diff mbox

[v2] ACPI / Processor: add sysfs support for low power idle

Message ID 1499968082-7753-1-git-send-email-pprakash@codeaurora.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Prakash, Prashanth July 13, 2017, 5:48 p.m. UTC
Add support to expose idle statistics maintained by platform to
userspace via sysfs in addition to other data of interest from
each LPI(Low Power Idle) state.

LPI described in section 8.4.4 of ACPI spec 6.1 provides different
methods to obtain idle statistics maintained by the platform. These
show a granular view of how each of the LPI state is being used at
different level of hierarchy. sysfs data is exposed at each level in
the hierarchy by creating a directory named 'lpi' at each level and
the LPI state information is presented under it. Below is the
representation of LPI information at one such level in the hierarchy

.../ACPI00XX: XX/lpi
	|-> state0
	|	|-> desc
	|	|-> time
	|	|-> usage
	|	|-> latency
	|	|-> min_residency
	|
	<<more states>>

ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)

stateX contains information related to a specific LPI state defined
in the LPI ACPI tables.

Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
---
v2
* Add Documentation/ABI/testing/sysfs-acpi-lpi
v1
* Drop flags, arch_flags and summary_stats field (Sudeep)
* Create sysfs entries after we know that LPI will be used (Sudeep & Rafael)

 Documentation/ABI/testing/sysfs-acpi-lpi |  47 +++++
 drivers/acpi/processor_idle.c            | 343 ++++++++++++++++++++++++++++++-
 include/acpi/processor.h                 |  14 ++
 3 files changed, 402 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-acpi-lpi

Comments

Sudeep Holla July 31, 2017, 4:25 p.m. UTC | #1
On 13/07/17 18:48, Prashanth Prakash wrote:
> Add support to expose idle statistics maintained by platform to
> userspace via sysfs in addition to other data of interest from
> each LPI(Low Power Idle) state.
> 
> LPI described in section 8.4.4 of ACPI spec 6.1 provides different
> methods to obtain idle statistics maintained by the platform. These
> show a granular view of how each of the LPI state is being used at
> different level of hierarchy. sysfs data is exposed at each level in
> the hierarchy by creating a directory named 'lpi' at each level and
> the LPI state information is presented under it. Below is the
> representation of LPI information at one such level in the hierarchy
> 
> .../ACPI00XX: XX/lpi
> 	|-> state0
> 	|	|-> desc
> 	|	|-> time
> 	|	|-> usage
> 	|	|-> latency
> 	|	|-> min_residency
> 	|
> 	<<more states>>
> 
> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
> 
> stateX contains information related to a specific LPI state defined
> in the LPI ACPI tables.
> 
> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
> ---
> v2
> * Add Documentation/ABI/testing/sysfs-acpi-lpi
> v1
> * Drop flags, arch_flags and summary_stats field (Sudeep)
> * Create sysfs entries after we know that LPI will be used (Sudeep & Rafael)
> 
>  Documentation/ABI/testing/sysfs-acpi-lpi |  47 +++++
>  drivers/acpi/processor_idle.c            | 343 ++++++++++++++++++++++++++++++-
>  include/acpi/processor.h                 |  14 ++
>  3 files changed, 402 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-acpi-lpi
> 
> diff --git a/Documentation/ABI/testing/sysfs-acpi-lpi b/Documentation/ABI/testing/sysfs-acpi-lpi
> new file mode 100644
> index 0000000..5b054a1
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-acpi-lpi

Sorry for the delay, I initial thought having this ABI under testing is
fine as I really don't want any *real* user space programs to depend on
this for various reasons stated in earlier threads, e.g. h/w auto
promotable states, accuracy of the stats, ..etc

But from Documentation/ABI/README, I see

testing/
	This directory documents interfaces that are felt to be stable,
	as the main development of this interface has been completed.
	The interface can be changed to add new features, but the
	current interface will not break by doing this, unless grave
	errors or security problems are found in them. User space
	programs can start to rely on these interfaces,...

which makes me worry. Since the use for this is purely for debug or
optimization purposes, I still prefer simple single file debugfs entry.
I still can't digest the fact that reading single file is time consuming
as we are not using this interface at runtime IIUC. i.e. statistic are
collected and analyzed offline.

But if Rafael is fine with this, that's fine.
Prakash, Prashanth July 31, 2017, 6:18 p.m. UTC | #2
Hi Sudeep,

On 7/31/2017 10:25 AM, Sudeep Holla wrote:
> Sorry for the delay, I initial thought having this ABI under testing is
> fine as I really don't want any *real* user space programs to depend on
> this for various reasons stated in earlier threads, e.g. h/w auto
> promotable states, accuracy of the stats, ..etc
<sorry for repeating this part>
These fields are optional, so if there is no reliable way to keep track of  stats, platform
can choose not to expose it. If a platform is exposing inaccurate stats via this ACPI
interface, it is breaking the spec.
> But from Documentation/ABI/README, I see
>
> testing/
> 	This directory documents interfaces that are felt to be stable,
> 	as the main development of this interface has been completed.
> 	The interface can be changed to add new features, but the
> 	current interface will not break by doing this, unless grave
> 	errors or security problems are found in them. User space
> 	programs can start to rely on these interfaces,...
>
> which makes me worry. Since the use for this is purely for debug or
> optimization purposes, I still prefer simple single file debugfs entry.
> I still can't digest the fact that reading single file is time consuming
> as we are not using this interface at runtime IIUC. i.e. statistic are
> collected and analyzed offline.
These fields has the same utility/use-cases as the usage & time fields in cpuidle sysfs,
but provides more granularity - idle stats for different levels hierarchy and accurate
idle stats for states that require platform co-ordination.

The argument for having a single sysfs file per node was that reading individual
files might get expensive to get a snapshot(not the other way around). But, that
argument was weak as we typically read these only in debug settings and not that
often during runtime. So, the summary_stats file was removed and went with one
value per file.

--
Thanks,
Prashanth
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla Aug. 1, 2017, 9:18 a.m. UTC | #3
On 31/07/17 19:18, Prakash, Prashanth wrote:
> Hi Sudeep,
> 
> On 7/31/2017 10:25 AM, Sudeep Holla wrote:
>> Sorry for the delay, I initial thought having this ABI under testing is
>> fine as I really don't want any *real* user space programs to depend on
>> this for various reasons stated in earlier threads, e.g. h/w auto
>> promotable states, accuracy of the stats, ..etc
> <sorry for repeating this part>
> These fields are optional, so if there is no reliable way to keep track of  stats, platform
> can choose not to expose it. If a platform is exposing inaccurate stats via this ACPI
> interface, it is breaking the spec.

Fair enough.

>> But from Documentation/ABI/README, I see
>>
>> testing/
>> 	This directory documents interfaces that are felt to be stable,
>> 	as the main development of this interface has been completed.
>> 	The interface can be changed to add new features, but the
>> 	current interface will not break by doing this, unless grave
>> 	errors or security problems are found in them. User space
>> 	programs can start to rely on these interfaces,...
>>
>> which makes me worry. Since the use for this is purely for debug or
>> optimization purposes, I still prefer simple single file debugfs entry.
>> I still can't digest the fact that reading single file is time consuming
>> as we are not using this interface at runtime IIUC. i.e. statistic are
>> collected and analyzed offline.>> These fields has the same utility/use-cases as the usage & time
fields in cpuidle sysfs,
> but provides more granularity - idle stats for different levels hierarchy and accurate
> idle stats for states that require platform co-ordination.
>

I completely agree with that and that's not the argument.

> The argument for having a single sysfs file per node was that reading individual
> files might get expensive to get a snapshot(not the other way around). But, that
> argument was weak as we typically read these only in debug settings and not that
> often during runtime. So, the summary_stats file was removed and went with one
> value per file.
> 

You are contradicting yourself above :). You say the argument you made
is weak :) but still went ahead and dropped single debugfs file vs the
standard per entry sysfs file which is an ABI.

Since we already have CPUIdle sysfs which is an ABI, I am really not
sure if we need another set of ABI files which are used only for debug
and optimization purposes. Why is single debugfs file not sufficient ?

As hardware evolves, most of the platforms can't provide these
information accurately. So if we are trying to address a problem which
is short-lived and on very small class of platforms, I would avoid
creating a new ABI for it. That's my main argument against this
interface instead go with debugfs entry. That's my opinion though.
Prakash, Prashanth Aug. 1, 2017, 5:15 p.m. UTC | #4
On 8/1/2017 3:18 AM, Sudeep Holla wrote:
>
> On 31/07/17 19:18, Prakash, Prashanth wrote:
>> Hi Sudeep,
>>
>> On 7/31/2017 10:25 AM, Sudeep Holla wrote:
>>> Sorry for the delay, I initial thought having this ABI under testing is
>>> fine as I really don't want any *real* user space programs to depend on
>>> this for various reasons stated in earlier threads, e.g. h/w auto
>>> promotable states, accuracy of the stats, ..etc
>> <sorry for repeating this part>
>> These fields are optional, so if there is no reliable way to keep track of  stats, platform
>> can choose not to expose it. If a platform is exposing inaccurate stats via this ACPI
>> interface, it is breaking the spec.
> Fair enough.
>
>>> But from Documentation/ABI/README, I see
>>>
>>> testing/
>>> 	This directory documents interfaces that are felt to be stable,
>>> 	as the main development of this interface has been completed.
>>> 	The interface can be changed to add new features, but the
>>> 	current interface will not break by doing this, unless grave
>>> 	errors or security problems are found in them. User space
>>> 	programs can start to rely on these interfaces,...
>>>
>>> which makes me worry. Since the use for this is purely for debug or
>>> optimization purposes, I still prefer simple single file debugfs entry.
>>> I still can't digest the fact that reading single file is time consuming
>>> as we are not using this interface at runtime IIUC. i.e. statistic are
>>> collected and analyzed offline.>> These fields has the same utility/use-cases as the usage & time
> fields in cpuidle sysfs,
>> but provides more granularity - idle stats for different levels hierarchy and accurate
>> idle stats for states that require platform co-ordination.
>>
> I completely agree with that and that's not the argument.
>
>> The argument for having a single sysfs file per node was that reading individual
>> files might get expensive to get a snapshot(not the other way around). But, that
>> argument was weak as we typically read these only in debug settings and not that
>> often during runtime. So, the summary_stats file was removed and went with one
>> value per file.
>>
> You are contradicting yourself above :). You say the argument you made
> is weak :) but still went ahead and dropped single debugfs file vs the
> standard per entry sysfs file which is an ABI.

To clarify, the first RFC patch had a sysfs entry called summary_stats which
provided all the stats for a specific node in hierarchy via a single file. The
argument for having such a single file was weak. :)  There was never a
debugfs file to be dropped in first place.

> Since we already have CPUIdle sysfs which is an ABI, I am really not
> sure if we need another set of ABI files which are used only for debug
> and optimization purposes. Why is single debugfs file not sufficient ?
>

The consumers for this data are not all kernel developers. We will have other
engineers looking at this data for power/performance optimizations and
would be nice to give them a consistent interface.
> As hardware evolves, most of the platforms can't provide these
> information accurately. So if we are trying to address a problem which
> is short-lived and on very small class of platforms, I would avoid
> creating a new ABI for it. That's my main argument against this
> interface instead go with debugfs entry. That's my opinion though.
I would like to think of it in terms of ACPI spec rather than a subset of platforms.
If it is part of spec there should be no reason to speculate on which platform
may or may not implement it. We implement to the spec and ideally platform
designers should ideally do the same.
Since sysfs vs debugfs is quite debatable :) , I will wait for Rafael's inputs.

--
Thanks,
Prashanth
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla Aug. 1, 2017, 5:34 p.m. UTC | #5
On 01/08/17 18:15, Prakash, Prashanth wrote:
> 
> On 8/1/2017 3:18 AM, Sudeep Holla wrote:
>>
>> On 31/07/17 19:18, Prakash, Prashanth wrote:
>>> Hi Sudeep,
>>>
>>> On 7/31/2017 10:25 AM, Sudeep Holla wrote:
>>>> Sorry for the delay, I initial thought having this ABI under testing is
>>>> fine as I really don't want any *real* user space programs to depend on
>>>> this for various reasons stated in earlier threads, e.g. h/w auto
>>>> promotable states, accuracy of the stats, ..etc
>>> <sorry for repeating this part>
>>> These fields are optional, so if there is no reliable way to keep track of  stats, platform
>>> can choose not to expose it. If a platform is exposing inaccurate stats via this ACPI
>>> interface, it is breaking the spec.
>> Fair enough.
>>
>>>> But from Documentation/ABI/README, I see
>>>>
>>>> testing/
>>>> 	This directory documents interfaces that are felt to be stable,
>>>> 	as the main development of this interface has been completed.
>>>> 	The interface can be changed to add new features, but the
>>>> 	current interface will not break by doing this, unless grave
>>>> 	errors or security problems are found in them. User space
>>>> 	programs can start to rely on these interfaces,...
>>>>
>>>> which makes me worry. Since the use for this is purely for debug or
>>>> optimization purposes, I still prefer simple single file debugfs entry.
>>>> I still can't digest the fact that reading single file is time consuming
>>>> as we are not using this interface at runtime IIUC. i.e. statistic are
>>>> collected and analyzed offline.>> These fields has the same utility/use-cases as the usage & time
>> fields in cpuidle sysfs,
>>> but provides more granularity - idle stats for different levels hierarchy and accurate
>>> idle stats for states that require platform co-ordination.
>>>
>> I completely agree with that and that's not the argument.
>>
>>> The argument for having a single sysfs file per node was that reading individual
>>> files might get expensive to get a snapshot(not the other way around). But, that
>>> argument was weak as we typically read these only in debug settings and not that
>>> often during runtime. So, the summary_stats file was removed and went with one
>>> value per file.
>>>
>> You are contradicting yourself above :). You say the argument you made
>> is weak :) but still went ahead and dropped single debugfs file vs the
>> standard per entry sysfs file which is an ABI.
> 
> To clarify, the first RFC patch had a sysfs entry called summary_stats which
> provided all the stats for a specific node in hierarchy via a single file. The
> argument for having such a single file was weak. :)  There was never a
> debugfs file to be dropped in first place.
> 

You are right, it was me who keep suggesting debugfs file as a
replacement for your summary_stats sysfs file. Sorry for the confusion
but I still insist debugfs :)

>> Since we already have CPUIdle sysfs which is an ABI, I am really not
>> sure if we need another set of ABI files which are used only for debug
>> and optimization purposes. Why is single debugfs file not sufficient ?
>>
> 
> The consumers for this data are not all kernel developers. We will have other
> engineers looking at this data for power/performance optimizations and
> would be nice to give them a consistent interface.

Yes that's the case with any sysfs file users and that's why we can't
break ABI once we expose.

>> As hardware evolves, most of the platforms can't provide these
>> information accurately. So if we are trying to address a problem which
>> is short-lived and on very small class of platforms, I would avoid
>> creating a new ABI for it. That's my main argument against this
>> interface instead go with debugfs entry. That's my opinion though.
> I would like to think of it in terms of ACPI spec rather than a subset of platforms.
> If it is part of spec there should be no reason to speculate on which platform
> may or may not implement it. We implement to the spec and ideally platform
> designers should ideally do the same. >
> Since sysfs vs debugfs is quite debatable :) , I will wait for Rafael's inputs.

Sure.
Prakash, Prashanth Aug. 23, 2017, 4:04 p.m. UTC | #6
Hi Rafael,

On 8/1/2017 11:34 AM, Sudeep Holla wrote:
>
> On 01/08/17 18:15, Prakash, Prashanth wrote:
>> On 8/1/2017 3:18 AM, Sudeep Holla wrote:
>>> On 31/07/17 19:18, Prakash, Prashanth wrote:
>>>> Hi Sudeep,
>>>>
>>>> On 7/31/2017 10:25 AM, Sudeep Holla wrote:
>>>>> Sorry for the delay, I initial thought having this ABI under testing is
>>>>> fine as I really don't want any *real* user space programs to depend on
>>>>> this for various reasons stated in earlier threads, e.g. h/w auto
>>>>> promotable states, accuracy of the stats, ..etc
>>>> <sorry for repeating this part>
>>>> These fields are optional, so if there is no reliable way to keep track of  stats, platform
>>>> can choose not to expose it. If a platform is exposing inaccurate stats via this ACPI
>>>> interface, it is breaking the spec.
>>> Fair enough.
>>>
>>>>> But from Documentation/ABI/README, I see
>>>>>
>>>>> testing/
>>>>> 	This directory documents interfaces that are felt to be stable,
>>>>> 	as the main development of this interface has been completed.
>>>>> 	The interface can be changed to add new features, but the
>>>>> 	current interface will not break by doing this, unless grave
>>>>> 	errors or security problems are found in them. User space
>>>>> 	programs can start to rely on these interfaces,...
>>>>>
>>>>> which makes me worry. Since the use for this is purely for debug or
>>>>> optimization purposes, I still prefer simple single file debugfs entry.
>>>>> I still can't digest the fact that reading single file is time consuming
>>>>> as we are not using this interface at runtime IIUC. i.e. statistic are
>>>>> collected and analyzed offline.>> These fields has the same utility/use-cases as the usage & time
>>> fields in cpuidle sysfs,
>>>> but provides more granularity - idle stats for different levels hierarchy and accurate
>>>> idle stats for states that require platform co-ordination.
>>>>
>>> I completely agree with that and that's not the argument.
>>>
>>>> The argument for having a single sysfs file per node was that reading individual
>>>> files might get expensive to get a snapshot(not the other way around). But, that
>>>> argument was weak as we typically read these only in debug settings and not that
>>>> often during runtime. So, the summary_stats file was removed and went with one
>>>> value per file.
>>>>
>>> You are contradicting yourself above :). You say the argument you made
>>> is weak :) but still went ahead and dropped single debugfs file vs the
>>> standard per entry sysfs file which is an ABI.
>> To clarify, the first RFC patch had a sysfs entry called summary_stats which
>> provided all the stats for a specific node in hierarchy via a single file. The
>> argument for having such a single file was weak. :)  There was never a
>> debugfs file to be dropped in first place.
>>
> You are right, it was me who keep suggesting debugfs file as a
> replacement for your summary_stats sysfs file. Sorry for the confusion
> but I still insist debugfs :)
>
>>> Since we already have CPUIdle sysfs which is an ABI, I am really not
>>> sure if we need another set of ABI files which are used only for debug
>>> and optimization purposes. Why is single debugfs file not sufficient ?
>>>
>> The consumers for this data are not all kernel developers. We will have other
>> engineers looking at this data for power/performance optimizations and
>> would be nice to give them a consistent interface.
> Yes that's the case with any sysfs file users and that's why we can't
> break ABI once we expose.
>
>>> As hardware evolves, most of the platforms can't provide these
>>> information accurately. So if we are trying to address a problem which
>>> is short-lived and on very small class of platforms, I would avoid
>>> creating a new ABI for it. That's my main argument against this
>>> interface instead go with debugfs entry. That's my opinion though.
>> I would like to think of it in terms of ACPI spec rather than a subset of platforms.
>> If it is part of spec there should be no reason to speculate on which platform
>> may or may not implement it. We implement to the spec and ideally platform
>> designers should ideally do the same. >
>> Since sysfs vs debugfs is quite debatable :) , I will wait for Rafael's inputs.
Any inputs on this?

--
Thanks,
Prashanth
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Aug. 28, 2017, 11:56 p.m. UTC | #7
On Wednesday, August 23, 2017 6:04:42 PM CEST Prakash, Prashanth wrote:
> Hi Rafael,
> 
> On 8/1/2017 11:34 AM, Sudeep Holla wrote:
> >
> > On 01/08/17 18:15, Prakash, Prashanth wrote:
> >> On 8/1/2017 3:18 AM, Sudeep Holla wrote:
> >>> On 31/07/17 19:18, Prakash, Prashanth wrote:
> >>>> Hi Sudeep,
> >>>>
> >>>> On 7/31/2017 10:25 AM, Sudeep Holla wrote:
> >>>>> Sorry for the delay, I initial thought having this ABI under testing is
> >>>>> fine as I really don't want any *real* user space programs to depend on
> >>>>> this for various reasons stated in earlier threads, e.g. h/w auto
> >>>>> promotable states, accuracy of the stats, ..etc
> >>>> <sorry for repeating this part>
> >>>> These fields are optional, so if there is no reliable way to keep track of  stats, platform
> >>>> can choose not to expose it. If a platform is exposing inaccurate stats via this ACPI
> >>>> interface, it is breaking the spec.
> >>> Fair enough.
> >>>
> >>>>> But from Documentation/ABI/README, I see
> >>>>>
> >>>>> testing/
> >>>>> 	This directory documents interfaces that are felt to be stable,
> >>>>> 	as the main development of this interface has been completed.
> >>>>> 	The interface can be changed to add new features, but the
> >>>>> 	current interface will not break by doing this, unless grave
> >>>>> 	errors or security problems are found in them. User space
> >>>>> 	programs can start to rely on these interfaces,...
> >>>>>
> >>>>> which makes me worry. Since the use for this is purely for debug or
> >>>>> optimization purposes, I still prefer simple single file debugfs entry.
> >>>>> I still can't digest the fact that reading single file is time consuming
> >>>>> as we are not using this interface at runtime IIUC. i.e. statistic are
> >>>>> collected and analyzed offline.>> These fields has the same utility/use-cases as the usage & time
> >>> fields in cpuidle sysfs,
> >>>> but provides more granularity - idle stats for different levels hierarchy and accurate
> >>>> idle stats for states that require platform co-ordination.
> >>>>
> >>> I completely agree with that and that's not the argument.
> >>>
> >>>> The argument for having a single sysfs file per node was that reading individual
> >>>> files might get expensive to get a snapshot(not the other way around). But, that
> >>>> argument was weak as we typically read these only in debug settings and not that
> >>>> often during runtime. So, the summary_stats file was removed and went with one
> >>>> value per file.
> >>>>
> >>> You are contradicting yourself above :). You say the argument you made
> >>> is weak :) but still went ahead and dropped single debugfs file vs the
> >>> standard per entry sysfs file which is an ABI.
> >> To clarify, the first RFC patch had a sysfs entry called summary_stats which
> >> provided all the stats for a specific node in hierarchy via a single file. The
> >> argument for having such a single file was weak. :)  There was never a
> >> debugfs file to be dropped in first place.
> >>
> > You are right, it was me who keep suggesting debugfs file as a
> > replacement for your summary_stats sysfs file. Sorry for the confusion
> > but I still insist debugfs :)
> >
> >>> Since we already have CPUIdle sysfs which is an ABI, I am really not
> >>> sure if we need another set of ABI files which are used only for debug
> >>> and optimization purposes. Why is single debugfs file not sufficient ?
> >>>
> >> The consumers for this data are not all kernel developers. We will have other
> >> engineers looking at this data for power/performance optimizations and
> >> would be nice to give them a consistent interface.
> > Yes that's the case with any sysfs file users and that's why we can't
> > break ABI once we expose.
> >
> >>> As hardware evolves, most of the platforms can't provide these
> >>> information accurately. So if we are trying to address a problem which
> >>> is short-lived and on very small class of platforms, I would avoid
> >>> creating a new ABI for it. That's my main argument against this
> >>> interface instead go with debugfs entry. That's my opinion though.
> >> I would like to think of it in terms of ACPI spec rather than a subset of platforms.
> >> If it is part of spec there should be no reason to speculate on which platform
> >> may or may not implement it. We implement to the spec and ideally platform
> >> designers should ideally do the same. >
> >> Since sysfs vs debugfs is quite debatable :) , I will wait for Rafael's inputs.
> Any inputs on this?

Sorry for the slow response, but I'm still unconvinced about this.

I generally don't like residency ("time" in your terminology, which is quite
confusing BTW) / usage information in sysfs under ACPI device nodes, as it
belongs to cpuidle rather.

I know that it is easy to expose it this way, but can't we do better?

Also, I'd use file names following the spec language.  min_residency is fine,
but "latency" would better be called "wakeup_latency" and why use "desc"
rather than "state_name"?  Plust exposing "Enabled Parent State" might be
useful.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prakash, Prashanth Aug. 31, 2017, 5:30 p.m. UTC | #8
On 8/28/2017 5:56 PM, Rafael J. Wysocki wrote:
> On Wednesday, August 23, 2017 6:04:42 PM CEST Prakash, Prashanth wrote:
>> Hi Rafael,
>>
>> On 8/1/2017 11:34 AM, Sudeep Holla wrote:
>>> On 01/08/17 18:15, Prakash, Prashanth wrote:
>>>> On 8/1/2017 3:18 AM, Sudeep Holla wrote:
>>>>> On 31/07/17 19:18, Prakash, Prashanth wrote:
>>>>>> Hi Sudeep,
>>>>>>
>>>>>> On 7/31/2017 10:25 AM, Sudeep Holla wrote:
>>>>>>> Sorry for the delay, I initial thought having this ABI under testing is
>>>>>>> fine as I really don't want any *real* user space programs to depend on
>>>>>>> this for various reasons stated in earlier threads, e.g. h/w auto
>>>>>>> promotable states, accuracy of the stats, ..etc
>>>>>> <sorry for repeating this part>
>>>>>> These fields are optional, so if there is no reliable way to keep track of  stats, platform
>>>>>> can choose not to expose it. If a platform is exposing inaccurate stats via this ACPI
>>>>>> interface, it is breaking the spec.
>>>>> Fair enough.
>>>>>
>>>>>>> But from Documentation/ABI/README, I see
>>>>>>>
>>>>>>> testing/
>>>>>>> 	This directory documents interfaces that are felt to be stable,
>>>>>>> 	as the main development of this interface has been completed.
>>>>>>> 	The interface can be changed to add new features, but the
>>>>>>> 	current interface will not break by doing this, unless grave
>>>>>>> 	errors or security problems are found in them. User space
>>>>>>> 	programs can start to rely on these interfaces,...
>>>>>>>
>>>>>>> which makes me worry. Since the use for this is purely for debug or
>>>>>>> optimization purposes, I still prefer simple single file debugfs entry.
>>>>>>> I still can't digest the fact that reading single file is time consuming
>>>>>>> as we are not using this interface at runtime IIUC. i.e. statistic are
>>>>>>> collected and analyzed offline.>> These fields has the same utility/use-cases as the usage & time
>>>>> fields in cpuidle sysfs,
>>>>>> but provides more granularity - idle stats for different levels hierarchy and accurate
>>>>>> idle stats for states that require platform co-ordination.
>>>>>>
>>>>> I completely agree with that and that's not the argument.
>>>>>
>>>>>> The argument for having a single sysfs file per node was that reading individual
>>>>>> files might get expensive to get a snapshot(not the other way around). But, that
>>>>>> argument was weak as we typically read these only in debug settings and not that
>>>>>> often during runtime. So, the summary_stats file was removed and went with one
>>>>>> value per file.
>>>>>>
>>>>> You are contradicting yourself above :). You say the argument you made
>>>>> is weak :) but still went ahead and dropped single debugfs file vs the
>>>>> standard per entry sysfs file which is an ABI.
>>>> To clarify, the first RFC patch had a sysfs entry called summary_stats which
>>>> provided all the stats for a specific node in hierarchy via a single file. The
>>>> argument for having such a single file was weak. :)  There was never a
>>>> debugfs file to be dropped in first place.
>>>>
>>> You are right, it was me who keep suggesting debugfs file as a
>>> replacement for your summary_stats sysfs file. Sorry for the confusion
>>> but I still insist debugfs :)
>>>
>>>>> Since we already have CPUIdle sysfs which is an ABI, I am really not
>>>>> sure if we need another set of ABI files which are used only for debug
>>>>> and optimization purposes. Why is single debugfs file not sufficient ?
>>>>>
>>>> The consumers for this data are not all kernel developers. We will have other
>>>> engineers looking at this data for power/performance optimizations and
>>>> would be nice to give them a consistent interface.
>>> Yes that's the case with any sysfs file users and that's why we can't
>>> break ABI once we expose.
>>>
>>>>> As hardware evolves, most of the platforms can't provide these
>>>>> information accurately. So if we are trying to address a problem which
>>>>> is short-lived and on very small class of platforms, I would avoid
>>>>> creating a new ABI for it. That's my main argument against this
>>>>> interface instead go with debugfs entry. That's my opinion though.
>>>> I would like to think of it in terms of ACPI spec rather than a subset of platforms.
>>>> If it is part of spec there should be no reason to speculate on which platform
>>>> may or may not implement it. We implement to the spec and ideally platform
>>>> designers should ideally do the same. >
>>>> Since sysfs vs debugfs is quite debatable :) , I will wait for Rafael's inputs.
>> Any inputs on this?
> Sorry for the slow response, but I'm still unconvinced about this.
>
> I generally don't like residency ("time" in your terminology, which is quite
> confusing BTW) / usage information in sysfs under ACPI device nodes, as it
> belongs to cpuidle rather.
>
> I know that it is easy to expose it this way, but can't we do better?
Last time I went though the code, I didn't find a good way to expose this outside
the ACPI hierarchy primarily because i couldn't find a good substitute to represent
the processor hierarchy.

I will take a look at it again.
> Also, I'd use file names following the spec language.  min_residency is fine,
> but "latency" would better be called "wakeup_latency" and why use "desc"
> rather than "state_name"?  Plust exposing "Enabled Parent State" might be
> useful.
I will switch the names to spec language on next version of this patch.

Thanks for your feedback!

--
Regards,
Prashanth

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/ABI/testing/sysfs-acpi-lpi b/Documentation/ABI/testing/sysfs-acpi-lpi
new file mode 100644
index 0000000..5b054a1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-acpi-lpi
@@ -0,0 +1,47 @@ 
+What:		LPI sysfs interface for processor devices and containers
+Date:		13-July-2017
+KernelVersion:	v4.13
+Contact:	linux-acpi@vger.kernel.org
+Description:	LPI(Low Power Idle) described in section 8.4.4 of ACPI spec 6.1
+		provides different methods to obtain idle statistics maintained
+		by the platform. These show a granular view of how each of the
+		LPI state is being used at different level of processor hierarchy.
+		sysfs data is exposed at each level in the hierarchy by creating
+		a directory named 'lpi' at each level and the LPI state
+		information is presented under it. Below is the representation
+		of LPI information at one such level in the hierarchy.
+
+		.../ACPI00XX: YY/lpi
+			|-> state0
+			|	|-> desc
+			|	|-> time
+			|	|-> usage
+			|	|-> latency
+			|	|-> min_residency
+			|
+			|-> state1
+			|	|-> desc
+			|	|-> time
+			|	|-> usage
+			|	|-> latency
+			|	|-> min_residency
+			|
+			<<more states>>
+
+		ACPI00XX can be ACPI0007(processor device) or ACPI0010(processor
+		container) and the sysfs nodes for these ACPI devices can be
+		found under /sys/devices/LNXSYSTM:00
+
+		stateX contains information related to a specific local LPI
+		state defined in the LPI ACPI tables under a owning hierarchy
+		node(ACPI0007 or ACPI0010).
+
+		desc	- Description/Name for the local LPI state
+		latency - Worst case wake up latency(in micro seconds) for the
+			owning hierarchy node to exit from this local idle state
+		min_residency - Time in microseconds after which a state becomes
+			more energy effecient than a shallower state
+		time	- Total time spent(in micro seconds) by the owning
+			hierarchy node in this local idle state
+		usage	- Number of times the owning hierarchy node entered this
+			local power state
\ No newline at end of file
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 5c8aa9c..d5c4d94 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -940,6 +940,10 @@  struct acpi_lpi_states_array {
 	struct acpi_lpi_state *composite_states[ACPI_PROCESSOR_MAX_POWER];
 };
 
+static int acpi_lpi_sysfs_init(acpi_handle h,
+			struct acpi_lpi_states_array *info);
+static int acpi_lpi_sysfs_exit(struct acpi_processor *pr);
+
 static int obj_get_integer(union acpi_object *obj, u32 *value)
 {
 	if (obj->type != ACPI_TYPE_INTEGER)
@@ -949,6 +953,24 @@  static int obj_get_integer(union acpi_object *obj, u32 *value)
 	return 0;
 }
 
+static int obj_get_generic_addr(union acpi_object *obj,
+				struct acpi_generic_address *addr)
+{
+	struct acpi_power_register *reg;
+
+	if (obj->type != ACPI_TYPE_BUFFER)
+		return -EINVAL;
+
+	reg = (struct acpi_power_register *)obj->buffer.pointer;
+	addr->space_id = reg->space_id;
+	addr->bit_width = reg->bit_width;
+	addr->bit_offset = reg->bit_offset;
+	addr->access_width = reg->access_size;
+	addr->address = reg->address;
+
+	return 0;
+}
+
 static int acpi_processor_evaluate_lpi(acpi_handle handle,
 				       struct acpi_lpi_states_array *info)
 {
@@ -1023,8 +1045,6 @@  static int acpi_processor_evaluate_lpi(acpi_handle handle,
 			continue;
 		}
 
-		/* elements[7,8] skipped for now i.e. Residency/Usage counter*/
-
 		obj = pkg_elem + 9;
 		if (obj->type == ACPI_TYPE_STRING)
 			strlcpy(lpi_state->desc, obj->string.pointer,
@@ -1052,9 +1072,16 @@  static int acpi_processor_evaluate_lpi(acpi_handle handle,
 
 		if (obj_get_integer(pkg_elem + 5, &lpi_state->enable_parent_state))
 			lpi_state->enable_parent_state = 0;
+
+		obj_get_generic_addr(pkg_elem + 7, &lpi_state->res_cntr);
+
+		obj_get_generic_addr(pkg_elem + 8, &lpi_state->usage_cntr);
 	}
 
 	acpi_handle_debug(handle, "Found %d power states\n", state_idx);
+
+	/* Set up LPI sysfs */
+	acpi_lpi_sysfs_init(handle, info);
 end:
 	kfree(buffer.pointer);
 	return ret;
@@ -1166,6 +1193,10 @@  static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
 	if (!acpi_has_method(handle, "_LPI"))
 		return -EINVAL;
 
+	/* If we have already initialized just return */
+	if (pr->flags.has_lpi == 1)
+		return 0;
+
 	flat_state_cnt = 0;
 	prev = &info[0];
 	curr = &info[1];
@@ -1477,8 +1508,316 @@  int acpi_processor_power_exit(struct acpi_processor *pr)
 		acpi_processor_registered--;
 		if (acpi_processor_registered == 0)
 			cpuidle_unregister_driver(&acpi_idle_driver);
+
+		acpi_lpi_sysfs_exit(pr);
 	}
 
 	pr->flags.power_setup_done = 0;
 	return 0;
 }
+
+
+/*
+ * LPI sysfs support
+ */
+
+struct acpi_lpi_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct kobject *kobj, struct attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct kobject *kobj, struct attribute *attr,
+			const char *c, ssize_t count);
+};
+
+#define define_lpi_ro(_name) static struct acpi_lpi_attr _name =	\
+		__ATTR(_name, 0444, show_##_name, NULL)
+
+#define to_acpi_lpi_sysfs_state(k)				\
+	container_of(k, struct acpi_lpi_sysfs_state, kobj)
+
+#define to_acpi_lpi_state(k)				\
+	(&(to_acpi_lpi_sysfs_state(k)->lpi_state))
+
+static ssize_t show_desc(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%s\n", lpi->desc);
+}
+define_lpi_ro(desc);
+
+static int acpi_lpi_get_time(struct acpi_lpi_state *lpi, u64 *val)
+{
+	struct acpi_generic_address *reg;
+
+	if (!lpi)
+		return -EFAULT;
+
+	reg = &lpi->res_cntr;
+
+	/* Supporting only system memory */
+	if (reg->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY ||
+		!(lpi->flags & ACPI_LPI_STATE_FLAGS_ENABLED) ||
+		!reg->address || !lpi->res_cnt_freq)
+		return -EINVAL;
+
+	if (ACPI_FAILURE(acpi_read(val, reg)))
+		return -EFAULT;
+
+	*val = div_u64((*val * 1000000), lpi->res_cnt_freq);
+	return 0;
+
+}
+
+/* shows residency in us */
+static ssize_t show_time(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+	u64 val = 0;
+	int ret;
+
+	ret = acpi_lpi_get_time(lpi, &val);
+
+	if (ret == -EINVAL)
+		return scnprintf(buf, PAGE_SIZE, "<unsupported>\n");
+
+	if (ret)
+		return ret;
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+}
+define_lpi_ro(time);
+
+static int acpi_lpi_get_usage(struct acpi_lpi_state *lpi, u64 *val)
+{
+	struct acpi_generic_address *reg;
+
+	if (!lpi)
+		return -EFAULT;
+
+	reg = &lpi->usage_cntr;
+
+	/* Supporting only system memory now (FFH not supported) */
+	if (reg->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY ||
+		!(lpi->flags & ACPI_LPI_STATE_FLAGS_ENABLED) ||
+		!reg->address)
+		return -EINVAL;
+
+	if (ACPI_FAILURE(acpi_read(val, reg)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static ssize_t show_usage(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+	u64 val = 0;
+	int ret;
+
+	ret = acpi_lpi_get_usage(lpi, &val);
+
+	if (ret == -EINVAL)
+		return scnprintf(buf, PAGE_SIZE, "<unsupported>\n");
+
+	if (ret)
+		return ret;
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+}
+define_lpi_ro(usage);
+
+static ssize_t show_min_residency(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", lpi->min_residency);
+}
+define_lpi_ro(min_residency);
+
+static ssize_t show_latency(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", lpi->wake_latency);
+}
+define_lpi_ro(latency);
+
+static struct attribute *acpi_lpi_state_attrs[] = {
+	&desc.attr,
+	&min_residency.attr,
+	&latency.attr,
+	&time.attr,
+	&usage.attr,
+	NULL
+};
+
+static struct kobj_type lpi_state_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+	.default_attrs = acpi_lpi_state_attrs,
+};
+
+static void acpi_lpi_sysfs_release(struct kobject *kobj)
+{
+	struct acpi_lpi_sysfs_data *sysfs_data =
+		container_of(kobj, struct acpi_lpi_sysfs_data, kobj);
+
+	kfree(sysfs_data->sysfs_states);
+	kfree(sysfs_data);
+}
+
+static struct kobj_type lpi_device_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+	.release = acpi_lpi_sysfs_release,
+};
+
+static int acpi_lpi_sysfs_get(struct acpi_lpi_sysfs_data *sysfs_data)
+{
+	int i;
+
+	if (!sysfs_data)
+		return -EFAULT;
+
+	for (i = 0; i < sysfs_data->state_count; i++)
+		kobject_get(&sysfs_data->sysfs_states[i].kobj);
+
+	kobject_get(&sysfs_data->kobj);
+
+	return 0;
+}
+
+static int acpi_lpi_sysfs_put(struct acpi_lpi_sysfs_data *sysfs_data)
+{
+	int i;
+
+	if (!sysfs_data)
+		return -EFAULT;
+
+	for (i = 0; i < sysfs_data->state_count; i++)
+		kobject_put(&sysfs_data->sysfs_states[i].kobj);
+
+	kobject_put(&sysfs_data->kobj);
+
+	return 0;
+}
+
+/*
+ * Given parsed LPI info creates sysfs entries to expose differnt LPI attributes
+ * stats for all the "enabled" states
+ */
+static int acpi_lpi_sysfs_init(acpi_handle h,
+			struct acpi_lpi_states_array *info)
+{
+	struct acpi_device *d;
+	struct acpi_lpi_sysfs_state *sysfs_state = NULL;
+	struct acpi_lpi_sysfs_data **lpi_sysfs_data;
+	struct acpi_lpi_sysfs_data *data = NULL;
+	int ret, i, j;
+
+	if (!info)
+		return -EINVAL;
+
+	ret = acpi_bus_get_device(h, &d);
+	if (ret)
+		return ret;
+
+	if (!strcmp(acpi_device_hid(d), ACPI_PROCESSOR_CONTAINER_HID))
+		lpi_sysfs_data = (struct acpi_lpi_sysfs_data **)&d->driver_data;
+	else {
+		struct acpi_processor *pr = acpi_driver_data(d);
+
+		lpi_sysfs_data = &pr->power.lpi_sysfs_data;
+	}
+
+	/* Already initialized, get a reference and return */
+	if (*lpi_sysfs_data) {
+		acpi_lpi_sysfs_get(*lpi_sysfs_data);
+		return 0;
+	}
+
+	data = kzalloc(sizeof(struct acpi_lpi_sysfs_data), GFP_KERNEL);
+	if (!data) {
+		ret = -ENOMEM;
+		goto kfree_and_return;
+	}
+
+	/* Count number of enabled states */
+	for (i = 0; i < info->size; i++)
+		if (info->entries[i].flags & ACPI_LPI_STATE_FLAGS_ENABLED)
+			data->state_count++;
+
+	sysfs_state = kcalloc(data->state_count,
+			sizeof(struct acpi_lpi_sysfs_state), GFP_KERNEL);
+	if (!sysfs_state) {
+		ret = -ENOMEM;
+		goto kfree_and_return;
+	}
+
+	ret = kobject_init_and_add(&data->kobj, &lpi_device_ktype, &d->dev.kobj,
+				"lpi");
+	if (ret)
+		goto kfree_and_return;
+
+	*lpi_sysfs_data = data;
+	data->sysfs_states = sysfs_state;
+
+	for (i = 0, j = 0; i < info->size; i++) {
+		if (!(info->entries[i].flags & ACPI_LPI_STATE_FLAGS_ENABLED))
+			continue;
+		sysfs_state = data->sysfs_states + j;
+		memcpy(&sysfs_state->lpi_state, info->entries + i,
+			sizeof(struct acpi_lpi_state));
+		ret = kobject_init_and_add(&sysfs_state->kobj, &lpi_state_ktype,
+					&data->kobj, "state%d", j);
+		if (ret)
+			break;
+		j++;
+	}
+
+	if (ret) {
+		while (j > 0) {
+			j--;
+			sysfs_state = data->sysfs_states + i;
+			kobject_put(&sysfs_state->kobj);
+		}
+		kobject_put(&data->kobj);
+	} else
+		*lpi_sysfs_data = data;
+
+	return ret;
+
+kfree_and_return:
+	kfree(data);
+	kfree(sysfs_state);
+	return ret;
+}
+
+static int acpi_lpi_sysfs_exit(struct acpi_processor *pr)
+{
+	acpi_handle handle, p_handle;
+	struct acpi_device *d = NULL;
+	acpi_status status;
+
+	if (!pr)
+		return -ENODEV;
+
+	handle = pr->handle;
+	acpi_lpi_sysfs_put(pr->power.lpi_sysfs_data);
+
+	status = acpi_get_parent(handle, &p_handle);
+	while (ACPI_SUCCESS(status)) {
+		acpi_bus_get_device(p_handle, &d);
+		if (strcmp(acpi_device_hid(d), ACPI_PROCESSOR_CONTAINER_HID))
+			break;
+
+		acpi_lpi_sysfs_put((struct acpi_lpi_sysfs_data *)d->driver_data);
+		status = acpi_get_parent(handle, &p_handle);
+	}
+
+	return 0;
+}
diff --git a/include/acpi/processor.h b/include/acpi/processor.h
index c1ba00f..b99b84b 100644
--- a/include/acpi/processor.h
+++ b/include/acpi/processor.h
@@ -79,6 +79,19 @@  struct acpi_lpi_state {
 	u8 index;
 	u8 entry_method;
 	char desc[ACPI_CX_DESC_LEN];
+	struct acpi_generic_address res_cntr;
+	struct acpi_generic_address usage_cntr;
+};
+
+struct acpi_lpi_sysfs_state {
+	struct acpi_lpi_state lpi_state;
+	struct kobject kobj;
+};
+
+struct acpi_lpi_sysfs_data {
+	u8 state_count;
+	struct kobject kobj;
+	struct acpi_lpi_sysfs_state *sysfs_states;
 };
 
 struct acpi_processor_power {
@@ -88,6 +101,7 @@  struct acpi_processor_power {
 		struct acpi_lpi_state lpi_states[ACPI_PROCESSOR_MAX_POWER];
 	};
 	int timer_broadcast_on_state;
+	struct acpi_lpi_sysfs_data *lpi_sysfs_data;
 };
 
 /* Performance Management */