diff mbox

[RFC] ACPI / Processor: add sysfs support for low power idle

Message ID 1490832795-27441-1-git-send-email-pprakash@codeaurora.org (mailing list archive)
State RFC, archived
Headers show

Commit Message

Prakash, Prashanth March 30, 2017, 12:13 a.m. UTC
Add support to expose idle statistics maintained by platform to
userspace via sysfs in addition to other data of interest from
each LPI(Low Power Idle) state.

LPI described in section 8.4.4 of ACPI spec 6.1 provides different
methods to obtain idle statistics maintained by the platform. These
show a granular view of how each of the LPI state is being used at
different level of hierarchy. sysfs data is exposed at each level in
the hierarchy by creating a directory named 'lpi' at each level and
the LPI state information is presented under it. Below is the
representation of LPI information at one such level in the hierarchy

.../ACPI00XX: XX/lpi
	|-> summary_stats
	|-> state0
	|	|-> desc
	|	|-> time
	|	|-> usage
	|	|-> latency
	|	|-> min_residency
	|	|-> flags
	|	|-> arch_flags
	|
	<<more states>>

ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)

stateX contains information related to a specific LPI state defined
in the LPI ACPI tables.

summary_stats shows the stats(usage and time) from all the LPI states
under a device. The summary_stats are provided to reduce the number'
of files to be accessed by the userspace to capture a snapshot of the'
idle statistics.

Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
---
 drivers/acpi/acpi_processor.c |  11 ++
 drivers/acpi/processor_idle.c | 345 +++++++++++++++++++++++++++++++++++++++++-
 include/acpi/processor.h      |  27 ++++
 3 files changed, 381 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki April 18, 2017, 2:34 p.m. UTC | #1
On Wednesday, March 29, 2017 06:13:15 PM Prashanth Prakash wrote:
> Add support to expose idle statistics maintained by platform to
> userspace via sysfs in addition to other data of interest from
> each LPI(Low Power Idle) state.
> 
> LPI described in section 8.4.4 of ACPI spec 6.1 provides different
> methods to obtain idle statistics maintained by the platform. These
> show a granular view of how each of the LPI state is being used at
> different level of hierarchy. sysfs data is exposed at each level in
> the hierarchy by creating a directory named 'lpi' at each level and
> the LPI state information is presented under it. Below is the
> representation of LPI information at one such level in the hierarchy
> 
> .../ACPI00XX: XX/lpi
> 	|-> summary_stats
> 	|-> state0
> 	|	|-> desc
> 	|	|-> time
> 	|	|-> usage
> 	|	|-> latency
> 	|	|-> min_residency
> 	|	|-> flags
> 	|	|-> arch_flags
> 	|
> 	<<more states>>
> 
> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
> 
> stateX contains information related to a specific LPI state defined
> in the LPI ACPI tables.
> 
> summary_stats shows the stats(usage and time) from all the LPI states
> under a device. The summary_stats are provided to reduce the number'
> of files to be accessed by the userspace to capture a snapshot of the'
> idle statistics.
> 
> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>

I'd like Sudeep to tell me what he thinks about this in the first place.

> ---
>  drivers/acpi/acpi_processor.c |  11 ++
>  drivers/acpi/processor_idle.c | 345 +++++++++++++++++++++++++++++++++++++++++-
>  include/acpi/processor.h      |  27 ++++
>  3 files changed, 381 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 0143135..a01368d 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -570,9 +570,19 @@ void __init acpi_early_processor_osc(void)
>  static int acpi_processor_container_attach(struct acpi_device *dev,
>  					   const struct acpi_device_id *id)
>  {
> +	if (dev->status.present && dev->status.functional &&
> +		dev->status.enabled && dev->status.show_in_ui)
> +		acpi_lpi_sysfs_init(dev->handle,
> +				(struct acpi_lpi_sysfs_data **)&dev->driver_data);

This isn't the right place to do it IMO.

It should be done at the processor driver initialization when it is know that
LPI is going to be used at all.

>  	return 1;
>  }
>  
> +static void acpi_processor_container_detach(struct acpi_device *dev)
> +{
> +	if (dev->driver_data)
> +		acpi_lpi_sysfs_exit((struct acpi_lpi_sysfs_data *)dev->driver_data);

And analogously here.

> +}
> +
>  static const struct acpi_device_id processor_container_ids[] = {
>  	{ ACPI_PROCESSOR_CONTAINER_HID, },
>  	{ }

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 19, 2017, 3:37 p.m. UTC | #2
On 30/03/17 01:13, Prashanth Prakash wrote:
> Add support to expose idle statistics maintained by platform to
> userspace via sysfs in addition to other data of interest from
> each LPI(Low Power Idle) state.
> 

While I understand this information is useful for some optimization
and also for idle characterization with different workloads, I prefer
to keep this in debugfs for:

1. We already have CPUIdle stats, this information may be more accurate
   which is good for above reasons but user-space applications shouldn't
   depend on it and end-up misusing it.
2. Also as more features get pushed into hardware, even these stats may
   not remain so much accurate as it is today and hence it would be
   better if user-space applications never use/depend on them.

Let me know if there are conflicting reasons ?

> LPI described in section 8.4.4 of ACPI spec 6.1 provides different
> methods to obtain idle statistics maintained by the platform. These
> show a granular view of how each of the LPI state is being used at
> different level of hierarchy. sysfs data is exposed at each level in
> the hierarchy by creating a directory named 'lpi' at each level and
> the LPI state information is presented under it. Below is the
> representation of LPI information at one such level in the hierarchy
> 
> .../ACPI00XX: XX/lpi
> 	|-> summary_stats

Not sure if it's any useful, see below.

> 	|-> state0
> 	|	|-> desc
> 	|	|-> time
> 	|	|-> usage
> 	|	|-> latency
> 	|	|-> min_residency

The flags/arch_flags are meaning less if they are exposed as it. I would
rather drop them.

> 	|	|-> flags
> 	|	|-> arch_flags
> 	|
> 	<<more states>>
> 
> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
> 
> stateX contains information related to a specific LPI state defined
> in the LPI ACPI tables.
> 
> summary_stats shows the stats(usage and time) from all the LPI states
> under a device. The summary_stats are provided to reduce the number'
> of files to be accessed by the userspace to capture a snapshot of the'
> idle statistics.

Really ? What's the need to reduce the no. of file accesses ?

> 
> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
> ---
>  drivers/acpi/acpi_processor.c |  11 ++
>  drivers/acpi/processor_idle.c | 345 +++++++++++++++++++++++++++++++++++++++++-
>  include/acpi/processor.h      |  27 ++++
>  3 files changed, 381 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 0143135..a01368d 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -570,9 +570,19 @@ void __init acpi_early_processor_osc(void)
>  static int acpi_processor_container_attach(struct acpi_device *dev,
>  					   const struct acpi_device_id *id)
>  {
> +	if (dev->status.present && dev->status.functional &&
> +		dev->status.enabled && dev->status.show_in_ui)
> +		acpi_lpi_sysfs_init(dev->handle,
> +				(struct acpi_lpi_sysfs_data **)&dev->driver_data);

Why not register it only if LPIs are detected/initialized as active ?

[...]

> +
> +
> +/*
> + * LPI sysfs support
> + * Exports two APIs that can be called as part of init and exit to setup the LPI
> + * sysfs entries either from processor or processor_container driver
> + */
> +

[...]

Lot of this sysfs handling looks similar to what we already have for
cpuidle. I don't have a quick solution to avoid duplication but Greg
really hate dealing with kobjects/ksets. I need to think if there's any
better way to do this. Sorry for just raising issue without solution.


> +int acpi_lpi_sysfs_init(acpi_handle h,
> +			struct acpi_lpi_sysfs_data **lpi_sysfs_data)
> +{
> +	struct acpi_device *d;
> +	struct acpi_lpi_states_array info;
> +	struct acpi_lpi_sysfs_state *sysfs_state = NULL;
> +	struct acpi_lpi_sysfs_data *data = NULL;
> +	int ret, i;
> +
> +	if (!lpi_sysfs_data)
> +		return -EINVAL;
> +
> +	ret = acpi_bus_get_device(h, &d);
> +	if (ret)
> +		return ret;
> +
> +	ret = acpi_processor_evaluate_lpi(h, &info);

Why do we need to reevaluate _LPI here ?
Prakash, Prashanth April 19, 2017, 10:57 p.m. UTC | #3
Hi Sudeep,

On 4/19/2017 9:37 AM, Sudeep Holla wrote:
> On 30/03/17 01:13, Prashanth Prakash wrote:
>> Add support to expose idle statistics maintained by platform to
>> userspace via sysfs in addition to other data of interest from
>> each LPI(Low Power Idle) state.
>>
> While I understand this information is useful for some optimization
> and also for idle characterization with different workloads, I prefer
> to keep this in debugfs for:
>
> 1. We already have CPUIdle stats, this information may be more accurate
>    which is good for above reasons but user-space applications shouldn't
>    depend on it and end-up misusing it.
> 2. Also as more features get pushed into hardware, even these stats may
>    not remain so much accurate as it is today and hence it would be
>    better if user-space applications never use/depend on them.
>
> Let me know if there are conflicting reasons ?
The information about idle state of shared resources(Cache, interconnect ...)
which cannot be deduced from the cpuidle stats is quite useful. We can use this
to analyze newer workloads and to explain their power consumption, especially as
the amount of time some of these shared resources spends in different LPI states
can be influenced by changes to workload, kernel or firmware. And for auto
promote-able states this is the only way to capture idle stats.

Regarding 2, since these stats are clearly defined by ACPI spec and maintained by
platform, I think it is reasonable to expect them to be accurate. If it is not accurate,
it is likely that platform is breaking the spec.

Given above, I don't see much room for user-space applications to misuse this.
Given these are defined as optional in spec, user-space application should use
only if/when available and use it as complementary to cpuidle stats.

Sounds reasonable?

>> LPI described in section 8.4.4 of ACPI spec 6.1 provides different
>> methods to obtain idle statistics maintained by the platform. These
>> show a granular view of how each of the LPI state is being used at
>> different level of hierarchy. sysfs data is exposed at each level in
>> the hierarchy by creating a directory named 'lpi' at each level and
>> the LPI state information is presented under it. Below is the
>> representation of LPI information at one such level in the hierarchy
>>
>> .../ACPI00XX: XX/lpi
>> 	|-> summary_stats
> Not sure if it's any useful, see below.
>
>> 	|-> state0
>> 	|	|-> desc
>> 	|	|-> time
>> 	|	|-> usage
>> 	|	|-> latency
>> 	|	|-> min_residency
> The flags/arch_flags are meaning less if they are exposed as it. I would
> rather drop them.
Agree on arch_flags, will remove it.

We do decode the flags -  Enabled/Disabled.  I will make changes to populate sysfs
only for enabled states and get rid of flags as well.
>
>> 	|	|-> flags
>> 	|	|-> arch_flags
>> 	|
>> 	<<more states>>
>>
>> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
>>
>> stateX contains information related to a specific LPI state defined
>> in the LPI ACPI tables.
>>
>> summary_stats shows the stats(usage and time) from all the LPI states
>> under a device. The summary_stats are provided to reduce the number'
>> of files to be accessed by the userspace to capture a snapshot of the'
>> idle statistics.
> Really ? What's the need to reduce the no. of file accesses ?
When we have a large number of cores, with multiple idle state + few auto-promotable
states. The amount of files we need to access to get a snapshot before/after a running
a workload is quite high.

It gets worse if we want to keep the file handles open to sample it little more frequently,
to get breakdown of idle state distribution during different phases of a workload.
>> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
>> ---
>>  drivers/acpi/acpi_processor.c |  11 ++
>>  drivers/acpi/processor_idle.c | 345 +++++++++++++++++++++++++++++++++++++++++-
>>  include/acpi/processor.h      |  27 ++++
>>  3 files changed, 381 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
>> index 0143135..a01368d 100644
>> --- a/drivers/acpi/acpi_processor.c
>> +++ b/drivers/acpi/acpi_processor.c
>> @@ -570,9 +570,19 @@ void __init acpi_early_processor_osc(void)
>>  static int acpi_processor_container_attach(struct acpi_device *dev,
>>  					   const struct acpi_device_id *id)
>>  {
>> +	if (dev->status.present && dev->status.functional &&
>> +		dev->status.enabled && dev->status.show_in_ui)
>> +		acpi_lpi_sysfs_init(dev->handle,
>> +				(struct acpi_lpi_sysfs_data **)&dev->driver_data);
> Why not register it only if LPIs are detected/initialized as active ?
Based on Rafael's feedback, I was planning to refactor the code for to initialize
sysfs as part of LPI initialization, so that should take care of this.
> [...]
>
>> +
>> +
>> +/*
>> + * LPI sysfs support
>> + * Exports two APIs that can be called as part of init and exit to setup the LPI
>> + * sysfs entries either from processor or processor_container driver
>> + */
>> +
> [...]
>
> Lot of this sysfs handling looks similar to what we already have for
> cpuidle. I don't have a quick solution to avoid duplication but Greg
> really hate dealing with kobjects/ksets. I need to think if there's any
> better way to do this. Sorry for just raising issue without solution.
Given the hierarchical nature of LPI and presence of  auto promote-able states, it was
hard to fit it to cpuidle model. I will take a look at it again, though I am not confident
about finding a solution to avoid duplication.
>
>> +int acpi_lpi_sysfs_init(acpi_handle h,
>> +			struct acpi_lpi_sysfs_data **lpi_sysfs_data)
>> +{
>> +	struct acpi_device *d;
>> +	struct acpi_lpi_states_array info;
>> +	struct acpi_lpi_sysfs_state *sysfs_state = NULL;
>> +	struct acpi_lpi_sysfs_data *data = NULL;
>> +	int ret, i;
>> +
>> +	if (!lpi_sysfs_data)
>> +		return -EINVAL;
>> +
>> +	ret = acpi_bus_get_device(h, &d);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = acpi_processor_evaluate_lpi(h, &info);
> Why do we need to reevaluate _LPI here ?
Since I am calling acpi_lpi_sysfs_init( ) from processor_driver and processor_container,
calling reevaluate made the code much simpler.
Refactoring code to initialize LPI sysfs as part of LPI init in processor driver should
likely remove the need to call reevaluate here.

Thanks a lot for your inputs!

--
Thanks,
Prashanth

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prakash, Prashanth April 19, 2017, 11 p.m. UTC | #4
Hi Rafael,

On 4/18/2017 8:34 AM, Rafael J. Wysocki wrote:
> On Wednesday, March 29, 2017 06:13:15 PM Prashanth Prakash wrote:
>> Add support to expose idle statistics maintained by platform to
>> userspace via sysfs in addition to other data of interest from
>> each LPI(Low Power Idle) state.
>>
>> LPI described in section 8.4.4 of ACPI spec 6.1 provides different
>> methods to obtain idle statistics maintained by the platform. These
>> show a granular view of how each of the LPI state is being used at
>> different level of hierarchy. sysfs data is exposed at each level in
>> the hierarchy by creating a directory named 'lpi' at each level and
>> the LPI state information is presented under it. Below is the
>> representation of LPI information at one such level in the hierarchy
>>
>> .../ACPI00XX: XX/lpi
>> 	|-> summary_stats
>> 	|-> state0
>> 	|	|-> desc
>> 	|	|-> time
>> 	|	|-> usage
>> 	|	|-> latency
>> 	|	|-> min_residency
>> 	|	|-> flags
>> 	|	|-> arch_flags
>> 	|
>> 	<<more states>>
>>
>> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
>>
>> stateX contains information related to a specific LPI state defined
>> in the LPI ACPI tables.
>>
>> summary_stats shows the stats(usage and time) from all the LPI states
>> under a device. The summary_stats are provided to reduce the number'
>> of files to be accessed by the userspace to capture a snapshot of the'
>> idle statistics.
>>
>> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
> I'd like Sudeep to tell me what he thinks about this in the first place.
>
>> ---
>>  drivers/acpi/acpi_processor.c |  11 ++
>>  drivers/acpi/processor_idle.c | 345 +++++++++++++++++++++++++++++++++++++++++-
>>  include/acpi/processor.h      |  27 ++++
>>  3 files changed, 381 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
>> index 0143135..a01368d 100644
>> --- a/drivers/acpi/acpi_processor.c
>> +++ b/drivers/acpi/acpi_processor.c
>> @@ -570,9 +570,19 @@ void __init acpi_early_processor_osc(void)
>>  static int acpi_processor_container_attach(struct acpi_device *dev,
>>  					   const struct acpi_device_id *id)
>>  {
>> +	if (dev->status.present && dev->status.functional &&
>> +		dev->status.enabled && dev->status.show_in_ui)
>> +		acpi_lpi_sysfs_init(dev->handle,
>> +				(struct acpi_lpi_sysfs_data **)&dev->driver_data);
> This isn't the right place to do it IMO.
>
> It should be done at the processor driver initialization when it is know that
> LPI is going to be used at all.
I will make the changes to initialize this in processor driver.

Thanks for the feedback!

--
Thanks,
Prashanth

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 20, 2017, 8:46 a.m. UTC | #5
On 19/04/17 23:57, Prakash, Prashanth wrote:
> Hi Sudeep,
> 
> On 4/19/2017 9:37 AM, Sudeep Holla wrote:
>> On 30/03/17 01:13, Prashanth Prakash wrote:
>>> Add support to expose idle statistics maintained by platform to
>>> userspace via sysfs in addition to other data of interest from
>>> each LPI(Low Power Idle) state.
>>>
>> While I understand this information is useful for some optimization
>> and also for idle characterization with different workloads, I prefer
>> to keep this in debugfs for:
>>
>> 1. We already have CPUIdle stats, this information may be more accurate
>>    which is good for above reasons but user-space applications shouldn't
>>    depend on it and end-up misusing it.
>> 2. Also as more features get pushed into hardware, even these stats may
>>    not remain so much accurate as it is today and hence it would be
>>    better if user-space applications never use/depend on them.
>>
>> Let me know if there are conflicting reasons ?
> The information about idle state of shared resources(Cache, interconnect ...)
> which cannot be deduced from the cpuidle stats is quite useful. We can use this
> to analyze newer workloads and to explain their power consumption, especially as
> the amount of time some of these shared resources spends in different LPI states
> can be influenced by changes to workload, kernel or firmware. And for auto
> promote-able states this is the only way to capture idle stats.
> 

I agree that the stats are useful, no argument there. The question is
more in terms of whether it can be debugfs which is just useful in
analysis and characterization or sysfs which becomes user ABI.

> Regarding 2, since these stats are clearly defined by ACPI spec and maintained by
> platform, I think it is reasonable to expect them to be accurate. If it is not accurate,
> it is likely that platform is breaking the spec.
> 

I was considering firmware vs hardware here. If f/w tracks and updates
these statistics, it may not be so accurate in the future if more
controls that are today done in f/w will be done automatically done in
h/w. If h/w updates these statistics, then yes they will be accurate.

I am still trying to convince myself if we need this as sysfs user ABI,
so just thinking aloud.

> Given above, I don't see much room for user-space applications to misuse this.

You never know :)

> Given these are defined as optional in spec, user-space application should use
> only if/when available and use it as complementary to cpuidle stats.
> 

Fair enough.

[...]

>>
>>> 	|	|-> flags
>>> 	|	|-> arch_flags
>>> 	|
>>> 	<<more states>>
>>>
>>> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
>>>
>>> stateX contains information related to a specific LPI state defined
>>> in the LPI ACPI tables.
>>>
>>> summary_stats shows the stats(usage and time) from all the LPI states
>>> under a device. The summary_stats are provided to reduce the number'
>>> of files to be accessed by the userspace to capture a snapshot of the'
>>> idle statistics.
>> Really ? What's the need to reduce the no. of file accesses ?
> When we have a large number of cores, with multiple idle state + few auto-promotable
> states. The amount of files we need to access to get a snapshot before/after a running
> a workload is quite high.
> 

OK. Since I don't have much knowledge on that, I can't really comment,
but I wonder why is that not done for many stats that are per-cpu today.

> It gets worse if we want to keep the file handles open to sample it little more frequently,
> to get breakdown of idle state distribution during different phases of a workload.

On the contrary agreeing with you on issue with large no. of file
handles, why not we have single stat file that provides all the
information you need and be done with it ? Why do we need all this
hierarchy of sysfs if the summary_stats can provide all those
information broken down.

[...]

>>> +
>>> +
>>> +/*
>>> + * LPI sysfs support
>>> + * Exports two APIs that can be called as part of init and exit to setup the LPI
>>> + * sysfs entries either from processor or processor_container driver
>>> + */
>>> +
>> [...]
>>
>> Lot of this sysfs handling looks similar to what we already have for
>> cpuidle. I don't have a quick solution to avoid duplication but Greg
>> really hate dealing with kobjects/ksets. I need to think if there's any
>> better way to do this. Sorry for just raising issue without solution.
> Given the hierarchical nature of LPI and presence of  auto promote-able states, it was
> hard to fit it to cpuidle model. I will take a look at it again, though I am not confident
> about finding a solution to avoid duplication.

Not a problem. I will also have a look, we can even do that later if
required.
Prakash, Prashanth April 25, 2017, 1:17 a.m. UTC | #6
On 4/20/2017 2:46 AM, Sudeep Holla wrote:
>
> On 19/04/17 23:57, Prakash, Prashanth wrote:
>> Hi Sudeep,
>>
>> On 4/19/2017 9:37 AM, Sudeep Holla wrote:
>>> On 30/03/17 01:13, Prashanth Prakash wrote:
>>>> Add support to expose idle statistics maintained by platform to
>>>> userspace via sysfs in addition to other data of interest from
>>>> each LPI(Low Power Idle) state.
>>>>
>>> While I understand this information is useful for some optimization
>>> and also for idle characterization with different workloads, I prefer
>>> to keep this in debugfs for:
>>>
>>> 1. We already have CPUIdle stats, this information may be more accurate
>>>    which is good for above reasons but user-space applications shouldn't
>>>    depend on it and end-up misusing it.
>>> 2. Also as more features get pushed into hardware, even these stats may
>>>    not remain so much accurate as it is today and hence it would be
>>>    better if user-space applications never use/depend on them.
>>>
>>> Let me know if there are conflicting reasons ?
>> The information about idle state of shared resources(Cache, interconnect ...)
>> which cannot be deduced from the cpuidle stats is quite useful. We can use this
>> to analyze newer workloads and to explain their power consumption, especially as
>> the amount of time some of these shared resources spends in different LPI states
>> can be influenced by changes to workload, kernel or firmware. And for auto
>> promote-able states this is the only way to capture idle stats.
>>
> I agree that the stats are useful, no argument there. The question is
> more in terms of whether it can be debugfs which is just useful in
> analysis and characterization or sysfs which becomes user ABI.
>
>> Regarding 2, since these stats are clearly defined by ACPI spec and maintained by
>> platform, I think it is reasonable to expect them to be accurate. If it is not accurate,
>> it is likely that platform is breaking the spec.
>>
> I was considering firmware vs hardware here. If f/w tracks and updates
> these statistics, it may not be so accurate in the future if more
> controls that are today done in f/w will be done automatically done in
> h/w. If h/w updates these statistics, then yes they will be accurate.
Agree with f/w tracking vs h/w tracking, but I am not sure if we can (or should even try
 to) handle those issues at LPI driver level.
> I am still trying to convince myself if we need this as sysfs user ABI,
> so just thinking aloud.
Sure.
>> Given above, I don't see much room for user-space applications to misuse this.
> You never know :)
:-)
>
>> Given these are defined as optional in spec, user-space application should use
>> only if/when available and use it as complementary to cpuidle stats.
>>
> Fair enough.
>
> [...]
>
>>>> 	|	|-> flags
>>>> 	|	|-> arch_flags
>>>> 	|
>>>> 	<<more states>>
>>>>
>>>> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
>>>>
>>>> stateX contains information related to a specific LPI state defined
>>>> in the LPI ACPI tables.
>>>>
>>>> summary_stats shows the stats(usage and time) from all the LPI states
>>>> under a device. The summary_stats are provided to reduce the number'
>>>> of files to be accessed by the userspace to capture a snapshot of the'
>>>> idle statistics.
>>> Really ? What's the need to reduce the no. of file accesses ?
>> When we have a large number of cores, with multiple idle state + few auto-promotable
>> states. The amount of files we need to access to get a snapshot before/after a running
>> a workload is quite high.
>>
> OK. Since I don't have much knowledge on that, I can't really comment,
> but I wonder why is that not done for many stats that are per-cpu today.
Good point. The only example I can think of is cpufreq stats that does something like this,
though I am not sure of exact motivation(s) behind it.

I think one could have made the similar case with cpuidle sysfs as well. Moreover
summary_stats only mitigates some of the overhead with frequent reading of stats and
doesn't really fix it in a scalable manner. So, I  suppose the current reasoning to have
summary_stats is quite weak to start with.
>> It gets worse if we want to keep the file handles open to sample it little more frequently,
>> to get breakdown of idle state distribution during different phases of a workload.
> On the contrary agreeing with you on issue with large no. of file
> handles, why not we have single stat file that provides all the
> information you need and be done with it ? Why do we need all this
> hierarchy of sysfs if the summary_stats can provide all those
> information broken down.
With hierarchy, it is possible to get information about relationship between cpus
and  higher order shared resources. If we flatten the hierarchy we lose information
about relationships and it gets a little hard to clearly represent the idle states of
shared resources and which cpus share those resources.

--
Thanks,
Prashanth
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 25, 2017, 10:28 a.m. UTC | #7
On 25/04/17 02:17, Prakash, Prashanth wrote:
> 
> 
> On 4/20/2017 2:46 AM, Sudeep Holla wrote:
>>
>> On 19/04/17 23:57, Prakash, Prashanth wrote:
>>> Hi Sudeep,
>>>
>>> On 4/19/2017 9:37 AM, Sudeep Holla wrote:
>>>> On 30/03/17 01:13, Prashanth Prakash wrote:
>>>>> Add support to expose idle statistics maintained by platform to
>>>>> userspace via sysfs in addition to other data of interest from
>>>>> each LPI(Low Power Idle) state.
>>>>>
>>>> While I understand this information is useful for some optimization
>>>> and also for idle characterization with different workloads, I prefer
>>>> to keep this in debugfs for:
>>>>
>>>> 1. We already have CPUIdle stats, this information may be more accurate
>>>>    which is good for above reasons but user-space applications shouldn't
>>>>    depend on it and end-up misusing it.
>>>> 2. Also as more features get pushed into hardware, even these stats may
>>>>    not remain so much accurate as it is today and hence it would be
>>>>    better if user-space applications never use/depend on them.
>>>>
>>>> Let me know if there are conflicting reasons ?
>>> The information about idle state of shared resources(Cache, interconnect ...)
>>> which cannot be deduced from the cpuidle stats is quite useful. We can use this
>>> to analyze newer workloads and to explain their power consumption, especially as
>>> the amount of time some of these shared resources spends in different LPI states
>>> can be influenced by changes to workload, kernel or firmware. And for auto
>>> promote-able states this is the only way to capture idle stats.
>>>
>> I agree that the stats are useful, no argument there. The question is
>> more in terms of whether it can be debugfs which is just useful in
>> analysis and characterization or sysfs which becomes user ABI.
>>
>>> Regarding 2, since these stats are clearly defined by ACPI spec and maintained by
>>> platform, I think it is reasonable to expect them to be accurate. If it is not accurate,
>>> it is likely that platform is breaking the spec.
>>>
>> I was considering firmware vs hardware here. If f/w tracks and updates
>> these statistics, it may not be so accurate in the future if more
>> controls that are today done in f/w will be done automatically done in 
>> h/w. If h/w updates these statistics, then yes they will be accurate.
>
> Agree with f/w tracking vs h/w tracking, but I am not sure if we can (or should even try
>  to) handle those issues at LPI driver level.

No I was not saying to control it. I just raised it to make sure we
don't allow userspace to depend on these stats too much other than just
characterization and analysis as these might be lost in future platforms.

>> I am still trying to convince myself if we need this as sysfs user ABI,
>> so just thinking aloud.
> Sure.
>>> Given above, I don't see much room for user-space applications to misuse this.
>> You never know :)
> :-)
>>
>>> Given these are defined as optional in spec, user-space application should use
>>> only if/when available and use it as complementary to cpuidle stats.
>>>
>> Fair enough.
>>
>> [...]
>>
>>>>> 	|	|-> flags
>>>>> 	|	|-> arch_flags
>>>>> 	|
>>>>> 	<<more states>>
>>>>>
>>>>> ACPI00XX can be ACPI0007(processor) or ACPI0010(processor container)
>>>>>
>>>>> stateX contains information related to a specific LPI state defined
>>>>> in the LPI ACPI tables.
>>>>>
>>>>> summary_stats shows the stats(usage and time) from all the LPI states
>>>>> under a device. The summary_stats are provided to reduce the number'
>>>>> of files to be accessed by the userspace to capture a snapshot of the'
>>>>> idle statistics.
>>>> Really ? What's the need to reduce the no. of file accesses ?
>>> When we have a large number of cores, with multiple idle state + few auto-promotable
>>> states. The amount of files we need to access to get a snapshot before/after a running
>>> a workload is quite high.
>>>
>> OK. Since I don't have much knowledge on that, I can't really comment,
>> but I wonder why is that not done for many stats that are per-cpu today.
> Good point. The only example I can think of is cpufreq stats that does something like this,
> though I am not sure of exact motivation(s) behind it.
> 
> I think one could have made the similar case with cpuidle sysfs as well. Moreover
> summary_stats only mitigates some of the overhead with frequent reading of stats and
> doesn't really fix it in a scalable manner. So, I  suppose the current reasoning to have
> summary_stats is quite weak to start with.

After you brought up this point, I was thinking to use this over
hierarchical representation. If we put this summary_stats in debugfs,
then no need to worry about user ABI.

>>> It gets worse if we want to keep the file handles open to sample it little more frequently,
>>> to get breakdown of idle state distribution during different phases of a workload.
>> On the contrary agreeing with you on issue with large no. of file
>> handles, why not we have single stat file that provides all the
>> information you need and be done with it ? Why do we need all this
>> hierarchy of sysfs if the summary_stats can provide all those
>> information broken down.
> With hierarchy, it is possible to get information about relationship between cpus
> and  higher order shared resources. If we flatten the hierarchy we lose information
> about relationships and it gets a little hard to clearly represent the idle states of
> shared resources and which cpus share those resources.
> 

No, what I meant is to have complete information broken down to the
lowest possible level, but just one file access to get that information.

IIUC, you had some format already for summary_stat file, just extend
to get all the information you would get reading individual sysfs files
organized hierarchically. I don't think that should matter much as you
would have all these information, just the representation differs. I am
assuming all this data is not used/interpreted on the fly, but mostly
used offline for analysis. Otherwise it's already misuse and we don't
want to expose such user ABI IMO.

Just to summarize, though I agree this LPI stats are more accurate and
more representative summary, I think it may fade away as we move towards
hardware controlled lower power states. Since we already have cpuidle
stats, I prefer to keep this LPI stats interface to userspace as simple
and minimal as possible and yet helpful to get all the information.
Prakash, Prashanth April 26, 2017, 11:21 p.m. UTC | #8
Hi Sudeep,

On 4/25/2017 4:28 AM, Sudeep Holla wrote:
>> With hierarchy, it is possible to get information about relationship between cpus
>> and  higher order shared resources. If we flatten the hierarchy we lose information
>> about relationships and it gets a little hard to clearly represent the idle states of
>> shared resources and which cpus share those resources.
>>
> No, what I meant is to have complete information broken down to the
> lowest possible level, but just one file access to get that information.
>
> IIUC, you had some format already for summary_stat file, just extend
> to get all the information you would get reading individual sysfs files
> organized hierarchically. I don't think that should matter much as you
> would have all these information, just the representation differs. I am
> assuming all this data is not used/interpreted on the fly, but mostly
> used offline for analysis. Otherwise it's already misuse and we don't
> want to expose such user ABI IMO.
>
> Just to summarize, though I agree this LPI stats are more accurate and
> more representative summary, I think it may fade away as we move towards
> hardware controlled lower power states. Since we already have cpuidle
> stats, I prefer to keep this LPI stats interface to userspace as simple
> and minimal as possible and yet helpful to get all the information.
Sounds good. I will remove summary_stats and make other changes based on
feedback and post next rev.

Thanks for the review and feedback!

--
Thanks,
Prashanth

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 27, 2017, 9:16 a.m. UTC | #9
On 27/04/17 00:21, Prakash, Prashanth wrote:
> Hi Sudeep,
> 
> On 4/25/2017 4:28 AM, Sudeep Holla wrote:
>>> With hierarchy, it is possible to get information about relationship between cpus
>>> and  higher order shared resources. If we flatten the hierarchy we lose information
>>> about relationships and it gets a little hard to clearly represent the idle states of
>>> shared resources and which cpus share those resources.
>>>
>> No, what I meant is to have complete information broken down to the
>> lowest possible level, but just one file access to get that information.
>>
>> IIUC, you had some format already for summary_stat file, just extend
>> to get all the information you would get reading individual sysfs files
>> organized hierarchically. I don't think that should matter much as you
>> would have all these information, just the representation differs. I am
>> assuming all this data is not used/interpreted on the fly, but mostly
>> used offline for analysis. Otherwise it's already misuse and we don't
>> want to expose such user ABI IMO.
>>
>> Just to summarize, though I agree this LPI stats are more accurate and
>> more representative summary, I think it may fade away as we move towards
>> hardware controlled lower power states. Since we already have cpuidle
>> stats, I prefer to keep this LPI stats interface to userspace as simple
>> and minimal as possible and yet helpful to get all the information.
>
> Sounds good. I will remove summary_stats and make other changes based on
> feedback and post next rev.

I am sorry if I was not clear earlier, I was infact asking you what if
we just have single summary_stat file with all statistics broken down to
minuet details you want instead of creating the hierarchy of individual
files which you say has overhead to read.

Do you process the stats on the fly when workload is running, I assume not.
Prakash, Prashanth April 27, 2017, 9:04 p.m. UTC | #10
On 4/27/2017 3:16 AM, Sudeep Holla wrote:
>
> On 27/04/17 00:21, Prakash, Prashanth wrote:
>> Hi Sudeep,
>>
>> On 4/25/2017 4:28 AM, Sudeep Holla wrote:
>>>> With hierarchy, it is possible to get information about relationship between cpus
>>>> and  higher order shared resources. If we flatten the hierarchy we lose information
>>>> about relationships and it gets a little hard to clearly represent the idle states of
>>>> shared resources and which cpus share those resources.
>>>>
>>> No, what I meant is to have complete information broken down to the
>>> lowest possible level, but just one file access to get that information.
>>>
>>> IIUC, you had some format already for summary_stat file, just extend
>>> to get all the information you would get reading individual sysfs files
>>> organized hierarchically. I don't think that should matter much as you
>>> would have all these information, just the representation differs. I am
>>> assuming all this data is not used/interpreted on the fly, but mostly
>>> used offline for analysis. Otherwise it's already misuse and we don't
>>> want to expose such user ABI IMO.
>>>
>>> Just to summarize, though I agree this LPI stats are more accurate and
>>> more representative summary, I think it may fade away as we move towards
>>> hardware controlled lower power states. Since we already have cpuidle
>>> stats, I prefer to keep this LPI stats interface to userspace as simple
>>> and minimal as possible and yet helpful to get all the information.
>> Sounds good. I will remove summary_stats and make other changes based on
>> feedback and post next rev.
> I am sorry if I was not clear earlier, I was infact asking you what if
> we just have single summary_stat file with all statistics broken down to
> minuet details you want instead of creating the hierarchy of individual
> files which you say has overhead to read.
>
> Do you process the stats on the fly when workload is running, I assume not.
I don't see any use-cases for on the fly analysis. The most we might want to
do it capture this data periodically and then post-process it to see how different
phases of a workload did. The overhead of reading multiple files comes down what
could be the duration between reads to this sysfs without drastically impacting the
workload - so it is probably not such a big issue.

While it is possible to simplify by having all the data aggregated into a single
file at each level in the hierarchy, we end up trading some inefficiencies:
- We are reducing the number of file access but increasing the amount of
data read (which user-space may or may not be interested in), including
re-reading the constant fields(desc, latency etc..) every time.
- Adding all info on the same file increases the probability we might break
userspace parsing of this single file when we want add a new piece of
information. While this was possible earlier, the probability was little
lower as it was restricted to only time and usage.
- Another reason to avoid collapsing all the fields to single file is to keep
it consistent with cpudile interface

I was thinking about some of the above issues and when you mentioned
simple and minimal, I interpreted it as lets keep the interface simple by
removing summary_stats and few other fields we discussed. Sorry, my bad.

--
Thanks,
Prashanth

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0143135..a01368d 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -570,9 +570,19 @@  void __init acpi_early_processor_osc(void)
 static int acpi_processor_container_attach(struct acpi_device *dev,
 					   const struct acpi_device_id *id)
 {
+	if (dev->status.present && dev->status.functional &&
+		dev->status.enabled && dev->status.show_in_ui)
+		acpi_lpi_sysfs_init(dev->handle,
+				(struct acpi_lpi_sysfs_data **)&dev->driver_data);
 	return 1;
 }
 
+static void acpi_processor_container_detach(struct acpi_device *dev)
+{
+	if (dev->driver_data)
+		acpi_lpi_sysfs_exit((struct acpi_lpi_sysfs_data *)dev->driver_data);
+}
+
 static const struct acpi_device_id processor_container_ids[] = {
 	{ ACPI_PROCESSOR_CONTAINER_HID, },
 	{ }
@@ -581,6 +591,7 @@  static int acpi_processor_container_attach(struct acpi_device *dev,
 static struct acpi_scan_handler processor_container_handler = {
 	.ids = processor_container_ids,
 	.attach = acpi_processor_container_attach,
+	.detach = acpi_processor_container_detach,
 };
 
 /* The number of the unique processor IDs */
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 5c8aa9c..ae898c2 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -949,6 +949,24 @@  static int obj_get_integer(union acpi_object *obj, u32 *value)
 	return 0;
 }
 
+static int obj_get_generic_addr(union acpi_object *obj,
+				struct acpi_generic_address *addr)
+{
+	struct acpi_power_register *reg;
+
+	if (obj->type != ACPI_TYPE_BUFFER)
+		return -EINVAL;
+
+	reg = (struct acpi_power_register *)obj->buffer.pointer;
+	addr->space_id = reg->space_id;
+	addr->bit_width = reg->bit_width;
+	addr->bit_offset = reg->bit_offset;
+	addr->access_width = reg->access_size;
+	addr->address = reg->address;
+
+	return 0;
+}
+
 static int acpi_processor_evaluate_lpi(acpi_handle handle,
 				       struct acpi_lpi_states_array *info)
 {
@@ -1023,8 +1041,6 @@  static int acpi_processor_evaluate_lpi(acpi_handle handle,
 			continue;
 		}
 
-		/* elements[7,8] skipped for now i.e. Residency/Usage counter*/
-
 		obj = pkg_elem + 9;
 		if (obj->type == ACPI_TYPE_STRING)
 			strlcpy(lpi_state->desc, obj->string.pointer,
@@ -1052,6 +1068,10 @@  static int acpi_processor_evaluate_lpi(acpi_handle handle,
 
 		if (obj_get_integer(pkg_elem + 5, &lpi_state->enable_parent_state))
 			lpi_state->enable_parent_state = 0;
+
+		obj_get_generic_addr(pkg_elem + 7, &lpi_state->res_cntr);
+
+		obj_get_generic_addr(pkg_elem + 8, &lpi_state->usage_cntr);
 	}
 
 	acpi_handle_debug(handle, "Found %d power states\n", state_idx);
@@ -1208,6 +1228,14 @@  static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
 	pr->flags.has_lpi = 1;
 	pr->flags.power = 1;
 
+	/*
+	 * Set up LPI sysfs at the processor device level - acpi_lpi_sysfs_exit
+	 * will not be called on CPU offline path, so need to check if sysfs is
+	 * initialized before calling init
+	 */
+	if (!pr->power.lpi_sysfs_data)
+		acpi_lpi_sysfs_init(pr->handle, &pr->power.lpi_sysfs_data);
+
 	return 0;
 }
 
@@ -1477,8 +1505,321 @@  int acpi_processor_power_exit(struct acpi_processor *pr)
 		acpi_processor_registered--;
 		if (acpi_processor_registered == 0)
 			cpuidle_unregister_driver(&acpi_idle_driver);
+
+		if (pr->power.lpi_sysfs_data)
+			acpi_lpi_sysfs_exit(pr->power.lpi_sysfs_data);
 	}
 
 	pr->flags.power_setup_done = 0;
 	return 0;
 }
+
+
+/*
+ * LPI sysfs support
+ * Exports two APIs that can be called as part of init and exit to setup the LPI
+ * sysfs entries either from processor or processor_container driver
+ */
+
+struct acpi_lpi_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct kobject *kobj, struct attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct kobject *kobj, struct attribute *attr,
+			const char *c, ssize_t count);
+};
+
+#define define_lpi_ro(_name) static struct acpi_lpi_attr _name =	\
+		__ATTR(_name, 0444, show_##_name, NULL)
+
+#define to_acpi_lpi_sysfs_state(k)				\
+	container_of(k, struct acpi_lpi_sysfs_state, kobj)
+
+#define to_acpi_lpi_state(k)			\
+	to_acpi_lpi_sysfs_state(k)->lpi_state
+
+static ssize_t show_desc(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%s\n", lpi->desc);
+}
+define_lpi_ro(desc);
+
+static int acpi_lpi_get_time(struct acpi_lpi_state *lpi, u64 *val)
+{
+	struct acpi_generic_address *reg;
+
+	if (!lpi)
+		return -EFAULT;
+
+	reg = &lpi->res_cntr;
+
+	/* Supporting only system memory */
+	if (reg->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY ||
+		!(lpi->flags & ACPI_LPI_STATE_FLAGS_ENABLED) ||
+		!reg->address || !lpi->res_cnt_freq)
+		return -EINVAL;
+
+	if (ACPI_FAILURE(acpi_read(val, reg)))
+		return -EFAULT;
+
+	*val = (*val * 1000000) / lpi->res_cnt_freq;
+	return 0;
+
+}
+
+/* shows residency in us */
+static ssize_t show_time(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+	u64 val = 0;
+	int ret;
+
+	ret = acpi_lpi_get_time(lpi, &val);
+
+	if (ret == -EINVAL)
+		return scnprintf(buf, PAGE_SIZE, "<unsupported>\n");
+
+	if (ret)
+		return ret;
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+}
+define_lpi_ro(time);
+
+static int acpi_lpi_get_usage(struct acpi_lpi_state *lpi, u64 *val)
+{
+	struct acpi_generic_address *reg;
+
+	if (!lpi)
+		return -EFAULT;
+
+	reg = &lpi->usage_cntr;
+
+	/* Supporting only system memory now (FFH not supported) */
+	if (reg->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY ||
+		!(lpi->flags & ACPI_LPI_STATE_FLAGS_ENABLED) ||
+		!reg->address)
+		return -EINVAL;
+
+	if (ACPI_FAILURE(acpi_read(val, reg)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static ssize_t show_usage(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+	u64 val = 0;
+	int ret;
+
+	ret = acpi_lpi_get_usage(lpi, &val);
+
+	if (ret == -EINVAL)
+		return scnprintf(buf, PAGE_SIZE, "<unsupported>\n");
+
+	if (ret)
+		return ret;
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+}
+define_lpi_ro(usage);
+
+static ssize_t show_flags(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+	bool enabled =  lpi->flags & ACPI_LPI_STATE_FLAGS_ENABLED;
+
+	return scnprintf(buf, PAGE_SIZE, "%s\n",
+			enabled ? "Enabled" : "Disabled");
+}
+define_lpi_ro(flags);
+
+static ssize_t show_arch_flags(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "0x%x\n", lpi->arch_flags);
+}
+define_lpi_ro(arch_flags);
+
+static ssize_t show_min_residency(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", lpi->min_residency);
+}
+define_lpi_ro(min_residency);
+
+static ssize_t show_latency(struct kobject *kobj, struct attribute *attr,
+			char *buf)
+{
+	struct acpi_lpi_state *lpi = to_acpi_lpi_state(kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", lpi->wake_latency);
+}
+define_lpi_ro(latency);
+
+static struct attribute *acpi_lpi_state_attrs[] = {
+	&desc.attr,
+	&flags.attr,
+	&arch_flags.attr,
+	&min_residency.attr,
+	&latency.attr,
+	&time.attr,
+	&usage.attr,
+	NULL
+};
+
+static struct kobj_type lpi_state_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+	.default_attrs = acpi_lpi_state_attrs,
+};
+
+static ssize_t show_summary_stats(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+{
+	struct acpi_lpi_sysfs_data *sysfs_data =
+		container_of(kobj, struct acpi_lpi_sysfs_data, kobj);
+	struct acpi_lpi_state *state;
+	u64 time, usage;
+	int i, ret_time, ret_usage, valid_states_found = 0;
+	ssize_t len;
+
+	len = scnprintf(buf, PAGE_SIZE, "%5s %20s %20s\n", "state",
+			"usage", "time(us)");
+
+	for (i = 0; i < sysfs_data->state_count; i++) {
+		state = sysfs_data->sysfs_states[i].lpi_state;
+
+		ret_time = acpi_lpi_get_time(state, &time);
+		ret_usage = acpi_lpi_get_usage(state, &usage);
+		if (ret_time || ret_usage)
+			continue;
+
+		len += scnprintf(buf + len, PAGE_SIZE - len,
+				"%5d %20llu %20llu\n", i, usage, time);
+
+		valid_states_found = 1;
+	}
+
+	if (valid_states_found)
+		return len;
+
+	return scnprintf(buf, PAGE_SIZE, "<unsupported>\n");
+}
+define_lpi_ro(summary_stats);
+
+static void acpi_lpi_sysfs_release(struct kobject *kobj)
+{
+	struct acpi_lpi_sysfs_data *sysfs_data =
+		container_of(kobj, struct acpi_lpi_sysfs_data, kobj);
+
+	kfree(sysfs_data->sysfs_states->lpi_state);
+	kfree(sysfs_data->sysfs_states);
+	kfree(sysfs_data);
+}
+
+static struct attribute *acpi_lpi_device_attrs[] = {
+	&summary_stats.attr,
+	NULL
+};
+
+static struct kobj_type lpi_device_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+	.default_attrs = acpi_lpi_device_attrs,
+	.release = acpi_lpi_sysfs_release,
+};
+
+int acpi_lpi_sysfs_init(acpi_handle h,
+			struct acpi_lpi_sysfs_data **lpi_sysfs_data)
+{
+	struct acpi_device *d;
+	struct acpi_lpi_states_array info;
+	struct acpi_lpi_sysfs_state *sysfs_state = NULL;
+	struct acpi_lpi_sysfs_data *data = NULL;
+	int ret, i;
+
+	if (!lpi_sysfs_data)
+		return -EINVAL;
+
+	ret = acpi_bus_get_device(h, &d);
+	if (ret)
+		return ret;
+
+	ret = acpi_processor_evaluate_lpi(h, &info);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(struct acpi_lpi_sysfs_data), GFP_KERNEL);
+	if (!data) {
+		ret = -ENOMEM;
+		goto kfree_and_return;
+	}
+
+	sysfs_state = kcalloc(info.size, sizeof(struct acpi_lpi_sysfs_state),
+			GFP_KERNEL);
+	if (!sysfs_state) {
+		ret = -ENOMEM;
+		goto kfree_and_return;
+	}
+
+	ret = kobject_init_and_add(&data->kobj, &lpi_device_ktype, &d->dev.kobj,
+				"lpi");
+	if (ret)
+		goto kfree_and_return;
+
+	*lpi_sysfs_data = data;
+	data->state_count = info.size;
+	data->sysfs_states = sysfs_state;
+
+	for (i = 0; i < info.size; i++, sysfs_state++) {
+		sysfs_state->lpi_state = info.entries + i;
+		ret = kobject_init_and_add(&sysfs_state->kobj, &lpi_state_ktype,
+					&data->kobj, "state%d", i);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		while (i > 0) {
+			i--;
+			sysfs_state = data->sysfs_states + i;
+			kobject_put(&sysfs_state->kobj);
+		}
+
+		kobject_put(&data->kobj);
+	}
+	return ret;
+
+kfree_and_return:
+	kfree(data);
+	kfree(sysfs_state);
+	kfree(info.entries);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(acpi_lpi_sysfs_init);
+
+int acpi_lpi_sysfs_exit(struct acpi_lpi_sysfs_data *sysfs_data)
+{
+	int i;
+
+	if (!sysfs_data)
+		return -EFAULT;
+
+	for (i = 0; i < sysfs_data->state_count; i++)
+		kobject_put(&sysfs_data->sysfs_states[i].kobj);
+
+	kobject_put(&sysfs_data->kobj);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_lpi_sysfs_exit);
diff --git a/include/acpi/processor.h b/include/acpi/processor.h
index c1ba00f..4baa14c 100644
--- a/include/acpi/processor.h
+++ b/include/acpi/processor.h
@@ -79,6 +79,19 @@  struct acpi_lpi_state {
 	u8 index;
 	u8 entry_method;
 	char desc[ACPI_CX_DESC_LEN];
+	struct acpi_generic_address res_cntr;
+	struct acpi_generic_address usage_cntr;
+};
+
+struct acpi_lpi_sysfs_state {
+	struct acpi_lpi_state *lpi_state;
+	struct kobject kobj;
+};
+
+struct acpi_lpi_sysfs_data {
+	u8 state_count;
+	struct kobject kobj;
+	struct acpi_lpi_sysfs_state *sysfs_states;
 };
 
 struct acpi_processor_power {
@@ -88,6 +101,7 @@  struct acpi_processor_power {
 		struct acpi_lpi_state lpi_states[ACPI_PROCESSOR_MAX_POWER];
 	};
 	int timer_broadcast_on_state;
+	struct acpi_lpi_sysfs_data *lpi_sysfs_data;
 };
 
 /* Performance Management */
@@ -393,6 +407,8 @@  static inline void acpi_processor_throttling_init(void) {}
 int acpi_processor_power_exit(struct acpi_processor *pr);
 int acpi_processor_power_state_has_changed(struct acpi_processor *pr);
 int acpi_processor_hotplug(struct acpi_processor *pr);
+int acpi_lpi_sysfs_init(acpi_handle h, struct acpi_lpi_sysfs_data **sysfs_data);
+int acpi_lpi_sysfs_exit(struct acpi_lpi_sysfs_data *sysfs_data);
 #else
 static inline int acpi_processor_power_init(struct acpi_processor *pr)
 {
@@ -413,6 +429,17 @@  static inline int acpi_processor_hotplug(struct acpi_processor *pr)
 {
 	return -ENODEV;
 }
+
+
+int acpi_lpi_sysfs_init(acpi_handle h, struct acpi_lpi_sysfs_data **sysfs_data)
+{
+	return -ENODEV;
+}
+
+int acpi_lpi_sysfs_exit(struct acpi_lpi_sysfs_data *sysfs_data)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_ACPI_PROCESSOR_IDLE */
 
 /* in processor_thermal.c */