diff mbox

[v6] cpufreq: Add SFI based cpufreq driver support

Message ID 20141125054226.GA20738@intel-desktop (mailing list archive)
State Changes Requested, archived
Delegated to: Len Brown
Headers show

Commit Message

Kasagar, Srinidhi Nov. 25, 2014, 5:42 a.m. UTC
This adds the SFI based cpu freq driver for some of the Intel's
Silvermont based Atom architectures like Z34xx and Z35xx.

Signed-off-by: Rudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com>
Signed-off-by: Srinidhi Kasagar <srinidhi.kasagar@intel.com>
---
v2:
(v2-v3: comments from Viresh)
* rephrase the Kconfig text
* merge the header file as its not being shared with anyone
* remove the unnecessary checks in sfi_cpufreq_target
* style changes in kzalloc
* fix the potential memory leak sfi_cpufreq_cpu_init
* move the cpufreq driver flag to the structure definition
* remove the redundant init of policy->cpu and policy->cur
* remove the unnecessary check while init the freq table
* merge the for loops in cpu_init
* use cpufreq_generic_attr rather than the custom one

v3:
* Add this change log
* remove the left over not used cached_freq var
* merge yet another for loop in cpu_init

v4:
* remove unnecessary pr_debug
* rephrase Kconfig text
* remove unused macro SFI_CPU_MAX
* Add Viresh's Acked-by

v5:
(comments from Len Brown)
* rename few private functions with sfi_cpufreq_ prefix
* move the PERF_CTL_MASK to msr-index.h
* make the driver.get current freq to refer PERF_CTRL
* remove sfi_processor_px, rather reuse sfi_freq_table_entry
* remove unnecessary memcpy, rather just point to the freq table
* consolidate get_performance_states with parse_freq
* remove extract_freq and make get_cur_freq to read from MSR directly
* move the state_count check while registering the perf states

v6:
* release the mutex in case of error in register_performance
(comments from Viresh)
* drop local perf pointer in ->target and ->init
* fix the order of freeing memories in the error path of ->init
* remove alloc_percpu, rather allocate it at runtime in ->init

 arch/x86/include/uapi/asm/msr-index.h |    1 +
 drivers/cpufreq/Kconfig.x86           |   10 ++
 drivers/cpufreq/Makefile              |    1 +
 drivers/cpufreq/sfi-cpufreq.c         |  230 +++++++++++++++++++++++++++++++++
 4 files changed, 242 insertions(+)
 create mode 100644 drivers/cpufreq/sfi-cpufreq.c

Comments

Viresh Kumar Nov. 25, 2014, 11:14 a.m. UTC | #1
On 25 November 2014 at 11:12, Srinidhi Kasagar
<srinidhi.kasagar@intel.com> wrote:
> This adds the SFI based cpu freq driver for some of the Intel's
> Silvermont based Atom architectures like Z34xx and Z35xx.
>
> Signed-off-by: Rudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com>
> Signed-off-by: Srinidhi Kasagar <srinidhi.kasagar@intel.com>

> v6:
> * release the mutex in case of error in register_performance
> (comments from Viresh)
> * drop local perf pointer in ->target and ->init
> * fix the order of freeing memories in the error path of ->init
> * remove alloc_percpu, rather allocate it at runtime in ->init

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Len Brown Dec. 4, 2014, 3:21 a.m. UTC | #2
I like this trend:

drivers/cpufreq/sfi-cpufreq.c |  529 +++++++++++++++++++++++++++++++++++++++++
v1: drivers/cpufreq/sfi-cpufreq.c |  426
+++++++++++++++++++++++++++++++++++++++++
v2: drivers/cpufreq/sfi-cpufreq.c |  332
+++++++++++++++++++++++++++++++++++++++++
v3: drivers/cpufreq/sfi-cpufreq.c |  326
+++++++++++++++++++++++++++++++++++++++++
v4: drivers/cpufreq/sfi-cpufreq.c |  317
+++++++++++++++++++++++++++++++++++++++++
v5: drivers/cpufreq/sfi-cpufreq.c         |  238
+++++++++++++++++++++++++++++++++
v6: drivers/cpufreq/sfi-cpufreq.c         |  230
+++++++++++++++++++++++++++++++++



On Tue, Nov 25, 2014 at 12:42 AM, Srinidhi Kasagar
<srinidhi.kasagar@intel.com> wrote:

> +++ b/drivers/cpufreq/sfi-cpufreq.c

> +#define SFI_FREQ_MASK          0xff00
> +
> +static DEFINE_PER_CPU(struct sfi_cpufreq_data *, drv_data);
> +static DEFINE_MUTEX(performance_mutex);

I'm not sure that cpufreq is able to call driver init's in parallel,
so I'm not sure that the driver init needs to lock against other
invocations of itself.
But even if called in parallel, what data does performance_mutex
actually protect?


> +
> +static struct sfi_freq_table_entry *f_entry;
> +static int sfi_cpufreq_num;

The frequency table and its number of entries are global to the system
and un-changing.
So they probably never have to be re-initialized after the first cpu
goes through the init routine.

ie. it doesn't appear to be necessary to have a copy per cpu.

Finally, perhaps more consistent names would be something like:

static sfi_freq_table_entry *freq_table_entries;
static int num_freq_table_entries;

> +
> +struct sfi_cpufreq_performance {
> +       unsigned int state;

state is remembering the most recent next_perf_state.
currently its only (dubious) use is during a printk at initialization time,
when is it probably always zero from being allocated but not yet used...

If you don't use it for caching the state in get_cur_freq_on_cpu()
as discussed below, I recommend you delete it.

> +       unsigned int state_count;

no need to have this, if you also have system-wide
num_freq_table_entries, above.

> +       struct sfi_freq_table_entry *states;

no need to have this, if you also have system-wide freq_table_entries, above.

> +};
> +
> +struct sfi_cpufreq_data {
> +       struct sfi_cpufreq_performance *sfi_data;

as sfi_cpufreq_performance data structure is now down to 4 bytes,
or 0 bytes, if you delete 'state'.  So delete the struct and include what
is left, here -- if anything.

> +       struct cpufreq_frequency_table *freq_table;
> +};
> +
> +static int parse_freq(struct sfi_table_header *table)
> +{
> +       struct sfi_table_simple *sb;
> +
> +       sb = (struct sfi_table_simple *)table;
> +
> +       sfi_cpufreq_num = SFI_GET_NUM_ENTRIES(sb, struct sfi_freq_table_entry);
> +       f_entry = (struct sfi_freq_table_entry *)sb->pentry;

are we sure that this pointer will still be valid after
sfi_table_parse() returns?
sfi_table_parse() appears to do an sfi_table_put() at the end...

> +
> +       return 0;
> +}
> +
> +static int sfi_cpufreq_register_performance(struct sfi_cpufreq_performance *pr)
> +{
> +       int ret = 0;
> +       mutex_lock(&performance_mutex);
> +
> +       /* parse the freq table from SFI */
> +       sfi_table_parse(SFI_SIG_FREQ, NULL, NULL, parse_freq);
> +       pr->state_count = sfi_cpufreq_num;
> +       if (pr->state_count <= 1) {
> +               pr_err("No p-states discovered\n");
> +               ret = -ENODEV;
> +               goto err;
> +       }
> +       pr->states = f_entry;
> +err:
> +       mutex_unlock(&performance_mutex);
> +
> +       return ret;
> +}
> +
> +static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
> +{
> +       u32 lo, hi, sfi_ctrl;
> +       struct cpufreq_frequency_table *pos;
> +       struct sfi_cpufreq_data *data = per_cpu(drv_data, cpu);
> +
> +       rdmsr_on_cpu(cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> +       lo &= SFI_FREQ_MASK;
> +
> +       cpufreq_for_each_entry(pos, data->freq_table) {
> +               sfi_ctrl = data->sfi_data->states[pos->driver_data].ctrl_val
> +                       & SFI_FREQ_MASK;
> +               if (sfi_ctrl == lo)
> +                       return pos->frequency;
> +       }
> +       return data->freq_table[0].frequency;
> +}

Not obvious (to me) that this in-accurate return value is more useful
to the caller than simply returning data->freq_table[state].frequency.

Unless the caller is taking into account for hardware coordination
of multiple PERF_CTRL MSR's, then it is garbage-in/garbage-out,
not to mention that by the time this value is used, it may have changed...

I'm curious what breaks in your Android setup if you don't supply
a get_cur_freq() at all...

But that said, acpi-cpufreq.c is much worse in this department.

> +
> +static int sfi_cpufreq_target(struct cpufreq_policy *policy,
> +                               unsigned int index)
> +{
> +       struct sfi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
> +       unsigned int next_perf_state = 0; /* Index into perf table */
> +       u32 lo, hi;
> +
> +       next_perf_state = data->freq_table[index].driver_data;
> +
> +       rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> +       lo = (lo & ~INTEL_PERF_CTL_MASK) |
> +               ((u32) data->sfi_data->states[next_perf_state].ctrl_val &
> +                       INTEL_PERF_CTL_MASK);
> +       wrmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, lo, hi);
> +
> +       data->sfi_data->state = next_perf_state;
> +
> +       return 0;
> +}
> +
> +static int sfi_cpufreq_cpu_init(struct cpufreq_policy *policy)
> +{
> +       unsigned int i, result = 0, valid_states = 0;
> +       unsigned int cpu = policy->cpu;
> +       struct sfi_cpufreq_data *data;
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return -ENOMEM;
> +
> +       data->sfi_data = kzalloc(sizeof(*data->sfi_data), GFP_KERNEL);
> +       if (!data->sfi_data) {
> +               result = -ENOMEM;
> +               goto err_free1;
> +       }
> +
> +       per_cpu(drv_data, cpu) = data;
> +
> +       result = sfi_cpufreq_register_performance(data->sfi_data);
> +       if (result)
> +               goto err_free2;
> +
> +       policy->shared_type = CPUFREQ_SHARED_TYPE_HW;
> +
> +       data->freq_table = kzalloc(sizeof(*data->freq_table) *
> +                       (data->sfi_data->state_count+1), GFP_KERNEL);
> +       if (!data->freq_table) {
> +               result = -ENOMEM;
> +               goto err_free2;
> +       }
> +
> +       policy->cpuinfo.transition_latency = 0;
> +       for (i = 0; i < data->sfi_data->state_count; i++) {
> +               /* detect transition latency */
> +               if ((data->sfi_data->states[i].latency * 1000) >
> +                   policy->cpuinfo.transition_latency)
> +                       policy->cpuinfo.transition_latency =
> +                               data->sfi_data->states[i].latency * 1000;
> +
> +               /* initialize the freq table */
> +               data->freq_table[valid_states].driver_data = i;
> +               data->freq_table[valid_states].frequency =
> +                               data->sfi_data->states[i].freq_mhz * 1000;
> +               valid_states++;
> +
> +               pr_debug("     %cP%d: %d MHz, %d uS\n",
> +                       (i == data->sfi_data->state ? '*' : ' '), i,
> +                       (u32) data->sfi_data->states[i].freq_mhz,
> +                       (u32) data->sfi_data->states[i].latency);
> +       }
> +
> +       data->freq_table[valid_states].frequency = CPUFREQ_TABLE_END;
> +       data->sfi_data->state = 0;
> +
> +       result = cpufreq_table_validate_and_show(policy, data->freq_table);
> +       if (result)
> +               goto err_freqfree;
> +
> +       pr_debug("CPU%u - SFI performance management activated.\n", cpu);
> +
> +       return result;
> +
> +err_freqfree:
> +       kfree(data->freq_table);
> +err_free2:
> +       per_cpu(drv_data, cpu) = NULL;
> +       kfree(data->sfi_data);
> +err_free1:
> +       kfree(data);
> +
> +       return result;
> +}
> +
> +static int sfi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> +{
> +       struct sfi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
> +
> +       kfree(data->freq_table);
> +       per_cpu(drv_data, policy->cpu) = NULL;
> +       kfree(data->sfi_data);
> +       kfree(data);
> +
> +       return 0;
> +}
> +
> +static struct cpufreq_driver sfi_cpufreq_driver = {
> +       .flags          = CPUFREQ_CONST_LOOPS,
> +       .get            = get_cur_freq_on_cpu,
> +       .verify         = cpufreq_generic_frequency_table_verify,
> +       .target_index   = sfi_cpufreq_target,
> +       .init           = sfi_cpufreq_cpu_init,
> +       .exit           = sfi_cpufreq_cpu_exit,
> +       .name           = "sfi-cpufreq",
> +       .attr           = cpufreq_generic_attr,
> +};
> +
> +static int __init sfi_cpufreq_init(void)
> +{
> +       return cpufreq_register_driver(&sfi_cpufreq_driver);
> +}
> +late_initcall(sfi_cpufreq_init);
> +
> +static void __exit sfi_cpufreq_exit(void)
> +{
> +       cpufreq_unregister_driver(&sfi_cpufreq_driver);
> +}
> +module_exit(sfi_cpufreq_exit);
> +
> +MODULE_AUTHOR("Vishwesh M Rudramuni <vishwesh.m.rudramuni@intel.com>");
> +MODULE_DESCRIPTION("SFI P-States Driver");
> +MODULE_LICENSE("GPL");
> --
> 1.7.9.5
>
>
> --
Viresh Kumar Dec. 4, 2014, 3:35 a.m. UTC | #3
On Thu, Dec 4, 2014 at 8:51 AM, Len Brown <lenb@kernel.org> wrote:
> I like this trend:
>
> drivers/cpufreq/sfi-cpufreq.c |  529 +++++++++++++++++++++++++++++++++++++++++
> v1: drivers/cpufreq/sfi-cpufreq.c |  426
> +++++++++++++++++++++++++++++++++++++++++
> v2: drivers/cpufreq/sfi-cpufreq.c |  332
> +++++++++++++++++++++++++++++++++++++++++
> v3: drivers/cpufreq/sfi-cpufreq.c |  326
> +++++++++++++++++++++++++++++++++++++++++
> v4: drivers/cpufreq/sfi-cpufreq.c |  317
> +++++++++++++++++++++++++++++++++++++++++
> v5: drivers/cpufreq/sfi-cpufreq.c         |  238
> +++++++++++++++++++++++++++++++++
> v6: drivers/cpufreq/sfi-cpufreq.c         |  230
> +++++++++++++++++++++++++++++++++

Nice outcome of our reviews :)

> On Tue, Nov 25, 2014 at 12:42 AM, Srinidhi Kasagar
> <srinidhi.kasagar@intel.com> wrote:
>
>> +++ b/drivers/cpufreq/sfi-cpufreq.c
>
>> +#define SFI_FREQ_MASK          0xff00
>> +
>> +static DEFINE_PER_CPU(struct sfi_cpufreq_data *, drv_data);
>> +static DEFINE_MUTEX(performance_mutex);
>
> I'm not sure that cpufreq is able to call driver init's in parallel,
> so I'm not sure that the driver init needs to lock against other
> invocations of itself.

No. Its not possible.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kasagar, Srinidhi Dec. 9, 2014, 11:16 a.m. UTC | #4
On Wed, Dec 03, 2014 at 10:21:57PM -0500, Len Brown wrote:
> I like this trend:

..and the trend will continue in the next version
as well :) Thank you and Viresh..

> 
> drivers/cpufreq/sfi-cpufreq.c |  529 +++++++++++++++++++++++++++++++++++++++++
> v1: drivers/cpufreq/sfi-cpufreq.c |  426
> +++++++++++++++++++++++++++++++++++++++++
> v2: drivers/cpufreq/sfi-cpufreq.c |  332
> +++++++++++++++++++++++++++++++++++++++++
> v3: drivers/cpufreq/sfi-cpufreq.c |  326
> +++++++++++++++++++++++++++++++++++++++++
> v4: drivers/cpufreq/sfi-cpufreq.c |  317
> +++++++++++++++++++++++++++++++++++++++++
> v5: drivers/cpufreq/sfi-cpufreq.c         |  238
> +++++++++++++++++++++++++++++++++
> v6: drivers/cpufreq/sfi-cpufreq.c         |  230
> +++++++++++++++++++++++++++++++++
> 
> 
> 
> On Tue, Nov 25, 2014 at 12:42 AM, Srinidhi Kasagar
> <srinidhi.kasagar@intel.com> wrote:
> 
> > +++ b/drivers/cpufreq/sfi-cpufreq.c
> 
> > +#define SFI_FREQ_MASK          0xff00
> > +
> > +static DEFINE_PER_CPU(struct sfi_cpufreq_data *, drv_data);
> > +static DEFINE_MUTEX(performance_mutex);
> 
> I'm not sure that cpufreq is able to call driver init's in parallel,
> so I'm not sure that the driver init needs to lock against other
> invocations of itself.
> But even if called in parallel, what data does performance_mutex
> actually protect?

hmm..I can drop this mutex stuff then..

[...]

> > +static int parse_freq(struct sfi_table_header *table)
> > +{
> > +       struct sfi_table_simple *sb;
> > +
> > +       sb = (struct sfi_table_simple *)table;
> > +
> > +       sfi_cpufreq_num = SFI_GET_NUM_ENTRIES(sb, struct sfi_freq_table_entry);
> > +       f_entry = (struct sfi_freq_table_entry *)sb->pentry;
> 
> are we sure that this pointer will still be valid after
> sfi_table_parse() returns?
> sfi_table_parse() appears to do an sfi_table_put() at the end...

That's why there used to be a memcpy earlier. I checked the instances
of sfi_table_parse and they mostly keep memcpy if that's global
and in our case, it is. So, I will retain that memcpy..

[...]

> > +static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
> > +{
> > +       u32 lo, hi, sfi_ctrl;
> > +       struct cpufreq_frequency_table *pos;
> > +       struct sfi_cpufreq_data *data = per_cpu(drv_data, cpu);
> > +
> > +       rdmsr_on_cpu(cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> > +       lo &= SFI_FREQ_MASK;
> > +
> > +       cpufreq_for_each_entry(pos, data->freq_table) {
> > +               sfi_ctrl = data->sfi_data->states[pos->driver_data].ctrl_val
> > +                       & SFI_FREQ_MASK;
> > +               if (sfi_ctrl == lo)
> > +                       return pos->frequency;
> > +       }
> > +       return data->freq_table[0].frequency;
> > +}
> 
> Not obvious (to me) that this in-accurate return value is more useful
> to the caller than simply returning data->freq_table[state].frequency.
> 
> Unless the caller is taking into account for hardware coordination
> of multiple PERF_CTRL MSR's, then it is garbage-in/garbage-out,
> not to mention that by the time this value is used, it may have changed...
> 
> I'm curious what breaks in your Android setup if you don't supply
> a get_cur_freq() at all...

Hmm..I do not think there is any mandate that the driver
should provide ->get callbacks to the core. However i'm
not sure if any library tools like libcpupower needs this.

I don't mind dropping this ->get callback, the driver still
works and I'm happy with the scaling_setspeed.

Srinidhi
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index e21331ce368f..4f6dae67dd10 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -318,6 +318,7 @@ 
 
 #define MSR_IA32_PERF_STATUS		0x00000198
 #define MSR_IA32_PERF_CTL		0x00000199
+#define INTEL_PERF_CTL_MASK		0xffff
 #define MSR_AMD_PSTATE_DEF_BASE		0xc0010064
 #define MSR_AMD_PERF_STATUS		0xc0010063
 #define MSR_AMD_PERF_CTL		0xc0010062
diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index 89ae88f91895..dd2b6f541f33 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -57,6 +57,16 @@  config X86_ACPI_CPUFREQ_CPB
 	  By enabling this option the acpi_cpufreq driver provides the old
 	  entry in addition to the new boost ones, for compatibility reasons.
 
+config X86_SFI_CPUFREQ
+	tristate "SFI P-States driver"
+	depends on X86_INTEL_MID && SFI
+	help
+	  This adds a CPUFreq driver for some Silvermont based Intel Atom
+	  architectures like Z34xx and Z35xx which enumerate processor
+	  performance states through SFI.
+
+	  If in doubt, say N.
+
 config ELAN_CPUFREQ
 	tristate "AMD Elan SC400 and SC410"
 	depends on MELAN
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index db6d9a2fea4d..c3b51efd4a85 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -41,6 +41,7 @@  obj-$(CONFIG_X86_P4_CLOCKMOD)		+= p4-clockmod.o
 obj-$(CONFIG_X86_CPUFREQ_NFORCE2)	+= cpufreq-nforce2.o
 obj-$(CONFIG_X86_INTEL_PSTATE)		+= intel_pstate.o
 obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY)	+= amd_freq_sensitivity.o
+obj-$(CONFIG_X86_SFI_CPUFREQ)		+= sfi-cpufreq.o
 
 ##################################################################################
 # ARM SoC drivers
diff --git a/drivers/cpufreq/sfi-cpufreq.c b/drivers/cpufreq/sfi-cpufreq.c
new file mode 100644
index 000000000000..bbab318bde4e
--- /dev/null
+++ b/drivers/cpufreq/sfi-cpufreq.c
@@ -0,0 +1,230 @@ 
+/*
+ *  SFI Processor P-States Driver
+ *  Based on ACPI Processor P-States Driver
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  General Public License for more details.
+ *
+ *  Author: Vishwesh M Rudramuni <vishwesh.m.rudramuni@intel.com>
+ *  Author: Srinidhi Kasagar <srinidhi.kasagar@intel.com>
+ */
+
+#include <linux/cpufreq.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sfi.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+
+#define SFI_FREQ_MASK		0xff00
+
+static DEFINE_PER_CPU(struct sfi_cpufreq_data *, drv_data);
+static DEFINE_MUTEX(performance_mutex);
+
+static struct sfi_freq_table_entry *f_entry;
+static int sfi_cpufreq_num;
+
+struct sfi_cpufreq_performance {
+	unsigned int state;
+	unsigned int state_count;
+	struct sfi_freq_table_entry *states;
+};
+
+struct sfi_cpufreq_data {
+	struct sfi_cpufreq_performance *sfi_data;
+	struct cpufreq_frequency_table *freq_table;
+};
+
+static int parse_freq(struct sfi_table_header *table)
+{
+	struct sfi_table_simple *sb;
+
+	sb = (struct sfi_table_simple *)table;
+
+	sfi_cpufreq_num = SFI_GET_NUM_ENTRIES(sb, struct sfi_freq_table_entry);
+	f_entry = (struct sfi_freq_table_entry *)sb->pentry;
+
+	return 0;
+}
+
+static int sfi_cpufreq_register_performance(struct sfi_cpufreq_performance *pr)
+{
+	int ret = 0;
+
+	mutex_lock(&performance_mutex);
+
+	/* parse the freq table from SFI */
+	sfi_table_parse(SFI_SIG_FREQ, NULL, NULL, parse_freq);
+
+	pr->state_count = sfi_cpufreq_num;
+	if (pr->state_count <= 1) {
+		pr_err("No p-states discovered\n");
+		ret = -ENODEV;
+		goto err;
+	}
+	pr->states = f_entry;
+err:
+	mutex_unlock(&performance_mutex);
+
+	return ret;
+}
+
+static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
+{
+	u32 lo, hi, sfi_ctrl;
+	struct cpufreq_frequency_table *pos;
+	struct sfi_cpufreq_data *data = per_cpu(drv_data, cpu);
+
+	rdmsr_on_cpu(cpu, MSR_IA32_PERF_CTL, &lo, &hi);
+	lo &= SFI_FREQ_MASK;
+
+	cpufreq_for_each_entry(pos, data->freq_table) {
+		sfi_ctrl = data->sfi_data->states[pos->driver_data].ctrl_val
+			& SFI_FREQ_MASK;
+		if (sfi_ctrl == lo)
+			return pos->frequency;
+	}
+	return data->freq_table[0].frequency;
+}
+
+static int sfi_cpufreq_target(struct cpufreq_policy *policy,
+				unsigned int index)
+{
+	struct sfi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
+	unsigned int next_perf_state = 0; /* Index into perf table */
+	u32 lo, hi;
+
+	next_perf_state = data->freq_table[index].driver_data;
+
+	rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
+	lo = (lo & ~INTEL_PERF_CTL_MASK) |
+		((u32) data->sfi_data->states[next_perf_state].ctrl_val &
+			INTEL_PERF_CTL_MASK);
+	wrmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, lo, hi);
+
+	data->sfi_data->state = next_perf_state;
+
+	return 0;
+}
+
+static int sfi_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+	unsigned int i, result = 0, valid_states = 0;
+	unsigned int cpu = policy->cpu;
+	struct sfi_cpufreq_data *data;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->sfi_data = kzalloc(sizeof(*data->sfi_data), GFP_KERNEL);
+	if (!data->sfi_data) {
+		result = -ENOMEM;
+		goto err_free1;
+	}
+
+	per_cpu(drv_data, cpu) = data;
+
+	result = sfi_cpufreq_register_performance(data->sfi_data);
+	if (result)
+		goto err_free2;
+
+	policy->shared_type = CPUFREQ_SHARED_TYPE_HW;
+
+	data->freq_table = kzalloc(sizeof(*data->freq_table) *
+			(data->sfi_data->state_count+1), GFP_KERNEL);
+	if (!data->freq_table) {
+		result = -ENOMEM;
+		goto err_free2;
+	}
+
+	policy->cpuinfo.transition_latency = 0;
+	for (i = 0; i < data->sfi_data->state_count; i++) {
+		/* detect transition latency */
+		if ((data->sfi_data->states[i].latency * 1000) >
+		    policy->cpuinfo.transition_latency)
+			policy->cpuinfo.transition_latency =
+				data->sfi_data->states[i].latency * 1000;
+
+		/* initialize the freq table */
+		data->freq_table[valid_states].driver_data = i;
+		data->freq_table[valid_states].frequency =
+				data->sfi_data->states[i].freq_mhz * 1000;
+		valid_states++;
+
+		pr_debug("     %cP%d: %d MHz, %d uS\n",
+			(i == data->sfi_data->state ? '*' : ' '), i,
+			(u32) data->sfi_data->states[i].freq_mhz,
+			(u32) data->sfi_data->states[i].latency);
+	}
+
+	data->freq_table[valid_states].frequency = CPUFREQ_TABLE_END;
+	data->sfi_data->state = 0;
+
+	result = cpufreq_table_validate_and_show(policy, data->freq_table);
+	if (result)
+		goto err_freqfree;
+
+	pr_debug("CPU%u - SFI performance management activated.\n", cpu);
+
+	return result;
+
+err_freqfree:
+	kfree(data->freq_table);
+err_free2:
+	per_cpu(drv_data, cpu) = NULL;
+	kfree(data->sfi_data);
+err_free1:
+	kfree(data);
+
+	return result;
+}
+
+static int sfi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+	struct sfi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
+
+	kfree(data->freq_table);
+	per_cpu(drv_data, policy->cpu) = NULL;
+	kfree(data->sfi_data);
+	kfree(data);
+
+	return 0;
+}
+
+static struct cpufreq_driver sfi_cpufreq_driver = {
+	.flags		= CPUFREQ_CONST_LOOPS,
+	.get		= get_cur_freq_on_cpu,
+	.verify		= cpufreq_generic_frequency_table_verify,
+	.target_index	= sfi_cpufreq_target,
+	.init		= sfi_cpufreq_cpu_init,
+	.exit		= sfi_cpufreq_cpu_exit,
+	.name		= "sfi-cpufreq",
+	.attr		= cpufreq_generic_attr,
+};
+
+static int __init sfi_cpufreq_init(void)
+{
+	return cpufreq_register_driver(&sfi_cpufreq_driver);
+}
+late_initcall(sfi_cpufreq_init);
+
+static void __exit sfi_cpufreq_exit(void)
+{
+	cpufreq_unregister_driver(&sfi_cpufreq_driver);
+}
+module_exit(sfi_cpufreq_exit);
+
+MODULE_AUTHOR("Vishwesh M Rudramuni <vishwesh.m.rudramuni@intel.com>");
+MODULE_DESCRIPTION("SFI P-States Driver");
+MODULE_LICENSE("GPL");