diff mbox

3.18: lockdep problems in cpufreq

Message ID 002f01d0186b$2700b730$75022590$%brar@samsung.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Yadwinder Singh Brar Dec. 15, 2014, 1:28 p.m. UTC
> -----Original Message-----
> From: Viresh Kumar [mailto:viresh.kumar@linaro.org]
> Sent: Monday, December 15, 2014 9:27 AM
> To: Rafael J. Wysocki; yadi.brar@samsung.com
> Cc: Russell King - ARM Linux; linux-arm-kernel@lists.infradead.org;
> linux-pm@vger.kernel.org; Eduardo Valentin
> Subject: Re: 3.18: lockdep problems in cpufreq
> 
> On 15 December 2014 at 03:53, Rafael J. Wysocki <rjw@rjwysocki.net>
> wrote:
> > On Sunday, December 14, 2014 09:36:55 PM Russell King - ARM Linux
> wrote:
> >> Here's a nice Christmas present one of my iMX6 machines gave me
> tonight.
> >> I haven't begun to look into it.
> 
> Is the culprit this one?
> 
> Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq
> policy with thermal constraints")
> 
> As this is what it changed:
> 
> @@ -316,21 +312,28 @@ static int cpufreq_thermal_notifier(struct
> notifier_block *nb,  {
>         struct cpufreq_policy *policy = data;
>         unsigned long max_freq = 0;
> +       struct cpufreq_cooling_device *cpufreq_dev;
> 
> -       if (event != CPUFREQ_ADJUST || notify_device == NOTIFY_INVALID)
> +       if (event != CPUFREQ_ADJUST)
>                 return 0;
> 
> -       if (cpumask_test_cpu(policy->cpu, &notify_device-
> >allowed_cpus))
> -               max_freq = notify_device->cpufreq_val;
> -       else
> -               return 0;
> +       mutex_lock(&cooling_cpufreq_lock);
> +       list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
> +               if (!cpumask_test_cpu(policy->cpu,
> +                                       &cpufreq_dev->allowed_cpus))
> +                       continue;
> +
> +               if (!cpufreq_dev->cpufreq_val)
> +                       cpufreq_dev->cpufreq_val = get_cpu_frequency(
> +                                       cpumask_any(&cpufreq_dev-
> >allowed_cpus),
> +                                       cpufreq_dev->cpufreq_state);
> 
> -       /* Never exceed user_policy.max */
> -       if (max_freq > policy->user_policy.max)
> -               max_freq = policy->user_policy.max;
> +               max_freq = cpufreq_dev->cpufreq_val;
> 
> -       if (policy->max != max_freq)
> -               cpufreq_verify_within_limits(policy, 0, max_freq);
> +               if (policy->max != max_freq)
> +                       cpufreq_verify_within_limits(policy, 0,
> max_freq);
> +       }
> +       mutex_unlock(&cooling_cpufreq_lock);
> 
>         return 0;
>  }
> 
> 
> >> ======================================================
> >> [ INFO: possible circular locking dependency detected ] 3.18.0+
> #1453
> >> Not tainted
> >> -------------------------------------------------------
> >> rc.local/770 is trying to acquire lock:
> >>  (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>]
> >> cpufreq_thermal_notifier+0x34/0xfc
> >>
> >> but task is already holding lock:
> >>  ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>]
> >> __blocking_notifier_call_chain+0x34/0x68
> >>
> >> which lock already depends on the new lock.
> >
> > Well, that's for Viresh.
> 
> Maybe not as the rework I have done is queued for this merge window.
> I was afraid really after reading the "offenders" discussion on IRC :)
> 
> Cc'd Yadwinder as well who wrote this patch.

Thanks :).

Unfortunately, I didn’t get any such warning though I tested
patch enabling CONFIG_PROVE_LOCKING before posting. It seems
Russell is trying to load imx_thermal as module and parallely
Changing cpufreq governor from userspace, which was not my
test case.

Anyways, after analyzing log and code,I think problem is not
in cpufreq_thermal_notifier which was modified in patch as
stated above. Actual problem is in __cpufreq_cooling_register
which is unnecessarily calling cpufreq_register_notifier()
inside section protected by cooling_cpufreq_lock.
Because cpufreq_policy_notifier_list).rwsem is already held
by store_scaling_governor when __cpufreq_cooling_register is
trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock. 

So I think following can fix the problem:



Best Regards,
Yadwinder
  

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Russell King - ARM Linux Dec. 15, 2014, 1:46 p.m. UTC | #1
On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote:
> Unfortunately, I didn’t get any such warning though I tested
> patch enabling CONFIG_PROVE_LOCKING before posting. It seems
> Russell is trying to load imx_thermal as module and parallely
> Changing cpufreq governor from userspace, which was not my
> test case.

No.  Yes, imx_thermal is a module, which is loaded by udev very early
in the userspace boot.  Later on, in the rc.local, I poke the governor
from userspace.

This is evidenced by the bluetooth modules being loaded after imx_thermal:

Module                  Size  Used by
bnep                   10574  2
rfcomm                 33720  0
bluetooth             293598  10 bnep,rfcomm
nfsd                   90264  0
exportfs                3988  1 nfsd
hid_cypress             1763  0
snd_soc_fsl_spdif       9753  2
imx_pcm_dma             1137  1 snd_soc_fsl_spdif
imx_sdma               12885  2
imx2_wdt                3248  0
imx_thermal             5386  0
snd_soc_imx_spdif       1877  0

and the timestamp on the bluetooth message:

[   25.503739] Bluetooth: Core ver 2.19

vs the timestamp on the lockdep message:

[   29.499195] [ INFO: possible circular locking dependency detected ]

My rc.local does this:

    # Configure cpufreq
    echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
    echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

> Anyways, after analyzing log and code,I think problem is not
> in cpufreq_thermal_notifier which was modified in patch as
> stated above. Actual problem is in __cpufreq_cooling_register
> which is unnecessarily calling cpufreq_register_notifier()
> inside section protected by cooling_cpufreq_lock.
> Because cpufreq_policy_notifier_list).rwsem is already held
> by store_scaling_governor when __cpufreq_cooling_register is
> trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock. 
> 
> So I think following can fix the problem:
> 
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index ad09e51..622ea40 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *np,
>         cpufreq_dev->cpufreq_state = 0;
>         mutex_lock(&cooling_cpufreq_lock);
>  
> -       /* Register the notifier for first cpufreq cooling device */
> -       if (cpufreq_dev_count == 0)
> -               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> -                                         CPUFREQ_POLICY_NOTIFIER);
>         cpufreq_dev_count++;
>         list_add(&cpufreq_dev->node, &cpufreq_dev_list);
>  
>         mutex_unlock(&cooling_cpufreq_lock);
>  
> +       /* Register the notifier for first cpufreq cooling device */
> +       if (cpufreq_dev_count == 0)
> +               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> +                                         CPUFREQ_POLICY_NOTIFIER);

No, sorry, this patch is worse.

1. cpufreq_register_notifier() will never be called, even on the first
   caller to this code: at the point where it's tested for zero, it will
   be incremented to one by the previous code.

2. What happens if two threads come here?

   The first thread succeeds in grabbing cooling_cpufreq_lock.  The second
   thread stalls waiting for cooling_cpufreq_lock to be released.

   The first thread increments cpufreq_dev_count, adds to the list, and then
   releases the lock.  At this point, control may be passed to the second
   thread (since mutex_unlock() will wake it up.)  The second thread gets
   into the critical region, and increments cpufreq_dev_count again.

   By the time the first thread runs, cpufreq_dev_count may be two at this
   point.

Unfortunately, you do need some kind of synchronisation here.  If it's
not important when cpufreq_register_notifier() gets called, maybe this
can work:

	bool register;

	mutex_lock(&cooling_cpufreq_lock);
	register = cpufreq_dev_count++ == 0;
	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
	mutex_unlock(&cooling_cpufreq_lock);

	if (register)
		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
					  CPUFREQ_POLICY_NOTIFIER);

However, I suspect that may well be buggy if we have a cleanup path which
unregisters the notifier when cpufreq_dev_count is decremented to zero...
which we seem to have in cpufreq_cooling_unregister().

The answer may well be to have finer grained locking here:

- one lock to protect cpufreq_dev_list, which is only ever taken when
  adding or removing entries from it

- a second lock to protect cpufreq_dev_count and the calls to
  cpufreq_register_notifier() and cpufreq_unregister_notifier()

and you would /never/ take either of those two locks inside each other.
In other words:

	mutex_lock(&cooling_list_lock);
	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
	mutex_unlock(&cooling_list_lock);

	mutex_lock(&cooling_cpufreq_lock);
	if (cpufreq_dev_count++ == 0)
		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
					  CPUFREQ_POLICY_NOTIFIER);
	mutex_unlock(&cooling_cpufreq_lock);

and similar in the cleanup path.  The notifier itself would only ever
take the cooling_list_lock.
Viresh Kumar Dec. 15, 2014, 2:38 p.m. UTC | #2
On 15 December 2014 at 18:58, Yadwinder Singh Brar
<yadi.brar@samsung.com> wrote:
> Unfortunately, I didn’t get any such warning though I tested
> patch enabling CONFIG_PROVE_LOCKING before posting. It seems

Even I couldn't reproduce it :)

> Anyways, after analyzing log and code,I think problem is not
> in cpufreq_thermal_notifier which was modified in patch as
> stated above. Actual problem is in __cpufreq_cooling_register
> which is unnecessarily calling cpufreq_register_notifier()
> inside section protected by cooling_cpufreq_lock.

Agreed, and I was looking to reproduce first before writing
some crazy code.

> Because cpufreq_policy_notifier_list).rwsem is already held
> by store_scaling_governor when __cpufreq_cooling_register is
> trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock.

Correct.

> So I think following can fix the problem:
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index ad09e51..622ea40 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *np,
>         cpufreq_dev->cpufreq_state = 0;
>         mutex_lock(&cooling_cpufreq_lock);
>
> -       /* Register the notifier for first cpufreq cooling device */
> -       if (cpufreq_dev_count == 0)
> -               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> -                                         CPUFREQ_POLICY_NOTIFIER);
>         cpufreq_dev_count++;
>         list_add(&cpufreq_dev->node, &cpufreq_dev_list);
>
>         mutex_unlock(&cooling_cpufreq_lock);
>
> +       /* Register the notifier for first cpufreq cooling device */
> +       if (cpufreq_dev_count == 0)

This will always fail :), over that we need to access cpufreq_dev_count
from within the lock.

> +               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> +                                         CPUFREQ_POLICY_NOTIFIER);
>         return cool_dev;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yadwinder Singh Brar Dec. 15, 2014, 2:54 p.m. UTC | #3
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Monday, December 15, 2014 7:17 PM
> To: Yadwinder Singh Brar
> Cc: 'Viresh Kumar'; 'Rafael J. Wysocki'; linux-arm-
> kernel@lists.infradead.org; linux-pm@vger.kernel.org; 'Eduardo
> Valentin'
> Subject: Re: 3.18: lockdep problems in cpufreq
> 
> On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote:
> > Unfortunately, I didn’t get any such warning though I tested patch
> > enabling CONFIG_PROVE_LOCKING before posting. It seems Russell is
> > trying to load imx_thermal as module and parallely Changing cpufreq
> > governor from userspace, which was not my test case.
> 
> No.  Yes, imx_thermal is a module, which is loaded by udev very early
> in the userspace boot.  Later on, in the rc.local, I poke the governor
> from userspace.
> 
> This is evidenced by the bluetooth modules being loaded after
> imx_thermal:
> 
> Module                  Size  Used by
> bnep                   10574  2
> rfcomm                 33720  0
> bluetooth             293598  10 bnep,rfcomm
> nfsd                   90264  0
> exportfs                3988  1 nfsd
> hid_cypress             1763  0
> snd_soc_fsl_spdif       9753  2
> imx_pcm_dma             1137  1 snd_soc_fsl_spdif
> imx_sdma               12885  2
> imx2_wdt                3248  0
> imx_thermal             5386  0
> snd_soc_imx_spdif       1877  0
> 
> and the timestamp on the bluetooth message:
> 
> [   25.503739] Bluetooth: Core ver 2.19
> 
> vs the timestamp on the lockdep message:
> 
> [   29.499195] [ INFO: possible circular locking dependency detected ]
> 
> My rc.local does this:
> 
>     # Configure cpufreq
>     echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
>     echo performance >
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> 
> > Anyways, after analyzing log and code,I think problem is not in
> > cpufreq_thermal_notifier which was modified in patch as stated above.
> > Actual problem is in __cpufreq_cooling_register which is
> unnecessarily
> > calling cpufreq_register_notifier() inside section protected by
> > cooling_cpufreq_lock.
> > Because cpufreq_policy_notifier_list).rwsem is already held by
> > store_scaling_governor when __cpufreq_cooling_register is trying to
> > cpufreq_policy_notifier_list while holding cooling_cpufreq_lock.
> >
> > So I think following can fix the problem:
> >
> > diff --git a/drivers/thermal/cpu_cooling.c
> > b/drivers/thermal/cpu_cooling.c index ad09e51..622ea40 100644
> > --- a/drivers/thermal/cpu_cooling.c
> > +++ b/drivers/thermal/cpu_cooling.c
> > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node
> *np,
> >         cpufreq_dev->cpufreq_state = 0;
> >         mutex_lock(&cooling_cpufreq_lock);
> >
> > -       /* Register the notifier for first cpufreq cooling device */
> > -       if (cpufreq_dev_count == 0)
> > -
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > -                                         CPUFREQ_POLICY_NOTIFIER);
> >         cpufreq_dev_count++;
> >         list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> >
> >         mutex_unlock(&cooling_cpufreq_lock);
> >
> > +       /* Register the notifier for first cpufreq cooling device */
> > +       if (cpufreq_dev_count == 0)
> > +
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > +                                         CPUFREQ_POLICY_NOTIFIER);
> 
> No, sorry, this patch is worse.
> 

Actually it was not patch :P , just moved cpufreq_register_notifier()
out of locking, as it is(C&P) to explain my point.

> 1. cpufreq_register_notifier() will never be called, even on the first
>    caller to this code: at the point where it's tested for zero, it
> will
>    be incremented to one by the previous code.
> 
> 2. What happens if two threads come here?
> 
>    The first thread succeeds in grabbing cooling_cpufreq_lock.  The
> second
>    thread stalls waiting for cooling_cpufreq_lock to be released.
> 
>    The first thread increments cpufreq_dev_count, adds to the list, and
> then
>    releases the lock.  At this point, control may be passed to the
> second
>    thread (since mutex_unlock() will wake it up.)  The second thread
> gets
>    into the critical region, and increments cpufreq_dev_count again.
> 
>    By the time the first thread runs, cpufreq_dev_count may be two at
> this
>    point.

Yes, may be.

> 
> Unfortunately, you do need some kind of synchronisation here.  If it's
> not important when cpufreq_register_notifier() gets called, maybe this
> can work:
> 
> 	bool register;
> 
> 	mutex_lock(&cooling_cpufreq_lock);
> 	register = cpufreq_dev_count++ == 0;
> 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> 	mutex_unlock(&cooling_cpufreq_lock);
> 
> 	if (register)

register may be 0 in scenario you stated above in second point.
So this approach will not work.
 
> 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> 					  CPUFREQ_POLICY_NOTIFIER);
> 
> However, I suspect that may well be buggy if we have a cleanup path
> which unregisters the notifier when cpufreq_dev_count is decremented to
> zero...
> which we seem to have in cpufreq_cooling_unregister().
> 
> The answer may well be to have finer grained locking here:
> 
> - one lock to protect cpufreq_dev_list, which is only ever taken when
>   adding or removing entries from it
> 
> - a second lock to protect cpufreq_dev_count and the calls to
>   cpufreq_register_notifier() and cpufreq_unregister_notifier()
> 
> and you would /never/ take either of those two locks inside each other.
> In other words:
> 
> 	mutex_lock(&cooling_list_lock);
> 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> 	mutex_unlock(&cooling_list_lock);
> 
> 	mutex_lock(&cooling_cpufreq_lock);
> 	if (cpufreq_dev_count++ == 0)
> 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> 					  CPUFREQ_POLICY_NOTIFIER);
> 	mutex_unlock(&cooling_cpufreq_lock);
> 
> and similar in the cleanup path.  The notifier itself would only ever
> take the cooling_list_lock.
> 

I agree with this approach, if its fine for others also, I can implement
and post patch.

Thanks,
Yadwinder

> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Dec. 15, 2014, 5:43 p.m. UTC | #4
On Mon, Dec 15, 2014 at 08:24:05PM +0530, Yadwinder Singh Brar wrote:
> > The answer may well be to have finer grained locking here:
> > 
> > - one lock to protect cpufreq_dev_list, which is only ever taken when
> >   adding or removing entries from it
> > 
> > - a second lock to protect cpufreq_dev_count and the calls to
> >   cpufreq_register_notifier() and cpufreq_unregister_notifier()
> > 
> > and you would /never/ take either of those two locks inside each other.
> > In other words:
> > 
> > 	mutex_lock(&cooling_list_lock);
> > 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > 	mutex_unlock(&cooling_list_lock);
> > 
> > 	mutex_lock(&cooling_cpufreq_lock);
> > 	if (cpufreq_dev_count++ == 0)
> > 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > 					  CPUFREQ_POLICY_NOTIFIER);
> > 	mutex_unlock(&cooling_cpufreq_lock);
> > 
> > and similar in the cleanup path.  The notifier itself would only ever
> > take the cooling_list_lock.
> > 
> 
> I agree with this approach, if its fine for others also, I can implement
> and post patch.

Yes, please do.
Rafael J. Wysocki Dec. 15, 2014, 9:41 p.m. UTC | #5
On Monday, December 15, 2014 05:43:36 PM Russell King - ARM Linux wrote:
> On Mon, Dec 15, 2014 at 08:24:05PM +0530, Yadwinder Singh Brar wrote:
> > > The answer may well be to have finer grained locking here:
> > > 
> > > - one lock to protect cpufreq_dev_list, which is only ever taken when
> > >   adding or removing entries from it
> > > 
> > > - a second lock to protect cpufreq_dev_count and the calls to
> > >   cpufreq_register_notifier() and cpufreq_unregister_notifier()
> > > 
> > > and you would /never/ take either of those two locks inside each other.
> > > In other words:
> > > 
> > > 	mutex_lock(&cooling_list_lock);
> > > 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > > 	mutex_unlock(&cooling_list_lock);
> > > 
> > > 	mutex_lock(&cooling_cpufreq_lock);
> > > 	if (cpufreq_dev_count++ == 0)
> > > 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > > 					  CPUFREQ_POLICY_NOTIFIER);
> > > 	mutex_unlock(&cooling_cpufreq_lock);
> > > 
> > > and similar in the cleanup path.  The notifier itself would only ever
> > > take the cooling_list_lock.
> > > 
> > 
> > I agree with this approach, if its fine for others also, I can implement
> > and post patch.
> 
> Yes, please do.

Indeed.  We have a regression here to fix.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Dec. 16, 2014, 3:37 a.m. UTC | #6
On 15 December 2014 at 20:24, Yadwinder Singh Brar
<yadi.brar@samsung.com> wrote:
>> Unfortunately, you do need some kind of synchronisation here.  If it's
>> not important when cpufreq_register_notifier() gets called, maybe this
>> can work:
>>
>>       bool register;
>>
>>       mutex_lock(&cooling_cpufreq_lock);
>>       register = cpufreq_dev_count++ == 0;
>>       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
>>       mutex_unlock(&cooling_cpufreq_lock);
>>
>>       if (register)
>
> register may be 0 in scenario you stated above in second point.
> So this approach will not work.

I didn't understood what you meant here. register will be zero only
for one of the threads.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index ad09e51..622ea40 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -484,15 +484,15 @@  __cpufreq_cooling_register(struct device_node *np,
        cpufreq_dev->cpufreq_state = 0;
        mutex_lock(&cooling_cpufreq_lock);
 
-       /* Register the notifier for first cpufreq cooling device */
-       if (cpufreq_dev_count == 0)
-               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
-                                         CPUFREQ_POLICY_NOTIFIER);
        cpufreq_dev_count++;
        list_add(&cpufreq_dev->node, &cpufreq_dev_list);
 
        mutex_unlock(&cooling_cpufreq_lock);
 
+       /* Register the notifier for first cpufreq cooling device */
+       if (cpufreq_dev_count == 0)
+               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
+                                         CPUFREQ_POLICY_NOTIFIER);
        return cool_dev;
 }