From patchwork Wed Aug 12 07:41:49 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Russell King X-Patchwork-Id: 6997551 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id F27329F39D for ; Wed, 12 Aug 2015 07:44:32 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9FA9F2060A for ; Wed, 12 Aug 2015 07:44:31 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 48E5A20609 for ; Wed, 12 Aug 2015 07:44:30 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZPQfz-0002me-Nj; Wed, 12 Aug 2015 07:42:19 +0000 Received: from pandora.arm.linux.org.uk ([2001:4d48:ad52:3201:214:fdff:fe10:1be6]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZPQfw-0002hj-48 for linux-arm-kernel@lists.infradead.org; Wed, 12 Aug 2015 07:42:18 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=arm.linux.org.uk; s=pandora-2014; h=Date:Sender:Message-Id:Content-Type:Content-Transfer-Encoding:MIME-Version:Subject:Cc:To:From:References:In-Reply-To; bh=NJBCre7FcAXVHmmfHqF5yCNd7/rqfFIjSb0Wb9uRELE=; b=h8s62LVgWHABGO3Wng2a2l6BZDk4QFy2dKnP8uQfwg7f9QgP0jaUoI9RTM1LF4s2uXNf1+8C1+MdpYt0iVS7aaAN/9rPHbJhgun/JP6eaVIWvXVw+4+akj/jrl385QtBlupjAVcHYC0H2o5sngqEMpGHSKGPHqH3WoLwkwIaiBw=; Received: from e0022681537dd.dyn.arm.linux.org.uk ([fd8f:7570:feb6:1:222:68ff:fe15:37dd]:60317 helo=rmk-PC.arm.linux.org.uk) by pandora.arm.linux.org.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1ZPQfW-0004fJ-71; Wed, 12 Aug 2015 08:41:50 +0100 Received: from rmk by rmk-PC.arm.linux.org.uk with local (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1ZPQfV-0005nb-DN; Wed, 12 Aug 2015 08:41:49 +0100 In-Reply-To: <20150812073530.GA16445@linux> References: <20150812073530.GA16445@linux> From: Russell King To: Viresh Kumar Subject: [PATCH] thermal: cpu_cooling: fix lockdep problems in cpu_cooling MIME-Version: 1.0 Content-Disposition: inline Message-Id: Date: Wed, 12 Aug 2015 08:41:49 +0100 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150812_004216_744623_A9C810F9 X-CRM114-Status: GOOD ( 19.82 ) X-Spam-Score: -2.2 (--) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eduardo Valentin , "Rafael J. Wysocki" , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A recent change to the cpu_cooling code introduced a AB-BA deadlock scenario between the cpufreq_policy_notifier_list rwsem and the cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held before the registration/removal of the notifier block (an operation which takes the rwsem), and the notifier code itself which takes the locks in the reverse order: ====================================================== [ INFO: possible circular locking dependency detected ] 3.18.0+ #1453 Not tainted ------------------------------------------------------- rc.local/770 is trying to acquire lock: (cooling_cpufreq_lock){+.+.+.}, at: [] cpufreq_thermal_notifier+0x34/0xfc but task is already holding lock: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [] __blocking_notifier_call_chain+0x34/0x68 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((cpufreq_policy_notifier_list).rwsem){++++.+}: [] down_write+0x44/0x9c [] blocking_notifier_chain_register+0x28/0xd8 [] cpufreq_register_notifier+0x68/0x90 [] __cpufreq_cooling_register.part.1+0x120/0x180 [] __cpufreq_cooling_register+0x98/0xa4 [] cpufreq_cooling_register+0x18/0x1c [] imx_thermal_probe+0x1c0/0x470 [imx_thermal] [] platform_drv_probe+0x50/0xac [] driver_probe_device+0x114/0x234 [] __driver_attach+0x9c/0xa0 [] bus_for_each_dev+0x5c/0x90 [] driver_attach+0x24/0x28 [] bus_add_driver+0xe0/0x1d8 [] driver_register+0x80/0xfc [] __platform_driver_register+0x50/0x64 [] 0xbf007018 [] do_one_initcall+0x88/0x1d8 [] load_module+0x1768/0x1ef8 [] SyS_init_module+0xe0/0xf4 [] ret_fast_syscall+0x0/0x48 -> #0 (cooling_cpufreq_lock){+.+.+.}: [] lock_acquire+0xb0/0x124 [] mutex_lock_nested+0x5c/0x3d8 [] cpufreq_thermal_notifier+0x34/0xfc [] notifier_call_chain+0x4c/0x8c [] __blocking_notifier_call_chain+0x50/0x68 [] blocking_notifier_call_chain+0x20/0x28 [] cpufreq_set_policy+0x7c/0x1d0 [] store_scaling_governor+0x74/0x9c [] store+0x90/0xc0 [] sysfs_kf_write+0x54/0x58 [] kernfs_fop_write+0xdc/0x190 [] vfs_write+0xac/0x1b4 [] SyS_write+0x44/0x90 [] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); *** DEADLOCK *** 7 locks held by rc.local/770: #0: (sb_writers#6){.+.+.+}, at: [] vfs_write+0x18c/0x1b4 #1: (&of->mutex){+.+.+.}, at: [] kernfs_fop_write+0xa0/0x190 #2: (s_active#52){.+.+.+}, at: [] kernfs_fop_write+0xa8/0x190 #3: (cpu_hotplug.lock){++++++}, at: [] get_online_cpus+0x34/0x90 #4: (cpufreq_rwsem){.+.+.+}, at: [] store+0x58/0xc0 #5: (&policy->rwsem){+.+.+.}, at: [] store+0x70/0xc0 #6: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [] __blocking_notifier_call_chain+0x34/0x68 stack backtrace: CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [] (dump_backtrace) from [] (show_stack+0x18/0x1c) r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000 [] (show_stack) from [] (dump_stack+0x7c/0x98) [] (dump_stack) from [] (print_circular_bug+0x28c/0x2d8) r4:c0b85a80 r3:d0071d40 [] (print_circular_bug) from [] (__lock_acquire+0x1acc/0x1bb0) r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240 [] (__lock_acquire) from [] (lock_acquire+0xb0/0x124) r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c r4:00000000 [] (lock_acquire) from [] (mutex_lock_nested+0x5c/0x3d8) r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000 r4:c04abfc4 [] (mutex_lock_nested) from [] (cpufreq_thermal_notifier+0x34/0xfc) r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000 r4:fffffffe [] (cpufreq_thermal_notifier) from [] (notifier_call_chain+0x4c/0x8c) r7:00000000 r6:00000000 r5:00000000 r4:fffffffe [] (notifier_call_chain) from [] (__blocking_notifier_call_chain+0x50/0x68) r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff [] (__blocking_notifier_call_chain) from [] (blocking_notifier_call_chain+0x20/0x28) r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c [] (blocking_notifier_call_chain) from [] (cpufreq_set_policy+0x7c/0x1d0) [] (cpufreq_set_policy) from [] (store_scaling_governor+0x74/0x9c) r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600 [] (store_scaling_governor) from [] (store+0x90/0xc0) r6:0000000c r5:ed76e6d4 r4:ed76e600 [] (store) from [] (sysfs_kf_write+0x54/0x58) r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c [] (sysfs_kf_write) from [] (kernfs_fop_write+0xdc/0x190) r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330 [] (kernfs_fop_write) from [] (vfs_write+0xac/0x1b4) r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c r4:eccde500 [] (vfs_write) from [] (SyS_write+0x44/0x90) r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000 [] (SyS_write) from [] (ret_fast_syscall+0x0/0x48) r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70 Solve this by moving to finer grained locking - use one mutex to protect the cpufreq_dev_list as a whole, and a separate lock to ensure correct ordering of cpufreq notifier registration and removal. I considered taking the cooling_list_lock within cooling_cpufreq_lock to protect the registration sequence as a whole, but that adds a dependency between these two locks which is best avoided (lest someone tries to take those two new locks in the reverse order.) In any case, it's safer to have an empty cpufreq_dev_list than to have unnecessary dependencies between locks. Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with Reviewed-by: Viresh Kumar Signed-off-by: Russell King For some reason, despite Rafael saying that this should go in, despite it having been reviewed, Eduardo appears to be ignoring basic fixes to code under his control. Having waited more than a reasonable amount of time, it is now time to take some action against this blockage. Hence, I'm putting this into linux-next in the hope that it'll conflict and elicit a response for this important bug fix. Relevant message IDs: 20141214213655.GA11285@n2100.arm.linux.org.uk 7573578.UE8ufgjWuX@vostro.rjw.lan 20141215230922.GL11285@n2100.arm.linux.org.uk CAKohponhkk9BK6-7ScHeZa4R43iooWQFV8YoPLv6LjO40GNq=A@mail.gmail.com 20150518185645.GA28053@n2100.arm.linux.org.uk 2574268.XBqpdL2VLI@vostro.rjw.lan --- drivers/thermal/cpu_cooling.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c index 6509c61b9648..d1fd02eec58d 100644 --- a/drivers/thermal/cpu_cooling.c +++ b/drivers/thermal/cpu_cooling.c @@ -107,6 +107,9 @@ struct cpufreq_cooling_device { static DEFINE_IDR(cpufreq_idr); static DEFINE_MUTEX(cooling_cpufreq_lock); +static unsigned int cpufreq_dev_count; + +static DEFINE_MUTEX(cooling_list_lock); static LIST_HEAD(cpufreq_dev_list); /** @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, switch (event) { case CPUFREQ_ADJUST: - mutex_lock(&cooling_cpufreq_lock); + mutex_lock(&cooling_list_lock); list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { if (!cpumask_test_cpu(policy->cpu, &cpufreq_dev->allowed_cpus)) @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, cpufreq_verify_within_limits(policy, 0, max_freq); } - mutex_unlock(&cooling_cpufreq_lock); + mutex_unlock(&cooling_list_lock); break; default: return NOTIFY_DONE; @@ -865,12 +868,15 @@ __cpufreq_cooling_register(struct device_node *np, cpufreq_dev->cool_dev = cool_dev; mutex_lock(&cooling_cpufreq_lock); + mutex_lock(&cooling_list_lock); + list_add(&cpufreq_dev->node, &cpufreq_dev_list); + mutex_unlock(&cooling_list_lock); /* Register the notifier for first cpufreq cooling device */ - if (list_empty(&cpufreq_dev_list)) + if (cpufreq_dev_count == 0) cpufreq_register_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); - list_add(&cpufreq_dev->node, &cpufreq_dev_list); + cpufreq_dev_count++; mutex_unlock(&cooling_cpufreq_lock); @@ -1014,14 +1020,18 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) cpufreq_dev = cdev->devdata; mutex_lock(&cooling_cpufreq_lock); - list_del(&cpufreq_dev->node); + cpufreq_dev_count--; /* Unregister the notifier for the last cpufreq cooling device */ - if (list_empty(&cpufreq_dev_list)) + if (cpufreq_dev_count == 0) cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); mutex_unlock(&cooling_cpufreq_lock); + mutex_lock(&cooling_list_lock); + list_del(&cpufreq_dev->node); + mutex_unlock(&cooling_list_lock); + thermal_cooling_device_unregister(cpufreq_dev->cool_dev); release_idr(&cpufreq_idr, cpufreq_dev->id); kfree(cpufreq_dev->time_in_idle_timestamp);