From patchwork Tue Aug 27 18:47:29 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Boyd X-Patchwork-Id: 2850271 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 17734BF546 for ; Tue, 27 Aug 2013 18:47:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id A0871204A7 for ; Tue, 27 Aug 2013 18:47:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B2A13204A2 for ; Tue, 27 Aug 2013 18:47:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752584Ab3H0Sre (ORCPT ); Tue, 27 Aug 2013 14:47:34 -0400 Received: from smtp.codeaurora.org ([198.145.11.231]:57868 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751743Ab3H0Srd (ORCPT ); Tue, 27 Aug 2013 14:47:33 -0400 Received: from smtp.codeaurora.org (localhost [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id 5209D13EE07; Tue, 27 Aug 2013 18:47:33 +0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 486) id 44F8B13F270; Tue, 27 Aug 2013 18:47:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Spam-Level: X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Received: from sboyd-linux.qualcomm.com (i-global252.qualcomm.com [199.106.103.252]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: sboyd@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 6C42F13EF86; Tue, 27 Aug 2013 18:47:32 +0000 (UTC) From: Stephen Boyd To: Viresh Kumar Cc: linux-kernel@vger.kernel.org, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, "Rafael J . Wysocki" Subject: [PATCH] cpufreq: Fix timer/workqueue corruption due to double queueing Date: Tue, 27 Aug 2013 11:47:29 -0700 Message-Id: <1377629249-1177-1-git-send-email-sboyd@codeaurora.org> X-Mailer: git-send-email 1.8.4 In-Reply-To: X-Virus-Scanned: ClamAV using ClamSMTP Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When a CPU is hot removed we'll cancel all the delayed work items via gov_cancel_work(). Normally this will just cancel a delayed timer on each CPU that the policy is managing and the work won't run, but if the work is already running the workqueue code will wait for the work to finish before continuing to prevent the work items from re-queuing themselves like they normally do. This scheme will work most of the time, except for the case where the work function determines that it should adjust the delay for all other CPUs that the policy is managing. If this scenario occurs, the canceling CPU will cancel its own work but queue up the other CPUs works to run. For example: CPU0 CPU1 ---- ---- cpu_down() ... __cpufreq_remove_dev() cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: gov_cancel_work(dbs_data, policy); cpu0 work is canceled timer is canceled cpu1 work is canceled od_dbs_timer() gov_queue_work(*, *, true); cpu0 work queued cpu1 work queued cpu2 work queued ... cpu1 work is canceled cpu2 work is canceled ... At the end of the GOV_STOP case cpu0 still has a work queued to run although the code is expecting all of the works to be canceled. __cpufreq_remove_dev() will then proceed to re-initialize all the other CPUs works except for the CPU that is going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs() will trample over the queued work and debugobjects will spit out a warning: WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc() ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10 Modules linked in: CPU: 0 PID: 1491 Comm: sh Tainted: G W 3.10.0 #19 [] (unwind_backtrace+0x0/0x11c) from [] (show_stack+0x10/0x14) [] (show_stack+0x10/0x14) from [] (warn_slowpath_common+0x4c/0x6c) [] (warn_slowpath_common+0x4c/0x6c) from [] (warn_slowpath_fmt+0x2c/0x3c) [] (warn_slowpath_fmt+0x2c/0x3c) from [] (debug_print_object+0x94/0xbc) [] (debug_print_object+0x94/0xbc) from [] (__debug_object_init+0x2d0/0x340) [] (__debug_object_init+0x2d0/0x340) from [] (init_timer_key+0x14/0xb0) [] (init_timer_key+0x14/0xb0) from [] (cpufreq_governor_dbs+0x3e8/0x5f8) [] (cpufreq_governor_dbs+0x3e8/0x5f8) from [] (__cpufreq_governor+0xdc/0x1a4) [] (__cpufreq_governor+0xdc/0x1a4) from [] (__cpufreq_remove_dev.isra.10+0x3b4/0x434) [] (__cpufreq_remove_dev.isra.10+0x3b4/0x434) from [] (cpufreq_cpu_callback+0x60/0x80) [] (cpufreq_cpu_callback+0x60/0x80) from [] (notifier_call_chain+0x38/0x68) [] (notifier_call_chain+0x38/0x68) from [] (__cpu_notify+0x28/0x40) [] (__cpu_notify+0x28/0x40) from [] (_cpu_down+0x7c/0x2c0) [] (_cpu_down+0x7c/0x2c0) from [] (cpu_down+0x24/0x40) [] (cpu_down+0x24/0x40) from [] (store_online+0x2c/0x74) [] (store_online+0x2c/0x74) from [] (dev_attr_store+0x18/0x24) [] (dev_attr_store+0x18/0x24) from [] (sysfs_write_file+0x100/0x148) [] (sysfs_write_file+0x100/0x148) from [] (vfs_write+0xcc/0x174) [] (vfs_write+0xcc/0x174) from [] (SyS_write+0x38/0x64) [] (SyS_write+0x38/0x64) from [] (ret_fast_syscall+0x0/0x30) Signed-off-by: Stephen Boyd Acked-by: Viresh Kumar --- On 08/27, Viresh Kumar wrote: > On 27 August 2013 04:15, Stephen Boyd wrote: > > +++ b/drivers/cpufreq/cpufreq_governor.c > > @@ -133,7 +133,7 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, > > { > > int i; > > > > - if (!all_cpus) { > > + if (!all_cpus || !policy->governor_enabled) { > > __gov_queue_work(smp_processor_id(), dbs_data, delay); > > } else { > > for_each_cpu(i, policy->cpus) > > Shouldn't we simply do this instead at the top of this function? > > > + if (!policy->governor_enabled) > > + return; Sure that works just as well. Here's a patch. drivers/cpufreq/cpufreq_governor.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 7b839a8..b9b20fd 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -133,6 +133,9 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, { int i; + if (!policy->governor_enabled) + return; + if (!all_cpus) { __gov_queue_work(smp_processor_id(), dbs_data, delay); } else {