From patchwork Tue Oct 13 08:09:04 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 7381891 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 319A69F1D5 for ; Tue, 13 Oct 2015 08:11:04 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 38AB720618 for ; Tue, 13 Oct 2015 08:11:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 252BB205F4 for ; Tue, 13 Oct 2015 08:11:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752678AbbJMIJr (ORCPT ); Tue, 13 Oct 2015 04:09:47 -0400 Received: from mail-pa0-f51.google.com ([209.85.220.51]:36852 "EHLO mail-pa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752864AbbJMIJn (ORCPT ); Tue, 13 Oct 2015 04:09:43 -0400 Received: by pacex6 with SMTP id ex6so14170961pac.3 for ; Tue, 13 Oct 2015 01:09:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=BaElBzxIQj6pbvYghW5A9wZ5VwhElP3IcHEpF/vdAYc=; b=bpEfnJ4GCLIGrGiLm8UaD5St6d1FV/UZQlA/PZ0T6bGJHbwmR9T7ujOeH/Xu0gdpG5 IKGpcqw/GHiGw50D26FVyT3+yhBjSiZ2BPJnzhzmesXky/aqnUmsF0CoXH0enKwrmWW3 PDcSXbyPKPnfQ2z1Ns9AuLa4sfcB6DI+pPXlVNj/Oz7FCN2ZukTkKx9ke7OWAAF2QPld +hLMtot5LYI51ruyl0nn1F93bQ5aUHNccg5wSn1YwlkMxWQ1lk4Ogo0eXiAE0wzAmLKi PmWh0okBXdMTIV7TiMeQkRNwj9Er/9Of6z37fi8YE8Tbqsqv3T9Ytzfp26eUIQmxMdYW I0Nw== X-Gm-Message-State: ALoCoQlCX50AnZfjnWpULiOaeODm2wbRCM8zRlirJc01i3u6WeQDXj6dImsPhH/KPuaEvaWD13Dk X-Received: by 10.68.92.164 with SMTP id cn4mr38587853pbb.156.1444723782844; Tue, 13 Oct 2015 01:09:42 -0700 (PDT) Received: from localhost ([223.227.239.124]) by smtp.gmail.com with ESMTPSA id tb9sm2264958pab.13.2015.10.13.01.09.40 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 13 Oct 2015 01:09:41 -0700 (PDT) From: Viresh Kumar To: Rafael Wysocki Cc: linaro-kernel@lists.linaro.org, linux-pm@vger.kernel.org, Viresh Kumar , linux-kernel@vger.kernel.org (open list) Subject: [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped Date: Tue, 13 Oct 2015 13:39:04 +0530 Message-Id: <1e579d2bf8dbee09295725cda37bd92222fe61fb.1444723240.git.viresh.kumar@linaro.org> X-Mailer: git-send-email 2.4.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP cpufreq_governor_lock is abused by using it outside of cpufreq core, i.e. in cpufreq-governors. But we didn't had a better solution to the problem (described later) at that point of time, and following was the only acceptable solution: 6f1e4efd882e ("cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled") The cpufreq governor core is fixed against possible races now and things are in much better shape. The original problem: When a CPU is hot unplugged, we cancel delayed works for all policy->cpus via gov_cancel_work(). If the work is already running on any CPU, the workqueue code will wait for the work to finish, to prevent the work items from re-queuing themselves. This works most of the time, except for the case where the work handler determines that it should adjust the delay for all other CPUs, that the policy is managing. When this happens, the canceling CPU will cancel its own work but can queue up the works on other policy->cpus. For example, consider CPU 0-4 in a policy and we called gov_cancel_work() for them. Workqueue core canceled the works for 0-3 and is waiting for the handler to finish on CPU4. At that time, handler on CPU4 can restart works on CPU 0-3 again. Which makes 0-3 run works, which the governor core thinks are canceled. To fix that in a different (non-hacky) way, set set shared->policy to false before trying to cancel the work. It should be updated within timer_mutex, which will prevent the work-handlers to start. Once the work-handlers finds that we are already trying to stop the governor, it will exit early. And that will prevent queuing of works again as well. Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq_governor.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 750626d8fb03..931424ca96d9 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -171,10 +171,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, { int i; - mutex_lock(&cpufreq_governor_lock); - if (!policy->governor_enabled) - goto out_unlock; - if (!all_cpus) { /* * Use raw_smp_processor_id() to avoid preemptible warnings. @@ -188,9 +184,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, for_each_cpu(i, policy->cpus) __gov_queue_work(i, dbs_data, delay); } - -out_unlock: - mutex_unlock(&cpufreq_governor_lock); } EXPORT_SYMBOL_GPL(gov_queue_work); @@ -229,13 +222,24 @@ static void dbs_timer(struct work_struct *work) struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info, dwork.work); struct cpu_common_dbs_info *shared = cdbs->shared; - struct cpufreq_policy *policy = shared->policy; - struct dbs_data *dbs_data = policy->governor_data; + struct cpufreq_policy *policy; + struct dbs_data *dbs_data; unsigned int sampling_rate, delay; bool modify_all = true; mutex_lock(&shared->timer_mutex); + policy = shared->policy; + + /* + * Governor might already be disabled and there is no point continuing + * with the work-handler. + */ + if (!policy) + goto unlock; + + dbs_data = policy->governor_data; + if (dbs_data->cdata->governor == GOV_CONSERVATIVE) { struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; @@ -252,6 +256,7 @@ static void dbs_timer(struct work_struct *work) delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all); gov_queue_work(dbs_data, policy, delay, modify_all); +unlock: mutex_unlock(&shared->timer_mutex); } @@ -488,9 +493,17 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy, if (!shared || !shared->policy) return -EBUSY; + /* + * Work-handler must see this updated, as it should not proceed any + * further after governor is disabled. And so timer_mutex is taken while + * updating this value. + */ + mutex_lock(&shared->timer_mutex); + shared->policy = NULL; + mutex_unlock(&shared->timer_mutex); + gov_cancel_work(dbs_data, policy); - shared->policy = NULL; mutex_destroy(&shared->timer_mutex); return 0; }