From patchwork Wed Feb 8 12:39:00 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ulf Hansson X-Patchwork-Id: 9562441 X-Patchwork-Delegate: rjw@sisk.pl Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 395C460434 for ; Wed, 8 Feb 2017 12:41:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2ABA52848B for ; Wed, 8 Feb 2017 12:41:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1E32B284EC; Wed, 8 Feb 2017 12:41:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C7BA2848B for ; Wed, 8 Feb 2017 12:41:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754136AbdBHMkm (ORCPT ); Wed, 8 Feb 2017 07:40:42 -0500 Received: from mail-lf0-f52.google.com ([209.85.215.52]:36302 "EHLO mail-lf0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753007AbdBHMki (ORCPT ); Wed, 8 Feb 2017 07:40:38 -0500 Received: by mail-lf0-f52.google.com with SMTP id z134so81130070lff.3 for ; Wed, 08 Feb 2017 04:39:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=bw+9yoHSLI/K7LxSLkiUsTVxbYXNwHFVPdUTXnYIaQc=; b=PEbDkNajUIjM8M2IdOTeOizFqK1BVqMEV/wQ0Ibj2ne+WQ6uKb1FwazBrd5uNxWFlL b8qSpyxLqXFEzXbuwSFjJpgbUZOuL/RjktSHZo93K9yKjQ9O5EK3TViQ6q3T+3EBR/6q 4dSfJd9ZnhEe3H5dFuAtw4fWdWqshOXBL/3Q0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=bw+9yoHSLI/K7LxSLkiUsTVxbYXNwHFVPdUTXnYIaQc=; b=mrxC7pWCYu0L48WX06xNamXr5c8B3OZHqMsyPJ7kYlx8eHO3UX5Z09m46KoopES2Mv EArU21CybsQ6vcW3MJazu1dk2qpf1jN8fP8qEjUr6IdHT5Y55gmUlcI8wRlCUyBo8bbj PG+5pMGLywDzJ2USbP/BUCoYfh/tBfKBVuj+zoMf1Mh4Tg1FtPua/kifFC+oKhZhz7MW wVmBlL9xZy6aAqPAJarDxnYzTL3GxbSuLPmPgikJh2NSy9EeI4pfPQcZfoQ9CmSXl5n+ //rBDcBIJFZdfpD59WLBwFLis/dKOpneKvNHy+Lsp2dkJ8TiVc/iAqxBV2uFTeZ3+qUC NJGQ== X-Gm-Message-State: AIkVDXK1fvgzkVwLF3OhHxIAkrMEBdCCq2Oe0AM24l53lUmbQWohoVF6llYuMY4BDkP2WDVw X-Received: by 10.25.154.2 with SMTP id c2mr7416879lfe.71.1486557547258; Wed, 08 Feb 2017 04:39:07 -0800 (PST) Received: from localhost.localdomain (h-155-4-221-67.na.cust.bahnhof.se. [155.4.221.67]) by smtp.gmail.com with ESMTPSA id n8sm2370483lfi.30.2017.02.08.04.39.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Feb 2017 04:39:06 -0800 (PST) From: Ulf Hansson To: "Rafael J. Wysocki" , Ulf Hansson , linux-pm@vger.kernel.org Cc: Len Brown , Pavel Machek , Kevin Hilman , Geert Uytterhoeven , Lina Iyer , Jon Hunter , Marek Szyprowski , Brian Norris Subject: [PATCH V2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Date: Wed, 8 Feb 2017 13:39:00 +0100 Message-Id: <1486557540-30936-1-git-send-email-ulf.hansson@linaro.org> X-Mailer: git-send-email 1.9.1 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As the PM core may invoke the *noirq() callbacks asynchronously, the current lock-less approach in genpd doesn't work. The consequence is that we may find concurrent operations racing to power on/off the PM domain. As of now, no immediate errors has been reported, but it's probably only a matter time. Therefor let's fix the problem now before this becomes a real issue, by deploying the locking scheme to the relevant functions. Reported-by: Brian Norris Signed-off-by: Ulf Hansson Tested-by: Geert Uytterhoeven --- Changes in v2: A regressions reported by Geert Uytterhoeven for Renesas arm32 boards, which uses the genpd syscore API triggered a BUG. Described below. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:232 in_atomic(): 0, irqs_disabled(): 128, pid: 1751, name: s2ram CPU: 0 PID: 1751 Comm: s2ram Not tainted 4.10.0-rc7-koelsch-05643-g27f4c73972a614fe #3354 Hardware name: Generic R8A7791 (Flattened Device Tree) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x7c/0x9c) [] (dump_stack) from [] (___might_sleep+0x124/0x160) [] (___might_sleep) from [] (mutex_lock+0x18/0x60) [] (mutex_lock) from [] (genpd_syscore_switch+0x2c/0x7c) [] (genpd_syscore_switch) from [] (sh_cmt_clock_event_suspend+0x18/0x28) [] (sh_cmt_clock_event_suspend) from [] (clockevents_suspend+0x40/0x54) [] (clockevents_suspend) from [] (timekeeping_suspend+0x23c/0x278) [] (timekeeping_suspend) from [] (syscore_suspend+0x88/0x138) [] (syscore_suspend) from [] (suspend_devices_and_enter+0x290/0x470) [] (suspend_devices_and_enter) from [] (pm_suspend+0x228/0x280) [] (pm_suspend) from [] (state_store+0xac/0xcc) [] (state_store) from [] (kernfs_fop_write+0x160/0x19c) [] (kernfs_fop_write) from [] (__vfs_write+0x20/0x108) [] (__vfs_write) from [] (vfs_write+0xb8/0x144) [] (vfs_write) from [] (SyS_write+0x40/0x80) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x34)" Therefore, in v2 I move back to the original lock-less behaviour for the genpd syscore APIs. --- drivers/base/power/domain.c | 68 ++++++++++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 29 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index fd2e3e1..73ae3e7 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -729,16 +729,18 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd, /** * genpd_sync_power_off - Synchronously power off a PM domain and its masters. * @genpd: PM domain to power off, if possible. + * @use_lock: use the lock. + * @depth: nesting count for lockdep. * * Check if the given PM domain can be powered off (during system suspend or * hibernation) and do that if so. Also, in that case propagate to its masters. * * This function is only called in "noirq" and "syscore" stages of system power - * transitions, so it need not acquire locks (all of the "noirq" callbacks are - * executed sequentially, so it is guaranteed that it will never run twice in - * parallel). + * transitions. The "noirq" callbacks may be executed asynchronously, thus in + * these cases the lock must be held. */ -static void genpd_sync_power_off(struct generic_pm_domain *genpd) +static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock, + unsigned int depth) { struct gpd_link *link; @@ -757,20 +759,29 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd) list_for_each_entry(link, &genpd->slave_links, slave_node) { genpd_sd_counter_dec(link->master); - genpd_sync_power_off(link->master); + + if (use_lock) + genpd_lock_nested(link->master, depth + 1); + + genpd_sync_power_off(link->master, use_lock, depth + 1); + + if (use_lock) + genpd_unlock(link->master); } } /** * genpd_sync_power_on - Synchronously power on a PM domain and its masters. * @genpd: PM domain to power on. + * @use_lock: use the lock. + * @depth: nesting count for lockdep. * * This function is only called in "noirq" and "syscore" stages of system power - * transitions, so it need not acquire locks (all of the "noirq" callbacks are - * executed sequentially, so it is guaranteed that it will never run twice in - * parallel). + * transitions. The "noirq" callbacks may be executed asynchronously, thus in + * these cases the lock must be held. */ -static void genpd_sync_power_on(struct generic_pm_domain *genpd) +static void genpd_sync_power_on(struct generic_pm_domain *genpd, bool use_lock, + unsigned int depth) { struct gpd_link *link; @@ -778,8 +789,15 @@ static void genpd_sync_power_on(struct generic_pm_domain *genpd) return; list_for_each_entry(link, &genpd->slave_links, slave_node) { - genpd_sync_power_on(link->master); genpd_sd_counter_inc(link->master); + + if (use_lock) + genpd_lock_nested(link->master, depth + 1); + + genpd_sync_power_on(link->master, use_lock, depth + 1); + + if (use_lock) + genpd_unlock(link->master); } _genpd_power_on(genpd, false); @@ -888,13 +906,10 @@ static int pm_genpd_suspend_noirq(struct device *dev) return ret; } - /* - * Since all of the "noirq" callbacks are executed sequentially, it is - * guaranteed that this function will never run twice in parallel for - * the same PM domain, so it is not necessary to use locking here. - */ + genpd_lock(genpd); genpd->suspended_count++; - genpd_sync_power_off(genpd); + genpd_sync_power_off(genpd, true, 0); + genpd_unlock(genpd); return 0; } @@ -919,13 +934,10 @@ static int pm_genpd_resume_noirq(struct device *dev) if (dev->power.wakeup_path && genpd_dev_active_wakeup(genpd, dev)) return 0; - /* - * Since all of the "noirq" callbacks are executed sequentially, it is - * guaranteed that this function will never run twice in parallel for - * the same PM domain, so it is not necessary to use locking here. - */ - genpd_sync_power_on(genpd); + genpd_lock(genpd); + genpd_sync_power_on(genpd, true, 0); genpd->suspended_count--; + genpd_unlock(genpd); if (genpd->dev_ops.stop && genpd->dev_ops.start) ret = pm_runtime_force_resume(dev); @@ -1002,13 +1014,10 @@ static int pm_genpd_restore_noirq(struct device *dev) return -EINVAL; /* - * Since all of the "noirq" callbacks are executed sequentially, it is - * guaranteed that this function will never run twice in parallel for - * the same PM domain, so it is not necessary to use locking here. - * * At this point suspended_count == 0 means we are being run for the * first time for the given domain in the present cycle. */ + genpd_lock(genpd); if (genpd->suspended_count++ == 0) /* * The boot kernel might put the domain into arbitrary state, @@ -1017,7 +1026,8 @@ static int pm_genpd_restore_noirq(struct device *dev) */ genpd->status = GPD_STATE_POWER_OFF; - genpd_sync_power_on(genpd); + genpd_sync_power_on(genpd, true, 0); + genpd_unlock(genpd); if (genpd->dev_ops.stop && genpd->dev_ops.start) ret = pm_runtime_force_resume(dev); @@ -1072,9 +1082,9 @@ static void genpd_syscore_switch(struct device *dev, bool suspend) if (suspend) { genpd->suspended_count++; - genpd_sync_power_off(genpd); + genpd_sync_power_off(genpd, false, 0); } else { - genpd_sync_power_on(genpd); + genpd_sync_power_on(genpd, false, 0); genpd->suspended_count--; } }