From patchwork Thu Sep 27 20:35:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 10618535 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4DAFA112B for ; Thu, 27 Sep 2018 20:36:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C9AB2BA51 for ; Thu, 27 Sep 2018 20:36:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 307012BCD3; Thu, 27 Sep 2018 20:36:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC76F2BA51 for ; Thu, 27 Sep 2018 20:36:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727370AbeI1C4L (ORCPT ); Thu, 27 Sep 2018 22:56:11 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:40986 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727361AbeI1C4L (ORCPT ); Thu, 27 Sep 2018 22:56:11 -0400 Received: by mail-pg1-f195.google.com with SMTP id z3-v6so2765948pgv.8 for ; Thu, 27 Sep 2018 13:36:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=MFTLVJAKx+sexf/Xa3m9BU7+lRei3FyqM0RY7ixgKMQ=; b=SDXEFUWkZOlUfxvW+Dr5m/MSA8xOzZ+YmmJoigAGWr3cy2o9Imix6frbeBq7SJAw1O cqKVLp47uEDy08cOsEv8uZKiCvjFHBgpKHOVCoz38cuaFYdQsU3F3OF0yiXHBTSmePD+ nLkGWG9Tq2v3+/SWmv2C9tA9tV0vsu89mDA08= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=MFTLVJAKx+sexf/Xa3m9BU7+lRei3FyqM0RY7ixgKMQ=; b=Fi19hIn3/UB1uKcyrkQQdD+w25AM/HO932FEwr+9irxOcYGbNqBCBoHoG/vVWp0L4X ugusF3gPjiqkcYWWyxdbxwrp3O8kwGVU/6mwm0WAfzkdQ+tzjfrueBVm5prLv11P1PdR CGYrRf+hV+9WbgH0pt/PhIhmjOH5wJtNW8UlqWfk9UXGUFfukIvU2aX1ydS/EU6Aph4q 3wZdn2yBt68N6uADx0kfs7UXWiMy8UMxAgRvKthe3DcQ1/q35Ap5E8ppTlCQuw1dVYhm 1eE0+7JlJd/3RMLvxZUTwx4SwRbmVnx6akVkStY1LtTvYKQyjWDMShRHO5d2ZLItQ/f2 tqUg== X-Gm-Message-State: ABuFfoh3fU2q9BHJ+FjLRfBkAd3shFJF0I00e/gb36xXWmIDu77Gd334 kDJuvAlyiwYdIApY8rmnBxRcog== X-Google-Smtp-Source: ACcGV63PXyCg8g1tZFkGLNm9PtNbl1dxNsGv8TwPEieQ7U/33T1HzHL4vtr/Ahxo/omxwy6yIQZSCw== X-Received: by 2002:a62:22c7:: with SMTP id p68-v6mr13188086pfj.53.1538080565191; Thu, 27 Sep 2018 13:36:05 -0700 (PDT) Received: from tictac2.mtv.corp.google.com ([2620:15c:202:1:c8e0:70d7:4be7:a36]) by smtp.gmail.com with ESMTPSA id f67-v6sm6104968pfe.75.2018.09.27.13.36.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 13:36:04 -0700 (PDT) From: Douglas Anderson To: rjw@rjwysocki.net Cc: Dilip Kota , dtor@chromium.org, swboyd@chromium.org, Douglas Anderson , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Len Brown , Greg Kroah-Hartman , Pavel Machek Subject: [RFC PATCH] PM / core: skip suspend next time if resume returns an error Date: Thu, 27 Sep 2018 13:35:23 -0700 Message-Id: <20180927203523.60856-1-dianders@chromium.org> X-Mailer: git-send-email 2.19.0.605.g01d371f741-goog MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In general Linux doesn't behave super great if you get an error while executing a device's resume handler. Nothing will come along later and and try again to resume the device (and all devices that depend on it), so pretty much you're left with a non-functioning device and that's not good. However, even though you'll end up with a non-functioning device we still don't consider resume failures to be fatal to the system. We'll keep chugging along and just hope that the device that failed to resume wasn't too critical. This establishes the precedent that we should at least try our best not to fully bork the system after a resume failure. I will argue that the best way to keep the system in the best shape is to assume that if a resume callback failed that it did as close to no-op as possible. Because of this we should consider the device still suspended and shouldn't try to suspend the device again next time around. Today that's not what happens. AKA if you have a device that fails resume every other time then you'll see these calls: 1. suspend 2. resume (no error) 3. suspend 4. resume (fails!) 5. suspend 6. resume (no error) I believe that a more correct thing to do is to remove step #5 from the above. To understand why, let's imagine a hypothetical device. In such a device let's say we have a regulator and clock to turn off. We'll say: hypothetical_suspend(...) { ret = regulator_disable(...); if (ret) return ret; ret = clk_disable(...); if (ret) regulator_enable(...); return ret; hypothetical_resume(...) { ret = clk_enable(...); if (ret) return ret; ret = regulator_enable(...); if (ret) clk_disable(...); return ret; Let's imagine that the regulator_enable() at resume fails sometimes. You'll see that on the next call to hypothetical_suspend() we'll try to disable a regulator that was never enabled. ...but if we change things to skip the next suspend callback then actually we might get the device functioning again after another suspend/resume cycle (especially if the resume failure was intermittent for some reason). Obviously this patch is pretty simplistic and certainly doesn't fix the world, but perhaps it moves us in the right direction? Signed-off-by: Douglas Anderson --- drivers/base/power/main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 3f68e2919dc5..27c7d1f76cee 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -999,7 +999,7 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) End: error = dpm_run_callback(callback, dev, state, info); - dev->power.is_suspended = false; + dev->power.is_suspended = error; Unlock: device_unlock(dev); @@ -1750,6 +1750,9 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) dpm_watchdog_set(&wd, dev); device_lock(dev); + if (dev->power.is_suspended) + goto End; + if (dev->pm_domain) { info = "power domain "; callback = pm_op(&dev->pm_domain->ops, state);