From patchwork Tue Apr 30 22:28:33 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zoran Markovic X-Patchwork-Id: 2506441 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 47F193FD85 for ; Tue, 30 Apr 2013 22:29:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933770Ab3D3W3h (ORCPT ); Tue, 30 Apr 2013 18:29:37 -0400 Received: from mail-pb0-f44.google.com ([209.85.160.44]:62141 "EHLO mail-pb0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933600Ab3D3W3g (ORCPT ); Tue, 30 Apr 2013 18:29:36 -0400 Received: by mail-pb0-f44.google.com with SMTP id wz17so478598pbc.31 for ; Tue, 30 Apr 2013 15:29:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:cc:subject:date:message-id:x-mailer :x-gm-message-state; bh=Hv7hJXJcFaPH1lcjoEn1gdyU2hgKjPjZ057MTAJDJZQ=; b=jOOCDiDwIUeO+Mycqg6xujlp8bWdpoX5Ph9yeWpIAP0VcCvCtHScozmn5fG8tPh0i4 Es2fJgCMZD6+A5333p9pkXp7p7AEQxsyupyXZgNm2WGjUAr6xyYUw2/zu5jDaIPG0Op4 ZalGc9SAZpKgAAeIrnAaupG4w5Sj+oAKbac7Kdl+lx8veMArHUOKgjflNl9/Mth/dV/O 9K4ffNaH8VNduwGE3D/V6Qo4tPQRCUI9CrjuOipe/Dw/JHYIOXyiz6bqQDNrnyCVFil2 XC9DQlgMOcL6Mse4w+H8lbN+bNKHvlSB2RyTTzp4OR4CWBU7JqS8cSHYtcTBSKsEKneK 33Wg== X-Received: by 10.68.184.100 with SMTP id et4mr505696pbc.48.1367360976172; Tue, 30 Apr 2013 15:29:36 -0700 (PDT) Received: from vb-linaro.ric.broadcom.com ([216.31.219.19]) by mx.google.com with ESMTPSA id at4sm257750pbc.40.2013.04.30.15.29.34 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 30 Apr 2013 15:29:35 -0700 (PDT) From: Zoran Markovic To: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org, Benoit Goby , Android Kernel Team , Colin Cross , Todd Poynor , San Mehat , John Stultz , Pavel Machek , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , Zoran Markovic Subject: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend. Date: Tue, 30 Apr 2013 15:28:33 -0700 Message-Id: <1367360914-23389-1-git-send-email-zoran.markovic@linaro.org> X-Mailer: git-send-email 1.7.9.5 X-Gm-Message-State: ALoCoQkcLEhrsIO2ewupjFLuf3qhQWSxUN98cu5s1I15Od4dyuoDufDrW9KYxlO31wVXfleYiI0M Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org From: Benoit Goby Below is a patch from android kernel that detects a driver suspend lockup and captures dump in the kernel log. Please review and provide comments. Rather than hard-lock the kernel, dump the suspend thread stack and BUG() when a driver takes too long to suspend. The timeout is set to 12 seconds to be longer than the usbhid 10 second timeout. Exclude from the watchdog the time spent waiting for children that are resumed asynchronously and time every device, whether or not they resumed synchronously. Cc: Android Kernel Team Cc: Colin Cross Cc: Todd Poynor Cc: San Mehat Cc: Benoit Goby Cc: John Stultz Cc: Pavel Machek Cc: Rafael J. Wysocki Cc: Len Brown Cc: Greg Kroah-Hartman Original-author: San Mehat Signed-off-by: Benoit Goby [zoran.markovic@linaro.org: Changed printk(KERN_EMERG,...) to pr_emerg(...), tweaked commit message.] Signed-off-by: Zoran Markovic --- drivers/base/power/main.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 15beb50..eb70c0e 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -29,6 +29,8 @@ #include #include #include +#include + #include "../base.h" #include "power.h" @@ -54,6 +56,12 @@ struct suspend_stats suspend_stats; static DEFINE_MUTEX(dpm_list_mtx); static pm_message_t pm_transition; +static void dpm_drv_timeout(unsigned long data); +struct dpm_drv_wd_data { + struct device *dev; + struct task_struct *tsk; +}; + static int async_error; /** @@ -663,6 +671,30 @@ static bool is_async(struct device *dev) } /** + * dpm_drv_timeout - Driver suspend / resume watchdog handler + * @data: struct device which timed out + * + * Called when a driver has timed out suspending or resuming. + * There's not much we can do here to recover so + * BUG() out for a crash-dump + * + */ +static void dpm_drv_timeout(unsigned long data) +{ + struct dpm_drv_wd_data *wd_data = (void *)data; + struct device *dev = wd_data->dev; + struct task_struct *tsk = wd_data->tsk; + + pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev), + (dev->driver ? dev->driver->name : "no driver")); + + pr_emerg("dpm suspend stack:\n"); + show_stack(tsk, NULL); + + BUG(); +} + +/** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. * @@ -1053,6 +1085,8 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) pm_callback_t callback = NULL; char *info = NULL; int error = 0; + struct timer_list timer; + struct dpm_drv_wd_data data; dpm_wait_for_children(dev, async); @@ -1076,6 +1110,14 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; + data.dev = dev; + data.tsk = get_current(); + init_timer_on_stack(&timer); + timer.expires = jiffies + HZ * 12; + timer.function = dpm_drv_timeout; + timer.data = (unsigned long)&data; + add_timer(&timer); + device_lock(dev); if (dev->pm_domain) { @@ -1131,6 +1173,9 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) device_unlock(dev); + del_timer_sync(&timer); + destroy_timer_on_stack(&timer); + Complete: complete_all(&dev->power.completion); if (error)