From patchwork Tue Feb 14 16:01:40 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 9572159 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5B23B6045F for ; Tue, 14 Feb 2017 16:02:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 46646283F9 for ; Tue, 14 Feb 2017 16:02:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 393F628426; Tue, 14 Feb 2017 16:02:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 69311283F9 for ; Tue, 14 Feb 2017 16:02:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752523AbdBNQCa (ORCPT ); Tue, 14 Feb 2017 11:02:30 -0500 Received: from mail-oi0-f65.google.com ([209.85.218.65]:36062 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751420AbdBNQC2 (ORCPT ); Tue, 14 Feb 2017 11:02:28 -0500 Received: by mail-oi0-f65.google.com with SMTP id u143so1250587oif.3; Tue, 14 Feb 2017 08:02:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=vlpdLzCdho6HsXDRt7R/hmwW9w/lOIsPt/csB6I0QRI=; b=sm0mC8oQHMjPoSweuTaOa0KRtTy2RoTSzclencf77fpM1wPlRR7kQtEWXILVntEgnJ awDvYJmlBWku75yBTi2SxLHM6h1nbLrRGd2cRjp16HNyqa0aD4ejdaxz/UOS7QWEDpNu khKSXhGL75JRiOi8v2hAfv88fGs1lQjEQuMggFHawo/9LhA4InNpcb12mT6/I+mhfKg7 05uwWgYieovIyt/bDAjji+ehFPLqP98SMsHY1mnMtTCsW/W32yLCqOTQzt60lCwnqe6j 9HnBFKTHjFn0ObyqAnhizWMGkulJruXdFMsQ4HacsMnnTilt1AXvPR3mwNhKip53z2qs oVVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=vlpdLzCdho6HsXDRt7R/hmwW9w/lOIsPt/csB6I0QRI=; b=ExWCqPWPcfy+J/OGxzxd+4kgKeF1/lfsXarK9d/8auxZ3pynH9a4X9jGhNXzEvnG1z +Jhscv2NJsAKTq3+c1RNTGrpoSlWNW3L5wsmdYTCZ+KprhcLEj+QgSh7XJ62UhUah/ZW VP4OdXGjdqLE382fmSLyBp3tJ92DKvtAdEj2Bu+dBK7eDRMf3ppOPKu9wDBSy5JLlnII m3U3nBvf85eguxUItEoTzIaSrLtKRhEBWFE9ifJO1wG1fCM1lmxFde+EZFg3yNwjge1B /n+SFS8S9NciZjNBIFPtyVtgz/bzHD3ImFHGQGoVAmmvVBggfPNQCc2jPGKMzn2PWFG0 QbOw== X-Gm-Message-State: AMke39kE94jmtKVZyfxNSCoLI/eKZwDEi7RKwQZpYfDXTUE9j75efnSg0dK74582SKMX6A== X-Received: by 10.84.233.193 with SMTP id m1mr7705956pln.118.1487088142513; Tue, 14 Feb 2017 08:02:22 -0800 (PST) Received: from localhost ([121.137.63.184]) by smtp.gmail.com with ESMTPSA id t22sm2129536pfa.114.2017.02.14.08.02.21 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 14 Feb 2017 08:02:21 -0800 (PST) Date: Wed, 15 Feb 2017 01:01:40 +0900 From: Sergey Senozhatsky To: Tony Lindgren Cc: Sergey Senozhatsky , Petr Mladek , Steven Rostedt , Peter Zijlstra , Thomas Gleixner , linux-kernel@vger.kernel.org, Sergey Senozhatsky , "Rafael J. Wysocki" , linux-pm@vger.kernel.org Subject: Re: Regression in next with use printk_safe buffers in printk Message-ID: <20170214160140.GA401@tigerII.localdomain> References: <20170213185956.GM3897@atomide.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170213185956.GM3897@atomide.com> User-Agent: Mutt/1.7.2 (2016-11-26) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hello, Cc Rafael, just in case On (02/13/17 10:59), Tony Lindgren wrote: > Looks like commit f975237b7682 ("printk: use printk_safe buffers in > printk") causes "possible circular locking dependency detected " for > me on the first suspend. thanks for the report. > Reverting the following four patches in next makes it go away: > > d9c23523ed98 ("printk: drop call_console_drivers() unused param") > de6fcbdb68b2 ("printk: convert the rest to printk-safe") > 8b1742c9c207 ("printk: remove zap_locks() function") > f975237b7682 ("printk: use printk_safe buffers in printk") these patches basically just enable locked where it previously was forcibly turned off. no timekeeping/pm/sched/etc code was modified. can you share the link where Peter pointed out that this might be caused by printk() changes? so the report is > kworker/0:0/3 is trying to acquire lock: > (tk_core){----..}, at: [] retrigger_next_event+0x4c/0x90 > > but task is already holding lock: > > (hrtimer_bases.lock){-.-...}, at: [] retrigger_next_event+0x38/0x90 > > which lock already depends on the new lock. some thoughts. so there is a: BAR -> try_to_wake_up()-> lock_hrtimer_base() ##hrtimer_bases.lock chain (from #5). and there is a: FOO -> printk()->try_to_wake_up() chain (from #2) note, that printk() patches that you have mentioned don't add anything new here. the call chain existed before, we just keep lockdep enabled now. what #1 does is timekeeping_resume() lock timekeeper_lock lock tk_core tk_debug_account_sleep_time() printk() << lockdep was disabled here before try_to_wake_up() lock_hrtimer_base() ##hrtimer_bases.lock so it's tk_core -> hrtimer_bases.lock and retrigger_next_event() does retrigger_next_event() lock hrtimer_bases ktime_get_update_offsets_now() lock tk_core which is hrtimer_bases.lock -> tk_core but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock scenario (so it's not ABBA, but ABAD) > lock(hrtimer_bases.lock); > lock(&rt_b->rt_runtime_lock); > lock(hrtimer_bases.lock); > lock(tk_core); > > > Chain exists of: > > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock come from? By the way, as a side note, shouldn't tk_debug_account_sleep_time() do printk_deferred() instead of 'normal' printk()? printk() calls from under timekeeping seqlock are not safe, aren't they? and tk_debug_account_sleep_time() is under tk_core seq lock. IOW, replace pr_info() in tk_debug_account_sleep_time() with something like this printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC); --- --- I'll keep looking at the report more in the meantime. -ss > 8< --------------------- > [ 48.950592] ====================================================== > [ 48.950622] [ INFO: possible circular locking dependency detected ] > [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted > [ 48.950622] ------------------------------------------------------- > [ 48.950622] kworker/0:0/3 is trying to acquire lock: > [ 48.950653] (tk_core){----..}, at: [] retrigger_next_event+0x4c/0x90 > [ 48.950683] > but task is already holding lock: > [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [] retrigger_next_event+0x38/0x90 > [ 48.950714] > which lock already depends on the new lock. > > [ 48.950714] > the existing dependency chain (in reverse order) is: > [ 48.950714] > -> #5 (hrtimer_bases.lock){-.-...}: > [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64 > [ 48.950775] lock_hrtimer_base+0x28/0x58 > [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8 > [ 48.950775] __enqueue_rt_entity+0x320/0x360 > [ 48.950805] enqueue_rt_entity+0x2c/0x44 > [ 48.950805] enqueue_task_rt+0x24/0x94 > [ 48.950836] ttwu_do_activate+0x54/0xc0 > [ 48.950836] try_to_wake_up+0x248/0x5c8 > [ 48.950836] __setup_irq+0x420/0x5f0 > [ 48.950836] request_threaded_irq+0xdc/0x184 > [ 48.950866] devm_request_threaded_irq+0x58/0xa4 > [ 48.950866] omap_i2c_probe+0x530/0x6a0 > [ 48.950897] platform_drv_probe+0x50/0xb0 > [ 48.950897] driver_probe_device+0x1f8/0x2cc > [ 48.950897] __driver_attach+0xc0/0xc4 > [ 48.950927] bus_for_each_dev+0x6c/0xa0 > [ 48.950927] bus_add_driver+0x100/0x210 > [ 48.950927] driver_register+0x78/0xf4 > [ 48.950958] do_one_initcall+0x3c/0x16c > [ 48.950958] kernel_init_freeable+0x20c/0x2d8 > [ 48.950958] kernel_init+0x8/0x110 > [ 48.950988] ret_from_fork+0x14/0x24 > [ 48.950988] > -> #4 (&rt_b->rt_runtime_lock){-.-...}: > [ 48.951019] _raw_spin_lock+0x40/0x50 > [ 48.951019] rq_offline_rt+0x9c/0x2bc > [ 48.951019] set_rq_offline.part.2+0x2c/0x58 > [ 48.951049] rq_attach_root+0x134/0x144 > [ 48.951049] cpu_attach_domain+0x18c/0x6f4 > [ 48.951049] build_sched_domains+0xba4/0xd80 > [ 48.951080] sched_init_smp+0x68/0x10c > [ 48.951080] kernel_init_freeable+0x160/0x2d8 > [ 48.951080] kernel_init+0x8/0x110 > [ 48.951080] ret_from_fork+0x14/0x24 > [ 48.951110] > -> #3 (&rq->lock){-.-.-.}: > [ 48.951110] _raw_spin_lock+0x40/0x50 > [ 48.951141] task_fork_fair+0x30/0x124 > [ 48.951141] sched_fork+0x194/0x2e0 > [ 48.951141] copy_process.part.5+0x448/0x1a20 > [ 48.951171] _do_fork+0x98/0x7e8 > [ 48.951171] kernel_thread+0x2c/0x34 > [ 48.951171] rest_init+0x1c/0x18c > [ 48.951202] start_kernel+0x35c/0x3d4 > [ 48.951202] 0x8000807c > [ 48.951202] > -> #2 (&p->pi_lock){-.-.-.}: > [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64 > [ 48.951232] try_to_wake_up+0x30/0x5c8 > [ 48.951232] up+0x4c/0x60 > [ 48.951263] __up_console_sem+0x2c/0x58 > [ 48.951263] console_unlock+0x3b4/0x650 > [ 48.951263] vprintk_emit+0x270/0x474 > [ 48.951293] vprintk_default+0x20/0x28 > [ 48.951293] printk+0x20/0x30 > [ 48.951324] kauditd_hold_skb+0x94/0xb8 > [ 48.951324] kauditd_thread+0x1a4/0x56c > [ 48.951324] kthread+0x104/0x148 > [ 48.951354] ret_from_fork+0x14/0x24 > [ 48.951354] > -> #1 ((console_sem).lock){-.....}: > [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64 > [ 48.951385] down_trylock+0xc/0x2c > [ 48.951385] __down_trylock_console_sem+0x24/0x80 > [ 48.951385] console_trylock+0x10/0x8c > [ 48.951416] vprintk_emit+0x264/0x474 > [ 48.951416] vprintk_default+0x20/0x28 > [ 48.951416] printk+0x20/0x30 > [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70 > [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0 > [ 48.951446] timekeeping_resume+0x218/0x23c > [ 48.951477] syscore_resume+0x94/0x42c > [ 48.951477] suspend_enter+0x554/0x9b4 > [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4 > [ 48.951507] enter_state+0x934/0xbd4 > [ 48.951507] pm_suspend+0x14/0x70 > [ 48.951507] state_store+0x68/0xc8 > [ 48.951538] kernfs_fop_write+0xf4/0x1f8 > [ 48.951538] __vfs_write+0x1c/0x114 > [ 48.951538] vfs_write+0xa0/0x168 > [ 48.951568] SyS_write+0x3c/0x90 > [ 48.951568] __sys_trace_return+0x0/0x10 > [ 48.951568] > -> #0 (tk_core){----..}: > [ 48.951599] lock_acquire+0xe0/0x294 > [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4 > [ 48.951629] retrigger_next_event+0x4c/0x90 > [ 48.951629] on_each_cpu+0x40/0x7c > [ 48.951629] clock_was_set_work+0x14/0x20 > [ 48.951660] process_one_work+0x2b4/0x808 > [ 48.951660] worker_thread+0x3c/0x550 > [ 48.951660] kthread+0x104/0x148 > [ 48.951690] ret_from_fork+0x14/0x24 > [ 48.951690] > other info that might help us debug this: > > [ 48.951690] Chain exists of: > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock > > [ 48.951721] Possible unsafe locking scenario: > > [ 48.951721] CPU0 CPU1 > [ 48.951721] ---- ---- > [ 48.951721] lock(hrtimer_bases.lock); > [ 48.951751] lock(&rt_b->rt_runtime_lock); > [ 48.951751] lock(hrtimer_bases.lock); > [ 48.951751] lock(tk_core); > [ 48.951782] > *** DEADLOCK *** > > [ 48.951782] 3 locks held by kworker/0:0/3: > [ 48.951782] #0: ("events"){.+.+.+}, at: [] process_one_work+0x1f8/0x808 > [ 48.951812] #1: (hrtimer_work){+.+...}, at: [] process_one_work+0x1f8/0x808 > [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [] retrigger_next_event+0x38/0x90 > [ 48.951843] > stack backtrace: > [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ #101 > [ 48.951873] Hardware name: Generic OMAP36xx (Flattened Device Tree) > [ 48.951904] Workqueue: events clock_was_set_work > [ 48.951904] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > [ 48.951934] [] (show_stack) from [] (dump_stack+0xac/0xe0) > [ 48.951934] [] (dump_stack) from [] (print_circular_bug+0x1d0/0x308) > [ 48.951965] [] (print_circular_bug) from [] (validate_chain+0xf50/0x1324) > [ 48.951965] [] (validate_chain) from [] (__lock_acquire+0x468/0x7e8) > [ 48.951995] [] (__lock_acquire) from [] (lock_acquire+0xe0/0x294) > [ 48.951995] [] (lock_acquire) from [] (ktime_get_update_offsets_now+0x5c/0x1d4) > [ 48.952026] [] (ktime_get_update_offsets_now) from [] (retrigger_next_event+0x4c/0x90) > [ 48.952026] [] (retrigger_next_event) from [] (on_each_cpu+0x40/0x7c) > [ 48.952056] [] (on_each_cpu) from [] (clock_was_set_work+0x14/0x20) > [ 48.952056] [] (clock_was_set_work) from [] (process_one_work+0x2b4/0x808) > [ 48.952087] [] (process_one_work) from [] (worker_thread+0x3c/0x550) > [ 48.952087] [] (worker_thread) from [] (kthread+0x104/0x148) > [ 48.952087] [] (kthread) from [] (ret_from_fork+0x14/0x24) > [ 48.952911] PM: noirq resume of devices complete after 33.355 msecs > ... > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index ca9fb800336b..b8f7146c3538 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t) int bin = min(fls(t->tv_sec), NUM_BINS-1); sleep_time_bin[bin]++; - pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec, + printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", + (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC); }