From patchwork Thu Jan 5 10:27:23 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 9498841 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 76812606B5 for ; Thu, 5 Jan 2017 10:27:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4C3BB20246 for ; Thu, 5 Jan 2017 10:27:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 406E327F95; Thu, 5 Jan 2017 10:27:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76B7120246 for ; Thu, 5 Jan 2017 10:27:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761719AbdAEK1k (ORCPT ); Thu, 5 Jan 2017 05:27:40 -0500 Received: from www262.sakura.ne.jp ([202.181.97.72]:64954 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762500AbdAEK1h (ORCPT ); Thu, 5 Jan 2017 05:27:37 -0500 Received: from fsav104.sakura.ne.jp (fsav104.sakura.ne.jp [27.133.134.231]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id v05ARNlX018683; Thu, 5 Jan 2017 19:27:23 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav104.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav104.sakura.ne.jp); Thu, 05 Jan 2017 19:27:23 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav104.sakura.ne.jp) Received: from AQUA (softbank126227147111.bbtec.net [126.227.147.111]) (authenticated bits=0) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id v05ARN0G018679; Thu, 5 Jan 2017 19:27:23 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) To: sergey.senozhatsky@gmail.com Cc: pmladek@suse.com, jack@suse.com, tj@kernel.org, kyle@kernel.org, davej@codemonkey.org.uk, calvinowens@fb.com, gregkh@linuxfoundation.org, jslaby@suse.cz, linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org, sergey.senozhatsky.work@gmail.com Subject: printk: reset may_schedule if console_trylock() from console_unlock() From: Tetsuo Handa Message-Id: <201701051927.IJF65159.OFFtSLQOOFHJMV@I-love.SAKURA.ne.jp> X-Mailer: Winbiff [Version 2.51 PL2] X-Accept-Language: ja,en,zh Date: Thu, 5 Jan 2017 19:27:23 +0900 Mime-Version: 1.0 Sender: linux-fbdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fbdev@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP (Excluding linux-mm and restarting as a new thread.) Sergey Senozhatsky wrote at http://lkml.kernel.org/r/20161226113407.GA515@tigerII.localdomain : > Cc Greg, Jiri, > > On (12/26/16 19:54), Tetsuo Handa wrote: > [..] > > > > (3) I got below warning. (Though not reproducible.) > > If fb_flashcursor() called console_trylock(), console_may_schedule is set to 1? > > hmmm... it takes an atomic/spin `printing_lock' lock in vt_console_print(), > then call console_conditional_schedule() from lf(), being under spin_lock. > `console_may_schedule' in console_conditional_schedule() still keeps the > value from console_trylock(), which was ok (console_may_schedule permits > rescheduling). but preemption got changed under console_trylock(), by > that spin_lock. > > console_trylock() used to always forbid rescheduling; but it got changed > like a year ago. Indeed, it seems to be an oversight in commit 6b97a20d3a7909da ("printk: set may_schedule for some of console_trylock() callers"). I understood what has happened. fb_flushcursor() was not the culprit. > > the other thing is... do we really need to console_conditional_schedule() > from fbcon_*()? console_unlock() does cond_resched() after every line it > prints. wouldn't that be enough? Every record, isn't it? How many bytes can a record write to consoles? CONSOLE_EXT_LOG_MAX bytes multiplied by MAX_CMDLINECONSOLES ? Might take longer than a second for each record? > > so may be we can drop some of console_conditional_schedule() > call sites in fbcon. or update console_conditional_schedule() > function to always return the current preemption value, not the > one we saw in console_trylock(). Replacing console_may_schedule with get_console_may_schedule() will avoid this bug. I noticed that we forgot to reset console_may_schedule to 0 because the "again:" label is located after the assignment line. > > ---------------------------------------- > > [ OK [ 255.862188] audit: type=1131 audit(1482733112.662:148): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > > ] Stopped Create Static Device Nodes in /dev. > > > > [ 255.871468] BUG: sleeping function called from invalid context at kernel/printk/printk.c:2325 > > [ 255.871469] in_atomic(): 1, irqs_disabled(): 1, pid: 10079, name: plymouthd > > [ 255.871469] 6 locks held by plymouthd/10079: > > [ 255.871470] #0: (&tty->ldisc_sem){++++.+}, at: [] ldsem_down_read+0x32/0x40 > > [ 255.871472] #1: (&tty->atomic_write_lock){+.+.+.}, at: [] tty_write_lock+0x19/0x50 > > [ 255.871474] #2: (&tty->termios_rwsem){++++..}, at: [] n_tty_write+0x99/0x470 > > [ 255.871475] #3: (&ldata->output_lock){+.+...}, at: [] n_tty_write+0x130/0x470 > > [ 255.871477] #4: (console_lock){+.+.+.}, at: [] console_unlock+0x33e/0x6b0 > > [ 255.871479] #5: (printing_lock){......}, at: [] vt_console_print+0x75/0x3d0 > > [ 255.871481] irq event stamp: 15244 > > [ 255.871481] hardirqs last enabled at (15243): [] __down_trylock_console_sem+0x91/0xa0 > > [ 255.871482] hardirqs last disabled at (15244): [] console_unlock+0x74/0x6b0 > > [ 255.871482] softirqs last enabled at (14968): [] __do_softirq+0x344/0x580 > > [ 255.871482] softirqs last disabled at (14963): [] irq_exit+0xe3/0x120 > > [ 255.871483] CPU: 0 PID: 10079 Comm: plymouthd Not tainted 4.9.0-next-20161224+ #12 > > [ 255.871483] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 > > [ 255.871484] Call Trace: > > [ 255.871484] dump_stack+0x85/0xc9 > > [ 255.871485] ___might_sleep+0x14a/0x250 > > [ 255.871485] console_conditional_schedule+0x22/0x30 > > [ 255.871485] fbcon_redraw.isra.24+0xa3/0x1d0 > > [ 255.871486] ? fbcon_cursor+0x151/0x1c0 > > [ 255.871486] fbcon_scroll+0x11d/0xcb0 > > [ 255.871487] con_scroll+0x160/0x170 > > [ 255.871487] lf+0x9c/0xb0 > > [ 255.871487] vt_console_print+0x2b7/0x3d0 > > [ 255.871488] console_unlock+0x457/0x6b0 > > [ 255.871488] do_con_write.part.19+0x737/0x9e0 > > [ 255.871489] ? mark_held_locks+0x71/0x90 > > [ 255.871489] con_write+0x57/0x60 > > [ 255.871489] n_tty_write+0x1bf/0x470 > > [ 255.871490] ? prepare_to_wait_event+0x110/0x110 > > [ 255.871490] tty_write+0x157/0x2d0 > > [ 255.871491] ? n_tty_open+0xd0/0xd0 > > [ 255.871491] __vfs_write+0x32/0x140 > > [ 255.871491] ? trace_hardirqs_on+0xd/0x10 > > [ 255.871492] ? __audit_syscall_entry+0xaa/0xf0 > > [ 255.871492] vfs_write+0xc2/0x1f0 > > [ 255.871493] ? syscall_trace_enter+0x1cb/0x3e0 > > [ 255.871493] SyS_write+0x53/0xc0 > > [ 255.871493] do_syscall_64+0x67/0x1f0 > > [ 255.871494] entry_SYSCALL64_slow_path+0x25/0x25 > > [ 255.871494] RIP: 0033:0x7fb74cf8fc60 > > [ 255.871495] RSP: 002b:00007ffcaab3fe88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [ 255.871495] RAX: ffffffffffffffda RBX: 000055d3acaf7160 RCX: 00007fb74cf8fc60 > > [ 255.871496] RDX: 000000000000003f RSI: 000055d3acafd090 RDI: 0000000000000009 > > [ 255.871496] RBP: 000055d3acafc440 R08: 0000000000000070 R09: 0000000000000000 > > [ 255.871497] R10: 000000000000003f R11: 0000000000000246 R12: 000055d3acafc330 > > [ 255.871497] R13: 000000000000003f R14: 00007ffcaab3ffb0 R15: 0000000000000000 > > Stopping Create Static Device Nodes in /dev... What happened here was: (1) /usr/sbin/plymouthd requested write() syscall against tty. (2) write() syscall invoked console_unlock() via tty_write(). (3) Since write() syscall is sleepable context, console_may_schedule was 1 as of entering into console_unlock(). (4) console_unlock() set do_cond_resched to 1 and set console_may_schedule to 0 so that console drivers won't call cond_resched() from console_conditional_schedule() because printk_safe_enter(flags) in console_unlock() disables hardirqs. (5) /usr/sbin/plymouthd left the "for (;;)" loop in console_unlock(). (6) Since an audit log from /usr/lib/systemd/systemd was added to printk buffer via kauditd_printk_skb() because /usr/sbin/auditd terminated due to shutdown sequence, /usr/sbin/plymouthd found that console_seq != log_next_seq is true. (7) /usr/sbin/plymouthd calls console_trylock(), and console_trylock() returned 1 because nobody was holding console_sem. console_may_schedule was 1 as of returning from console_trylock() because write() syscall is sleepable context. (8) However, "goto again;" jumped to a line which is after setting console_may_schedule to 0. This is the cause of this problem. As a result, console_may_schedule remained 1. (9) console_conditional_schedule() was called from vt_console_print() via call_console_drivers(). And cond_resched() was by error called due to (8). The correct behavior I guess is ---------- ---------- optionally with ---------- @@ -2297,11 +2297,13 @@ void console_unlock(void) retry = console_seq != log_next_seq; raw_spin_unlock_irqrestore(&logbuf_lock, flags); - if (retry && console_trylock()) - goto again; - if (wake_klogd) wake_up_klogd(); + + if (retry && console_trylock()) { + wake_klogd = false; + goto again; + } } EXPORT_SYMBOL(console_unlock); ---------- so that current thread will act as if current thread again called console_unlock() immediately after returning from console_trylock(). --- To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 8b26964..ce53488 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2186,6 +2186,7 @@ void console_unlock(void) bool wake_klogd = false; bool do_cond_resched, retry; +again: if (console_suspended) { up_console_sem(); return; @@ -2204,7 +2205,6 @@ void console_unlock(void) do_cond_resched = console_may_schedule; console_may_schedule = 0; -again: /* * We released the console_sem lock, so we need to recheck if * cpu is online and (if not) is there at least one CON_ANYTIME