From patchwork Sat Sep 22 17:52:11 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 1494541 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by patchwork2.kernel.org (Postfix) with ESMTP id 6700DDF2D2 for ; Sat, 22 Sep 2012 17:52:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 431B29E81A for ; Sat, 22 Sep 2012 10:52:41 -0700 (PDT) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-we0-f177.google.com (mail-we0-f177.google.com [74.125.82.177]) by gabe.freedesktop.org (Postfix) with ESMTP id AD2D59E74C for ; Sat, 22 Sep 2012 10:52:26 -0700 (PDT) Received: by weyr3 with SMTP id r3so2631746wey.36 for ; Sat, 22 Sep 2012 10:52:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references; bh=dG1BKZ7ApF0e0HzVJl+OQRNvHaW5jYt7q33ZNFvlKOU=; b=Z+dDzS37OiE+7MfGi8hPfzYfyuMZL9OVj47Ixhs9XKYJnEPrrRPGP3PvbkC+6LEsbn TMDdIrvcWgeX6XxxyzEptuCAqMZjRhaExRTqhLFFJ/rod3u0+vdgsscDlYa2GtyR69ge X50Oe7W/tvvLJLgj0SzGmwjMJRUzozTwKsIQs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references :x-gm-message-state; bh=dG1BKZ7ApF0e0HzVJl+OQRNvHaW5jYt7q33ZNFvlKOU=; b=A0NAh6TymPO6EICGCcMyzbtjBp+D9CuJMK/n4EuJSRbEvxZE4jM0nhDee2ieMlO6i6 0kAuh1gIGJRB2AD2wDUJvApzFfYK2/dhaCAycjy10B6pLFqS3qzGz052QkeK4anmZmBK nDvGrNGg3cs3jLEusxvsYux8Zlxg1Lzo1sj4C1ISxuCgvC7aGDSzmrmMMjpplQe7Cz7Y 4MN6wgTvqYMEg+sAZfR3uMPg6utj++IS1qETHX6NIcEpX3GanYSN8ezrM1W8jhlaYFRU 6mbGAp8LBV03t3V4nnqnH5RVrnHvhpqbMxliG0Rh4g9UpZnEiFJga86onHnLNOEYzpk/ LEwQ== Received: by 10.216.245.135 with SMTP id o7mr4507476wer.40.1348336345658; Sat, 22 Sep 2012 10:52:25 -0700 (PDT) Received: from aaron.ffwll.local (178-83-130-250.dynamic.hispeed.ch. [178.83.130.250]) by mx.google.com with ESMTPS id dm3sm5445013wib.3.2012.09.22.10.52.23 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 22 Sep 2012 10:52:24 -0700 (PDT) From: Daniel Vetter To: LKML Subject: [PATCH] console: implement lockdep support for console_lock Date: Sat, 22 Sep 2012 19:52:11 +0200 Message-Id: <1348336331-20957-1-git-send-email-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <87627b9453.fsf@intel.com> References: <87627b9453.fsf@intel.com> X-Gm-Message-State: ALoCoQn8f0sPKn2HyMyyQg77pOKf2e3tuIkqzzVb3NVZnBK3oYLNVd0rb/1gZwKz9pusioviG57u Cc: Peter Zijlstra , Daniel Vetter , Intel Graphics Development , DRI Development , Thomas Gleixner X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org Errors-To: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock: https://lkml.org/lkml/2012/8/21/36 Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat. Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen. Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually. There are a few special cases: - The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything. - In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either. - The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that. This all leaves us with annotating the normal console_lock, _unlock and _trylock functions. And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit: ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted ------------------------------------------------------- kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [] wait_on_work+0x0/0xa7 but task is already holding lock: (console_lock){+.+.+.}, at: [] bind_con_driver+0x38/0x263 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (console_lock){+.+.+.}: [] lock_acquire+0x95/0x105 [] console_lock+0x59/0x5b [] fb_flashcursor+0x2e/0x12c [] process_one_work+0x1d9/0x3b4 [] worker_thread+0x1a7/0x24b [] kthread+0x7f/0x87 [] kernel_thread_helper+0x4/0x10 -> #0 ((&info->queue)){+.+...}: [] __lock_acquire+0x999/0xcf6 [] lock_acquire+0x95/0x105 [] wait_on_work+0x3b/0xa7 [] __cancel_work_timer+0xbf/0x102 [] cancel_work_sync+0xb/0xd [] fbcon_deinit+0x11c/0x1dc [] bind_con_driver+0x145/0x263 [] unbind_con_driver+0x14f/0x195 [] store_bind+0x1ad/0x1c1 [] dev_attr_store+0x13/0x1f [] sysfs_write_file+0xe9/0x121 [] vfs_write+0x9b/0xfd [] sys_write+0x3e/0x6b [] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue)); *** DEADLOCK *** v2: Mark the lockdep_map static, noticed by Jani Nikula. Cc: Dave Airlie Cc: Thomas Gleixner Cc: Alan Cox Cc: Peter Zijlstra Signed-off-by: Daniel Vetter --- kernel/printk.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/printk.c b/kernel/printk.c index ed9af6a..e5c6dba 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -87,6 +87,12 @@ static DEFINE_SEMAPHORE(console_sem); struct console *console_drivers; EXPORT_SYMBOL_GPL(console_drivers); +#ifdef CONFIG_LOCKDEP +static struct lockdep_map console_lock_dep_map = { + .name = "console_lock" +}; +#endif + /* * This is used for debugging the mess that is the VT code by * keeping track if we have the console semaphore held. It's @@ -1916,6 +1922,7 @@ void console_lock(void) return; console_locked = 1; console_may_schedule = 1; + mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_); } EXPORT_SYMBOL(console_lock); @@ -1937,6 +1944,7 @@ int console_trylock(void) } console_locked = 1; console_may_schedule = 0; + mutex_acquire(&console_lock_dep_map, 0, 1, _RET_IP_); return 1; } EXPORT_SYMBOL(console_trylock); @@ -2097,6 +2105,7 @@ skip: local_irq_restore(flags); } console_locked = 0; + mutex_release(&console_lock_dep_map, 1, _RET_IP_); /* Release the exclusive_console once it is used */ if (unlikely(exclusive_console))