From patchwork Thu Oct 22 19:38:53 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Norris X-Patchwork-Id: 7467301 Return-Path: X-Original-To: patchwork-linux-omap@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id D2598BEEA4 for ; Thu, 22 Oct 2015 19:38:59 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 8FACF208F8 for ; Thu, 22 Oct 2015 19:38:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 345242051D for ; Thu, 22 Oct 2015 19:38:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757517AbbJVTi4 (ORCPT ); Thu, 22 Oct 2015 15:38:56 -0400 Received: from mail-pa0-f54.google.com ([209.85.220.54]:36417 "EHLO mail-pa0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757354AbbJVTi4 (ORCPT ); Thu, 22 Oct 2015 15:38:56 -0400 Received: by pacfv9 with SMTP id fv9so99188790pac.3 for ; Thu, 22 Oct 2015 12:38:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=ThEq/WuSURSTTLVAQR2/sflPZT8KrH/obnAs4nFLJxc=; b=Q4PEFL1cX57H4eeafNXesqQUPdZLc2+6A0dCNIS+8LXEWHpFRgPlL05mYFm8C+rJAg fa1Bo548GZ7+TM0ViadDQOf/6MmV1lyfad+hHVGrXNtCLmFguDlStid4u8MilU/sQK1l agJJSiRjWtNKdpSeqVRqwcU/uqOx10l2xTxblmMnHw02EOWP8IHolZko5s9XwYSbktQy TTPqOn6fDsHXPR9BxypKR2IPEvZRv/ZZPk+YvYw63SubkyljFBZwClmmDE3FdnRiIjHL +Il0h35YzbFTO1OFb3bqwdliMriDLeIEZQatpz3jpM0wj6Z3dkMS8HmwKkTdqcUejK1T 1YuA== X-Received: by 10.66.150.161 with SMTP id uj1mr19295608pab.148.1445542735453; Thu, 22 Oct 2015 12:38:55 -0700 (PDT) Received: from google.com ([2620:0:1000:1301:1c2f:bbf4:3a8a:26bc]) by smtp.gmail.com with ESMTPSA id kv9sm15213638pab.39.2015.10.22.12.38.54 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Thu, 22 Oct 2015 12:38:54 -0700 (PDT) Date: Thu, 22 Oct 2015 12:38:53 -0700 From: Brian Norris To: Felipe Balbi Cc: David Woodhouse , linux-mtd@lists.infradead.org, linux-omap@vger.kernel.org, Tony Lindgren Subject: Re: Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab Message-ID: <20151022193853.GH13239@google.com> References: <87y4euenip.fsf@saruman.tx.rr.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <87y4euenip.fsf@saruman.tx.rr.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-omap-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-omap@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Felipe, First of all, this is only a quick response. I could easily be missing something obvious. On Thu, Oct 22, 2015 at 02:01:02PM -0500, Felipe Balbi wrote: > > Hi, > > I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition > when accessing mtd->usecount) caused a regression at least when removing > m25p80. Wonder if you guys would know of a quick fix, other than > reverting $commit in HEAD (yes, that makes the problem go away, but > regresses on what $commit tried to fix, of course). > > More info about the regression follows, together with bisection log: Not all deadlocks alleged by lockdep are true deadlocks. Are you able to reproduce a real deadlock? (I know, it's not always easy to hit, so I'm not discounting this based on lack of evidence.) In particular, I think something like this warning was mentioned previously, and I looked into it a bit but didn't have the time to figure out for sure (and figure out how to squash the potential false positive). Hence, I'm responding now, even if I'm not 100% sure. More below. > # modprobe -r m25p80 > [ 53.419251] > [ 53.420838] ====================================================== > [ 53.427300] [ INFO: possible circular locking dependency detected ] > [ 53.433865] 4.3.0-rc6 #96 Not tainted > [ 53.437686] ------------------------------------------------------- > [ 53.444220] modprobe/372 is trying to acquire lock: > [ 53.449320] (&new->lock){+.+...}, at: [] del_mtd_blktrans_dev+0x80/0xdc > [ 53.457271] > [ 53.457271] but task is already holding lock: > [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [] del_mtd_device+0x18/0x100 > [ 53.471321] > [ 53.471321] which lock already depends on the new lock. > [ 53.471321] > [ 53.479856] > [ 53.479856] the existing dependency chain (in reverse order) is: > [ 53.487660] > -> #1 (mtd_table_mutex){+.+.+.}: > [ 53.492331] [] blktrans_open+0x34/0x1a4 > [ 53.497879] [] __blkdev_get+0xc4/0x3b0 > [ 53.503364] [] blkdev_get+0x108/0x320 > [ 53.508743] [] do_dentry_open+0x218/0x314 > [ 53.514496] [] path_openat+0x4c0/0xf9c > [ 53.519959] [] do_filp_open+0x5c/0xc0 > [ 53.525336] [] do_sys_open+0xfc/0x1cc > [ 53.530716] [] ret_fast_syscall+0x0/0x1c > [ 53.536375] > -> #0 (&new->lock){+.+...}: > [ 53.540587] [] mutex_lock_nested+0x38/0x3cc > [ 53.546504] [] del_mtd_blktrans_dev+0x80/0xdc > [ 53.552606] [] blktrans_notify_remove+0x7c/0x84 > [ 53.558891] [] del_mtd_device+0x74/0x100 > [ 53.564544] [] del_mtd_partitions+0x80/0xc8 > [ 53.570451] [] mtd_device_unregister+0x24/0x48 > [ 53.576637] [] spi_drv_remove+0x1c/0x34 > [ 53.582207] [] __device_release_driver+0x88/0x114 > [ 53.588663] [] device_release_driver+0x20/0x2c > [ 53.594843] [] bus_remove_device+0xd8/0x108 > [ 53.600748] [] device_del+0x10c/0x210 > [ 53.606127] [] device_unregister+0xc/0x20 > [ 53.611849] [] __unregister+0x10/0x20 > [ 53.617211] [] device_for_each_child+0x50/0x7c > [ 53.623387] [] spi_unregister_master+0x58/0x8c > [ 53.629578] [] release_nodes+0x15c/0x1c8 > [ 53.635223] [] __device_release_driver+0x90/0x114 > [ 53.641689] [] driver_detach+0xb4/0xb8 > [ 53.647147] [] bus_remove_driver+0x4c/0xa0 > [ 53.652970] [] SyS_delete_module+0x11c/0x1e4 > [ 53.658976] [] ret_fast_syscall+0x0/0x1c Actually, the more I look at this, the more I think the warning is probably correct. I might have been thinking of a different false positive. Tentatively: I think the right fix might be to reverse the ordering of acquire/release of the mtd_table_mutex and dev->lock throughout mtd_blkdevs.c. See below. > [ 53.664621] > [ 53.664621] other info that might help us debug this: > [ 53.664621] > [ 53.672979] Possible unsafe locking scenario: > [ 53.672979] > [ 53.679169] CPU0 CPU1 > [ 53.683900] ---- ---- > [ 53.688633] lock(mtd_table_mutex); > [ 53.692383] lock(&new->lock); > [ 53.698306] lock(mtd_table_mutex); > [ 53.704658] lock(&new->lock); > [ 53.707946] > [ 53.707946] *** DEADLOCK *** > [ 53.707946] > [ 53.714123] 5 locks held by modprobe/372: > [ 53.718305] #0: (&dev->mutex){......}, at: [] driver_detach+0x44/0xb8 > [ 53.726147] #1: (&dev->mutex){......}, at: [] driver_detach+0x50/0xb8 > [ 53.733985] #2: (&dev->mutex){......}, at: [] device_release_driver+0x18/0x2c > [ 53.742541] #3: (mtd_partitions_mutex){+.+.+.}, at: [] del_mtd_partitions+0x1c/0xc8 > [ 53.751656] #4: (mtd_table_mutex){+.+.+.}, at: [] del_mtd_device+0x18/0x100 > [ 53.760048] > [ 53.760048] stack backtrace: > [ 53.764591] CPU: 0 PID: 372 Comm: modprobe Not tainted 4.3.0-rc6 #96 > [ 53.771217] Hardware name: Generic AM43 (Flattened Device Tree) > [ 53.777419] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > [ 53.785511] [] (show_stack) from [] (dump_stack+0x84/0x9c) > [ 53.793063] [] (dump_stack) from [] (print_circular_bug+0x1c8/0x30c) > [ 53.801500] [] (print_circular_bug) from [] (__lock_acquire+0x1a48/0x1cd8) > [ 53.810480] [] (__lock_acquire) from [] (lock_acquire+0xac/0x12c) > [ 53.818649] [] (lock_acquire) from [] (mutex_lock_nested+0x38/0x3cc) > [ 53.827103] [] (mutex_lock_nested) from [] (del_mtd_blktrans_dev+0x80/0xdc) > [ 53.836199] [] (del_mtd_blktrans_dev) from [] (blktrans_notify_remove+0x7c/0x84) > [ 53.845735] [] (blktrans_notify_remove) from [] (del_mtd_device+0x74/0x100) > [ 53.854833] [] (del_mtd_device) from [] (del_mtd_partitions+0x80/0xc8) > [ 53.863469] [] (del_mtd_partitions) from [] (mtd_device_unregister+0x24/0x48) > [ 53.872733] [] (mtd_device_unregister) from [] (spi_drv_remove+0x1c/0x34) > [ 53.881633] [] (spi_drv_remove) from [] (__device_release_driver+0x88/0x114) > [ 53.890788] [] (__device_release_driver) from [] (device_release_driver+0x20/0x2c) > [ 53.900483] [] (device_release_driver) from [] (bus_remove_device+0xd8/0x108) > [ 53.909735] [] (bus_remove_device) from [] (device_del+0x10c/0x210) > [ 53.918088] [] (device_del) from [] (device_unregister+0xc/0x20) > [ 53.926160] [] (device_unregister) from [] (__unregister+0x10/0x20) > [ 53.934526] [] (__unregister) from [] (device_for_each_child+0x50/0x7c) > [ 53.943261] [] (device_for_each_child) from [] (spi_unregister_master+0x58/0x8c) > [ 53.952805] [] (spi_unregister_master) from [] (release_nodes+0x15c/0x1c8) > [ 53.961809] [] (release_nodes) from [] (__device_release_driver+0x90/0x114) > [ 53.970883] [] (__device_release_driver) from [] (driver_detach+0xb4/0xb8) > [ 53.979864] [] (driver_detach) from [] (bus_remove_driver+0x4c/0xa0) > [ 53.988303] [] (bus_remove_driver) from [] (SyS_delete_module+0x11c/0x1e4) > [ 53.997285] [] (SyS_delete_module) from [] (ret_fast_syscall+0x0/0x1c) [...] Maybe this works? Not tested (not even compile tested). If so, then shame on me; I really don't even know why I picked the lock ordering I did in commit 073db4a51ee4 :( Signed-off-by: Brian Norris Tested-by: Felipe Balbi --- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index 44dc965a2f7c..e7a02ed9fba8 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -192,8 +192,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) if (!dev) return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/ - mutex_lock(&dev->lock); mutex_lock(&mtd_table_mutex); + mutex_lock(&dev->lock); if (dev->open) goto unlock; @@ -217,8 +217,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) unlock: dev->open++; - mutex_unlock(&mtd_table_mutex); mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); return ret; @@ -228,8 +228,8 @@ error_release: error_put: module_put(dev->tr->owner); kref_put(&dev->ref, blktrans_dev_release); - mutex_unlock(&mtd_table_mutex); mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); return ret; } @@ -241,8 +241,8 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) if (!dev) return; - mutex_lock(&dev->lock); mutex_lock(&mtd_table_mutex); + mutex_lock(&dev->lock); if (--dev->open) goto unlock; @@ -256,8 +256,8 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) __put_mtd_device(dev->mtd); } unlock: - mutex_unlock(&mtd_table_mutex); mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); }