From patchwork Tue Aug 11 17:04:01 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oleg Nesterov X-Patchwork-Id: 6993761 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 18B62C05AC for ; Tue, 11 Aug 2015 17:08:34 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4664B205B1 for ; Tue, 11 Aug 2015 17:08:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2C833205AA for ; Tue, 11 Aug 2015 17:08:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965534AbbHKRGR (ORCPT ); Tue, 11 Aug 2015 13:06:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42214 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965532AbbHKRGP (ORCPT ); Tue, 11 Aug 2015 13:06:15 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 6B64C157742; Tue, 11 Aug 2015 17:06:15 +0000 (UTC) Received: from tranklukator.brq.redhat.com (dhcp-1-102.brq.redhat.com [10.34.1.102]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id t7BH6Ca3024464; Tue, 11 Aug 2015 13:06:13 -0400 Received: by tranklukator.brq.redhat.com (nbSMTP-1.00) for uid 500 oleg@redhat.com; Tue, 11 Aug 2015 19:04:03 +0200 (CEST) Date: Tue, 11 Aug 2015 19:04:01 +0200 From: Oleg Nesterov To: Al Viro , Dave Chinner , Dave Hansen , Jan Kara Cc: "Paul E. McKenney" , Peter Zijlstra , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/8] fix the broken lockdep logic in __sb_start_write() Message-ID: <20150811170401.GA26904@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150811170343.GA26881@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP 1. wait_event(frozen < level) without rwsem_acquire_read() is just wrong from lockdep perspective. If we are going to deadlock because the caller is buggy, lockdep detect this problem. 2. __sb_start_write() can race with thaw_super() + freeze_super(), and after "goto retry" the 2nd acquire_freeze_lock() is wrong. 3. The "tell lockdep we are doing trylock" hack doesn't look nice. I think this is correct, but this logic should be more explicit. Yes, the recursive read_lock() is fine if we hold the lock on a higher level. But we do not need to fool lockdep. If we can not deadlock in this case then try-lock must not fail and we can use use wait == F throughout this code. Note: as Dave Chinner explains, the "trylock" hack and the fat comment can be probably removed. But this needs a separate change and it will be trivial: just kill __sb_start_write() and rename do_sb_start_write() back to __sb_start_write(). Signed-off-by: Oleg Nesterov Reviewed-by: Jan Kara --- fs/super.c | 73 ++++++++++++++++++++++++++++++++--------------------------- 1 files changed, 40 insertions(+), 33 deletions(-) diff --git a/fs/super.c b/fs/super.c index 928c20f..d0fdd49 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1158,38 +1158,11 @@ void __sb_end_write(struct super_block *sb, int level) } EXPORT_SYMBOL(__sb_end_write); -#ifdef CONFIG_LOCKDEP -/* - * We want lockdep to tell us about possible deadlocks with freezing but - * it's it bit tricky to properly instrument it. Getting a freeze protection - * works as getting a read lock but there are subtle problems. XFS for example - * gets freeze protection on internal level twice in some cases, which is OK - * only because we already hold a freeze protection also on higher level. Due - * to these cases we have to tell lockdep we are doing trylock when we - * already hold a freeze protection for a higher freeze level. - */ -static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock, +static int do_sb_start_write(struct super_block *sb, int level, bool wait, unsigned long ip) { - int i; - - if (!trylock) { - for (i = 0; i < level - 1; i++) - if (lock_is_held(&sb->s_writers.lock_map[i])) { - trylock = true; - break; - } - } - rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip); -} -#endif - -/* - * This is an internal function, please use sb_start_{write,pagefault,intwrite} - * instead. - */ -int __sb_start_write(struct super_block *sb, int level, bool wait) -{ + if (wait) + rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip); retry: if (unlikely(sb->s_writers.frozen >= level)) { if (!wait) @@ -1198,9 +1171,6 @@ retry: sb->s_writers.frozen < level); } -#ifdef CONFIG_LOCKDEP - acquire_freeze_lock(sb, level, !wait, _RET_IP_); -#endif percpu_counter_inc(&sb->s_writers.counter[level-1]); /* * Make sure counter is updated before we check for frozen. @@ -1211,8 +1181,45 @@ retry: __sb_end_write(sb, level); goto retry; } + + if (!wait) + rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip); return 1; } + +/* + * This is an internal function, please use sb_start_{write,pagefault,intwrite} + * instead. + */ +int __sb_start_write(struct super_block *sb, int level, bool wait) +{ + bool force_trylock = false; + int ret; + +#ifdef CONFIG_LOCKDEP + /* + * We want lockdep to tell us about possible deadlocks with freezing + * but it's it bit tricky to properly instrument it. Getting a freeze + * protection works as getting a read lock but there are subtle + * problems. XFS for example gets freeze protection on internal level + * twice in some cases, which is OK only because we already hold a + * freeze protection also on higher level. Due to these cases we have + * to use wait == F (trylock mode) which must not fail. + */ + if (wait) { + int i; + + for (i = 0; i < level - 1; i++) + if (lock_is_held(&sb->s_writers.lock_map[i])) { + force_trylock = true; + break; + } + } +#endif + ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_); + WARN_ON(force_trylock & !ret); + return ret; +} EXPORT_SYMBOL(__sb_start_write); /**