From patchwork Tue Aug 11 13:16:26 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oleg Nesterov X-Patchwork-Id: 6992321 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 79468C05AC for ; Tue, 11 Aug 2015 13:18:59 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7176D20552 for ; Tue, 11 Aug 2015 13:18:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5C7B020549 for ; Tue, 11 Aug 2015 13:18:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755278AbbHKNSl (ORCPT ); Tue, 11 Aug 2015 09:18:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49325 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752281AbbHKNSk (ORCPT ); Tue, 11 Aug 2015 09:18:40 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id 6EAC38F29D; Tue, 11 Aug 2015 13:18:40 +0000 (UTC) Received: from tranklukator.brq.redhat.com (dhcp-1-102.brq.redhat.com [10.34.1.102]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id t7BDIbZS023119; Tue, 11 Aug 2015 09:18:38 -0400 Received: by tranklukator.brq.redhat.com (nbSMTP-1.00) for uid 500 oleg@redhat.com; Tue, 11 Aug 2015 15:16:29 +0200 (CEST) Date: Tue, 11 Aug 2015 15:16:26 +0200 From: Oleg Nesterov To: Dave Chinner Cc: Jan Kara , Al Viro , Dave Hansen , "Paul E. McKenney" , Peter Zijlstra , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore Message-ID: <20150811131626.GA19780@redhat.com> References: <20150722211513.GA19986@redhat.com> <20150807195552.GB28529@redhat.com> <20150810145942.GF3768@quack.suse.cz> <20150810224154.GK3902@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150810224154.GK3902@dastard> User-Agent: Mutt/1.5.18 (2008-05-17) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 08/11, Dave Chinner wrote: > > On Mon, Aug 10, 2015 at 04:59:42PM +0200, Jan Kara wrote: > > > > One would like to construct the lock chain as: > > > > CPU0 (chown foo dir) CPU1 (readdir dir) CPU2 (page fault) > > process Y process X, thread 0 process X, thread 1 > > > > get ILOCK for dir > > gets freeze protection > > starts transaction in xfs_setattr_nonsize > > waits to get ILOCK on 'dir' > > get mmap_sem for X > > wait for mmap_sem for process X > > in filldir() > > wait for freeze protection in > > xfs_page_mkwrite > > > > and CPU3 then being in freeze_super() blocking CPU2 and waiting for CPU0 to > > finish it's freeze-protected section. But this cannot happen. The reason is > > that we block writers level-by-level and thus while there are writers at > > level X, we do not block writers at level X+1. So in this particular case > > freeze_super() will block waiting for CPU0 to finish its freeze protected > > section while CPU2 is free to continue. > > > > In general we have a chain like > > > > freeze L0 -> freeze L1 -> freeze L2 -> ILOCK -> mmap_sem --\ > > A | > > \------------------------------------------/ > > > > But since ILOCK is always acquired with freeze protection at L0 and we can > > block at L1 only after there are no writers at L0, this loop can never > > happen. > > > > Note that if we use the property of freezing that lock at level X+1 cannot > > block when we hold lock at level X, we can as well simplify the dependency > > graph and track in it only the lowest level of freeze lock that is > > currently acquired (since the levels above it cannot block and do not in > > any way influence blocking of other processes either and thus are > > irrelevant for the purpose of deadlock detection). Then the dependency > > graph we'd get would be: > > > > freeze L0 -> ILOCK -> mmap_sem -> freeze L1 > > > > and we have a nice acyclic graph we like to see... So probably we have to > > hack the lockdep instrumentation some more and just don't tell lockdep > > about freeze locks at higher levels if we already hold a lock at lower > > level. Thoughts? > > The XFS directory ilock->filldir->might_fault locking path has been > generating false positives in quite a lot of places because of > things we do on one side of the mmap_sem in filesystem paths vs > thigs we do on the other side of the mmap_sem in the page fault > path. OK. Dave, Jan, thanks a lot. I was also confused because I didn't know that "Chain exists of" part of print_circular_bug() only prints the _partial_ chain, and I have to admit that I do not even understand which part it actually shows... I'll drop move rwsem_release() from sb_wait_write() to freeze_super() change thaw_super() to re-acquire s_writers.lock_map from the previous series and resend everything. Lets change sb_writers to use percpu_rw_semaphore first, then try to improve the lockdep annotations. See the interdiff below. With this change I have TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 SCRATCH_MNT=SCRATCH \ ./check `grep -il freeze tests/*/???` ... Ran: generic/068 generic/085 generic/280 generic/311 xfs/011 xfs/119 xfs/297 Passed all 7 tests anything else I should test? Oleg. this needs a comment in sb_wait_write() to explain that this is not what we want. --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/fs/super.c +++ b/fs/super.c @@ -1215,27 +1215,15 @@ EXPORT_SYMBOL(__sb_start_write); static void sb_wait_write(struct super_block *sb, int level) { percpu_down_write(sb->s_writers.rw_sem + level-1); + percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_); } -static void sb_freeze_release(struct super_block *sb) -{ - int level; - /* Avoid the warning from lockdep_sys_exit() */ - for (level = 0; level < SB_FREEZE_LEVELS; ++level) - percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_); -} - -static void sb_freeze_acquire(struct super_block *sb) +static void sb_freeze_unlock(struct super_block *sb) { int level; for (level = 0; level < SB_FREEZE_LEVELS; ++level) percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_); -} - -static void sb_freeze_unlock(struct super_block *sb) -{ - int level; for (level = SB_FREEZE_LEVELS; --level >= 0; ) percpu_up_write(sb->s_writers.rw_sem + level); @@ -1331,7 +1319,6 @@ int freeze_super(struct super_block *sb) * sees write activity when frozen is set to SB_FREEZE_COMPLETE. */ sb->s_writers.frozen = SB_FREEZE_COMPLETE; - sb_freeze_release(sb); up_write(&sb->s_umount); return 0; } @@ -1358,14 +1345,11 @@ int thaw_super(struct super_block *sb) goto out; } - sb_freeze_acquire(sb); - if (sb->s_op->unfreeze_fs) { error = sb->s_op->unfreeze_fs(sb); if (error) { printk(KERN_ERR "VFS:Filesystem thaw failed\n"); - sb_freeze_release(sb); up_write(&sb->s_umount); return error; }