From patchwork Fri Aug 7 19:55:52 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oleg Nesterov X-Patchwork-Id: 6972871 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 9346D9F457 for ; Fri, 7 Aug 2015 19:58:24 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5D7F1203E3 for ; Fri, 7 Aug 2015 19:58:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0608C203E1 for ; Fri, 7 Aug 2015 19:58:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753734AbbHGT6E (ORCPT ); Fri, 7 Aug 2015 15:58:04 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39803 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751373AbbHGT6C (ORCPT ); Fri, 7 Aug 2015 15:58:02 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 729C2AACDF; Fri, 7 Aug 2015 19:58:01 +0000 (UTC) Received: from tranklukator.brq.redhat.com (dhcp-1-102.brq.redhat.com [10.34.1.102]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id t77JvwMJ017107; Fri, 7 Aug 2015 15:57:59 -0400 Received: by tranklukator.brq.redhat.com (nbSMTP-1.00) for uid 500 oleg@redhat.com; Fri, 7 Aug 2015 21:55:54 +0200 (CEST) Date: Fri, 7 Aug 2015 21:55:52 +0200 From: Oleg Nesterov To: Al Viro , Dave Chinner , Dave Hansen , Jan Kara Cc: "Paul E. McKenney" , Peter Zijlstra , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore Message-ID: <20150807195552.GB28529@redhat.com> References: <20150722211513.GA19986@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150722211513.GA19986@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Jan, Dave, please help. I'll try to re-check/re-test, but so far I think that this and the previous series are correct. However 3/4 from the previous series (attached at the end just in case) uncovers (I think) some problems in xfs locking. What should I do now? On 07/22, Oleg Nesterov wrote: > > Testing. Well, so far I only verified that ioctl(FIFREEZE) + > ioctl(FITHAW) seems to wors "as expected" on my testing machine > with ext3. So probably this needs more testing. Finally I got the testing machine and ran xfstests, I did dd if=/dev/zero of=TEST.img bs=1MiB count=4000 dd if=/dev/zero of=SCRATCH.img bs=1MiB count=4000 losetup --find --show TEST.img losetup --find --show SCRATCH.img mkfs.xfs -f /dev/loop0 mkfs.xfs -f /dev/loop1 mkdir -p TEST SCRATCH mount /dev/loop0 TEST mount /dev/loop1 SCRATCH TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 SCRATCH_MNT=SCRATCH \ ./check `grep -il freeze tests/*/???` several times and every time the result looks like below: FSTYP -- xfs (non-debug) PLATFORM -- Linux/x86_64 intel-canoepass-10 4.1.0+ MKFS_OPTIONS -- -f -bsize=4096 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/loop1 SCRATCH generic/068 59s ... 61s generic/085 23s ... 26s generic/280 2s ... 3s generic/311 169s ... 167s xfs/011 21s ... 20s xfs/119 32s ... 21s xfs/297 455s ... 301s Ran: generic/068 generic/085 generic/280 generic/311 xfs/011 xfs/119 xfs/297 Passed all 7 tests But with CONFIG_LOCKDEP=y generic/068 triggers the warning: [ 66.092831] ====================================================== [ 66.099726] [ INFO: possible circular locking dependency detected ] [ 66.106719] 4.1.0+ #2 Not tainted [ 66.110414] ------------------------------------------------------- [ 66.117405] fsstress/2210 is trying to acquire lock: [ 66.122944] (&mm->mmap_sem){++++++}, at: [] might_fault+0x42/0xa0 [ 66.131637] but task is already holding lock: [ 66.138146] (&xfs_dir_ilock_class){++++..}, at: [] xfs_ilock+0xc2/0x170 [xfs] [ 66.148022] which lock already depends on the new lock. [ 66.157141] the existing dependency chain (in reverse order) is: [ 66.165490] -> #4 (&xfs_dir_ilock_class){++++..}: [ 66.170974] [] lock_acquire+0xbe/0x150 [ 66.177596] [] down_write_nested+0x3c/0x70 [ 66.184605] [] xfs_ilock+0x126/0x170 [xfs] [ 66.191645] [] xfs_setattr_nonsize+0x3ba/0x5d0 [xfs] [ 66.199638] [] xfs_vn_setattr+0x9a/0xd0 [xfs] [ 66.206950] [] notify_change+0x271/0x3a0 [ 66.213762] [] chown_common.isra.11+0x15b/0x200 [ 66.221251] [] SyS_lchown+0xa6/0xf0 [ 66.227576] [] system_call_fastpath+0x12/0x76 [ 66.234878] -> #3 (sb_internal#2){++++++}: [ 66.239698] [] lock_acquire+0xbe/0x150 [ 66.246316] [] down_write+0x36/0x70 [ 66.252644] [] percpu_down_write+0x39/0x110 [ 66.259760] [] freeze_super+0xbd/0x190 [ 66.266369] [] do_vfs_ioctl+0x3f8/0x520 [ 66.273082] [] SyS_ioctl+0x81/0xa0 [ 66.279311] [] system_call_fastpath+0x12/0x76 [ 66.286610] -> #2 (sb_pagefaults#2){++++++}: [ 66.291623] [] lock_acquire+0xbe/0x150 [ 66.298237] [] percpu_down_read+0x51/0xa0 [ 66.305144] [] __sb_start_write+0xdb/0x120 [ 66.312140] [] block_page_mkwrite+0x3a/0xb0 [ 66.319245] [] xfs_filemap_page_mkwrite+0x5e/0xb0 [xfs] [ 66.327533] [] do_page_mkwrite+0x58/0x100 [ 66.334433] [] handle_mm_fault+0x537/0x17c0 [ 66.341533] [] __do_page_fault+0x19c/0x450 [ 66.348542] [] do_page_fault+0x2f/0x80 [ 66.355172] [] page_fault+0x28/0x30 [ 66.361493] -> #1 (&(&ip->i_mmaplock)->mr_lock){++++++}: [ 66.367659] [] lock_acquire+0xbe/0x150 [ 66.374275] [] down_read_nested+0x3f/0x60 [ 66.381185] [] xfs_ilock+0x168/0x170 [xfs] [ 66.388212] [] xfs_filemap_fault+0x4c/0xb0 [xfs] [ 66.395817] [] __do_fault+0x4e/0x100 [ 66.402239] [] handle_mm_fault+0x4f4/0x17c0 [ 66.409340] [] __do_page_fault+0x19c/0x450 [ 66.416344] [] do_page_fault+0x2f/0x80 [ 66.422959] [] page_fault+0x28/0x30 [ 66.429283] -> #0 (&mm->mmap_sem){++++++}: [ 66.434093] [] __lock_acquire+0x20d7/0x2100 [ 66.441194] [] lock_acquire+0xbe/0x150 [ 66.447808] [] might_fault+0x6f/0xa0 [ 66.454235] [] filldir+0x9a/0x130 [ 66.460373] [] xfs_dir2_sf_getdents.isra.10+0x1bd/0x220 [xfs] [ 66.469233] [] xfs_readdir+0x17e/0x1a0 [xfs] [ 66.476449] [] xfs_file_readdir+0x2b/0x30 [xfs] [ 66.483954] [] iterate_dir+0x9a/0x140 [ 66.490473] [] SyS_getdents+0x91/0x120 [ 66.497091] [] system_call_fastpath+0x12/0x76 [ 66.504381] other info that might help us debug this: [ 66.513316] Chain exists of: &mm->mmap_sem --> sb_internal#2 --> &xfs_dir_ilock_class [ 66.522619] Possible unsafe locking scenario: [ 66.529222] CPU0 CPU1 [ 66.534275] ---- ---- [ 66.539327] lock(&xfs_dir_ilock_class); [ 66.543823] lock(sb_internal#2); [ 66.550465] lock(&xfs_dir_ilock_class); [ 66.557767] lock(&mm->mmap_sem); [ 66.561580] *** DEADLOCK *** [ 66.568186] 2 locks held by fsstress/2210: [ 66.572753] #0: (&type->i_mutex_dir_key#5){+.+.+.}, at: [] iterate_dir+0x61/0x140 [ 66.583103] #1: (&xfs_dir_ilock_class){++++..}, at: [] xfs_ilock+0xc2/0x170 [xfs] [ 66.593457] stack backtrace: [ 66.598321] CPU: 26 PID: 2210 Comm: fsstress Not tainted 4.1.0+ #2 [ 66.605215] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013 [ 66.616372] 0000000000000000 0000000048bb9c77 ffff8817fa127ad8 ffffffff8181d9dd [ 66.624663] 0000000000000000 ffffffff8288f500 ffff8817fa127b28 ffffffff810f7b26 [ 66.632955] 0000000000000002 ffff8817fa127b98 ffff8817fa127b28 ffff881803f06480 [ 66.641249] Call Trace: [ 66.643983] [] dump_stack+0x45/0x57 [ 66.649713] [] print_circular_bug+0x206/0x280 [ 66.656413] [] __lock_acquire+0x20d7/0x2100 [ 66.662919] [] ? __lock_acquire+0xa27/0x2100 [ 66.669523] [] lock_acquire+0xbe/0x150 [ 66.675543] [] ? might_fault+0x42/0xa0 [ 66.681566] [] might_fault+0x6f/0xa0 [ 66.687394] [] ? might_fault+0x42/0xa0 [ 66.693418] [] filldir+0x9a/0x130 [ 66.698968] [] ? xfs_ilock_data_map_shared+0x34/0x40 [xfs] [ 66.706941] [] xfs_dir2_sf_getdents.isra.10+0x1bd/0x220 [xfs] [ 66.715203] [] ? xfs_ilock+0xc2/0x170 [xfs] [ 66.721712] [] xfs_readdir+0x17e/0x1a0 [xfs] [ 66.728316] [] ? mutex_lock_killable_nested+0x3ad/0x480 [ 66.735998] [] xfs_file_readdir+0x2b/0x30 [xfs] [ 66.742894] [] iterate_dir+0x9a/0x140 [ 66.748819] [] ? __fget_light+0x6c/0xa0 [ 66.754938] [] SyS_getdents+0x91/0x120 [ 66.760958] [] ? fillonedir+0xf0/0xf0 [ 66.766884] [] system_call_fastpath+0x12/0x76 The chain reported by lockdep is not exactly the same every time, but similar. Once again, I'll recheck. but the patch below still looks correct to me, and I think that it is actually a fix. Before this patch freeze_super() calls sync_filesystem() and s_op->freeze_fs(sb) without ->s_writers locks (as it seen by lockdep) and this is wrong. After this patch lockdep knows about ->s_writers locks held and this is what we want, but apparently this way lockdep can notice the new dependencies. Oleg. ------------------------------------------------------------------------------ Subject: [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Move the "fool the lockdep" code from sb_wait_write() into the new simple helper, sb_lockdep_release(), called once before return from freeze_super(). This is preparation, but imo this makes sense in any case. This way we do not hide from lockdep the "write" locks we hold when we call s_op->freeze_fs(sb). Signed-off-by: Oleg Nesterov --- fs/super.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/super.c b/fs/super.c index d0fdd49..e7ea9f1 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1236,16 +1236,10 @@ static void sb_wait_write(struct super_block *sb, int level) { s64 writers; - /* - * We just cycle-through lockdep here so that it does not complain - * about returning with lock to userspace - */ rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_); - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_); do { DEFINE_WAIT(wait); - /* * We use a barrier in prepare_to_wait() to separate setting * of frozen and checking of the counter @@ -1261,6 +1255,14 @@ static void sb_wait_write(struct super_block *sb, int level) } while (writers); } +static void sb_freeze_release(struct super_block *sb) +{ + int level; + /* Avoid the warning from lockdep_sys_exit() */ + for (level = 0; level < SB_FREEZE_LEVELS; ++level) + rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_); +} + /** * freeze_super - lock the filesystem and force it into a consistent state * @sb: the super to lock @@ -1349,6 +1351,7 @@ int freeze_super(struct super_block *sb) sb->s_writers.frozen = SB_UNFROZEN; smp_wmb(); wake_up(&sb->s_writers.wait_unfrozen); + sb_freeze_release(sb); deactivate_locked_super(sb); return ret; } @@ -1358,6 +1361,7 @@ int freeze_super(struct super_block *sb) * sees write activity when frozen is set to SB_FREEZE_COMPLETE. */ sb->s_writers.frozen = SB_FREEZE_COMPLETE; + sb_freeze_release(sb); up_write(&sb->s_umount); return 0; }