From patchwork Thu Jun 30 03:30:38 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoguang Wang X-Patchwork-Id: 9206557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DBEAD6075F for ; Thu, 30 Jun 2016 03:32:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4A4228658 for ; Thu, 30 Jun 2016 03:32:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B8C292866B; Thu, 30 Jun 2016 03:32:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1EBD128658 for ; Thu, 30 Jun 2016 03:32:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751646AbcF3Dc1 (ORCPT ); Wed, 29 Jun 2016 23:32:27 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:29884 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751486AbcF3Dc0 (ORCPT ); Wed, 29 Jun 2016 23:32:26 -0400 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="8210855" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 30 Jun 2016 11:32:21 +0800 Received: from G08CNEXCHPEKD01.g08.fujitsu.local (unknown [10.167.33.80]) by cn.fujitsu.com (Postfix) with ESMTP id EE49141C0BB0; Thu, 30 Jun 2016 11:32:16 +0800 (CST) Received: from localhost.localdomain (10.167.226.107) by G08CNEXCHPEKD01.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.279.2; Thu, 30 Jun 2016 11:32:16 +0800 Subject: Re: [PATCH 1/2] btrfs: fix fsfreeze hang caused by delayed iputs deal To: References: <20160629051510.17222-1-wangxg.fnst@cn.fujitsu.com> CC: "linux-btrfs@vger.kernel.org" From: Wang Xiaoguang Message-ID: <5774925E.6060106@cn.fujitsu.com> Date: Thu, 30 Jun 2016 11:30:38 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.167.226.107] X-yoursite-MailScanner-ID: EE49141C0BB0.A9F83 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: wangxg.fnst@cn.fujitsu.com Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP hello, On 06/29/2016 07:16 PM, Filipe Manana wrote: > On Wed, Jun 29, 2016 at 6:15 AM, Wang Xiaoguang > wrote: >> When running fstests generic/068, sometimes we got below WARNING: >> xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080 >> ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000 >> ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001 >> ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8 >> Call Trace: >> [] schedule+0x35/0x80 >> [] rwsem_down_read_failed+0xf2/0x140 >> [] ? __filemap_fdatawrite_range+0xd1/0x100 >> [] call_rwsem_down_read_failed+0x18/0x30 >> [] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs] >> [] percpu_down_read+0x35/0x50 >> [] __sb_start_write+0x2c/0x40 >> [] start_transaction+0x2a5/0x4d0 [btrfs] >> [] btrfs_join_transaction+0x17/0x20 [btrfs] >> [] btrfs_evict_inode+0x3c4/0x5d0 [btrfs] >> [] evict+0xba/0x1a0 >> [] iput+0x196/0x200 >> [] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs] >> [] btrfs_commit_transaction+0x928/0xa80 [btrfs] >> [] btrfs_freeze+0x30/0x40 [btrfs] >> [] freeze_super+0xf0/0x190 >> [] do_vfs_ioctl+0x4a5/0x5c0 >> [] ? do_audit_syscall_entry+0x66/0x70 >> [] ? syscall_trace_enter_phase1+0x11f/0x140 >> [] SyS_ioctl+0x79/0x90 >> [] do_syscall_64+0x62/0x110 >> [] entry_SYSCALL64_slow_path+0x25/0x25 >> >> From this warning, freeze_super() already holds SB_FREEZE_FS, but >> btrfs_freeze() will call btrfs_commit_transaction() again, if >> btrfs_commit_transaction() finds that it has delayed iputs to handle, >> it'll start_transaction(), which will try to get SB_FREEZE_FS lock >> again, then deadlock occurs. >> >> The root cause is that in btrfs, sync_filesystem(sb) does not make >> sure all metadata is updated. See below race window in freeze_super(): >> sync_filesystem(sb); >> | >> | race window >> | In this period, cleaner_kthread() may be scheduled to >> | run, and it call btrfs_delete_unused_bgs() which will >> | add some delayed iputs. >> | >> sb->s_writers.frozen = SB_FREEZE_FS; >> sb_wait_write(sb, SB_FREEZE_FS); >> if (sb->s_op->freeze_fs) { >> /* freeze_fs will call btrfs_commit_transaction() */ >> ret = sb->s_op->freeze_fs(sb); >> > This pseudo diagram also doesn't well the problem. > You should add two timelines for 2 different CPUs/tasks where we see > one locking SB_FREEZE_FS and calling btrfs_freeze() > while the task in the other CPU calls btrfs_commit_transaction(), runs > delayed iputs and then starts/joins a transaction in the > eviction handler. OK, got it. > > > >> So if btrfs is doing freeze job, we should block >> btrfs_delete_unused_bgs(), to avoid add delayed iputs. > Nop, this isn't a real solution. > > You can have many other parts adding inodes to the delayed iputs list, > not just btrfs_delete_unused_bgs() (doing it indirectly for the free > space cache inodes). > For example, when an ordered extent completes we add its inode to the > delayed iputs list at btrfs_put_ordered_extent(), called through > btrfs_finish_ordered_io(). Yes, I know. Before sending this patch, I had searched related codes which call btrfs_add_delayed_iput(). I had thought sync_filesystem() in freeze_super() would ensure that all other btrfs_add_delayed_iput()s are called before transaction starts except the one in cleaner_kthread(). I also have one question, for compressed buffered write, even sync_filesystem() returns, we can not ensure the data is written to disk, right? It seems that btrfs_writepages() can return directly for compressed data, but with page still locked and no WRITEBACK flag set, so later filemap_fdatawait() won't work. btrfs_fdatawrite_range() is used to resolve this issue, but it seems that sync_filesystem() won't make btrfs_fdatawrite_range() called. Also would you be OK with this solution: return ret; Regards, Xiaoguang Wang > > thanks > > > >> Signed-off-by: Wang Xiaoguang >> --- >> fs/btrfs/disk-io.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 863bf7a..fdbe0df 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -1846,8 +1846,11 @@ static int cleaner_kthread(void *arg) >> * after acquiring fs_info->delete_unused_bgs_mutex. So we >> * can't hold, nor need to, fs_info->cleaner_mutex when deleting >> * unused block groups. >> + * > Unneeded and unrelated change, please remove it. > >> */ >> + __sb_start_write(root->fs_info->sb, SB_FREEZE_WRITE, true); >> btrfs_delete_unused_bgs(root->fs_info); >> + __sb_end_write(root->fs_info->sb, SB_FREEZE_WRITE); > A comment explaining why we do this would be valuable, since we never > do these calls except when starting/committing transactions. > >> sleep: >> if (!again) { >> set_current_state(TASK_INTERRUPTIBLE); >> -- >> 2.9.0 >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 948aa18..89a996c6 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2278,7 +2278,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); if (current != root->fs_info->transaction_kthread && - current != root->fs_info->cleaner_kthread) + current != root->fs_info->cleaner_kthread && + root->fs_info->sb->s_writers.frozen != SB_FREEZE_FS) btrfs_run_delayed_iputs(root);