From patchwork Sat Apr 25 17:29:16 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 6275011 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id D56C99F389 for ; Sat, 25 Apr 2015 17:30:39 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D26AB2024D for ; Sat, 25 Apr 2015 17:30:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 70F3D2011D for ; Sat, 25 Apr 2015 17:30:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966563AbbDYRa2 (ORCPT ); Sat, 25 Apr 2015 13:30:28 -0400 Received: from victor.provo.novell.com ([137.65.250.26]:34488 "EHLO prv3-mh.provo.novell.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965095AbbDYRa1 (ORCPT ); Sat, 25 Apr 2015 13:30:27 -0400 Received: from debian3.lan (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by prv3-mh.provo.novell.com with ESMTP (NOT encrypted); Sat, 25 Apr 2015 11:30:25 -0600 From: Filipe Manana To: linux-btrfs@vger.kernel.org Cc: clm@fb.com, Filipe Manana Subject: [PATCH] Btrfs: fix race between start dirty bg cache writeout and bg deletion Date: Sat, 25 Apr 2015 18:29:16 +0100 Message-Id: <1429982956-22284-1-git-send-email-fdmanana@suse.com> X-Mailer: git-send-email 2.1.3 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP While running xfstests I ran into the following: [20892.242791] ------------[ cut here ]------------ [20892.243776] WARNING: CPU: 0 PID: 13299 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]() [20892.245874] BTRFS: Transaction aborted (error -2) [20892.247329] Modules linked in: btrfs dm_snapshot dm_bufio dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse$ [20892.258488] CPU: 0 PID: 13299 Comm: fsstress Tainted: G W 4.0.0-rc5-btrfs-next-9+ #2 [20892.262011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [20892.264738] 0000000000000009 ffff880427f8bc18 ffffffff8142fa46 ffffffff8108b6a2 [20892.266244] ffff880427f8bc68 ffff880427f8bc58 ffffffff81045ea5 ffff880427f8bc48 [20892.267761] ffffffffa0509a6d 00000000fffffffe ffff8803545d6f40 ffffffffa05a15a0 [20892.269378] Call Trace: [20892.269915] [] dump_stack+0x4f/0x7b [20892.271097] [] ? console_unlock+0x361/0x3ad [20892.272173] [] warn_slowpath_common+0xa1/0xbb [20892.273386] [] ? __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.274857] [] warn_slowpath_fmt+0x46/0x48 [20892.275851] [] __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.277341] [] write_one_cache_group+0x68/0xaf [btrfs] [20892.278628] [] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs] [20892.280191] [] btrfs_commit_transaction+0x130/0x9c9 [btrfs] [20892.281781] [] ? trace_hardirqs_on+0xd/0xf [20892.282873] [] btrfs_sync_file+0x313/0x387 [btrfs] [20892.284111] [] vfs_fsync_range+0x95/0xa4 [20892.285203] [] ? time_hardirqs_on+0x15/0x28 [20892.286290] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [20892.287469] [] vfs_fsync+0x1c/0x1e [20892.288412] [] do_fsync+0x34/0x4e [20892.289348] [] SyS_fsync+0x10/0x14 [20892.290255] [] system_call_fastpath+0x12/0x17 [20892.291316] ---[ end trace 597f77e664245373 ]--- [20892.293955] BTRFS: error (device sdg) in write_one_cache_group:3184: errno=-2 No such entry [20892.297390] BTRFS info (device sdg): forced readonly This happens because in btrfs_start_dirty_block_groups() we splice the transaction's list of dirty block groups into a local list and then we keep extracting the first element of the list without holding the cache_write_mutex mutex. This means that before we acquire that mutex the first block group on the list might be removed by a conurrent task running btrfs_remove_block_group(). So make sure we extract the first element (and test the list emptyness) while holding that mutex. Fixes: 1bbc621ef284 ("Btrfs: allow block group cache writeout outside critical section in commit") Signed-off-by: Filipe Manana --- fs/btrfs/extent-tree.c | 44 +++++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5edc7ee..d5f3ef0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3407,17 +3407,14 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, int loops = 0; spin_lock(&cur_trans->dirty_bgs_lock); - if (!list_empty(&cur_trans->dirty_bgs)) { - list_splice_init(&cur_trans->dirty_bgs, &dirty); + if (list_empty(&cur_trans->dirty_bgs)) { + spin_unlock(&cur_trans->dirty_bgs_lock); + return 0; } + list_splice_init(&cur_trans->dirty_bgs, &dirty); spin_unlock(&cur_trans->dirty_bgs_lock); again: - if (list_empty(&dirty)) { - btrfs_free_path(path); - return 0; - } - /* * make sure all the block groups on our dirty list actually * exist @@ -3430,18 +3427,16 @@ again: return -ENOMEM; } + /* + * cache_write_mutex is here only to save us from balance or automatic + * removal of empty block groups deleting this block group while we are + * writing out the cache + */ + mutex_lock(&trans->transaction->cache_write_mutex); while (!list_empty(&dirty)) { cache = list_first_entry(&dirty, struct btrfs_block_group_cache, dirty_list); - - /* - * cache_write_mutex is here only to save us from balance - * deleting this block group while we are writing out the - * cache - */ - mutex_lock(&trans->transaction->cache_write_mutex); - /* * this can happen if something re-dirties a block * group that is already under IO. Just wait for it to @@ -3494,7 +3489,6 @@ again: } if (!ret) ret = write_one_cache_group(trans, root, path, cache); - mutex_unlock(&trans->transaction->cache_write_mutex); /* if its not on the io list, we need to put the block group */ if (should_put) @@ -3502,7 +3496,16 @@ again: if (ret) break; + + /* + * Avoid blocking other tasks for too long. It might even save + * us from writing caches for block groups that are going to be + * removed. + */ + mutex_unlock(&trans->transaction->cache_write_mutex); + mutex_lock(&trans->transaction->cache_write_mutex); } + mutex_unlock(&trans->transaction->cache_write_mutex); /* * go through delayed refs for all the stuff we've just kicked off @@ -3513,8 +3516,15 @@ again: loops++; spin_lock(&cur_trans->dirty_bgs_lock); list_splice_init(&cur_trans->dirty_bgs, &dirty); + /* + * dirty_bgs_lock protects us from concurrent block group + * deletes too (not just cache_write_mutex). + */ + if (!list_empty(&dirty)) { + spin_unlock(&cur_trans->dirty_bgs_lock); + goto again; + } spin_unlock(&cur_trans->dirty_bgs_lock); - goto again; } btrfs_free_path(path);