From patchwork Thu Apr 10 03:48:46 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liu Bo X-Patchwork-Id: 3959041 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 71CC1BFF02 for ; Thu, 10 Apr 2014 03:50:33 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6645920640 for ; Thu, 10 Apr 2014 03:50:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 491562034E for ; Thu, 10 Apr 2014 03:50:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965337AbaDJDu1 (ORCPT ); Wed, 9 Apr 2014 23:50:27 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:34500 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965326AbaDJDuP (ORCPT ); Wed, 9 Apr 2014 23:50:15 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s3A3o1HF030391 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 10 Apr 2014 03:50:02 GMT Received: from aserz7022.oracle.com (aserz7022.oracle.com [141.146.126.231]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s3A3o0Tf029498 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 10 Apr 2014 03:50:01 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserz7022.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s3A3o0sf029467; Thu, 10 Apr 2014 03:50:00 GMT Received: from localhost.localdomain.com (/10.182.228.124) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 09 Apr 2014 20:50:00 -0700 From: Liu Bo To: linux-btrfs@vger.kernel.org Cc: Marcel Ritter , Christian Robert , , Konstantinos Skarlatos , David Sterba , Martin Steigerwald , Josef Bacik , Chris Mason Subject: [PATCH v10 16/16] Btrfs: fix dedup enospc problem Date: Thu, 10 Apr 2014 11:48:46 +0800 Message-Id: <1397101727-20806-17-git-send-email-bo.li.liu@oracle.com> X-Mailer: git-send-email 1.8.1.4 In-Reply-To: <1397101727-20806-1-git-send-email-bo.li.liu@oracle.com> References: <1397101727-20806-1-git-send-email-bo.li.liu@oracle.com> X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In the case of dedupe, btrfs will produce large number of delayed refs, and processing them can very likely eat all of the space reserved in global_block_rsv, and we'll end up with transaction abortion due to ENOSPC. I tried several different ways to reserve more space for global_block_rsv to hope it's enough for flushing delayed refs, but I failed and code could become very messy. I found that with high delayed refs pressure, the throttle work in the end_transaction had little use since it didn't block new delayed refs' insertion, so I put throttle stuff into the very start stage, i.e. start_transaction. We take the worst case into account in the throttle code, that is, every delayed_refs would update btree, so when we reach the limit that it may use up all the reserved space of global_block_rsv, we kick transaction_kthread to commit transaction to process these delayed refs, refresh global_block_rsv's space, and get pinned space back as well. That way we get rid of annoy ENOSPC problem. However, this leads to a new problem that it cannot use along with option "flushoncommit", otherwise it can cause ABBA deadlock between commit_transaction between ordered extents flush. Signed-off-by: Liu Bo --- fs/btrfs/extent-tree.c | 50 ++++++++++++++++++++++++++++++++++++++----------- fs/btrfs/ordered-data.c | 6 ++++++ fs/btrfs/transaction.c | 41 ++++++++++++++++++++++++++++++++++++++++ fs/btrfs/transaction.h | 1 + 4 files changed, 87 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6f8b012..ec6f42d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2695,24 +2695,52 @@ static inline u64 heads_to_leaves(struct btrfs_root *root, u64 heads) int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans, struct btrfs_root *root) { + struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_block_rsv *global_rsv; - u64 num_heads = trans->transaction->delayed_refs.num_heads_ready; + u64 num_heads; + u64 num_entries; u64 num_bytes; int ret = 0; - num_bytes = btrfs_calc_trans_metadata_size(root, 1); - num_heads = heads_to_leaves(root, num_heads); - if (num_heads > 1) - num_bytes += (num_heads - 1) * root->leafsize; - num_bytes <<= 1; global_rsv = &root->fs_info->global_block_rsv; - /* - * If we can't allocate any more chunks lets make sure we have _lots_ of - * wiggle room since running delayed refs can create more delayed refs. - */ - if (global_rsv->space_info->full) + if (trans) { + num_heads = trans->transaction->delayed_refs.num_heads_ready; + num_bytes = btrfs_calc_trans_metadata_size(root, 1); + num_heads = heads_to_leaves(root, num_heads); + if (num_heads > 1) + num_bytes += (num_heads - 1) * root->leafsize; num_bytes <<= 1; + /* + * If we can't allocate any more chunks lets make sure we have + * _lots_ of wiggle room since running delayed refs can create + * more delayed refs. + */ + if (global_rsv->space_info->full) + num_bytes <<= 1; + } else { + if (root->fs_info->dedup_bs == 0) + return 0; + + /* dedup enabled */ + spin_lock(&root->fs_info->trans_lock); + if (!root->fs_info->running_transaction) { + spin_unlock(&root->fs_info->trans_lock); + return 0; + } + + delayed_refs = + &root->fs_info->running_transaction->delayed_refs; + + num_entries = atomic_read(&delayed_refs->num_entries); + num_heads = delayed_refs->num_heads; + + spin_unlock(&root->fs_info->trans_lock); + + /* The worst case */ + num_bytes = (num_entries - num_heads) * + btrfs_calc_trans_metadata_size(root, 1); + } spin_lock(&global_rsv->lock); if (global_rsv->reserved <= num_bytes) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index c520e13..72c0caa 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -747,6 +747,12 @@ int btrfs_run_ordered_operations(struct btrfs_trans_handle *trans, &cur_trans->ordered_operations); spin_unlock(&root->fs_info->ordered_root_lock); + if (cur_trans->blocked) { + cur_trans->blocked = 0; + if (waitqueue_active(&cur_trans->commit_wait)) + wake_up(&cur_trans->commit_wait); + } + work = btrfs_alloc_delalloc_work(inode, wait, 1); if (!work) { spin_lock(&root->fs_info->ordered_root_lock); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index a04707f..9937eb2 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -215,6 +215,7 @@ loop: cur_trans->transid = fs_info->generation; fs_info->running_transaction = cur_trans; cur_trans->aborted = 0; + cur_trans->blocked = 1; spin_unlock(&fs_info->trans_lock); return 0; @@ -329,6 +330,27 @@ static void wait_current_trans(struct btrfs_root *root) wait_event(root->fs_info->transaction_wait, cur_trans->state >= TRANS_STATE_UNBLOCKED || cur_trans->aborted); + + btrfs_put_transaction(cur_trans); + } else { + spin_unlock(&root->fs_info->trans_lock); + } +} + +static noinline void wait_current_trans_for_commit(struct btrfs_root *root) +{ + struct btrfs_transaction *cur_trans; + + spin_lock(&root->fs_info->trans_lock); + cur_trans = root->fs_info->running_transaction; + if (cur_trans && is_transaction_blocked(cur_trans)) { + atomic_inc(&cur_trans->use_count); + spin_unlock(&root->fs_info->trans_lock); + + wait_event(cur_trans->commit_wait, + cur_trans->state >= TRANS_STATE_COMPLETED || + cur_trans->aborted || cur_trans->blocked == 0); + btrfs_put_transaction(cur_trans); } else { spin_unlock(&root->fs_info->trans_lock); @@ -436,6 +458,25 @@ again: if (may_wait_transaction(root, type)) wait_current_trans(root); + /* + * In the case of dedupe, we need to throttle delayed refs at the + * very start stage, otherwise we'd run into ENOSPC because more + * delayed refs are added while processing delayed refs. + */ + if (root->fs_info->dedup_bs > 0 && type == TRANS_JOIN && + btrfs_check_space_for_delayed_refs(NULL, root)) { + struct btrfs_transaction *cur_trans; + + spin_lock(&root->fs_info->trans_lock); + cur_trans = root->fs_info->running_transaction; + if (cur_trans && cur_trans->state == TRANS_STATE_RUNNING) + cur_trans->state = TRANS_STATE_BLOCKED; + spin_unlock(&root->fs_info->trans_lock); + + wake_up_process(root->fs_info->transaction_kthread); + wait_current_trans_for_commit(root); + } + do { ret = join_transaction(root, type); if (ret == -EBUSY) { diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 6ac037e..ac58d43 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -59,6 +59,7 @@ struct btrfs_transaction { struct list_head pending_chunks; struct btrfs_delayed_ref_root delayed_refs; int aborted; + int blocked; }; #define __TRANS_FREEZABLE (1U << 0)