From patchwork Wed Jan 30 14:50:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788731 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0C4917E9 for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D033A2F421 for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CE9302DFB8; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 69ECD2F422 for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731426AbfA3OvH (ORCPT ); Wed, 30 Jan 2019 09:51:07 -0500 Received: from mx2.suse.de ([195.135.220.15]:38116 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731400AbfA3OvG (ORCPT ); Wed, 30 Jan 2019 09:51:06 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A8358B0B2 for ; Wed, 30 Jan 2019 14:51:05 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim Date: Wed, 30 Jan 2019 16:50:48 +0200 Message-Id: <20190130145102.4708-2-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Up until know trimming the freespace was done irrespective of what the arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments will be entirely ignored. Fix it by correctly handling those paramter. This requires breaking if the found freespace extent is after the end of the passed range as well as completing trim after trimming fstrim_range::len bytes. Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM") Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1fdba38761f7..fc3f6acc3c9b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11190,9 +11190,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, * held back allocations. */ static int btrfs_trim_free_extents(struct btrfs_device *device, - u64 minlen, u64 *trimmed) + struct fstrim_range *range, u64 *trimmed) { - u64 start = 0, len = 0; + u64 start = range->start, len = 0; int ret; *trimmed = 0; @@ -11235,8 +11235,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, if (!trans) up_read(&fs_info->commit_root_sem); - ret = find_free_dev_extent_start(trans, device, minlen, start, - &start, &len); + ret = find_free_dev_extent_start(trans, device, range->minlen, + start, &start, &len); if (trans) { up_read(&fs_info->commit_root_sem); btrfs_put_transaction(trans); @@ -11249,6 +11249,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, break; } + /* If we are out of the passed range break */ + if (start > range->start + range->len - 1) { + mutex_unlock(&fs_info->chunk_mutex); + ret = 0; + break; + } + + start = max(range->start, start); + len = min(range->len, len); + ret = btrfs_issue_discard(device->bdev, start, len, &bytes); mutex_unlock(&fs_info->chunk_mutex); @@ -11258,6 +11268,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, start += len; *trimmed += bytes; + /* We've trimmed enough */ + if (*trimmed >= range->len) + break; + if (fatal_signal_pending(current)) { ret = -ERESTARTSYS; break; @@ -11341,8 +11355,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) mutex_lock(&fs_info->fs_devices->device_list_mutex); devices = &fs_info->fs_devices->devices; list_for_each_entry(device, devices, dev_list) { - ret = btrfs_trim_free_extents(device, range->minlen, - &group_trimmed); + ret = btrfs_trim_free_extents(device, range, &group_trimmed); if (ret) { dev_failed++; dev_ret = ret; From patchwork Wed Jan 30 14:50:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788759 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A76C96C2 for ; Wed, 30 Jan 2019 14:51:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 972122F437 for ; Wed, 30 Jan 2019 14:51:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 959D32F43B; Wed, 30 Jan 2019 14:51:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C1192F434 for ; Wed, 30 Jan 2019 14:51:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731398AbfA3OvG (ORCPT ); Wed, 30 Jan 2019 09:51:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:38094 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731387AbfA3OvF (ORCPT ); Wed, 30 Jan 2019 09:51:05 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C244DB07B for ; Wed, 30 Jan 2019 14:51:04 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 02/15] btrfs: Make WARN_ON in a canonical form Date: Wed, 30 Jan 2019 16:50:49 +0200 Message-Id: <20190130145102.4708-3-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There is no point in using a construct like 'if (!condition) WARN_ON(1)'. Use WARN_ON(!condition) directly. No functional changes. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent-tree.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index fc3f6acc3c9b..79d0bc813a72 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10798,13 +10798,10 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, } spin_lock(&trans->transaction->dirty_bgs_lock); - if (!list_empty(&block_group->dirty_list)) { - WARN_ON(1); - } - if (!list_empty(&block_group->io_list)) { - WARN_ON(1); - } + WARN_ON(!list_empty(&block_group->dirty_list)); + WARN_ON(!list_empty(&block_group->io_list)); spin_unlock(&trans->transaction->dirty_bgs_lock); + btrfs_remove_free_space_cache(block_group); spin_lock(&block_group->space_info->lock); From patchwork Wed Jan 30 14:50:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788729 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34FEC746 for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D8052F3ED for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B7E12F406; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 990E42F421 for ; Wed, 30 Jan 2019 14:51:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731416AbfA3OvG (ORCPT ); Wed, 30 Jan 2019 09:51:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:38100 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731391AbfA3OvG (ORCPT ); Wed, 30 Jan 2019 09:51:06 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1A1E3B08F for ; Wed, 30 Jan 2019 14:51:05 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 03/15] btrfs: Remove EXTENT_FIRST_DELALLOC bit Date: Wed, 30 Jan 2019 16:50:50 +0200 Message-Id: <20190130145102.4708-4-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With the refactoring introduced in 8b62f87bad9c ("Btrfs: reworki outstanding_extents") this flag became unused. Remove it and renumber the following flags accordingly. No functional changes. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent_io.c | 2 -- fs/btrfs/extent_io.h | 15 +++++++-------- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 71db9f8bcdaa..01f596d42f59 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -585,7 +585,6 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, if (delete) bits |= ~EXTENT_CTLBITS; - bits |= EXTENT_FIRST_DELALLOC; if (bits & (EXTENT_IOBITS | EXTENT_BOUNDARY)) clear = 1; @@ -850,7 +849,6 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, btrfs_debug_check_extent_io_range(tree, start, end); - bits |= EXTENT_FIRST_DELALLOC; again: if (!prealloc && gfpflags_allow_blocking(mask)) { /* diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 9673be3f3d1f..08749e0b9c32 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -18,17 +18,16 @@ #define EXTENT_BOUNDARY (1U << 9) #define EXTENT_NODATASUM (1U << 10) #define EXTENT_CLEAR_META_RESV (1U << 11) -#define EXTENT_FIRST_DELALLOC (1U << 12) -#define EXTENT_NEED_WAIT (1U << 13) -#define EXTENT_DAMAGED (1U << 14) -#define EXTENT_NORESERVE (1U << 15) -#define EXTENT_QGROUP_RESERVED (1U << 16) -#define EXTENT_CLEAR_DATA_RESV (1U << 17) -#define EXTENT_DELALLOC_NEW (1U << 18) +#define EXTENT_NEED_WAIT (1U << 12) +#define EXTENT_DAMAGED (1U << 13) +#define EXTENT_NORESERVE (1U << 14) +#define EXTENT_QGROUP_RESERVED (1U << 15) +#define EXTENT_CLEAR_DATA_RESV (1U << 16) +#define EXTENT_DELALLOC_NEW (1U << 17) #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK) #define EXTENT_DO_ACCOUNTING (EXTENT_CLEAR_META_RESV | \ EXTENT_CLEAR_DATA_RESV) -#define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC) +#define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING) /* * flags for bio submission. The high bits indicate the compression From patchwork Wed Jan 30 14:50:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788755 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 59D336C2 for ; Wed, 30 Jan 2019 14:51:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 45CE32F421 for ; Wed, 30 Jan 2019 14:51:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 43CF02F404; Wed, 30 Jan 2019 14:51:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 651F52F434 for ; Wed, 30 Jan 2019 14:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731475AbfA3OvU (ORCPT ); Wed, 30 Jan 2019 09:51:20 -0500 Received: from mx2.suse.de ([195.135.220.15]:38106 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730350AbfA3OvH (ORCPT ); Wed, 30 Jan 2019 09:51:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5A96CB0B0 for ; Wed, 30 Jan 2019 14:51:05 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com Subject: [PATCH 04/15] btrfs: combine device update operations during transaction commit Date: Wed, 30 Jan 2019 16:50:51 +0200 Message-Id: <20190130145102.4708-5-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jeff Mahoney We currently overload the pending_chunks list to handle updating btrfs_device->commit_bytes used. We don't actually care about the extent mapping or even the device mapping for the chunk - we just need the device, and we can end up processing it multiple times. The fs_devices->resized_list does more or less the same thing, but with the disk size. They are called consecutively during commit and have more or less the same purpose. We can combine the two lists into a single list that attaches to the transaction and contains a list of devices that need updating. Since we always add the device to a list when we change bytes_used or disk_total_size, there's no harm in copying both values at once. Signed-off-by: Jeff Mahoney --- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/disk-io.c | 9 +++++ fs/btrfs/transaction.c | 7 ++-- fs/btrfs/transaction.h | 1 + fs/btrfs/volumes.c | 84 ++++++++++++++++++------------------------ fs/btrfs/volumes.h | 13 ++----- 6 files changed, 54 insertions(+), 62 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 13863354ff9d..f335cbf3bf1f 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -662,7 +662,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_device_set_disk_total_bytes(tgt_device, src_device->disk_total_bytes); btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used); - ASSERT(list_empty(&src_device->resized_list)); + ASSERT(list_empty(&src_device->post_commit_list)); tgt_device->commit_total_bytes = src_device->commit_total_bytes; tgt_device->commit_bytes_used = src_device->bytes_used; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 888d72dda794..25dab68070dc 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4498,6 +4498,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, ASSERT(list_empty(&cur_trans->dirty_bgs)); ASSERT(list_empty(&cur_trans->io_bgs)); + while (!list_empty(&cur_trans->dev_update_list)) { + struct btrfs_device *dev; + + dev = list_first_entry(&cur_trans->dev_update_list, + struct btrfs_device, + post_commit_list); + list_del_init(&dev->post_commit_list); + } + btrfs_destroy_delayed_refs(cur_trans, fs_info); cur_trans->state = TRANS_STATE_COMMIT_START; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 84ffb381098d..8d340957f7d6 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -75,6 +75,7 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction) btrfs_put_block_group_trimming(cache); btrfs_put_block_group(cache); } + BUG_ON(!list_empty(&transaction->dev_update_list)); kfree(transaction); } } @@ -263,6 +264,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cur_trans->pending_snapshots); INIT_LIST_HEAD(&cur_trans->pending_chunks); + INIT_LIST_HEAD(&cur_trans->dev_update_list); INIT_LIST_HEAD(&cur_trans->switch_commits); INIT_LIST_HEAD(&cur_trans->dirty_bgs); INIT_LIST_HEAD(&cur_trans->io_bgs); @@ -549,7 +551,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, * and then we deadlock with somebody doing a freeze. * * If we are ATTACH, it means we just want to catch the current - * transaction and commit it, so we needn't do sb_start_intwrite(). + * transaction and commit it, so we needn't do sb_start_intwrite(). */ if (type & __TRANS_FREEZABLE) sb_start_intwrite(fs_info->sb); @@ -2198,8 +2200,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) memcpy(fs_info->super_for_commit, fs_info->super_copy, sizeof(*fs_info->super_copy)); - btrfs_update_commit_device_size(fs_info); - btrfs_update_commit_device_bytes_used(cur_trans); + btrfs_commit_device_sizes(cur_trans); clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags); clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index f1ba78949d1b..e0a04fa4de66 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -52,6 +52,7 @@ struct btrfs_transaction { wait_queue_head_t commit_wait; struct list_head pending_snapshots; struct list_head pending_chunks; + struct list_head dev_update_list; struct list_head switch_commits; struct list_head dirty_bgs; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0572831947ab..4f645e1b5539 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -318,7 +318,6 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, mutex_init(&fs_devs->device_list_mutex); INIT_LIST_HEAD(&fs_devs->devices); - INIT_LIST_HEAD(&fs_devs->resized_devices); INIT_LIST_HEAD(&fs_devs->alloc_list); INIT_LIST_HEAD(&fs_devs->fs_list); if (fsid) @@ -334,6 +333,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, void btrfs_free_device(struct btrfs_device *device) { + BUG_ON(!list_empty(&device->post_commit_list)); rcu_string_free(device->name); bio_put(device->flush_bio); kfree(device); @@ -402,7 +402,7 @@ static struct btrfs_device *__alloc_device(void) INIT_LIST_HEAD(&dev->dev_list); INIT_LIST_HEAD(&dev->dev_alloc_list); - INIT_LIST_HEAD(&dev->resized_list); + INIT_LIST_HEAD(&dev->post_commit_list); spin_lock_init(&dev->io_lock); @@ -2870,9 +2870,9 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans, btrfs_device_set_total_bytes(device, new_size); btrfs_device_set_disk_total_bytes(device, new_size); btrfs_clear_space_info_full(device->fs_info); - if (list_empty(&device->resized_list)) - list_add_tail(&device->resized_list, - &fs_devices->resized_devices); + if (list_empty(&device->post_commit_list)) + list_add_tail(&device->post_commit_list, + &trans->transaction->dev_update_list); mutex_unlock(&fs_info->chunk_mutex); return btrfs_update_device(trans, device); @@ -4861,9 +4861,9 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) } btrfs_device_set_disk_total_bytes(device, new_size); - if (list_empty(&device->resized_list)) - list_add_tail(&device->resized_list, - &fs_info->fs_devices->resized_devices); + if (list_empty(&device->post_commit_list)) + list_add_tail(&device->post_commit_list, + &trans->transaction->dev_update_list); WARN_ON(diff > old_total); btrfs_set_super_total_bytes(super_copy, @@ -5212,9 +5212,14 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (ret) goto error_del_extent; - for (i = 0; i < map->num_stripes; i++) - btrfs_device_set_bytes_used(map->stripes[i].dev, - map->stripes[i].dev->bytes_used + stripe_size); + for (i = 0; i < map->num_stripes; i++) { + struct btrfs_device *dev = map->stripes[i].dev; + + btrfs_device_set_bytes_used(dev, dev->bytes_used + stripe_size); + if (list_empty(&dev->post_commit_list)) + list_add_tail(&dev->post_commit_list, + &trans->transaction->dev_update_list); + } atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space); @@ -7664,51 +7669,34 @@ void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_pat } /* - * Update the size of all devices, which is used for writing out the - * super blocks. + * Update the size and bytes used for each device where it changed. + * This is delayed since we would otherwise get errors while writing + * out the superblocks. + * + * Must be invoked during transaction commit. */ -void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info) +void btrfs_commit_device_sizes(struct btrfs_transaction *trans) { - struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *curr, *next; - if (list_empty(&fs_devices->resized_devices)) - return; - - mutex_lock(&fs_devices->device_list_mutex); - mutex_lock(&fs_info->chunk_mutex); - list_for_each_entry_safe(curr, next, &fs_devices->resized_devices, - resized_list) { - list_del_init(&curr->resized_list); - curr->commit_total_bytes = curr->disk_total_bytes; - } - mutex_unlock(&fs_info->chunk_mutex); - mutex_unlock(&fs_devices->device_list_mutex); -} - -/* Must be invoked during the transaction commit */ -void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans) -{ - struct btrfs_fs_info *fs_info = trans->fs_info; - struct extent_map *em; - struct map_lookup *map; - struct btrfs_device *dev; - int i; + BUG_ON(trans->state != TRANS_STATE_COMMIT_DOING); - if (list_empty(&trans->pending_chunks)) + if (list_empty(&trans->dev_update_list)) return; - /* In order to kick the device replace finish process */ - mutex_lock(&fs_info->chunk_mutex); - list_for_each_entry(em, &trans->pending_chunks, list) { - map = em->map_lookup; - - for (i = 0; i < map->num_stripes; i++) { - dev = map->stripes[i].dev; - dev->commit_bytes_used = dev->bytes_used; - } + /* + * We don't need the device_list_mutex here. This list is owned + * by the transaction and the transaction must complete before + * the device is released. + */ + mutex_lock(&trans->fs_info->chunk_mutex); + list_for_each_entry_safe(curr, next, &trans->dev_update_list, + post_commit_list) { + list_del_init(&curr->post_commit_list); + curr->commit_total_bytes = curr->disk_total_bytes; + curr->commit_bytes_used = curr->bytes_used; } - mutex_unlock(&fs_info->chunk_mutex); + mutex_unlock(&trans->fs_info->chunk_mutex); } void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 656ea8d85770..b61935286e17 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -45,6 +45,7 @@ struct btrfs_pending_bios { struct btrfs_device { struct list_head dev_list; struct list_head dev_alloc_list; + struct list_head post_commit_list; /* chunk mutex */ struct btrfs_fs_devices *fs_devices; struct btrfs_fs_info *fs_info; @@ -102,18 +103,12 @@ struct btrfs_device { * size of the device on the current transaction * * This variant is update when committing the transaction, - * and protected by device_list_mutex + * and protected by chunk mutex */ u64 commit_total_bytes; /* bytes used on the current transaction */ u64 commit_bytes_used; - /* - * used to manage the device which is resized - * - * It is protected by chunk_lock. - */ - struct list_head resized_list; /* for sending down flush barriers */ struct bio *flush_bio; @@ -235,7 +230,6 @@ struct btrfs_fs_devices { struct mutex device_list_mutex; struct list_head devices; - struct list_head resized_devices; /* devices not currently being allocated */ struct list_head alloc_list; @@ -557,8 +551,7 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) const char *get_raid_name(enum btrfs_raid_types type); -void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info); -void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans); +void btrfs_commit_device_sizes(struct btrfs_transaction *trans); struct list_head *btrfs_get_fs_uuids(void); void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info); From patchwork Wed Jan 30 14:50:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788739 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B069F746 for ; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9FA752F3A7 for ; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9DB422F3A8; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C5472F419 for ; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731448AbfA3OvJ (ORCPT ); Wed, 30 Jan 2019 09:51:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:38122 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731409AbfA3OvH (ORCPT ); Wed, 30 Jan 2019 09:51:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AB0C1B0BA for ; Wed, 30 Jan 2019 14:51:05 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 05/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Date: Wed, 30 Jan 2019 16:50:52 +0200 Message-Id: <20190130145102.4708-6-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP During device shrink pinned/pending chunks (i.e those which have been deleted/created respectively, in the current transaction and haven't touched disk) need to be accounted when doing device shrink. Presently this happens after the main relocation loop in btrfs_shrink_device, which could lead to making another go in the body of the function. Since there is no hard requirement to perform pinned/pending chunks handling after the relocation loop, move the code before it. This leads to simplifying the code flow around - i.e no need to use 'goto again'. Signed-off-by: Nikolay Borisov Reviewed-by: Johannes Thumshirn --- fs/btrfs/volumes.c | 54 ++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4f645e1b5539..be91432697ab 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4712,15 +4712,15 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) int slot; int failed = 0; bool retried = false; - bool checked_pending_chunks = false; struct extent_buffer *l; struct btrfs_key key; struct btrfs_super_block *super_copy = fs_info->super_copy; u64 old_total = btrfs_super_total_bytes(super_copy); u64 old_size = btrfs_device_get_total_bytes(device); u64 diff; + u64 start; - new_size = round_down(new_size, fs_info->sectorsize); + start = new_size = round_down(new_size, fs_info->sectorsize); diff = round_down(old_size - new_size, fs_info->sectorsize); if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) @@ -4732,6 +4732,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) path->reada = READA_BACK; + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) + return PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); btrfs_device_set_total_bytes(device, new_size); @@ -4739,7 +4743,21 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) device->fs_devices->total_rw_bytes -= diff; atomic64_sub(diff, &fs_info->free_chunk_space); } - mutex_unlock(&fs_info->chunk_mutex); + + /* + * Once the device's size has been set to the new size, ensure all + * in-memory chunks are synced to disk so that the loop below sees them + * and relocates them accordingly. + */ + if (contains_pending_extent(trans->transaction, device, &start, diff)) { + mutex_unlock(&fs_info->chunk_mutex); + ret = btrfs_commit_transaction(trans); + if (ret) + goto done; + } else { + mutex_unlock(&fs_info->chunk_mutex); + btrfs_end_transaction(trans); + } again: key.objectid = device->devid; @@ -4830,36 +4848,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) } mutex_lock(&fs_info->chunk_mutex); - - /* - * We checked in the above loop all device extents that were already in - * the device tree. However before we have updated the device's - * total_bytes to the new size, we might have had chunk allocations that - * have not complete yet (new block groups attached to transaction - * handles), and therefore their device extents were not yet in the - * device tree and we missed them in the loop above. So if we have any - * pending chunk using a device extent that overlaps the device range - * that we can not use anymore, commit the current transaction and - * repeat the search on the device tree - this way we guarantee we will - * not have chunks using device extents that end beyond 'new_size'. - */ - if (!checked_pending_chunks) { - u64 start = new_size; - u64 len = old_size - new_size; - - if (contains_pending_extent(trans->transaction, device, - &start, len)) { - mutex_unlock(&fs_info->chunk_mutex); - checked_pending_chunks = true; - failed = 0; - retried = false; - ret = btrfs_commit_transaction(trans); - if (ret) - goto done; - goto again; - } - } - btrfs_device_set_disk_total_bytes(device, new_size); if (list_empty(&device->post_commit_list)) list_add_tail(&device->post_commit_list, From patchwork Wed Jan 30 14:50:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788741 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F3C646C2 for ; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E30B92F421 for ; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E16DF2F423; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D3DC2F42B for ; Wed, 30 Jan 2019 14:51:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731437AbfA3OvJ (ORCPT ); Wed, 30 Jan 2019 09:51:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:38124 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731408AbfA3OvH (ORCPT ); Wed, 30 Jan 2019 09:51:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EBBB6B0C4 for ; Wed, 30 Jan 2019 14:51:05 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 06/15] btrfs: Rename and export clear_btree_io_tree Date: Wed, 30 Jan 2019 16:50:53 +0200 Message-Id: <20190130145102.4708-7-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This function is going to be used to clear out the device extent allocation information. Give it a more generic name and export it. This is in preparation to replacing the pending/pinned chunk lists with an extent tree. No functional changes. Signed-off-by: Nikolay Borisov Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent_io.c | 28 ++++++++++++++++++++++++++++ fs/btrfs/extent_io.h | 1 + fs/btrfs/transaction.c | 37 ++++--------------------------------- 3 files changed, 33 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 01f596d42f59..ab1c81a1918e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -210,6 +210,34 @@ void extent_io_tree_init(struct extent_io_tree *tree, tree->private_data = private_data; } +void extent_io_tree_release(struct extent_io_tree *tree) +{ + spin_lock(&tree->lock); + /* + * Do a single barrier for the waitqueue_active check here, the state + * of the waitqueue should not change once clear_btree_io_tree is + * called. + */ + smp_mb(); + while (!RB_EMPTY_ROOT(&tree->state)) { + struct rb_node *node; + struct extent_state *state; + + node = rb_first(&tree->state); + state = rb_entry(node, struct extent_state, rb_node); + rb_erase(&state->rb_node, &tree->state); + RB_CLEAR_NODE(&state->rb_node); + /* + * btree io trees aren't supposed to have tasks waiting for + * changes in the flags of extent states ever. + */ + ASSERT(!waitqueue_active(&state->wq)); + free_extent_state(state); + + cond_resched_lock(&tree->lock); + } + spin_unlock(&tree->lock); +} static struct extent_state *alloc_extent_state(gfp_t mask) { struct extent_state *state; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 08749e0b9c32..d7beb2b3bc7d 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -240,6 +240,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode, int create); void extent_io_tree_init(struct extent_io_tree *tree, void *private_data); +void extent_io_tree_release(struct extent_io_tree *tree); int try_release_extent_mapping(struct page *page, gfp_t mask); int try_release_extent_buffer(struct page *page); int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8d340957f7d6..0f5c8d9b50d7 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -80,35 +80,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction) } } -static void clear_btree_io_tree(struct extent_io_tree *tree) -{ - spin_lock(&tree->lock); - /* - * Do a single barrier for the waitqueue_active check here, the state - * of the waitqueue should not change once clear_btree_io_tree is - * called. - */ - smp_mb(); - while (!RB_EMPTY_ROOT(&tree->state)) { - struct rb_node *node; - struct extent_state *state; - - node = rb_first(&tree->state); - state = rb_entry(node, struct extent_state, rb_node); - rb_erase(&state->rb_node, &tree->state); - RB_CLEAR_NODE(&state->rb_node); - /* - * btree io trees aren't supposed to have tasks waiting for - * changes in the flags of extent states ever. - */ - ASSERT(!waitqueue_active(&state->wq)); - free_extent_state(state); - - cond_resched_lock(&tree->lock); - } - spin_unlock(&tree->lock); -} - static noinline void switch_commit_roots(struct btrfs_transaction *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; @@ -122,7 +93,7 @@ static noinline void switch_commit_roots(struct btrfs_transaction *trans) root->commit_root = btrfs_root_node(root); if (is_fstree(root->root_key.objectid)) btrfs_unpin_free_ino(root); - clear_btree_io_tree(&root->dirty_log_pages); + extent_io_tree_release(&root->dirty_log_pages); } /* We can free old roots now. */ @@ -938,7 +909,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info, * superblock that points to btree nodes/leafs for which * writeback hasn't finished yet (and without errors). * We cleanup any entries left in the io tree when committing - * the transaction (through clear_btree_io_tree()). + * the transaction (through extent_io_tree_release()). */ if (err == -ENOMEM) { err = 0; @@ -983,7 +954,7 @@ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info, * left in the io tree. For a log commit, we don't remove them * after committing the log because the tree can be accessed * concurrently - we do it only at transaction commit time when - * it's safe to do it (through clear_btree_io_tree()). + * it's safe to do it (through extent_io_tree_release()). */ err = clear_extent_bit(dirty_pages, start, end, EXTENT_NEED_WAIT, 0, 0, &cached_state); @@ -1061,7 +1032,7 @@ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans) blk_finish_plug(&plug); ret2 = btrfs_wait_extents(fs_info, dirty_pages); - clear_btree_io_tree(&trans->transaction->dirty_pages); + extent_io_tree_release(&trans->transaction->dirty_pages); if (ret) return ret; From patchwork Wed Jan 30 14:50:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788753 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40515746 for ; Wed, 30 Jan 2019 14:51:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 286402F404 for ; Wed, 30 Jan 2019 14:51:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 260982F43C; Wed, 30 Jan 2019 14:51:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B95A62F43D for ; Wed, 30 Jan 2019 14:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731476AbfA3OvV (ORCPT ); Wed, 30 Jan 2019 09:51:21 -0500 Received: from mx2.suse.de ([195.135.220.15]:38130 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731391AbfA3OvH (ORCPT ); Wed, 30 Jan 2019 09:51:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4303BB06A for ; Wed, 30 Jan 2019 14:51:06 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 07/15] btrfs: Populate ->orig_block_len during read_one_chunk Date: Wed, 30 Jan 2019 16:50:54 +0200 Message-Id: <20190130145102.4708-8-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Chunks read from disk currently don't get their ->orig_block_len member set, in contrast when a new chunk is allocated, the respective extent_map's ->orig_block_len is assigned the size of the stripe of this chunk. Let's apply the same strategy for chunks which are read from disk, not only does this codify the invariant that ->orig_block_len always contains the size of the stripe for a chunk (when the em belongs to the mapping tree). But it's also a preparatory patch for further work around tracking chunk allocation in an extent tree rather than pinned/pending lists. Signed-off-by: Nikolay Borisov --- fs/btrfs/volumes.c | 41 ++++++++++++++++++++++------------------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index be91432697ab..e678cad34bd3 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6801,6 +6801,26 @@ static void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, devid, uuid); } +static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes) +{ + int index = btrfs_bg_flags_to_raid_index(type); + int ncopies = btrfs_raid_array[index].ncopies; + int data_stripes; + + switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case BTRFS_BLOCK_GROUP_RAID5: + data_stripes = num_stripes - 1; + break; + case BTRFS_BLOCK_GROUP_RAID6: + data_stripes = num_stripes - 2; + break; + default: + data_stripes = num_stripes / ncopies; + break; + } + return div_u64(chunk_len, data_stripes); +} + static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key, struct extent_buffer *leaf, struct btrfs_chunk *chunk) @@ -6860,6 +6880,8 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key, map->type = btrfs_chunk_type(leaf, chunk); map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk); map->verified_stripes = 0; + em->orig_block_len = calc_stripe_length(map->type, em->len, + map->num_stripes); for (i = 0; i < num_stripes; i++) { map->stripes[i].physical = btrfs_stripe_offset_nr(leaf, chunk, i); @@ -7717,25 +7739,6 @@ int btrfs_bg_type_to_factor(u64 flags) } -static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes) -{ - int index = btrfs_bg_flags_to_raid_index(type); - int ncopies = btrfs_raid_array[index].ncopies; - int data_stripes; - - switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { - case BTRFS_BLOCK_GROUP_RAID5: - data_stripes = num_stripes - 1; - break; - case BTRFS_BLOCK_GROUP_RAID6: - data_stripes = num_stripes - 2; - break; - default: - data_stripes = num_stripes / ncopies; - break; - } - return div_u64(chunk_len, data_stripes); -} static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, u64 chunk_offset, u64 devid, From patchwork Wed Jan 30 14:50:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788747 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 90F1A6C2 for ; Wed, 30 Jan 2019 14:51:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 801D82F3BD for ; Wed, 30 Jan 2019 14:51:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E9812F43C; Wed, 30 Jan 2019 14:51:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7617B2F421 for ; Wed, 30 Jan 2019 14:51:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731447AbfA3OvJ (ORCPT ); Wed, 30 Jan 2019 09:51:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:38136 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731424AbfA3OvI (ORCPT ); Wed, 30 Jan 2019 09:51:08 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8BD63B079 for ; Wed, 30 Jan 2019 14:51:06 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 08/15] btrfs: Introduce new bits for device allocation tree Date: Wed, 30 Jan 2019 16:50:55 +0200 Message-Id: <20190130145102.4708-9-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Rather than hijacking the existing defines let's just define new bits, with more descriptive names. Instead of using yet more (currently at 18) bits for the new flags, use the fact those flags will be specific to the device allocation tree so define them using existing EXTENT_* flags. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d7beb2b3bc7d..af7e00a3678c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -29,6 +29,10 @@ EXTENT_CLEAR_DATA_RESV) #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING) + +/* Redefined bits above which are used only in the device allocation tree */ +#define CHUNK_ALLOCATED EXTENT_DIRTY + /* * flags for bio submission. The high bits indicate the compression * type for this bio From patchwork Wed Jan 30 14:50:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788751 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B85017E9 for ; Wed, 30 Jan 2019 14:51:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 860082F3B6 for ; Wed, 30 Jan 2019 14:51:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 83A9E2F433; Wed, 30 Jan 2019 14:51:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 503582F3B6 for ; Wed, 30 Jan 2019 14:51:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731468AbfA3OvT (ORCPT ); Wed, 30 Jan 2019 09:51:19 -0500 Received: from mx2.suse.de ([195.135.220.15]:38142 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731427AbfA3OvJ (ORCPT ); Wed, 30 Jan 2019 09:51:09 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E321CB07B for ; Wed, 30 Jan 2019 14:51:06 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 09/15] btrfs: replace pending/pinned chunks lists with io tree Date: Wed, 30 Jan 2019 16:50:56 +0200 Message-Id: <20190130145102.4708-10-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jeff Mahoney The pending chunks list contains chunks that are allocated in the current transaction but haven't been created yet. The pinned chunks list contains chunks that are being released in the current transaction. Both describe chunks that are not reflected on disk as in use but are unavailable just the same. The pending chunks list is anchored by the transaction handle, which means that we need to hold a reference to a transaction when working with the list. We use these lists to ensure that we don't end up discarding chunks that are allocated or released in the current transaction. What we r The way we use them is by iterating over both lists to perform comparisons on the stripes they describe for each device. This is backwards and requires that we keep a transaction handle open while we're trimming. This patchset adds an extent_io_tree to btrfs_device that maintains the allocation state of the device. Extents are set dirty when chunks are first allocated -- when the extent maps are added to the mapping tree. They're cleared when last removed -- when the extent maps are removed from the mapping tree. This matches the lifespan of the pending and pinned chunks list and allows us to do trims on unallocated space safely without pinning the transaction for what may be a lengthy operation. We can also use this io tree to mark which chunks have already been trimmed so we don't repeat the operation. Signed-off-by: Jeff Mahoney Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.h | 6 --- fs/btrfs/disk-io.c | 11 ----- fs/btrfs/extent-tree.c | 28 ----------- fs/btrfs/extent_io.c | 2 +- fs/btrfs/extent_io.h | 6 ++- fs/btrfs/extent_map.c | 36 ++++++++++++++ fs/btrfs/extent_map.h | 1 - fs/btrfs/free-space-cache.c | 4 -- fs/btrfs/transaction.c | 9 ---- fs/btrfs/transaction.h | 1 - fs/btrfs/volumes.c | 96 +++++++++++++------------------------ fs/btrfs/volumes.h | 2 + 12 files changed, 76 insertions(+), 126 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fecc64d8e285..a08791dc5cc2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1148,12 +1148,6 @@ struct btrfs_fs_info { struct mutex unused_bg_unpin_mutex; struct mutex delete_unused_bgs_mutex; - /* - * Chunks that can't be freed yet (under a trim/discard operation) - * and will be latter freed. Protected by fs_info->chunk_mutex. - */ - struct list_head pinned_chunks; - /* Cached block sizes */ u32 nodesize; u32 sectorsize; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 25dab68070dc..8ef7b8f9e4be 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2774,8 +2774,6 @@ int open_ctree(struct super_block *sb, init_waitqueue_head(&fs_info->transaction_blocked_wait); init_waitqueue_head(&fs_info->async_submit_wait); - INIT_LIST_HEAD(&fs_info->pinned_chunks); - /* Usable values until the real ones are cached from the superblock */ fs_info->nodesize = 4096; fs_info->sectorsize = 4096; @@ -4050,15 +4048,6 @@ void close_ctree(struct btrfs_fs_info *fs_info) btrfs_free_stripe_hash_table(fs_info); btrfs_free_ref_cache(fs_info); - - while (!list_empty(&fs_info->pinned_chunks)) { - struct extent_map *em; - - em = list_first_entry(&fs_info->pinned_chunks, - struct extent_map, list); - list_del_init(&em->list); - free_extent_map(em); - } } int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 79d0bc813a72..a5613fe35b90 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10824,10 +10824,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, memcpy(&key, &block_group->key, sizeof(key)); mutex_lock(&fs_info->chunk_mutex); - if (!list_empty(&em->list)) { - /* We're in the transaction->pending_chunks list. */ - free_extent_map(em); - } spin_lock(&block_group->lock); block_group->removed = 1; /* @@ -10854,25 +10850,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, * the transaction commit has completed. */ remove_em = (atomic_read(&block_group->trimming) == 0); - /* - * Make sure a trimmer task always sees the em in the pinned_chunks list - * if it sees block_group->removed == 1 (needs to lock block_group->lock - * before checking block_group->removed). - */ - if (!remove_em) { - /* - * Our em might be in trans->transaction->pending_chunks which - * is protected by fs_info->chunk_mutex ([lock|unlock]_chunks), - * and so is the fs_info->pinned_chunks list. - * - * So at this point we must be holding the chunk_mutex to avoid - * any races with chunk allocation (more specifically at - * volumes.c:contains_pending_extent()), to ensure it always - * sees the em, either in the pending_chunks list or in the - * pinned_chunks list. - */ - list_move_tail(&em->list, &fs_info->pinned_chunks); - } spin_unlock(&block_group->lock); if (remove_em) { @@ -10880,11 +10857,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, em_tree = &fs_info->mapping_tree.map_tree; write_lock(&em_tree->lock); - /* - * The em might be in the pending_chunks list, so make sure the - * chunk mutex is locked, since remove_extent_mapping() will - * delete us from that list. - */ remove_extent_mapping(em_tree, em); write_unlock(&em_tree->lock); /* once for the tree */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ab1c81a1918e..e788d4aa53d1 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -860,7 +860,7 @@ static void cache_state(struct extent_state *state, * [start, end] is inclusive This takes the tree lock. */ -static int __must_check +int __must_check __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, unsigned exclusive_bits, u64 *failed_start, struct extent_state **cached_state, diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index af7e00a3678c..d4227e40c8ee 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -314,7 +314,11 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, u64 *failed_start, struct extent_state **cached_state, gfp_t mask); - +int +__set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, + unsigned bits, unsigned exclusive_bits, + u64 *failed_start, struct extent_state **cached_state, + gfp_t mask, struct extent_changeset *changeset); static inline int set_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits) { diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 928f729c55ba..0820f6fcf3a6 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -4,6 +4,7 @@ #include #include #include "ctree.h" +#include "volumes.h" #include "extent_map.h" #include "compression.h" @@ -336,6 +337,37 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree, else try_merge_map(tree, em); } +static void extent_map_device_set_bits(struct extent_map *em, unsigned bits) +{ + struct map_lookup *map = em->map_lookup; + u64 stripe_size = em->orig_block_len; + int i; + + for (i = 0; i < map->num_stripes; i++) { + struct btrfs_bio_stripe *stripe = &map->stripes[i]; + struct btrfs_device *device = stripe->dev; + + __set_extent_bit(&device->alloc_state, stripe->physical, + stripe->physical + stripe_size - 1, bits, + 0, NULL, NULL, GFP_NOWAIT, NULL); + } +} + +static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits) +{ + struct map_lookup *map = em->map_lookup; + u64 stripe_size = em->orig_block_len; + int i; + + for (i = 0; i < map->num_stripes; i++) { + struct btrfs_bio_stripe *stripe = &map->stripes[i]; + struct btrfs_device *device = stripe->dev; + + __clear_extent_bit(&device->alloc_state, stripe->physical, + stripe->physical + stripe_size - 1, bits, + 0, 0, NULL, GFP_NOWAIT, NULL); + } +} /** * add_extent_mapping - add new extent map to the extent tree @@ -357,6 +389,8 @@ int add_extent_mapping(struct extent_map_tree *tree, goto out; setup_extent_mapping(tree, em, modified); + if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) + extent_map_device_set_bits(em, CHUNK_ALLOCATED); out: return ret; } @@ -438,6 +472,8 @@ void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em) rb_erase_cached(&em->rb_node, &tree->map); if (!test_bit(EXTENT_FLAG_LOGGING, &em->flags)) list_del_init(&em->list); + if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) + extent_map_device_clear_bits(em, CHUNK_ALLOCATED); RB_CLEAR_NODE(&em->rb_node); } diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h index 473f039fcd7c..72b46833f236 100644 --- a/fs/btrfs/extent_map.h +++ b/fs/btrfs/extent_map.h @@ -91,7 +91,6 @@ void replace_extent_mapping(struct extent_map_tree *tree, struct extent_map *cur, struct extent_map *new, int modified); - struct extent_map *alloc_extent_map(void); void free_extent_map(struct extent_map *em); int __init extent_map_init(void); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 74aa552f4793..207fb50dcc7a 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3366,10 +3366,6 @@ void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *block_group) em = lookup_extent_mapping(em_tree, block_group->key.objectid, 1); BUG_ON(!em); /* logic error, can't happen */ - /* - * remove_extent_mapping() will delete us from the pinned_chunks - * list, which is protected by the chunk mutex. - */ remove_extent_mapping(em_tree, em); write_unlock(&em_tree->lock); mutex_unlock(&fs_info->chunk_mutex); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 0f5c8d9b50d7..32af91b0a132 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -50,14 +50,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction) btrfs_err(transaction->fs_info, "pending csums is %llu", transaction->delayed_refs.pending_csums); - while (!list_empty(&transaction->pending_chunks)) { - struct extent_map *em; - - em = list_first_entry(&transaction->pending_chunks, - struct extent_map, list); - list_del_init(&em->list); - free_extent_map(em); - } /* * If any block groups are found in ->deleted_bgs then it's * because the transaction was aborted and a commit did not @@ -234,7 +226,6 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->delayed_refs.lock); INIT_LIST_HEAD(&cur_trans->pending_snapshots); - INIT_LIST_HEAD(&cur_trans->pending_chunks); INIT_LIST_HEAD(&cur_trans->dev_update_list); INIT_LIST_HEAD(&cur_trans->switch_commits); INIT_LIST_HEAD(&cur_trans->dirty_bgs); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index e0a04fa4de66..134d1a0bd92f 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -51,7 +51,6 @@ struct btrfs_transaction { wait_queue_head_t writer_wait; wait_queue_head_t commit_wait; struct list_head pending_snapshots; - struct list_head pending_chunks; struct list_head dev_update_list; struct list_head switch_commits; struct list_head dirty_bgs; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e678cad34bd3..25a3c1bdfc99 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -335,6 +335,8 @@ void btrfs_free_device(struct btrfs_device *device) { BUG_ON(!list_empty(&device->post_commit_list)); rcu_string_free(device->name); + if (!in_softirq()) + extent_io_tree_release(&device->alloc_state); bio_put(device->flush_bio); kfree(device); } @@ -411,6 +413,7 @@ static struct btrfs_device *__alloc_device(void) btrfs_device_data_ordered_init(dev); INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); + extent_io_tree_init(&dev->alloc_state, NULL); return dev; } @@ -1270,6 +1273,9 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state)) fs_devices->missing_devices--; + /* Remove alloc state now since it cannot be done in RCU context */ + extent_io_tree_release(&device->alloc_state); + btrfs_close_bdev(device); new_device = btrfs_alloc_device(NULL, &device->devid, @@ -1495,58 +1501,29 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, return device; } -static int contains_pending_extent(struct btrfs_transaction *transaction, - struct btrfs_device *device, - u64 *start, u64 len) -{ - struct btrfs_fs_info *fs_info = device->fs_info; - struct extent_map *em; - struct list_head *search_list = &fs_info->pinned_chunks; - int ret = 0; - u64 physical_start = *start; - - if (transaction) - search_list = &transaction->pending_chunks; -again: - list_for_each_entry(em, search_list, list) { - struct map_lookup *map; - int i; - - map = em->map_lookup; - for (i = 0; i < map->num_stripes; i++) { - u64 end; - - if (map->stripes[i].dev != device) - continue; - if (map->stripes[i].physical >= physical_start + len || - map->stripes[i].physical + em->orig_block_len <= - physical_start) - continue; - /* - * Make sure that while processing the pinned list we do - * not override our *start with a lower value, because - * we can have pinned chunks that fall within this - * device hole and that have lower physical addresses - * than the pending chunks we processed before. If we - * do not take this special care we can end up getting - * 2 pending chunks that start at the same physical - * device offsets because the end offset of a pinned - * chunk can be equal to the start offset of some - * pending chunk. - */ - end = map->stripes[i].physical + em->orig_block_len; - if (end > *start) { - *start = end; - ret = 1; - } +/* + * Tries to find a chunk that intersects [start, start +len] range and when one + * such is found, records the end of it in *start + */ +#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) +static bool contains_pending_extent(struct btrfs_device *device, u64 *start, + u64 len) +{ + u64 physical_start, physical_end; + lockdep_assert_held(&device->fs_info->chunk_mutex); + + if (!find_first_extent_bit(&device->alloc_state, *start, + &physical_start, &physical_end, + CHUNK_ALLOCATED, NULL)) { + + if (in_range(physical_start, *start, len) || + in_range(*start, physical_start, + physical_end - physical_start)) { + *start = physical_end + 1; + return true; } } - if (search_list != &fs_info->pinned_chunks) { - search_list = &fs_info->pinned_chunks; - goto again; - } - - return ret; + return false; } @@ -1657,15 +1634,12 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, * Have to check before we set max_hole_start, otherwise * we could end up sending back this offset anyway. */ - if (contains_pending_extent(transaction, device, - &search_start, + if (contains_pending_extent(device, &search_start, hole_size)) { - if (key.offset >= search_start) { + if (key.offset >= search_start) hole_size = key.offset - search_start; - } else { - WARN_ON_ONCE(1); + else hole_size = 0; - } } if (hole_size > max_hole_size) { @@ -1706,8 +1680,7 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, if (search_end > search_start) { hole_size = search_end - search_start; - if (contains_pending_extent(transaction, device, &search_start, - hole_size)) { + if (contains_pending_extent(device, &search_start, hole_size)) { btrfs_release_path(path); goto again; } @@ -4749,7 +4722,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) * in-memory chunks are synced to disk so that the loop below sees them * and relocates them accordingly. */ - if (contains_pending_extent(trans->transaction, device, &start, diff)) { + if (contains_pending_extent(device, &start, diff)) { mutex_unlock(&fs_info->chunk_mutex); ret = btrfs_commit_transaction(trans); if (ret) @@ -5191,9 +5164,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, free_extent_map(em); goto error; } - - list_add_tail(&em->list, &trans->transaction->pending_chunks); - refcount_inc(&em->refs); write_unlock(&em_tree->lock); ret = btrfs_make_block_group(trans, 0, type, start, chunk_size); @@ -5226,8 +5196,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, free_extent_map(em); /* One for the tree reference */ free_extent_map(em); - /* One for the pending_chunks list reference */ - free_extent_map(em); error: kfree(devices_info); return ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b61935286e17..8fbeba92025d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -134,6 +134,8 @@ struct btrfs_device { /* Counter to record the change of device stats */ atomic_t dev_stats_ccnt; atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; + + struct extent_io_tree alloc_state; }; /* From patchwork Wed Jan 30 14:50:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788749 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1AE0917E9 for ; Wed, 30 Jan 2019 14:51:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07C192F420 for ; Wed, 30 Jan 2019 14:51:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 065622F422; Wed, 30 Jan 2019 14:51:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75A712F43B for ; Wed, 30 Jan 2019 14:51:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731473AbfA3OvS (ORCPT ); Wed, 30 Jan 2019 09:51:18 -0500 Received: from mx2.suse.de ([195.135.220.15]:38148 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731431AbfA3OvJ (ORCPT ); Wed, 30 Jan 2019 09:51:09 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 38157B08F for ; Wed, 30 Jan 2019 14:51:07 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 10/15] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Date: Wed, 30 Jan 2019 16:50:57 +0200 Message-Id: <20190130145102.4708-11-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now that those function no longer require a handle to transaction to inspect pending/pinned chunks the argument can be removed. At the same time also remove any surrounding code which acquired the handle. Signed-off-by: Nikolay Borisov Reviewed-by: Johannes Thumshirn --- fs/btrfs/extent-tree.c | 36 +++--------------------------------- fs/btrfs/volumes.c | 11 ++++------- fs/btrfs/volumes.h | 8 +++----- 3 files changed, 10 insertions(+), 45 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a5613fe35b90..86e6d19871f9 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9787,12 +9787,10 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache) */ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) { - struct btrfs_root *root = fs_info->extent_root; struct btrfs_block_group_cache *block_group; struct btrfs_space_info *space_info; struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *device; - struct btrfs_trans_handle *trans; u64 min_free; u64 dev_min = 1; u64 dev_nr = 0; @@ -9891,13 +9889,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) min_free = div64_u64(min_free, dev_min); } - /* We need to do this so that we can look at pending chunks */ - trans = btrfs_join_transaction(root); - if (IS_ERR(trans)) { - ret = PTR_ERR(trans); - goto out; - } - mutex_lock(&fs_info->chunk_mutex); list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) { u64 dev_offset; @@ -9908,7 +9899,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) */ if (device->total_bytes > device->bytes_used + min_free && !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) { - ret = find_free_dev_extent(trans, device, min_free, + ret = find_free_dev_extent(device, min_free, &dev_offset, NULL); if (!ret) dev_nr++; @@ -9924,7 +9915,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) "no space to allocate a new chunk for block group %llu", block_group->key.objectid); mutex_unlock(&fs_info->chunk_mutex); - btrfs_end_transaction(trans); out: btrfs_put_block_group(block_group); return ret; @@ -11182,34 +11172,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, while (1) { struct btrfs_fs_info *fs_info = device->fs_info; - struct btrfs_transaction *trans; u64 bytes; ret = mutex_lock_interruptible(&fs_info->chunk_mutex); if (ret) break; - ret = down_read_killable(&fs_info->commit_root_sem); - if (ret) { - mutex_unlock(&fs_info->chunk_mutex); - break; - } - - spin_lock(&fs_info->trans_lock); - trans = fs_info->running_transaction; - if (trans) - refcount_inc(&trans->use_count); - spin_unlock(&fs_info->trans_lock); - - if (!trans) - up_read(&fs_info->commit_root_sem); - - ret = find_free_dev_extent_start(trans, device, range->minlen, - start, &start, &len); - if (trans) { - up_read(&fs_info->commit_root_sem); - btrfs_put_transaction(trans); - } + ret = find_free_dev_extent_start(device, range->minlen, start, + &start, &len); if (ret) { mutex_unlock(&fs_info->chunk_mutex); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 25a3c1bdfc99..cb01d40993dd 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1548,8 +1548,7 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start, * But if we don't find suitable free space, it is used to store the size of * the max free space. */ -int find_free_dev_extent_start(struct btrfs_transaction *transaction, - struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, u64 search_start, u64 *start, u64 *len) { struct btrfs_fs_info *fs_info = device->fs_info; @@ -1705,13 +1704,11 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, return ret; } -int find_free_dev_extent(struct btrfs_trans_handle *trans, - struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *len) { /* FIXME use last free of some kind */ - return find_free_dev_extent_start(trans->transaction, device, - num_bytes, 0, start, len); + return find_free_dev_extent_start(device, num_bytes, 0, start, len); } static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans, @@ -5030,7 +5027,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (total_avail == 0) continue; - ret = find_free_dev_extent(trans, device, + ret = find_free_dev_extent(device, max_stripe_size * dev_stripes, &dev_offset, &max_avail); if (ret && ret != -ENOSPC) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 8fbeba92025d..91f0fc889678 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -444,11 +444,9 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info); int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info); int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info); int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset); -int find_free_dev_extent_start(struct btrfs_transaction *transaction, - struct btrfs_device *device, u64 num_bytes, - u64 search_start, u64 *start, u64 *max_avail); -int find_free_dev_extent(struct btrfs_trans_handle *trans, - struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, + u64 search_start, u64 *start, u64 *max_avail); +int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *max_avail); void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index); int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info, From patchwork Wed Jan 30 14:50:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788733 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4FB16C2 for ; Wed, 30 Jan 2019 14:51:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9269F2F3A7 for ; Wed, 30 Jan 2019 14:51:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9071C2F42D; Wed, 30 Jan 2019 14:51:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24D662F423 for ; Wed, 30 Jan 2019 14:51:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731458AbfA3OvK (ORCPT ); Wed, 30 Jan 2019 09:51:10 -0500 Received: from mx2.suse.de ([195.135.220.15]:38130 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731436AbfA3OvI (ORCPT ); Wed, 30 Jan 2019 09:51:08 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 82C19B0B2 for ; Wed, 30 Jan 2019 14:51:07 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 11/15] btrfs: Factor out in_range macro Date: Wed, 30 Jan 2019 16:50:58 +0200 Message-Id: <20190130145102.4708-12-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is used in more than one places so let's factor it out in ctree.h. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 1 - fs/btrfs/volumes.c | 1 - 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a08791dc5cc2..648c3504ffcc 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3779,6 +3779,8 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info) return signal_pending(current); } +#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) + /* Sanity test specific functions */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS void btrfs_test_inode_set_ops(struct inode *inode); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 86e6d19871f9..610bed028511 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1905,7 +1905,6 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans, return ret; } -#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, u64 *discarded_bytes) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index cb01d40993dd..0657ad510664 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1505,7 +1505,6 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, * Tries to find a chunk that intersects [start, start +len] range and when one * such is found, records the end of it in *start */ -#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) static bool contains_pending_extent(struct btrfs_device *device, u64 *start, u64 len) { From patchwork Wed Jan 30 14:50:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788735 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A8F26C2 for ; Wed, 30 Jan 2019 14:51:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 351552F419 for ; Wed, 30 Jan 2019 14:51:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 32AE82F436; Wed, 30 Jan 2019 14:51:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A931D2F419 for ; Wed, 30 Jan 2019 14:51:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731461AbfA3OvM (ORCPT ); Wed, 30 Jan 2019 09:51:12 -0500 Received: from mx2.suse.de ([195.135.220.15]:38158 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731439AbfA3OvK (ORCPT ); Wed, 30 Jan 2019 09:51:10 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CF16CB06A for ; Wed, 30 Jan 2019 14:51:07 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 12/15] btrfs: Optimize unallocated chunks discard Date: Wed, 30 Jan 2019 16:50:59 +0200 Message-Id: <20190130145102.4708-13-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently unallocated chunks are always trimmed. For example 2 consecutive trims on large storage would trim freespace twice irrespective of whether the space was actually allocated or not between those trims. Optimise this behavior by exploiting the newly introduced alloc_state tree of btrfs_device. A new CHUNK_TRIMMED bit is used to mark those unallocated chunks which have been trimmed and have not been allocated afterwards. On chunk allocation the respective underlying devices' physical space will have its CHUNK_TRIMMED flag cleared. This avoids submitting discards for space which hasn't been changed since the last time discard was issued. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++++++++++++- fs/btrfs/extent_io.h | 8 +++++- fs/btrfs/extent_map.c | 4 ++- 3 files changed, 66 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 610bed028511..f5005ef39f98 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11127,6 +11127,54 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, return unpin_extent_range(fs_info, start, end, false); } +static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len) +{ + u64 trimmed_start = 0, trimmed_end = 0; + u64 end = *start + *len - 1; + + if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start, + &trimmed_end, CHUNK_TRIMMED, NULL)) { + u64 trimmed_len = trimmed_end - trimmed_start + 1; + + if (*start < trimmed_start) { + if (in_range(end, trimmed_start, trimmed_len) || + end > trimmed_end) { + /* + * start|------|end + * ts|--|trimmed_len + * OR + * start|-----|end + * ts|-----|trimmed_len + */ + *len = trimmed_start - *start; + return false; + } else if (end < trimmed_start) { + /* + * start|------|end + * ts|--|trimmed_len + */ + return false; + } + } else if (in_range(*start, trimmed_start, trimmed_len)) { + if (in_range(end, trimmed_start, trimmed_len)) { + /* + * start|------|end + * ts|----------|trimmed_len + */ + return true; + } else { + /* + * start|-----------|end + * ts|----------|trimmed_len + */ + *start = trimmed_end + 1; + *len = end - *start + 1; + return false; + } + } + } + return false; +} /* * It used to be that old block groups would be left around forever. * Iterating over them would be enough to trim unused space. Since we @@ -11197,7 +11245,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, start = max(range->start, start); len = min(range->len, len); - ret = btrfs_issue_discard(device->bdev, start, len, &bytes); + if (!should_skip_trim(device, &start, &len)) { + ret = btrfs_issue_discard(device->bdev, start, len, + &bytes); + if (!ret) + set_extent_bits(&device->alloc_state, start, + start + bytes - 1, + CHUNK_TRIMMED); + } mutex_unlock(&fs_info->chunk_mutex); if (ret) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d4227e40c8ee..d238efd628cf 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -30,8 +30,14 @@ #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING) -/* Redefined bits above which are used only in the device allocation tree */ +/* + * Redefined bits above which are used only in the device allocation tree, + * shouldn't be using EXTENT_IOBITS(EXTENT_LOCKED/EXTENT_WRITEBACK) / + * EXTENT_BOUNDARY / EXTENT_CLEAR_META_RESV / EXTENT_CLEAR_DATA_RESV because + * they have special meaning to the bit manipulation functions + */ #define CHUNK_ALLOCATED EXTENT_DIRTY +#define CHUNK_TRIMMED EXTENT_DEFRAG /* * flags for bio submission. The high bits indicate the compression diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 0820f6fcf3a6..9e8c0904f623 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -389,8 +389,10 @@ int add_extent_mapping(struct extent_map_tree *tree, goto out; setup_extent_mapping(tree, em, modified); - if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) + if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) { extent_map_device_set_bits(em, CHUNK_ALLOCATED); + extent_map_device_clear_bits(em, CHUNK_TRIMMED); + } out: return ret; } From patchwork Wed Jan 30 14:51:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788745 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39FDB746 for ; Wed, 30 Jan 2019 14:51:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 27D892F437 for ; Wed, 30 Jan 2019 14:51:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 258682F420; Wed, 30 Jan 2019 14:51:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEE512F434 for ; Wed, 30 Jan 2019 14:51:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731464AbfA3OvQ (ORCPT ); Wed, 30 Jan 2019 09:51:16 -0500 Received: from mx2.suse.de ([195.135.220.15]:38164 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731400AbfA3OvJ (ORCPT ); Wed, 30 Jan 2019 09:51:09 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2B476B0B0 for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 13/15] btrfs: Fix gross misnaming Date: Wed, 30 Jan 2019 16:51:00 +0200 Message-Id: <20190130145102.4708-14-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The variables and function parameters of __etree_search which pertain to prev/next are grossly misnamed. Namely, prev_ret holds the next state and not the previous. Similarly, next_ret actually holds the previous extent state relating to the offset we are interested in. Fix this by renaming the variables as well as switching the arguments order. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e788d4aa53d1..d5525d18803d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -309,8 +309,8 @@ static struct rb_node *tree_insert(struct rb_root *root, } static struct rb_node *__etree_search(struct extent_io_tree *tree, u64 offset, - struct rb_node **prev_ret, struct rb_node **next_ret, + struct rb_node **prev_ret, struct rb_node ***p_ret, struct rb_node **parent_ret) { @@ -339,23 +339,23 @@ static struct rb_node *__etree_search(struct extent_io_tree *tree, u64 offset, if (parent_ret) *parent_ret = prev; - if (prev_ret) { + if (next_ret) { orig_prev = prev; while (prev && offset > prev_entry->end) { prev = rb_next(prev); prev_entry = rb_entry(prev, struct tree_entry, rb_node); } - *prev_ret = prev; + *next_ret = prev; prev = orig_prev; } - if (next_ret) { + if (prev_ret) { prev_entry = rb_entry(prev, struct tree_entry, rb_node); while (prev && offset < prev_entry->start) { prev = rb_prev(prev); prev_entry = rb_entry(prev, struct tree_entry, rb_node); } - *next_ret = prev; + *prev_ret = prev; } return NULL; } @@ -366,12 +366,12 @@ tree_search_for_insert(struct extent_io_tree *tree, struct rb_node ***p_ret, struct rb_node **parent_ret) { - struct rb_node *prev = NULL; + struct rb_node *next= NULL; struct rb_node *ret; - ret = __etree_search(tree, offset, &prev, NULL, p_ret, parent_ret); + ret = __etree_search(tree, offset, &next, NULL, p_ret, parent_ret); if (!ret) - return prev; + return next; return ret; } From patchwork Wed Jan 30 14:51:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788743 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85481746 for ; Wed, 30 Jan 2019 14:51:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7458D2F419 for ; Wed, 30 Jan 2019 14:51:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 721F02F43B; Wed, 30 Jan 2019 14:51:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 629F32F419 for ; Wed, 30 Jan 2019 14:51:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731462AbfA3OvP (ORCPT ); Wed, 30 Jan 2019 09:51:15 -0500 Received: from mx2.suse.de ([195.135.220.15]:38136 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731423AbfA3OvK (ORCPT ); Wed, 30 Jan 2019 09:51:10 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7A973B079 for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit Date: Wed, 30 Jan 2019 16:51:01 +0200 Message-Id: <20190130145102.4708-15-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This function is very similar to find_first_extent_bit except that it locates the first contiguous span of space which does not have bits set. It's intended use is in the freespace trimming code. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 78 ++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/extent_io.h | 2 ++ 2 files changed, 80 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d5525d18803d..542eaab9c0c1 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1474,6 +1474,84 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, return ret; } +/* find_first_clear_extent_bit - finds the first range that has @bits not set + * and that starts after @start + * + * @tree - the tree to search + * @start - the offset at/after which the found extent should start + * @start_ret - records the beginning of the range + * @end_ret - records the end of the range (inclusive) + * @bits - the set of bits which must be unset + * + * Returns 0 when a range is found and @start_ret/@end_ret are initialised + * accordingly, 1 otherwise. Since unallocated range is also considered one + * which doesn't have the bits set it's possible that @end_ret contains -1, this + * happens in case the range spans (last_range_end, end of device]. In this case + * it's up to the caller to trim @end_ret to the appropriate size. + */ +int find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start, + u64 *start_ret, u64 *end_ret, unsigned bits) +{ + struct extent_state *state; + struct rb_node *node, *prev = NULL, *next; + int ret = 1; + + spin_lock(&tree->lock); + + /* Find first extent with bits cleared */ + while (1) { + node = __etree_search(tree, start, &next, &prev, NULL, NULL); + if (!node) { + node = next; + if (!node) { + /* + * We are past the last allocated chunk, + * set start at the end of the last extent. The + * device alloc tree should never be empty so + * prev is always set. + */ + ASSERT(prev); + state = rb_entry(prev, struct extent_state, rb_node); + *start_ret = state->end + 1; + *end_ret = -1; + ret = 0; + goto out; + } + } + state = rb_entry(node, struct extent_state, rb_node); + if (in_range(start, state->start, state->end - state->start + 1) && + (state->state & bits)) { + start = state->end + 1; + } else { + *start_ret = start; + break; + } + } + + /* + * Find the longest stretch from start until an entry which has the + * bits set + */ + while (1) { + state = rb_entry(node, struct extent_state, rb_node); + if (state->end >= start && !(state->state & bits)) { + *end_ret = state->end; + ret = 0; + } else { + *end_ret = state->start - 1; + ret = 0; + break; + } + + node = rb_next(node); + if (!node) + break; + } +out: + spin_unlock(&tree->lock); + return ret; +} + /* * find a contiguous range of bytes in the file marked as delalloc, not * more than 'max_bytes'. start and end are used to return the range, diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d238efd628cf..7ddb3ec70023 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -391,6 +391,8 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start, int find_first_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, unsigned bits, struct extent_state **cached_state); +int find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start, + u64 *start_ret, u64 *end_ret, unsigned bits); int extent_invalidatepage(struct extent_io_tree *tree, struct page *page, unsigned long offset); int extent_write_full_page(struct page *page, struct writeback_control *wbc); From patchwork Wed Jan 30 14:51:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10788737 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08F32746 for ; Wed, 30 Jan 2019 14:51:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC2E92F430 for ; Wed, 30 Jan 2019 14:51:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EA4012F436; Wed, 30 Jan 2019 14:51:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E1C52F435 for ; Wed, 30 Jan 2019 14:51:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726915AbfA3OvL (ORCPT ); Wed, 30 Jan 2019 09:51:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:38122 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731441AbfA3OvK (ORCPT ); Wed, 30 Jan 2019 09:51:10 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BDC32B0BA for ; Wed, 30 Jan 2019 14:51:08 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: jeffm@suse.com, Nikolay Borisov Subject: [PATCH 15/15] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Date: Wed, 30 Jan 2019 16:51:02 +0200 Message-Id: <20190130145102.4708-16-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190130145102.4708-1-nborisov@suse.com> References: <20190130145102.4708-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Instead of always calling the allocator to search for a free extent, that satisfies the input criteria, switch btrfs_trim_free_extents to using find_first_clear_extent_bit. With this change it's no longer necessary to read the device tree in order to figure out holes in the devices. Now the code always searches in-memory data structure to figure out the space range which contains the requested which should result in speed oups. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 87 ++++++++++++------------------------------ 1 file changed, 24 insertions(+), 63 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f5005ef39f98..7a2b144bdb76 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11127,54 +11127,6 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, return unpin_extent_range(fs_info, start, end, false); } -static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len) -{ - u64 trimmed_start = 0, trimmed_end = 0; - u64 end = *start + *len - 1; - - if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start, - &trimmed_end, CHUNK_TRIMMED, NULL)) { - u64 trimmed_len = trimmed_end - trimmed_start + 1; - - if (*start < trimmed_start) { - if (in_range(end, trimmed_start, trimmed_len) || - end > trimmed_end) { - /* - * start|------|end - * ts|--|trimmed_len - * OR - * start|-----|end - * ts|-----|trimmed_len - */ - *len = trimmed_start - *start; - return false; - } else if (end < trimmed_start) { - /* - * start|------|end - * ts|--|trimmed_len - */ - return false; - } - } else if (in_range(*start, trimmed_start, trimmed_len)) { - if (in_range(end, trimmed_start, trimmed_len)) { - /* - * start|------|end - * ts|----------|trimmed_len - */ - return true; - } else { - /* - * start|-----------|end - * ts|----------|trimmed_len - */ - *start = trimmed_end + 1; - *len = end - *start + 1; - return false; - } - } - } - return false; -} /* * It used to be that old block groups would be left around forever. * Iterating over them would be enough to trim unused space. Since we @@ -11198,7 +11150,7 @@ static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len) static int btrfs_trim_free_extents(struct btrfs_device *device, struct fstrim_range *range, u64 *trimmed) { - u64 start = range->start, len = 0; + u64 start = range->start, len = 0, end = 0; int ret; *trimmed = 0; @@ -11225,34 +11177,43 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, if (ret) break; - ret = find_free_dev_extent_start(device, range->minlen, start, - &start, &len); - + ret = find_first_clear_extent_bit(&device->alloc_state, start, + &start, &end, + CHUNK_TRIMMED | CHUNK_ALLOCATED); if (ret) { mutex_unlock(&fs_info->chunk_mutex); - if (ret == -ENOSPC) - ret = 0; + ret = 0; break; } + /* If find_first_clear_extent_bit find a range that spans the + * end of the device it will set end to -1, in this case it's up + * to the caller to trim the value to the size of the device. + */ + end = min(end, device->total_bytes); + len = end - start + 1; + + /* Keep going until we satisfy minlen or reach end of space */ + if (len < range->minlen) { + mutex_unlock(&fs_info->chunk_mutex); + start += len; + continue; + } /* If we are out of the passed range break */ if (start > range->start + range->len - 1) { mutex_unlock(&fs_info->chunk_mutex); - ret = 0; break; } start = max(range->start, start); len = min(range->len, len); - if (!should_skip_trim(device, &start, &len)) { - ret = btrfs_issue_discard(device->bdev, start, len, - &bytes); - if (!ret) - set_extent_bits(&device->alloc_state, start, - start + bytes - 1, - CHUNK_TRIMMED); - } + ret = btrfs_issue_discard(device->bdev, start, len, + &bytes); + if (!ret) + set_extent_bits(&device->alloc_state, start, + start + bytes - 1, + CHUNK_TRIMMED); mutex_unlock(&fs_info->chunk_mutex); if (ret)