From patchwork Mon Feb 11 08:34:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D6826C2 for ; Mon, 11 Feb 2019 08:35:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3995429A21 for ; Mon, 11 Feb 2019 08:35:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 28C6C29BED; Mon, 11 Feb 2019 08:35:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5AC929A21 for ; Mon, 11 Feb 2019 08:35:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726989AbfBKIfP (ORCPT ); Mon, 11 Feb 2019 03:35:15 -0500 Received: from mx2.suse.de ([195.135.220.15]:41488 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726852AbfBKIfP (ORCPT ); Mon, 11 Feb 2019 03:35:15 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2F420ACB8 for ; Mon, 11 Feb 2019 08:35:14 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 01/12] btrfs: Honour FITRIM range constraints during free space trim Date: Mon, 11 Feb 2019 10:34:59 +0200 Message-Id: <20190211083510.27591-2-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Up until know trimming the freespace was done irrespective of what the arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments will be entirely ignored. Fix it by correctly handling those paramter. This requires breaking if the found freespace extent is after the end of the passed range as well as completing trim after trimming fstrim_range::len bytes. Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM") Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 9f012c2facbe..33ecd4128898 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11258,9 +11258,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, * held back allocations. */ static int btrfs_trim_free_extents(struct btrfs_device *device, - u64 minlen, u64 *trimmed) + struct fstrim_range *range, u64 *trimmed) { - u64 start = 0, len = 0; + u64 start = range->start, len = 0; int ret; *trimmed = 0; @@ -11303,8 +11303,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, if (!trans) up_read(&fs_info->commit_root_sem); - ret = find_free_dev_extent_start(trans, device, minlen, start, - &start, &len); + ret = find_free_dev_extent_start(trans, device, range->minlen, + start, &start, &len); if (trans) { up_read(&fs_info->commit_root_sem); btrfs_put_transaction(trans); @@ -11317,6 +11317,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, break; } + /* If we are out of the passed range break */ + if (start > range->start + range->len - 1) { + mutex_unlock(&fs_info->chunk_mutex); + ret = 0; + break; + } + + start = max(range->start, start); + len = min(range->len, len); + ret = btrfs_issue_discard(device->bdev, start, len, &bytes); mutex_unlock(&fs_info->chunk_mutex); @@ -11326,6 +11336,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, start += len; *trimmed += bytes; + /* We've trimmed enough */ + if (*trimmed >= range->len) + break; + if (fatal_signal_pending(current)) { ret = -ERESTARTSYS; break; @@ -11409,8 +11423,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) mutex_lock(&fs_info->fs_devices->device_list_mutex); devices = &fs_info->fs_devices->devices; list_for_each_entry(device, devices, dev_list) { - ret = btrfs_trim_free_extents(device, range->minlen, - &group_trimmed); + ret = btrfs_trim_free_extents(device, range, &group_trimmed); if (ret) { dev_failed++; dev_ret = ret; From patchwork Mon Feb 11 08:35:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805251 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B8A86C2 for ; Mon, 11 Feb 2019 08:35:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A52D29A21 for ; Mon, 11 Feb 2019 08:35:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E88429BE9; Mon, 11 Feb 2019 08:35:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A7FE029A21 for ; Mon, 11 Feb 2019 08:35:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727132AbfBKIfS (ORCPT ); Mon, 11 Feb 2019 03:35:18 -0500 Received: from mx2.suse.de ([195.135.220.15]:41494 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726860AbfBKIfQ (ORCPT ); Mon, 11 Feb 2019 03:35:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8240AACEA for ; Mon, 11 Feb 2019 08:35:14 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Jeff Mahoney , Nikolay Borisov Subject: [PATCH v2 02/12] btrfs: combine device update operations during transaction commit Date: Mon, 11 Feb 2019 10:35:00 +0200 Message-Id: <20190211083510.27591-3-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jeff Mahoney We currently overload the pending_chunks list to handle updating btrfs_device->commit_bytes used. We don't actually care about the extent mapping or even the device mapping for the chunk - we just need the device, and we can end up processing it multiple times. The fs_devices->resized_list does more or less the same thing, but with the disk size. They are called consecutively during commit and have more or less the same purpose. We can combine the two lists into a single list that attaches to the transaction and contains a list of devices that need updating. Since we always add the device to a list when we change bytes_used or disk_total_size, there's no harm in copying both values at once. Signed-off-by: Jeff Mahoney Signed-off-by: Nikolay Borisov --- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/disk-io.c | 7 ++++ fs/btrfs/transaction.c | 7 ++-- fs/btrfs/transaction.h | 1 + fs/btrfs/volumes.c | 84 ++++++++++++++++++------------------------ fs/btrfs/volumes.h | 13 ++----- 6 files changed, 52 insertions(+), 62 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 13863354ff9d..f335cbf3bf1f 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -662,7 +662,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_device_set_disk_total_bytes(tgt_device, src_device->disk_total_bytes); btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used); - ASSERT(list_empty(&src_device->resized_list)); + ASSERT(list_empty(&src_device->post_commit_list)); tgt_device->commit_total_bytes = src_device->commit_total_bytes; tgt_device->commit_bytes_used = src_device->bytes_used; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8c0038de73ee..f1c42d242d48 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4483,10 +4483,17 @@ void btrfs_cleanup_dirty_bgs(struct btrfs_transaction *cur_trans, void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, struct btrfs_fs_info *fs_info) { + struct btrfs_device *dev, *tmp; + btrfs_cleanup_dirty_bgs(cur_trans, fs_info); ASSERT(list_empty(&cur_trans->dirty_bgs)); ASSERT(list_empty(&cur_trans->io_bgs)); + list_for_each_entry_safe(dev, tmp, &cur_trans->dev_update_list, + post_commit_list) { + list_del_init(&dev->post_commit_list); + } + btrfs_destroy_delayed_refs(cur_trans, fs_info); cur_trans->state = TRANS_STATE_COMMIT_START; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index acdad6d658f5..d12c9c17d9e9 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -75,6 +75,7 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction) btrfs_put_block_group_trimming(cache); btrfs_put_block_group(cache); } + BUG_ON(!list_empty(&transaction->dev_update_list)); kfree(transaction); } } @@ -264,6 +265,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cur_trans->pending_snapshots); INIT_LIST_HEAD(&cur_trans->pending_chunks); + INIT_LIST_HEAD(&cur_trans->dev_update_list); INIT_LIST_HEAD(&cur_trans->switch_commits); INIT_LIST_HEAD(&cur_trans->dirty_bgs); INIT_LIST_HEAD(&cur_trans->io_bgs); @@ -550,7 +552,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, * and then we deadlock with somebody doing a freeze. * * If we are ATTACH, it means we just want to catch the current - * transaction and commit it, so we needn't do sb_start_intwrite(). + * transaction and commit it, so we needn't do sb_start_intwrite(). */ if (type & __TRANS_FREEZABLE) sb_start_intwrite(fs_info->sb); @@ -2204,8 +2206,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) memcpy(fs_info->super_for_commit, fs_info->super_copy, sizeof(*fs_info->super_copy)); - btrfs_update_commit_device_size(fs_info); - btrfs_update_commit_device_bytes_used(cur_trans); + btrfs_commit_device_sizes(cur_trans); clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags); clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index f1ba78949d1b..e0a04fa4de66 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -52,6 +52,7 @@ struct btrfs_transaction { wait_queue_head_t commit_wait; struct list_head pending_snapshots; struct list_head pending_chunks; + struct list_head dev_update_list; struct list_head switch_commits; struct list_head dirty_bgs; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 03f223aa7194..2d763f944298 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -318,7 +318,6 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, mutex_init(&fs_devs->device_list_mutex); INIT_LIST_HEAD(&fs_devs->devices); - INIT_LIST_HEAD(&fs_devs->resized_devices); INIT_LIST_HEAD(&fs_devs->alloc_list); INIT_LIST_HEAD(&fs_devs->fs_list); if (fsid) @@ -334,6 +333,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, void btrfs_free_device(struct btrfs_device *device) { + BUG_ON(!list_empty(&device->post_commit_list)); rcu_string_free(device->name); bio_put(device->flush_bio); kfree(device); @@ -402,7 +402,7 @@ static struct btrfs_device *__alloc_device(void) INIT_LIST_HEAD(&dev->dev_list); INIT_LIST_HEAD(&dev->dev_alloc_list); - INIT_LIST_HEAD(&dev->resized_list); + INIT_LIST_HEAD(&dev->post_commit_list); spin_lock_init(&dev->io_lock); @@ -2880,9 +2880,9 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans, btrfs_device_set_total_bytes(device, new_size); btrfs_device_set_disk_total_bytes(device, new_size); btrfs_clear_space_info_full(device->fs_info); - if (list_empty(&device->resized_list)) - list_add_tail(&device->resized_list, - &fs_devices->resized_devices); + if (list_empty(&device->post_commit_list)) + list_add_tail(&device->post_commit_list, + &trans->transaction->dev_update_list); mutex_unlock(&fs_info->chunk_mutex); return btrfs_update_device(trans, device); @@ -4871,9 +4871,9 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) } btrfs_device_set_disk_total_bytes(device, new_size); - if (list_empty(&device->resized_list)) - list_add_tail(&device->resized_list, - &fs_info->fs_devices->resized_devices); + if (list_empty(&device->post_commit_list)) + list_add_tail(&device->post_commit_list, + &trans->transaction->dev_update_list); WARN_ON(diff > old_total); btrfs_set_super_total_bytes(super_copy, @@ -5222,9 +5222,14 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (ret) goto error_del_extent; - for (i = 0; i < map->num_stripes; i++) - btrfs_device_set_bytes_used(map->stripes[i].dev, - map->stripes[i].dev->bytes_used + stripe_size); + for (i = 0; i < map->num_stripes; i++) { + struct btrfs_device *dev = map->stripes[i].dev; + + btrfs_device_set_bytes_used(dev, dev->bytes_used + stripe_size); + if (list_empty(&dev->post_commit_list)) + list_add_tail(&dev->post_commit_list, + &trans->transaction->dev_update_list); + } atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space); @@ -7674,51 +7679,34 @@ void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_pat } /* - * Update the size of all devices, which is used for writing out the - * super blocks. + * Update the size and bytes used for each device where it changed. + * This is delayed since we would otherwise get errors while writing + * out the superblocks. + * + * Must be invoked during transaction commit. */ -void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info) +void btrfs_commit_device_sizes(struct btrfs_transaction *trans) { - struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *curr, *next; - if (list_empty(&fs_devices->resized_devices)) - return; - - mutex_lock(&fs_devices->device_list_mutex); - mutex_lock(&fs_info->chunk_mutex); - list_for_each_entry_safe(curr, next, &fs_devices->resized_devices, - resized_list) { - list_del_init(&curr->resized_list); - curr->commit_total_bytes = curr->disk_total_bytes; - } - mutex_unlock(&fs_info->chunk_mutex); - mutex_unlock(&fs_devices->device_list_mutex); -} - -/* Must be invoked during the transaction commit */ -void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans) -{ - struct btrfs_fs_info *fs_info = trans->fs_info; - struct extent_map *em; - struct map_lookup *map; - struct btrfs_device *dev; - int i; + BUG_ON(trans->state != TRANS_STATE_COMMIT_DOING); - if (list_empty(&trans->pending_chunks)) + if (list_empty(&trans->dev_update_list)) return; - /* In order to kick the device replace finish process */ - mutex_lock(&fs_info->chunk_mutex); - list_for_each_entry(em, &trans->pending_chunks, list) { - map = em->map_lookup; - - for (i = 0; i < map->num_stripes; i++) { - dev = map->stripes[i].dev; - dev->commit_bytes_used = dev->bytes_used; - } + /* + * We don't need the device_list_mutex here. This list is owned + * by the transaction and the transaction must complete before + * the device is released. + */ + mutex_lock(&trans->fs_info->chunk_mutex); + list_for_each_entry_safe(curr, next, &trans->dev_update_list, + post_commit_list) { + list_del_init(&curr->post_commit_list); + curr->commit_total_bytes = curr->disk_total_bytes; + curr->commit_bytes_used = curr->bytes_used; } - mutex_unlock(&fs_info->chunk_mutex); + mutex_unlock(&trans->fs_info->chunk_mutex); } void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 3ad9d58d1b66..a0f09aad3770 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -45,6 +45,7 @@ struct btrfs_pending_bios { struct btrfs_device { struct list_head dev_list; struct list_head dev_alloc_list; + struct list_head post_commit_list; /* chunk mutex */ struct btrfs_fs_devices *fs_devices; struct btrfs_fs_info *fs_info; @@ -102,18 +103,12 @@ struct btrfs_device { * size of the device on the current transaction * * This variant is update when committing the transaction, - * and protected by device_list_mutex + * and protected by chunk mutex */ u64 commit_total_bytes; /* bytes used on the current transaction */ u64 commit_bytes_used; - /* - * used to manage the device which is resized - * - * It is protected by chunk_lock. - */ - struct list_head resized_list; /* for sending down flush barriers */ struct bio *flush_bio; @@ -235,7 +230,6 @@ struct btrfs_fs_devices { struct mutex device_list_mutex; struct list_head devices; - struct list_head resized_devices; /* devices not currently being allocated */ struct list_head alloc_list; @@ -558,8 +552,7 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) const char *get_raid_name(enum btrfs_raid_types type); -void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info); -void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans); +void btrfs_commit_device_sizes(struct btrfs_transaction *trans); struct list_head *btrfs_get_fs_uuids(void); void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info); From patchwork Mon Feb 11 08:35:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805271 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D4286C2 for ; Mon, 11 Feb 2019 08:35:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 29C8929A21 for ; Mon, 11 Feb 2019 08:35:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DEC029BE9; Mon, 11 Feb 2019 08:35:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A825D29A21 for ; Mon, 11 Feb 2019 08:35:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727124AbfBKIfR (ORCPT ); Mon, 11 Feb 2019 03:35:17 -0500 Received: from mx2.suse.de ([195.135.220.15]:41500 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726955AbfBKIfQ (ORCPT ); Mon, 11 Feb 2019 03:35:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C3DB6AD60 for ; Mon, 11 Feb 2019 08:35:14 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 03/12] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Date: Mon, 11 Feb 2019 10:35:01 +0200 Message-Id: <20190211083510.27591-4-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP During device shrink pinned/pending chunks (i.e those which have been deleted/created respectively, in the current transaction and haven't touched disk) need to be accounted when doing device shrink. Presently this happens after the main relocation loop in btrfs_shrink_device, which could lead to making another go in the body of the function. Since there is no hard requirement to perform pinned/pending chunks handling after the relocation loop, move the code before it. This leads to simplifying the code flow around - i.e no need to use 'goto again'. Signed-off-by: Nikolay Borisov --- fs/btrfs/volumes.c | 54 ++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2d763f944298..f0d91b1fda1c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4722,15 +4722,15 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) int slot; int failed = 0; bool retried = false; - bool checked_pending_chunks = false; struct extent_buffer *l; struct btrfs_key key; struct btrfs_super_block *super_copy = fs_info->super_copy; u64 old_total = btrfs_super_total_bytes(super_copy); u64 old_size = btrfs_device_get_total_bytes(device); u64 diff; + u64 start; - new_size = round_down(new_size, fs_info->sectorsize); + start = new_size = round_down(new_size, fs_info->sectorsize); diff = round_down(old_size - new_size, fs_info->sectorsize); if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) @@ -4742,6 +4742,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) path->reada = READA_BACK; + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) + return PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); btrfs_device_set_total_bytes(device, new_size); @@ -4749,7 +4753,21 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) device->fs_devices->total_rw_bytes -= diff; atomic64_sub(diff, &fs_info->free_chunk_space); } - mutex_unlock(&fs_info->chunk_mutex); + + /* + * Once the device's size has been set to the new size, ensure all + * in-memory chunks are synced to disk so that the loop below sees them + * and relocates them accordingly. + */ + if (contains_pending_extent(trans->transaction, device, &start, diff)) { + mutex_unlock(&fs_info->chunk_mutex); + ret = btrfs_commit_transaction(trans); + if (ret) + goto done; + } else { + mutex_unlock(&fs_info->chunk_mutex); + btrfs_end_transaction(trans); + } again: key.objectid = device->devid; @@ -4840,36 +4858,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) } mutex_lock(&fs_info->chunk_mutex); - - /* - * We checked in the above loop all device extents that were already in - * the device tree. However before we have updated the device's - * total_bytes to the new size, we might have had chunk allocations that - * have not complete yet (new block groups attached to transaction - * handles), and therefore their device extents were not yet in the - * device tree and we missed them in the loop above. So if we have any - * pending chunk using a device extent that overlaps the device range - * that we can not use anymore, commit the current transaction and - * repeat the search on the device tree - this way we guarantee we will - * not have chunks using device extents that end beyond 'new_size'. - */ - if (!checked_pending_chunks) { - u64 start = new_size; - u64 len = old_size - new_size; - - if (contains_pending_extent(trans->transaction, device, - &start, len)) { - mutex_unlock(&fs_info->chunk_mutex); - checked_pending_chunks = true; - failed = 0; - retried = false; - ret = btrfs_commit_transaction(trans); - if (ret) - goto done; - goto again; - } - } - btrfs_device_set_disk_total_bytes(device, new_size); if (list_empty(&device->post_commit_list)) list_add_tail(&device->post_commit_list, From patchwork Mon Feb 11 08:35:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805257 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2AB946C2 for ; Mon, 11 Feb 2019 08:35:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 183CD29A21 for ; Mon, 11 Feb 2019 08:35:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0130429BE9; Mon, 11 Feb 2019 08:35:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7EDDD29A21 for ; Mon, 11 Feb 2019 08:35:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727135AbfBKIfS (ORCPT ); Mon, 11 Feb 2019 03:35:18 -0500 Received: from mx2.suse.de ([195.135.220.15]:41506 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726852AbfBKIfQ (ORCPT ); Mon, 11 Feb 2019 03:35:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 12239AD7B for ; Mon, 11 Feb 2019 08:35:15 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree Date: Mon, 11 Feb 2019 10:35:02 +0200 Message-Id: <20190211083510.27591-5-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This function is going to be used to clear out the device extent allocation information. Give it a more generic name and export it. This is in preparation to replacing the pending/pinned chunk lists with an extent tree. No functional changes. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba --- fs/btrfs/extent_io.c | 28 ++++++++++++++++++++++++++++ fs/btrfs/extent_io.h | 1 + fs/btrfs/transaction.c | 37 ++++--------------------------------- 3 files changed, 33 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a9e74bd6c434..ae1049824739 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -241,6 +241,34 @@ void extent_io_tree_init(struct extent_io_tree *tree, tree->private_data = private_data; } +void extent_io_tree_release(struct extent_io_tree *tree) +{ + spin_lock(&tree->lock); + /* + * Do a single barrier for the waitqueue_active check here, the state + * of the waitqueue should not change once clear_btree_io_tree is + * called. + */ + smp_mb(); + while (!RB_EMPTY_ROOT(&tree->state)) { + struct rb_node *node; + struct extent_state *state; + + node = rb_first(&tree->state); + state = rb_entry(node, struct extent_state, rb_node); + rb_erase(&state->rb_node, &tree->state); + RB_CLEAR_NODE(&state->rb_node); + /* + * btree io trees aren't supposed to have tasks waiting for + * changes in the flags of extent states ever. + */ + ASSERT(!waitqueue_active(&state->wq)); + free_extent_state(state); + + cond_resched_lock(&tree->lock); + } + spin_unlock(&tree->lock); +} static struct extent_state *alloc_extent_state(gfp_t mask) { struct extent_state *state; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 08749e0b9c32..d7beb2b3bc7d 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -240,6 +240,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode, int create); void extent_io_tree_init(struct extent_io_tree *tree, void *private_data); +void extent_io_tree_release(struct extent_io_tree *tree); int try_release_extent_mapping(struct page *page, gfp_t mask); int try_release_extent_buffer(struct page *page); int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index d12c9c17d9e9..07f5477abd0a 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -80,35 +80,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction) } } -static void clear_btree_io_tree(struct extent_io_tree *tree) -{ - spin_lock(&tree->lock); - /* - * Do a single barrier for the waitqueue_active check here, the state - * of the waitqueue should not change once clear_btree_io_tree is - * called. - */ - smp_mb(); - while (!RB_EMPTY_ROOT(&tree->state)) { - struct rb_node *node; - struct extent_state *state; - - node = rb_first(&tree->state); - state = rb_entry(node, struct extent_state, rb_node); - rb_erase(&state->rb_node, &tree->state); - RB_CLEAR_NODE(&state->rb_node); - /* - * btree io trees aren't supposed to have tasks waiting for - * changes in the flags of extent states ever. - */ - ASSERT(!waitqueue_active(&state->wq)); - free_extent_state(state); - - cond_resched_lock(&tree->lock); - } - spin_unlock(&tree->lock); -} - static noinline void switch_commit_roots(struct btrfs_transaction *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; @@ -122,7 +93,7 @@ static noinline void switch_commit_roots(struct btrfs_transaction *trans) root->commit_root = btrfs_root_node(root); if (is_fstree(root->root_key.objectid)) btrfs_unpin_free_ino(root); - clear_btree_io_tree(&root->dirty_log_pages); + extent_io_tree_release(&root->dirty_log_pages); btrfs_qgroup_clean_swapped_blocks(root); } @@ -930,7 +901,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info, * superblock that points to btree nodes/leafs for which * writeback hasn't finished yet (and without errors). * We cleanup any entries left in the io tree when committing - * the transaction (through clear_btree_io_tree()). + * the transaction (through extent_io_tree_release()). */ if (err == -ENOMEM) { err = 0; @@ -975,7 +946,7 @@ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info, * left in the io tree. For a log commit, we don't remove them * after committing the log because the tree can be accessed * concurrently - we do it only at transaction commit time when - * it's safe to do it (through clear_btree_io_tree()). + * it's safe to do it (through extent_io_tree_release()). */ err = clear_extent_bit(dirty_pages, start, end, EXTENT_NEED_WAIT, 0, 0, &cached_state); @@ -1053,7 +1024,7 @@ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans) blk_finish_plug(&plug); ret2 = btrfs_wait_extents(fs_info, dirty_pages); - clear_btree_io_tree(&trans->transaction->dirty_pages); + extent_io_tree_release(&trans->transaction->dirty_pages); if (ret) return ret; From patchwork Mon Feb 11 08:35:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805253 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 498D16C2 for ; Mon, 11 Feb 2019 08:35:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39E0729BDF for ; Mon, 11 Feb 2019 08:35:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D42029BE9; Mon, 11 Feb 2019 08:35:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C79D929A21 for ; Mon, 11 Feb 2019 08:35:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727138AbfBKIfT (ORCPT ); Mon, 11 Feb 2019 03:35:19 -0500 Received: from mx2.suse.de ([195.135.220.15]:41514 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727024AbfBKIfQ (ORCPT ); Mon, 11 Feb 2019 03:35:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 54777ADF3 for ; Mon, 11 Feb 2019 08:35:15 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk Date: Mon, 11 Feb 2019 10:35:03 +0200 Message-Id: <20190211083510.27591-6-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Chunks read from disk currently don't get their ->orig_block_len member set, in contrast when a new chunk is allocated, the respective extent_map's ->orig_block_len is assigned the size of the stripe of this chunk. Let's apply the same strategy for chunks which are read from disk, not only does this codify the invariant that ->orig_block_len always contains the size of the stripe for a chunk (when the em belongs to the mapping tree). But it's also a preparatory patch for further work around tracking chunk allocation in an extent tree rather than pinned/pending lists. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba --- fs/btrfs/volumes.c | 41 ++++++++++++++++++++++------------------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f0d91b1fda1c..4c654e0bd618 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6811,6 +6811,26 @@ static void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, devid, uuid); } +static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes) +{ + int index = btrfs_bg_flags_to_raid_index(type); + int ncopies = btrfs_raid_array[index].ncopies; + int data_stripes; + + switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case BTRFS_BLOCK_GROUP_RAID5: + data_stripes = num_stripes - 1; + break; + case BTRFS_BLOCK_GROUP_RAID6: + data_stripes = num_stripes - 2; + break; + default: + data_stripes = num_stripes / ncopies; + break; + } + return div_u64(chunk_len, data_stripes); +} + static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key, struct extent_buffer *leaf, struct btrfs_chunk *chunk) @@ -6870,6 +6890,8 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key, map->type = btrfs_chunk_type(leaf, chunk); map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk); map->verified_stripes = 0; + em->orig_block_len = calc_stripe_length(map->type, em->len, + map->num_stripes); for (i = 0; i < num_stripes; i++) { map->stripes[i].physical = btrfs_stripe_offset_nr(leaf, chunk, i); @@ -7727,25 +7749,6 @@ int btrfs_bg_type_to_factor(u64 flags) } -static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes) -{ - int index = btrfs_bg_flags_to_raid_index(type); - int ncopies = btrfs_raid_array[index].ncopies; - int data_stripes; - - switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { - case BTRFS_BLOCK_GROUP_RAID5: - data_stripes = num_stripes - 1; - break; - case BTRFS_BLOCK_GROUP_RAID6: - data_stripes = num_stripes - 2; - break; - default: - data_stripes = num_stripes / ncopies; - break; - } - return div_u64(chunk_len, data_stripes); -} static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, u64 chunk_offset, u64 devid, From patchwork Mon Feb 11 08:35:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805267 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6138B13B4 for ; Mon, 11 Feb 2019 08:35:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5247329A21 for ; Mon, 11 Feb 2019 08:35:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 46E7E29BE9; Mon, 11 Feb 2019 08:35:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAB9729A21 for ; Mon, 11 Feb 2019 08:35:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727129AbfBKIfR (ORCPT ); Mon, 11 Feb 2019 03:35:17 -0500 Received: from mx2.suse.de ([195.135.220.15]:41524 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726363AbfBKIfQ (ORCPT ); Mon, 11 Feb 2019 03:35:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9ECB9ACB8 for ; Mon, 11 Feb 2019 08:35:15 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree Date: Mon, 11 Feb 2019 10:35:04 +0200 Message-Id: <20190211083510.27591-7-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Rather than hijacking the existing defines let's just define new bits, with more descriptive names. Instead of using yet more (currently at 18) bits for the new flags, use the fact those flags will be specific to the device allocation tree so define them using existing EXTENT_* flags. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d7beb2b3bc7d..af7e00a3678c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -29,6 +29,10 @@ EXTENT_CLEAR_DATA_RESV) #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING) + +/* Redefined bits above which are used only in the device allocation tree */ +#define CHUNK_ALLOCATED EXTENT_DIRTY + /* * flags for bio submission. The high bits indicate the compression * type for this bio From patchwork Mon Feb 11 08:35:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805265 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9AFC06C2 for ; Mon, 11 Feb 2019 08:35:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8938829A21 for ; Mon, 11 Feb 2019 08:35:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7D49629BE9; Mon, 11 Feb 2019 08:35:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5ED9429A21 for ; Mon, 11 Feb 2019 08:35:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727165AbfBKIfY (ORCPT ); Mon, 11 Feb 2019 03:35:24 -0500 Received: from mx2.suse.de ([195.135.220.15]:41530 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727086AbfBKIfS (ORCPT ); Mon, 11 Feb 2019 03:35:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id F25F4AC89 for ; Mon, 11 Feb 2019 08:35:15 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Jeff Mahoney , Nikolay Borisov Subject: [PATCH v2 07/12] btrfs: replace pending/pinned chunks lists with io tree Date: Mon, 11 Feb 2019 10:35:05 +0200 Message-Id: <20190211083510.27591-8-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jeff Mahoney The pending chunks list contains chunks that are allocated in the current transaction but haven't been created yet. The pinned chunks list contains chunks that are being released in the current transaction. Both describe chunks that are not reflected on disk as in use but are unavailable just the same. The pending chunks list is anchored by the transaction handle, which means that we need to hold a reference to a transaction when working with the list. We use these lists to ensure that we don't end up discarding chunks that are allocated or released in the current transaction. What we r The way we use them is by iterating over both lists to perform comparisons on the stripes they describe for each device. This is backwards and requires that we keep a transaction handle open while we're trimming. This patchset adds an extent_io_tree to btrfs_device that maintains the allocation state of the device. Extents are set dirty when chunks are first allocated -- when the extent maps are added to the mapping tree. They're cleared when last removed -- when the extent maps are removed from the mapping tree. This matches the lifespan of the pending and pinned chunks list and allows us to do trims on unallocated space safely without pinning the transaction for what may be a lengthy operation. We can also use this io tree to mark which chunks have already been trimmed so we don't repeat the operation. Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.h | 6 --- fs/btrfs/disk-io.c | 11 ----- fs/btrfs/extent-tree.c | 28 ----------- fs/btrfs/extent_io.c | 2 +- fs/btrfs/extent_io.h | 6 ++- fs/btrfs/extent_map.c | 36 ++++++++++++++ fs/btrfs/extent_map.h | 1 - fs/btrfs/free-space-cache.c | 4 -- fs/btrfs/transaction.c | 9 ---- fs/btrfs/transaction.h | 1 - fs/btrfs/volumes.c | 96 +++++++++++++------------------------ fs/btrfs/volumes.h | 2 + 12 files changed, 76 insertions(+), 126 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 9306925b6790..86dbf2160ae2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1149,12 +1149,6 @@ struct btrfs_fs_info { struct mutex unused_bg_unpin_mutex; struct mutex delete_unused_bgs_mutex; - /* - * Chunks that can't be freed yet (under a trim/discard operation) - * and will be latter freed. Protected by fs_info->chunk_mutex. - */ - struct list_head pinned_chunks; - /* Cached block sizes */ u32 nodesize; u32 sectorsize; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f1c42d242d48..acf312203cd1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2775,8 +2775,6 @@ int open_ctree(struct super_block *sb, init_waitqueue_head(&fs_info->async_submit_wait); init_waitqueue_head(&fs_info->delayed_iputs_wait); - INIT_LIST_HEAD(&fs_info->pinned_chunks); - /* Usable values until the real ones are cached from the superblock */ fs_info->nodesize = 4096; fs_info->sectorsize = 4096; @@ -4051,15 +4049,6 @@ void close_ctree(struct btrfs_fs_info *fs_info) btrfs_free_stripe_hash_table(fs_info); btrfs_free_ref_cache(fs_info); - - while (!list_empty(&fs_info->pinned_chunks)) { - struct extent_map *em; - - em = list_first_entry(&fs_info->pinned_chunks, - struct extent_map, list); - list_del_init(&em->list); - free_extent_map(em); - } } int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 33ecd4128898..48bf3df0b194 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10895,10 +10895,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, memcpy(&key, &block_group->key, sizeof(key)); mutex_lock(&fs_info->chunk_mutex); - if (!list_empty(&em->list)) { - /* We're in the transaction->pending_chunks list. */ - free_extent_map(em); - } spin_lock(&block_group->lock); block_group->removed = 1; /* @@ -10925,25 +10921,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, * the transaction commit has completed. */ remove_em = (atomic_read(&block_group->trimming) == 0); - /* - * Make sure a trimmer task always sees the em in the pinned_chunks list - * if it sees block_group->removed == 1 (needs to lock block_group->lock - * before checking block_group->removed). - */ - if (!remove_em) { - /* - * Our em might be in trans->transaction->pending_chunks which - * is protected by fs_info->chunk_mutex ([lock|unlock]_chunks), - * and so is the fs_info->pinned_chunks list. - * - * So at this point we must be holding the chunk_mutex to avoid - * any races with chunk allocation (more specifically at - * volumes.c:contains_pending_extent()), to ensure it always - * sees the em, either in the pending_chunks list or in the - * pinned_chunks list. - */ - list_move_tail(&em->list, &fs_info->pinned_chunks); - } spin_unlock(&block_group->lock); if (remove_em) { @@ -10951,11 +10928,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, em_tree = &fs_info->mapping_tree.map_tree; write_lock(&em_tree->lock); - /* - * The em might be in the pending_chunks list, so make sure the - * chunk mutex is locked, since remove_extent_mapping() will - * delete us from that list. - */ remove_extent_mapping(em_tree, em); write_unlock(&em_tree->lock); /* once for the tree */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ae1049824739..9733790881ae 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -891,7 +891,7 @@ static void cache_state(struct extent_state *state, * [start, end] is inclusive This takes the tree lock. */ -static int __must_check +int __must_check __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, unsigned exclusive_bits, u64 *failed_start, struct extent_state **cached_state, diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index af7e00a3678c..d4227e40c8ee 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -314,7 +314,11 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits, u64 *failed_start, struct extent_state **cached_state, gfp_t mask); - +int +__set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, + unsigned bits, unsigned exclusive_bits, + u64 *failed_start, struct extent_state **cached_state, + gfp_t mask, struct extent_changeset *changeset); static inline int set_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, unsigned bits) { diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 928f729c55ba..0820f6fcf3a6 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -4,6 +4,7 @@ #include #include #include "ctree.h" +#include "volumes.h" #include "extent_map.h" #include "compression.h" @@ -336,6 +337,37 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree, else try_merge_map(tree, em); } +static void extent_map_device_set_bits(struct extent_map *em, unsigned bits) +{ + struct map_lookup *map = em->map_lookup; + u64 stripe_size = em->orig_block_len; + int i; + + for (i = 0; i < map->num_stripes; i++) { + struct btrfs_bio_stripe *stripe = &map->stripes[i]; + struct btrfs_device *device = stripe->dev; + + __set_extent_bit(&device->alloc_state, stripe->physical, + stripe->physical + stripe_size - 1, bits, + 0, NULL, NULL, GFP_NOWAIT, NULL); + } +} + +static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits) +{ + struct map_lookup *map = em->map_lookup; + u64 stripe_size = em->orig_block_len; + int i; + + for (i = 0; i < map->num_stripes; i++) { + struct btrfs_bio_stripe *stripe = &map->stripes[i]; + struct btrfs_device *device = stripe->dev; + + __clear_extent_bit(&device->alloc_state, stripe->physical, + stripe->physical + stripe_size - 1, bits, + 0, 0, NULL, GFP_NOWAIT, NULL); + } +} /** * add_extent_mapping - add new extent map to the extent tree @@ -357,6 +389,8 @@ int add_extent_mapping(struct extent_map_tree *tree, goto out; setup_extent_mapping(tree, em, modified); + if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) + extent_map_device_set_bits(em, CHUNK_ALLOCATED); out: return ret; } @@ -438,6 +472,8 @@ void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em) rb_erase_cached(&em->rb_node, &tree->map); if (!test_bit(EXTENT_FLAG_LOGGING, &em->flags)) list_del_init(&em->list); + if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) + extent_map_device_clear_bits(em, CHUNK_ALLOCATED); RB_CLEAR_NODE(&em->rb_node); } diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h index 473f039fcd7c..72b46833f236 100644 --- a/fs/btrfs/extent_map.h +++ b/fs/btrfs/extent_map.h @@ -91,7 +91,6 @@ void replace_extent_mapping(struct extent_map_tree *tree, struct extent_map *cur, struct extent_map *new, int modified); - struct extent_map *alloc_extent_map(void); void free_extent_map(struct extent_map *em); int __init extent_map_init(void); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 74aa552f4793..207fb50dcc7a 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3366,10 +3366,6 @@ void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *block_group) em = lookup_extent_mapping(em_tree, block_group->key.objectid, 1); BUG_ON(!em); /* logic error, can't happen */ - /* - * remove_extent_mapping() will delete us from the pinned_chunks - * list, which is protected by the chunk mutex. - */ remove_extent_mapping(em_tree, em); write_unlock(&em_tree->lock); mutex_unlock(&fs_info->chunk_mutex); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 07f5477abd0a..e91f1a98d0dd 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -50,14 +50,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction) btrfs_err(transaction->fs_info, "pending csums is %llu", transaction->delayed_refs.pending_csums); - while (!list_empty(&transaction->pending_chunks)) { - struct extent_map *em; - - em = list_first_entry(&transaction->pending_chunks, - struct extent_map, list); - list_del_init(&em->list); - free_extent_map(em); - } /* * If any block groups are found in ->deleted_bgs then it's * because the transaction was aborted and a commit did not @@ -235,7 +227,6 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->delayed_refs.lock); INIT_LIST_HEAD(&cur_trans->pending_snapshots); - INIT_LIST_HEAD(&cur_trans->pending_chunks); INIT_LIST_HEAD(&cur_trans->dev_update_list); INIT_LIST_HEAD(&cur_trans->switch_commits); INIT_LIST_HEAD(&cur_trans->dirty_bgs); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index e0a04fa4de66..134d1a0bd92f 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -51,7 +51,6 @@ struct btrfs_transaction { wait_queue_head_t writer_wait; wait_queue_head_t commit_wait; struct list_head pending_snapshots; - struct list_head pending_chunks; struct list_head dev_update_list; struct list_head switch_commits; struct list_head dirty_bgs; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4c654e0bd618..f8baa9d4c796 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -335,6 +335,8 @@ void btrfs_free_device(struct btrfs_device *device) { BUG_ON(!list_empty(&device->post_commit_list)); rcu_string_free(device->name); + if (!in_softirq()) + extent_io_tree_release(&device->alloc_state); bio_put(device->flush_bio); kfree(device); } @@ -411,6 +413,7 @@ static struct btrfs_device *__alloc_device(void) btrfs_device_data_ordered_init(dev); INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); + extent_io_tree_init(&dev->alloc_state, NULL); return dev; } @@ -1269,6 +1272,9 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state)) fs_devices->missing_devices--; + /* Remove alloc state now since it cannot be done in RCU context */ + extent_io_tree_release(&device->alloc_state); + btrfs_close_bdev(device); new_device = btrfs_alloc_device(NULL, &device->devid, @@ -1505,58 +1511,29 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, return device; } -static int contains_pending_extent(struct btrfs_transaction *transaction, - struct btrfs_device *device, - u64 *start, u64 len) -{ - struct btrfs_fs_info *fs_info = device->fs_info; - struct extent_map *em; - struct list_head *search_list = &fs_info->pinned_chunks; - int ret = 0; - u64 physical_start = *start; - - if (transaction) - search_list = &transaction->pending_chunks; -again: - list_for_each_entry(em, search_list, list) { - struct map_lookup *map; - int i; - - map = em->map_lookup; - for (i = 0; i < map->num_stripes; i++) { - u64 end; - - if (map->stripes[i].dev != device) - continue; - if (map->stripes[i].physical >= physical_start + len || - map->stripes[i].physical + em->orig_block_len <= - physical_start) - continue; - /* - * Make sure that while processing the pinned list we do - * not override our *start with a lower value, because - * we can have pinned chunks that fall within this - * device hole and that have lower physical addresses - * than the pending chunks we processed before. If we - * do not take this special care we can end up getting - * 2 pending chunks that start at the same physical - * device offsets because the end offset of a pinned - * chunk can be equal to the start offset of some - * pending chunk. - */ - end = map->stripes[i].physical + em->orig_block_len; - if (end > *start) { - *start = end; - ret = 1; - } +/* + * Tries to find a chunk that intersects [start, start +len] range and when one + * such is found, records the end of it in *start + */ +#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) +static bool contains_pending_extent(struct btrfs_device *device, u64 *start, + u64 len) +{ + u64 physical_start, physical_end; + lockdep_assert_held(&device->fs_info->chunk_mutex); + + if (!find_first_extent_bit(&device->alloc_state, *start, + &physical_start, &physical_end, + CHUNK_ALLOCATED, NULL)) { + + if (in_range(physical_start, *start, len) || + in_range(*start, physical_start, + physical_end - physical_start)) { + *start = physical_end + 1; + return true; } } - if (search_list != &fs_info->pinned_chunks) { - search_list = &fs_info->pinned_chunks; - goto again; - } - - return ret; + return false; } @@ -1667,15 +1644,12 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, * Have to check before we set max_hole_start, otherwise * we could end up sending back this offset anyway. */ - if (contains_pending_extent(transaction, device, - &search_start, + if (contains_pending_extent(device, &search_start, hole_size)) { - if (key.offset >= search_start) { + if (key.offset >= search_start) hole_size = key.offset - search_start; - } else { - WARN_ON_ONCE(1); + else hole_size = 0; - } } if (hole_size > max_hole_size) { @@ -1716,8 +1690,7 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, if (search_end > search_start) { hole_size = search_end - search_start; - if (contains_pending_extent(transaction, device, &search_start, - hole_size)) { + if (contains_pending_extent(device, &search_start, hole_size)) { btrfs_release_path(path); goto again; } @@ -4759,7 +4732,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) * in-memory chunks are synced to disk so that the loop below sees them * and relocates them accordingly. */ - if (contains_pending_extent(trans->transaction, device, &start, diff)) { + if (contains_pending_extent(device, &start, diff)) { mutex_unlock(&fs_info->chunk_mutex); ret = btrfs_commit_transaction(trans); if (ret) @@ -5201,9 +5174,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, free_extent_map(em); goto error; } - - list_add_tail(&em->list, &trans->transaction->pending_chunks); - refcount_inc(&em->refs); write_unlock(&em_tree->lock); ret = btrfs_make_block_group(trans, 0, type, start, chunk_size); @@ -5236,8 +5206,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, free_extent_map(em); /* One for the tree reference */ free_extent_map(em); - /* One for the pending_chunks list reference */ - free_extent_map(em); error: kfree(devices_info); return ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index a0f09aad3770..49dc737d8a54 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -134,6 +134,8 @@ struct btrfs_device { /* Counter to record the change of device stats */ atomic_t dev_stats_ccnt; atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; + + struct extent_io_tree alloc_state; }; /* From patchwork Mon Feb 11 08:35:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805269 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02BC61575 for ; Mon, 11 Feb 2019 08:35:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E73A729A21 for ; Mon, 11 Feb 2019 08:35:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB90529BDF; Mon, 11 Feb 2019 08:35:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F33829BED for ; Mon, 11 Feb 2019 08:35:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727171AbfBKIf1 (ORCPT ); Mon, 11 Feb 2019 03:35:27 -0500 Received: from mx2.suse.de ([195.135.220.15]:41536 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727116AbfBKIfS (ORCPT ); Mon, 11 Feb 2019 03:35:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 47E35AE27 for ; Mon, 11 Feb 2019 08:35:16 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 08/12] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Date: Mon, 11 Feb 2019 10:35:06 +0200 Message-Id: <20190211083510.27591-9-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now that those function no longer require a handle to transaction to inspect pending/pinned chunks the argument can be removed. At the same time also remove any surrounding code which acquired the handle. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 36 +++--------------------------------- fs/btrfs/volumes.c | 11 ++++------- fs/btrfs/volumes.h | 8 +++----- 3 files changed, 10 insertions(+), 45 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 48bf3df0b194..39647ddb2195 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9864,12 +9864,10 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache) */ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) { - struct btrfs_root *root = fs_info->extent_root; struct btrfs_block_group_cache *block_group; struct btrfs_space_info *space_info; struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *device; - struct btrfs_trans_handle *trans; u64 min_free; u64 dev_min = 1; u64 dev_nr = 0; @@ -9968,13 +9966,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) min_free = div64_u64(min_free, dev_min); } - /* We need to do this so that we can look at pending chunks */ - trans = btrfs_join_transaction(root); - if (IS_ERR(trans)) { - ret = PTR_ERR(trans); - goto out; - } - mutex_lock(&fs_info->chunk_mutex); list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) { u64 dev_offset; @@ -9985,7 +9976,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) */ if (device->total_bytes > device->bytes_used + min_free && !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) { - ret = find_free_dev_extent(trans, device, min_free, + ret = find_free_dev_extent(device, min_free, &dev_offset, NULL); if (!ret) dev_nr++; @@ -10001,7 +9992,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) "no space to allocate a new chunk for block group %llu", block_group->key.objectid); mutex_unlock(&fs_info->chunk_mutex); - btrfs_end_transaction(trans); out: btrfs_put_block_group(block_group); return ret; @@ -11253,34 +11243,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, while (1) { struct btrfs_fs_info *fs_info = device->fs_info; - struct btrfs_transaction *trans; u64 bytes; ret = mutex_lock_interruptible(&fs_info->chunk_mutex); if (ret) break; - ret = down_read_killable(&fs_info->commit_root_sem); - if (ret) { - mutex_unlock(&fs_info->chunk_mutex); - break; - } - - spin_lock(&fs_info->trans_lock); - trans = fs_info->running_transaction; - if (trans) - refcount_inc(&trans->use_count); - spin_unlock(&fs_info->trans_lock); - - if (!trans) - up_read(&fs_info->commit_root_sem); - - ret = find_free_dev_extent_start(trans, device, range->minlen, - start, &start, &len); - if (trans) { - up_read(&fs_info->commit_root_sem); - btrfs_put_transaction(trans); - } + ret = find_free_dev_extent_start(device, range->minlen, start, + &start, &len); if (ret) { mutex_unlock(&fs_info->chunk_mutex); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f8baa9d4c796..ece4e5fad9c6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1558,8 +1558,7 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start, * But if we don't find suitable free space, it is used to store the size of * the max free space. */ -int find_free_dev_extent_start(struct btrfs_transaction *transaction, - struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, u64 search_start, u64 *start, u64 *len) { struct btrfs_fs_info *fs_info = device->fs_info; @@ -1715,13 +1714,11 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, return ret; } -int find_free_dev_extent(struct btrfs_trans_handle *trans, - struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *len) { /* FIXME use last free of some kind */ - return find_free_dev_extent_start(trans->transaction, device, - num_bytes, 0, start, len); + return find_free_dev_extent_start(device, num_bytes, 0, start, len); } static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans, @@ -5040,7 +5037,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (total_avail == 0) continue; - ret = find_free_dev_extent(trans, device, + ret = find_free_dev_extent(device, max_stripe_size * dev_stripes, &dev_offset, &max_avail); if (ret && ret != -ENOSPC) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 49dc737d8a54..30c1f3002a81 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -445,11 +445,9 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info); int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info); int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info); int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset); -int find_free_dev_extent_start(struct btrfs_transaction *transaction, - struct btrfs_device *device, u64 num_bytes, - u64 search_start, u64 *start, u64 *max_avail); -int find_free_dev_extent(struct btrfs_trans_handle *trans, - struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes, + u64 search_start, u64 *start, u64 *max_avail); +int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *max_avail); void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index); int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info, From patchwork Mon Feb 11 08:35:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805255 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A03AC1575 for ; Mon, 11 Feb 2019 08:35:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9051729A21 for ; Mon, 11 Feb 2019 08:35:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 848E529BE9; Mon, 11 Feb 2019 08:35:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35A3129A21 for ; Mon, 11 Feb 2019 08:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727139AbfBKIfU (ORCPT ); Mon, 11 Feb 2019 03:35:20 -0500 Received: from mx2.suse.de ([195.135.220.15]:41514 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727103AbfBKIfR (ORCPT ); Mon, 11 Feb 2019 03:35:17 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8C9FFAEC3 for ; Mon, 11 Feb 2019 08:35:16 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 09/12] btrfs: Factor out in_range macro Date: Mon, 11 Feb 2019 10:35:07 +0200 Message-Id: <20190211083510.27591-10-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is used in more than one places so let's factor it out in ctree.h. No functional changes. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 1 - fs/btrfs/volumes.c | 1 - 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 86dbf2160ae2..c61fff4c294d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3808,6 +3808,8 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info) return signal_pending(current); } +#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) + /* Sanity test specific functions */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS void btrfs_test_inode_set_ops(struct inode *inode); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 39647ddb2195..188774ed7795 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1905,7 +1905,6 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans, return ret; } -#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, u64 *discarded_bytes) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ece4e5fad9c6..6fd8df6e3964 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1515,7 +1515,6 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, * Tries to find a chunk that intersects [start, start +len] range and when one * such is found, records the end of it in *start */ -#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len)) static bool contains_pending_extent(struct btrfs_device *device, u64 *start, u64 len) { From patchwork Mon Feb 11 08:35:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805261 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E8E86C2 for ; Mon, 11 Feb 2019 08:35:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E16429A21 for ; Mon, 11 Feb 2019 08:35:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 42BA229BE9; Mon, 11 Feb 2019 08:35:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C8BE329A21 for ; Mon, 11 Feb 2019 08:35:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727152AbfBKIfX (ORCPT ); Mon, 11 Feb 2019 03:35:23 -0500 Received: from mx2.suse.de ([195.135.220.15]:41546 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727119AbfBKIfS (ORCPT ); Mon, 11 Feb 2019 03:35:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CEDA9ADF3 for ; Mon, 11 Feb 2019 08:35:16 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 10/12] btrfs: Optimize unallocated chunks discard Date: Mon, 11 Feb 2019 10:35:08 +0200 Message-Id: <20190211083510.27591-11-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently unallocated chunks are always trimmed. For example 2 consecutive trims on large storage would trim freespace twice irrespective of whether the space was actually allocated or not between those trims. Optimise this behavior by exploiting the newly introduced alloc_state tree of btrfs_device. A new CHUNK_TRIMMED bit is used to mark those unallocated chunks which have been trimmed and have not been allocated afterwards. On chunk allocation the respective underlying devices' physical space will have its CHUNK_TRIMMED flag cleared. This avoids submitting discards for space which hasn't been changed since the last time discard was issued. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++++++++++++- fs/btrfs/extent_io.h | 8 +++++- fs/btrfs/extent_map.c | 4 ++- 3 files changed, 66 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 188774ed7795..2d4d597c8ca4 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11198,6 +11198,54 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, return unpin_extent_range(fs_info, start, end, false); } +static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len) +{ + u64 trimmed_start = 0, trimmed_end = 0; + u64 end = *start + *len - 1; + + if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start, + &trimmed_end, CHUNK_TRIMMED, NULL)) { + u64 trimmed_len = trimmed_end - trimmed_start + 1; + + if (*start < trimmed_start) { + if (in_range(end, trimmed_start, trimmed_len) || + end > trimmed_end) { + /* + * start|------|end + * ts|--|trimmed_len + * OR + * start|-----|end + * ts|-----|trimmed_len + */ + *len = trimmed_start - *start; + return false; + } else if (end < trimmed_start) { + /* + * start|------|end + * ts|--|trimmed_len + */ + return false; + } + } else if (in_range(*start, trimmed_start, trimmed_len)) { + if (in_range(end, trimmed_start, trimmed_len)) { + /* + * start|------|end + * ts|----------|trimmed_len + */ + return true; + } else { + /* + * start|-----------|end + * ts|----------|trimmed_len + */ + *start = trimmed_end + 1; + *len = end - *start + 1; + return false; + } + } + } + return false; +} /* * It used to be that old block groups would be left around forever. * Iterating over them would be enough to trim unused space. Since we @@ -11268,7 +11316,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, start = max(range->start, start); len = min(range->len, len); - ret = btrfs_issue_discard(device->bdev, start, len, &bytes); + if (!should_skip_trim(device, &start, &len)) { + ret = btrfs_issue_discard(device->bdev, start, len, + &bytes); + if (!ret) + set_extent_bits(&device->alloc_state, start, + start + bytes - 1, + CHUNK_TRIMMED); + } mutex_unlock(&fs_info->chunk_mutex); if (ret) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d4227e40c8ee..d238efd628cf 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -30,8 +30,14 @@ #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING) -/* Redefined bits above which are used only in the device allocation tree */ +/* + * Redefined bits above which are used only in the device allocation tree, + * shouldn't be using EXTENT_IOBITS(EXTENT_LOCKED/EXTENT_WRITEBACK) / + * EXTENT_BOUNDARY / EXTENT_CLEAR_META_RESV / EXTENT_CLEAR_DATA_RESV because + * they have special meaning to the bit manipulation functions + */ #define CHUNK_ALLOCATED EXTENT_DIRTY +#define CHUNK_TRIMMED EXTENT_DEFRAG /* * flags for bio submission. The high bits indicate the compression diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 0820f6fcf3a6..9e8c0904f623 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -389,8 +389,10 @@ int add_extent_mapping(struct extent_map_tree *tree, goto out; setup_extent_mapping(tree, em, modified); - if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) + if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) { extent_map_device_set_bits(em, CHUNK_ALLOCATED); + extent_map_device_clear_bits(em, CHUNK_TRIMMED); + } out: return ret; } From patchwork Mon Feb 11 08:35:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805263 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71B566C2 for ; Mon, 11 Feb 2019 08:35:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 612BC29BDF for ; Mon, 11 Feb 2019 08:35:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 55BF929BE9; Mon, 11 Feb 2019 08:35:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFEE329A21 for ; Mon, 11 Feb 2019 08:35:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727168AbfBKIfY (ORCPT ); Mon, 11 Feb 2019 03:35:24 -0500 Received: from mx2.suse.de ([195.135.220.15]:41524 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727118AbfBKIfS (ORCPT ); Mon, 11 Feb 2019 03:35:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1F571ACB8 for ; Mon, 11 Feb 2019 08:35:17 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 11/12] btrfs: Implement find_first_clear_extent_bit Date: Mon, 11 Feb 2019 10:35:09 +0200 Message-Id: <20190211083510.27591-12-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This function is very similar to find_first_extent_bit except that it locates the first contiguous span of space which does not have bits set. It's intended use is in the freespace trimming code. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/extent_io.h | 2 ++ 2 files changed, 75 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9733790881ae..95b9af7376c7 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1505,6 +1505,79 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start, return ret; } +/** + * find_first_clear_extent_bit - finds the first range that has @bits not set + * and that starts after @start + * + * @tree - the tree to search + * @start - the offset at/after which the found extent should start + * @start_ret - records the beginning of the range + * @end_ret - records the end of the range (inclusive) + * @bits - the set of bits which must be unset + * + * Since unallocated range is also considered one which doesn't have the bits + * set it's possible that @end_ret contains -1, this happens in case the range + * spans (last_range_end, end of device]. In this case it's up to the caller to + * trim @end_ret to the appropriate size. + */ +void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start, + u64 *start_ret, u64 *end_ret, unsigned bits) +{ + struct extent_state *state; + struct rb_node *node, *prev = NULL, *next; + + spin_lock(&tree->lock); + + /* Find first extent with bits cleared */ + while (1) { + node = __etree_search(tree, start, &next, &prev, NULL, NULL); + if (!node) { + node = next; + if (!node) { + /* + * We are past the last allocated chunk, + * set start at the end of the last extent. The + * device alloc tree should never be empty so + * prev is always set. + */ + ASSERT(prev); + state = rb_entry(prev, struct extent_state, rb_node); + *start_ret = state->end + 1; + *end_ret = -1; + goto out; + } + } + state = rb_entry(node, struct extent_state, rb_node); + if (in_range(start, state->start, state->end - state->start + 1) && + (state->state & bits)) { + start = state->end + 1; + } else { + *start_ret = start; + break; + } + } + + /* + * Find the longest stretch from start until an entry which has the + * bits set + */ + while (1) { + state = rb_entry(node, struct extent_state, rb_node); + if (state->end >= start && !(state->state & bits)) { + *end_ret = state->end; + } else { + *end_ret = state->start - 1; + break; + } + + node = rb_next(node); + if (!node) + break; + } +out: + spin_unlock(&tree->lock); +} + /* * find a contiguous range of bytes in the file marked as delalloc, not * more than 'max_bytes'. start and end are used to return the range, diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index d238efd628cf..7012ff27da82 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -391,6 +391,8 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start, int find_first_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, unsigned bits, struct extent_state **cached_state); +void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start, + u64 *start_ret, u64 *end_ret, unsigned bits); int extent_invalidatepage(struct extent_io_tree *tree, struct page *page, unsigned long offset); int extent_write_full_page(struct page *page, struct writeback_control *wbc); From patchwork Mon Feb 11 08:35:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10805259 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C48E81575 for ; Mon, 11 Feb 2019 08:35:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B2AD429A21 for ; Mon, 11 Feb 2019 08:35:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A41E229BE9; Mon, 11 Feb 2019 08:35:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39A6829A21 for ; Mon, 11 Feb 2019 08:35:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727150AbfBKIfW (ORCPT ); Mon, 11 Feb 2019 03:35:22 -0500 Received: from mx2.suse.de ([195.135.220.15]:41514 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727127AbfBKIfS (ORCPT ); Mon, 11 Feb 2019 03:35:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 62239ACEA for ; Mon, 11 Feb 2019 08:35:17 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH v2 12/12] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Date: Mon, 11 Feb 2019 10:35:10 +0200 Message-Id: <20190211083510.27591-13-nborisov@suse.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190211083510.27591-1-nborisov@suse.com> References: <20190211083510.27591-1-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Instead of always calling the allocator to search for a free extent, that satisfies the input criteria, switch btrfs_trim_free_extents to using find_first_clear_extent_bit. With this change it's no longer necessary to read the device tree in order to figure out holes in the devices. Now the code always searches in-memory data structure to figure out the space range which contains the requested which should result in speed oups. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 89 ++++++++++++------------------------------ 1 file changed, 26 insertions(+), 63 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 2d4d597c8ca4..cb56fbd84e6a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11198,54 +11198,6 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, return unpin_extent_range(fs_info, start, end, false); } -static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len) -{ - u64 trimmed_start = 0, trimmed_end = 0; - u64 end = *start + *len - 1; - - if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start, - &trimmed_end, CHUNK_TRIMMED, NULL)) { - u64 trimmed_len = trimmed_end - trimmed_start + 1; - - if (*start < trimmed_start) { - if (in_range(end, trimmed_start, trimmed_len) || - end > trimmed_end) { - /* - * start|------|end - * ts|--|trimmed_len - * OR - * start|-----|end - * ts|-----|trimmed_len - */ - *len = trimmed_start - *start; - return false; - } else if (end < trimmed_start) { - /* - * start|------|end - * ts|--|trimmed_len - */ - return false; - } - } else if (in_range(*start, trimmed_start, trimmed_len)) { - if (in_range(end, trimmed_start, trimmed_len)) { - /* - * start|------|end - * ts|----------|trimmed_len - */ - return true; - } else { - /* - * start|-----------|end - * ts|----------|trimmed_len - */ - *start = trimmed_end + 1; - *len = end - *start + 1; - return false; - } - } - } - return false; -} /* * It used to be that old block groups would be left around forever. * Iterating over them would be enough to trim unused space. Since we @@ -11269,7 +11221,7 @@ static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len) static int btrfs_trim_free_extents(struct btrfs_device *device, struct fstrim_range *range, u64 *trimmed) { - u64 start = range->start, len = 0; + u64 start = max_t(u64, range->start, SZ_1M), len = 0, end = 0; int ret; *trimmed = 0; @@ -11296,34 +11248,45 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, if (ret) break; - ret = find_free_dev_extent_start(device, range->minlen, start, - &start, &len); + find_first_clear_extent_bit(&device->alloc_state, start, + &start, &end, + CHUNK_TRIMMED | CHUNK_ALLOCATED); + /* If find_first_clear_extent_bit find a range that spans the + * end of the device it will set end to -1, in this case it's up + * to the caller to trim the value to the size of the device. + */ + end = min(end, device->total_bytes); + len = end - start + 1; - if (ret) { + /* We didn't find any extents */ + if (!len) { mutex_unlock(&fs_info->chunk_mutex); - if (ret == -ENOSPC) - ret = 0; + ret = 0; break; } + /* Keep going until we satisfy minlen or reach end of space */ + if (len < range->minlen) { + mutex_unlock(&fs_info->chunk_mutex); + start += len; + continue; + } + /* If we are out of the passed range break */ if (start > range->start + range->len - 1) { mutex_unlock(&fs_info->chunk_mutex); - ret = 0; break; } start = max(range->start, start); len = min(range->len, len); - if (!should_skip_trim(device, &start, &len)) { - ret = btrfs_issue_discard(device->bdev, start, len, - &bytes); - if (!ret) - set_extent_bits(&device->alloc_state, start, - start + bytes - 1, - CHUNK_TRIMMED); - } + ret = btrfs_issue_discard(device->bdev, start, len, + &bytes); + if (!ret) + set_extent_bits(&device->alloc_state, start, + start + bytes - 1, + CHUNK_TRIMMED); mutex_unlock(&fs_info->chunk_mutex); if (ret)