From patchwork Wed Oct 23 22:52:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207955 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C4B113B1 for ; Wed, 23 Oct 2019 22:53:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4B4A420679 for ; Wed, 23 Oct 2019 22:53:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871202; bh=dc+qSrrV7G2RJ3tX3LbtUwgCTDBmk5u8Wu/6GWV5BZY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=vLMiJHob1QgPuYymNqy17+MFoIqAEV1F1gw1KowWjdTiF5xql2/pKk/qt3spmwqAz euclazl2GN+QMs51aj0aoGMiXUgQ+2TefEB7sfDponn3pOV8pJXLAV5dVxY5CmwXd+ zRov+iWgCnYu1aohey4P7nZRF39Emn8SKQ8gSy58= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436895AbfJWWxV (ORCPT ); Wed, 23 Oct 2019 18:53:21 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:42570 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405969AbfJWWxV (ORCPT ); Wed, 23 Oct 2019 18:53:21 -0400 Received: by mail-qk1-f194.google.com with SMTP id m4so4563991qke.9 for ; Wed, 23 Oct 2019 15:53:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=Ge+4e7/i6Jpx3E9OAgpktj+sSx8hq3FfFmEAg1dhe9Q=; b=Rs6r+F8RsG8wRn2Pjv8HfLv1bY5a0ONMoOM9TXmK2ag7OrqTnRvw8xa5u8t4cEeHqL kk6zX9nY54P8IaTiDpztfjlMDd7SoMd/W1Nm9E4/8pbTwtiFvHBtm/t/pizCAEDc9hY+ irvDSsQCOgNPbsAxy+IwsWFpRNgaR3RZWu6ADFUShuOorpp0b08OfiGYhxnuiV1VtHoI aZBBbQMurIvQerCOO3lN++CzdopaIJ2yLIrk0YucnqbxpHnvDN7nJb6aiMYcfuFFi3qX yQigyua2PyMhAOtLNMaB3fkmga5qm6zPDYF/KAQ6eZSeslBcUIXUY5kaaxp9AuQbUa3E sNuw== X-Gm-Message-State: APjAAAVs6US3hvdls7BuJgmfpasI4ATaFtrtaYKeRIrQRbfS0mHxPMwu fn4jUa09vnhqansIjjttE44= X-Google-Smtp-Source: APXvYqz3+KppD+iths1t6Q4LKAu6Z+xJZZ3Z1meCuwlSE34p+aV/EUxOYEwPW1ofGaPXM28Ho/b/kQ== X-Received: by 2002:a05:620a:b16:: with SMTP id t22mr11416898qkg.235.1571871200009; Wed, 23 Oct 2019 15:53:20 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:19 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 01/22] bitmap: genericize percpu bitmap region iterators Date: Wed, 23 Oct 2019 18:52:55 -0400 Message-Id: <734a9d3fbc683853fe61eeddded54bf9d9c4efa8.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Bitmaps are fairly popular for their space efficiency, but we don't have generic iterators available. Make percpu's bitmap region iterators available to everyone. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- include/linux/bitmap.h | 35 ++++++++++++++++++++++++ mm/percpu.c | 61 +++++++++++------------------------------- 2 files changed, 51 insertions(+), 45 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 29fc933df3bf..9c31b5268f7a 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -438,6 +438,41 @@ static inline int bitmap_parse(const char *buf, unsigned int buflen, return __bitmap_parse(buf, buflen, 0, maskp, nmaskbits); } +static inline void bitmap_next_clear_region(unsigned long *bitmap, + unsigned int *rs, unsigned int *re, + unsigned int end) +{ + *rs = find_next_zero_bit(bitmap, end, *rs); + *re = find_next_bit(bitmap, end, *rs + 1); +} + +static inline void bitmap_next_set_region(unsigned long *bitmap, + unsigned int *rs, unsigned int *re, + unsigned int end) +{ + *rs = find_next_bit(bitmap, end, *rs); + *re = find_next_zero_bit(bitmap, end, *rs + 1); +} + +/* + * Bitmap region iterators. Iterates over the bitmap between [@start, @end). + * @rs and @re should be integer variables and will be set to start and end + * index of the current clear or set region. + */ +#define bitmap_for_each_clear_region(bitmap, rs, re, start, end) \ + for ((rs) = (start), \ + bitmap_next_clear_region((bitmap), &(rs), &(re), (end)); \ + (rs) < (re); \ + (rs) = (re) + 1, \ + bitmap_next_clear_region((bitmap), &(rs), &(re), (end))) + +#define bitmap_for_each_set_region(bitmap, rs, re, start, end) \ + for ((rs) = (start), \ + bitmap_next_set_region((bitmap), &(rs), &(re), (end)); \ + (rs) < (re); \ + (rs) = (re) + 1, \ + bitmap_next_set_region((bitmap), &(rs), &(re), (end))) + /** * BITMAP_FROM_U64() - Represent u64 value in the format suitable for bitmap. * @n: u64 value diff --git a/mm/percpu.c b/mm/percpu.c index 7e06a1e58720..e9844086b236 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -270,33 +270,6 @@ static unsigned long pcpu_chunk_addr(struct pcpu_chunk *chunk, pcpu_unit_page_offset(cpu, page_idx); } -static void pcpu_next_unpop(unsigned long *bitmap, int *rs, int *re, int end) -{ - *rs = find_next_zero_bit(bitmap, end, *rs); - *re = find_next_bit(bitmap, end, *rs + 1); -} - -static void pcpu_next_pop(unsigned long *bitmap, int *rs, int *re, int end) -{ - *rs = find_next_bit(bitmap, end, *rs); - *re = find_next_zero_bit(bitmap, end, *rs + 1); -} - -/* - * Bitmap region iterators. Iterates over the bitmap between - * [@start, @end) in @chunk. @rs and @re should be integer variables - * and will be set to start and end index of the current free region. - */ -#define pcpu_for_each_unpop_region(bitmap, rs, re, start, end) \ - for ((rs) = (start), pcpu_next_unpop((bitmap), &(rs), &(re), (end)); \ - (rs) < (re); \ - (rs) = (re) + 1, pcpu_next_unpop((bitmap), &(rs), &(re), (end))) - -#define pcpu_for_each_pop_region(bitmap, rs, re, start, end) \ - for ((rs) = (start), pcpu_next_pop((bitmap), &(rs), &(re), (end)); \ - (rs) < (re); \ - (rs) = (re) + 1, pcpu_next_pop((bitmap), &(rs), &(re), (end))) - /* * The following are helper functions to help access bitmaps and convert * between bitmap offsets to address offsets. @@ -732,9 +705,8 @@ static void pcpu_chunk_refresh_hint(struct pcpu_chunk *chunk, bool full_scan) } bits = 0; - pcpu_for_each_md_free_region(chunk, bit_off, bits) { + pcpu_for_each_md_free_region(chunk, bit_off, bits) pcpu_block_update(chunk_md, bit_off, bit_off + bits); - } } /** @@ -749,7 +721,7 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) { struct pcpu_block_md *block = chunk->md_blocks + index; unsigned long *alloc_map = pcpu_index_alloc_map(chunk, index); - int rs, re, start; /* region start, region end */ + unsigned int rs, re, start; /* region start, region end */ /* promote scan_hint to contig_hint */ if (block->scan_hint) { @@ -765,10 +737,9 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) block->right_free = 0; /* iterate over free areas and update the contig hints */ - pcpu_for_each_unpop_region(alloc_map, rs, re, start, - PCPU_BITMAP_BLOCK_BITS) { + bitmap_for_each_clear_region(alloc_map, rs, re, start, + PCPU_BITMAP_BLOCK_BITS) pcpu_block_update(block, rs, re); - } } /** @@ -1041,13 +1012,13 @@ static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off, static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits, int *next_off) { - int page_start, page_end, rs, re; + unsigned int page_start, page_end, rs, re; page_start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE); page_end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE); rs = page_start; - pcpu_next_unpop(chunk->populated, &rs, &re, page_end); + bitmap_next_clear_region(chunk->populated, &rs, &re, page_end); if (rs >= page_end) return true; @@ -1702,13 +1673,13 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, /* populate if not all pages are already there */ if (!is_atomic) { - int page_start, page_end, rs, re; + unsigned int page_start, page_end, rs, re; page_start = PFN_DOWN(off); page_end = PFN_UP(off + size); - pcpu_for_each_unpop_region(chunk->populated, rs, re, - page_start, page_end) { + bitmap_for_each_clear_region(chunk->populated, rs, re, + page_start, page_end) { WARN_ON(chunk->immutable); ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp); @@ -1858,10 +1829,10 @@ static void pcpu_balance_workfn(struct work_struct *work) spin_unlock_irq(&pcpu_lock); list_for_each_entry_safe(chunk, next, &to_free, list) { - int rs, re; + unsigned int rs, re; - pcpu_for_each_pop_region(chunk->populated, rs, re, 0, - chunk->nr_pages) { + bitmap_for_each_set_region(chunk->populated, rs, re, 0, + chunk->nr_pages) { pcpu_depopulate_chunk(chunk, rs, re); spin_lock_irq(&pcpu_lock); pcpu_chunk_depopulated(chunk, rs, re); @@ -1893,7 +1864,7 @@ static void pcpu_balance_workfn(struct work_struct *work) } for (slot = pcpu_size_to_slot(PAGE_SIZE); slot < pcpu_nr_slots; slot++) { - int nr_unpop = 0, rs, re; + unsigned int nr_unpop = 0, rs, re; if (!nr_to_pop) break; @@ -1910,9 +1881,9 @@ static void pcpu_balance_workfn(struct work_struct *work) continue; /* @chunk can't go away while pcpu_alloc_mutex is held */ - pcpu_for_each_unpop_region(chunk->populated, rs, re, 0, - chunk->nr_pages) { - int nr = min(re - rs, nr_to_pop); + bitmap_for_each_clear_region(chunk->populated, rs, re, 0, + chunk->nr_pages) { + int nr = min_t(int, re - rs, nr_to_pop); ret = pcpu_populate_chunk(chunk, rs, rs + nr, gfp); if (!ret) { From patchwork Wed Oct 23 22:52:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207957 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D39B13B1 for ; Wed, 23 Oct 2019 22:53:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2C51A20679 for ; Wed, 23 Oct 2019 22:53:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871203; bh=blPqjXf+2tHAwoBHB2TAAxkdpDZB9Tchp3HvIFBrc2w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=0xN9QWm6ugpzpjDQHckwqACwgbBO1ZXOwJqP85gOX+BH/oURuBZcL2+L6p1C/6bwg rC6lU1c9dedSAfaKS7SPiJmRqKFbe+z1sWPl2Uh6+DxjElzfuyaZxwvXaL6sMvOXWE 1K6XAnzeR0X+eNjx85WVreIvyso65rVtTnOVAvQE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436900AbfJWWxW (ORCPT ); Wed, 23 Oct 2019 18:53:22 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:38279 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436897AbfJWWxW (ORCPT ); Wed, 23 Oct 2019 18:53:22 -0400 Received: by mail-qk1-f196.google.com with SMTP id p4so21509905qkf.5 for ; Wed, 23 Oct 2019 15:53:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=nzwN4IK5YKxmE037L+54s3sfbyFlSCFhP+t6D5sD1Rg=; b=PxoOFq06W5RbFhdVkc1RLAkSHvV8ZDcl/MsC74Ryc1PWEsDTb4P5zVGCbITgWM3d6E tEaY5tv337Uqvz+2OZkkq0dnIcGbt5cJGlPXVadu6Mh5HDzvbYdX6w96ujebyBRuDaQz 3Ybbo/Xp7+E7FL5W+58eucTRrtRroEgWqZJ4N5AgBTtVgVn+Xok+3CQ2Nu04SpBLe5SC iwI0OHUR6bejnLvRqmD8WUVN+pIZXobREVWIaxWkZazSCDl73IioAokACKC/+rJDgX/g hSZA5m7CAPsTk+9tGfblQJ0pMbjwgZ1AoFV8GK+FaB2sU9MiIJUV3AYuursewWe8TJYK AXhA== X-Gm-Message-State: APjAAAWAhf3AoIM2fkRSyjbNMgMNUF8ZoszS8sVMCUWcBzjp8scc8ZnK tAowxg4bNGRO1J/8D2RoYNc= X-Google-Smtp-Source: APXvYqxLKDgHA0v29WaQI+rqVeBpOByKF/Av4nlzGGzGLnFcn7/UzZsFIPziIN/U6oiXHCEzWtdbig== X-Received: by 2002:a37:a892:: with SMTP id r140mr7459173qke.178.1571871201170; Wed, 23 Oct 2019 15:53:21 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:20 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 02/22] btrfs: rename DISCARD opt to DISCARD_SYNC Date: Wed, 23 Oct 2019 18:52:56 -0400 Message-Id: <6c7c82ffdc8bcf0fbc1aa940542d8c0abc782629.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This series introduces async discard which will use the flag DISCARD_ASYNC, so rename the original flag to DISCARD_SYNC as it is synchronously done in transaction commit. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik Reviewed-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 2 +- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent-tree.c | 4 ++-- fs/btrfs/super.c | 8 ++++---- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index bf7e3f23bba7..afe86028246a 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1365,7 +1365,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_unlock(&space_info->lock); /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD); + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 19d669d12ca1..1877586576aa 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1171,7 +1171,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info) #define BTRFS_MOUNT_FLUSHONCOMMIT (1 << 7) #define BTRFS_MOUNT_SSD_SPREAD (1 << 8) #define BTRFS_MOUNT_NOSSD (1 << 9) -#define BTRFS_MOUNT_DISCARD (1 << 10) +#define BTRFS_MOUNT_DISCARD_SYNC (1 << 10) #define BTRFS_MOUNT_FORCE_COMPRESS (1 << 11) #define BTRFS_MOUNT_SPACE_CACHE (1 << 12) #define BTRFS_MOUNT_CLEAR_CACHE (1 << 13) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 49cb26fa7c63..77a5904756c5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2903,7 +2903,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) break; } - if (btrfs_test_opt(fs_info, DISCARD)) + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) ret = btrfs_discard_extent(fs_info, start, end + 1 - start, NULL); @@ -4146,7 +4146,7 @@ static int __btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info, if (pin) pin_down_extent(cache, start, len, 1); else { - if (btrfs_test_opt(fs_info, DISCARD)) + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) ret = btrfs_discard_extent(fs_info, start, len, NULL); btrfs_add_free_space(cache, start, len); btrfs_free_reserved_bytes(cache, len, delalloc); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 1b151af25772..a02fece949cb 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -695,11 +695,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, info->metadata_ratio); break; case Opt_discard: - btrfs_set_and_info(info, DISCARD, - "turning on discard"); + btrfs_set_and_info(info, DISCARD_SYNC, + "turning on sync discard"); break; case Opt_nodiscard: - btrfs_clear_and_info(info, DISCARD, + btrfs_clear_and_info(info, DISCARD_SYNC, "turning off discard"); break; case Opt_space_cache: @@ -1322,7 +1322,7 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",nologreplay"); if (btrfs_test_opt(info, FLUSHONCOMMIT)) seq_puts(seq, ",flushoncommit"); - if (btrfs_test_opt(info, DISCARD)) + if (btrfs_test_opt(info, DISCARD_SYNC)) seq_puts(seq, ",discard"); if (!(info->sb->s_flags & SB_POSIXACL)) seq_puts(seq, ",noacl"); From patchwork Wed Oct 23 22:52:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207959 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B90DD13B1 for ; Wed, 23 Oct 2019 22:53:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8EE1920679 for ; Wed, 23 Oct 2019 22:53:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871206; bh=l/OU+tuk7CJNNoDaazuieimEtiQ5KVcziG+KVqvGxQ8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=LLZ7VqTypN7HKRQjQCVdI6sdprcXLiSptTJ6DNYRLL/M9DlD0taRcauLYPW2la07G RZWVSKHThIsAgB7nzrF59EPFr6lfa9AGxc7vlrfULNhicMCrdi4etfSGlgdg+6/XxF 49T+6BhpHOQLFNQcTDXrx4wlVzTJ4+yLjfmu1pFE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436907AbfJWWx0 (ORCPT ); Wed, 23 Oct 2019 18:53:26 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:42902 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436901AbfJWWxZ (ORCPT ); Wed, 23 Oct 2019 18:53:25 -0400 Received: by mail-qt1-f195.google.com with SMTP id w14so34719428qto.9 for ; Wed, 23 Oct 2019 15:53:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=KO4bV3p01kEGJLiuyWsygGv7RoGr384Sb3e5QELvfHc=; b=reiXozs8tazeSVbmpTbOQiCs/HKxtb5jPhTxLoJ9vIG0jna8k7HSJXYR524Ku8Jnea Bt//9tCUlMiOxDcAEQGvphs+lpXSediXvnn13YrPKRrrVxNH37aVtJfj6SrTqtcvfq6e 0ByLHDKw9DsE+EpOqPruStXOeBA0NLXUujHu0WVD41XGwvejqyt2ZFVlQclzZzv+45KJ gFBXVLGEgUyWtQFb6RqAzZsQLDacYPhwL4Wec+Zo4L821yfWCnwsgi1b0Rl4eZf3+lY7 HkNRZZ9IIDarVk0w0F8ahOYtJvFrGkwfRqhigoaa+0NbBBpKIMSMiJE+8EzuvJitVEjG TB1A== X-Gm-Message-State: APjAAAU7gRNduFZcqLInEcfmKnRVYoN3JfXJXNE03tRHuSpvOamD9INa 7cWneihwPlGixX1LJ3g/S14= X-Google-Smtp-Source: APXvYqzV4bnUdqEtnw92TeG7XQsRy/tC4Ma/XGD6EsiZT+iH1NMG6Cb/YL1qrtk0xMrnJDUrcaUedA== X-Received: by 2002:ad4:4112:: with SMTP id i18mr11777478qvp.21.1571871202311; Wed, 23 Oct 2019 15:53:22 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:21 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 03/22] btrfs: keep track of which extents have been discarded Date: Wed, 23 Oct 2019 18:52:57 -0400 Message-Id: <63a78fbbd4d742ab13484d0cdad2264173ca7411.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Async discard will use the free space cache as backing knowledge for which extents to discard. This patch plumbs knowledge about which extents need to be discarded into the free space cache from unpin_extent_range(). An untrimmed extent can merge with everything as this is a new region. Absorbing trimmed extents is a tradeoff to for greater coalescing which makes life better for find_free_extent(). Additionally, it seems the size of a trim isn't as problematic as the trim io itself. When reading in the free space cache from disk, if sync is set, mark all extents as trimmed. The current code ensures at transaction commit that all free space is trimmed when sync is set, so this reflects that. Signed-off-by: Dennis Zhou --- fs/btrfs/extent-tree.c | 15 +++++++--- fs/btrfs/free-space-cache.c | 60 ++++++++++++++++++++++++++++++++----- fs/btrfs/free-space-cache.h | 17 ++++++++++- fs/btrfs/inode-map.c | 13 ++++---- 4 files changed, 87 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 77a5904756c5..6a40bba3cb19 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2783,6 +2783,7 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, static int unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u64 end, + enum btrfs_trim_state trim_state, const bool return_free_space) { struct btrfs_block_group_cache *cache = NULL; @@ -2816,7 +2817,9 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, if (start < cache->last_byte_to_unpin) { len = min(len, cache->last_byte_to_unpin - start); if (return_free_space) - btrfs_add_free_space(cache, start, len); + __btrfs_add_free_space(fs_info, + cache->free_space_ctl, + start, len, trim_state); } start += len; @@ -2894,6 +2897,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) while (!trans->aborted) { struct extent_state *cached_state = NULL; + enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; mutex_lock(&fs_info->unused_bg_unpin_mutex); ret = find_first_extent_bit(unpin, 0, &start, &end, @@ -2903,12 +2907,14 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) break; } - if (btrfs_test_opt(fs_info, DISCARD_SYNC)) + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) { ret = btrfs_discard_extent(fs_info, start, end + 1 - start, NULL); + trim_state = BTRFS_TRIM_STATE_TRIMMED; + } clear_extent_dirty(unpin, start, end, &cached_state); - unpin_extent_range(fs_info, start, end, true); + unpin_extent_range(fs_info, start, end, trim_state, true); mutex_unlock(&fs_info->unused_bg_unpin_mutex); free_extent_state(cached_state); cond_resched(); @@ -5512,7 +5518,8 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u64 end) { - return unpin_extent_range(fs_info, start, end, false); + return unpin_extent_range(fs_info, start, end, + BTRFS_TRIM_STATE_UNTRIMMED, false); } /* diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d54dcd0ab230..d7f0cb961496 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -747,6 +747,14 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, goto free_cache; } + /* + * Sync discard ensures that the free space cache is always + * trimmed. So when reading this in, the state should reflect + * that. + */ + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) + e->trim_state = BTRFS_TRIM_STATE_TRIMMED; + if (!e->bytes) { kmem_cache_free(btrfs_free_space_cachep, e); goto free_cache; @@ -2157,6 +2165,23 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, return ret; } +/* + * Free space merging rules: + * 1) Merge trimmed areas together + * 2) Let untrimmed areas coalesce with trimmed areas + * 3) Always pull neighboring regions from bitmaps + * + * The above rules are for when we merge free space based on btrfs_trim_state. + * Rules 2 and 3 are subtle because they are suboptimal, but are done for the + * same reason: to promote larger extent regions which makes life easier for + * find_free_extent(). Rule 2 enables coalescing based on the common path + * being returning free space from btrfs_finish_extent_commit(). So when free + * space is trimmed, it will prevent aggregating trimmed new region and + * untrimmed regions in the rb_tree. Rule 3 is purely to obtain larger extents + * and provide find_free_extent() with the largest extents possible hoping for + * the reuse path. + * + */ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, bool update_stat) { @@ -2165,6 +2190,7 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl, bool merged = false; u64 offset = info->offset; u64 bytes = info->bytes; + const bool is_trimmed = btrfs_free_space_trimmed(info); /* * first we want to see if there is free space adjacent to the range we @@ -2178,7 +2204,9 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl, else left_info = tree_search_offset(ctl, offset - 1, 0, 0); - if (right_info && !right_info->bitmap) { + /* See try_merge_free_space() comment. */ + if (right_info && !right_info->bitmap && + (!is_trimmed || btrfs_free_space_trimmed(right_info))) { if (update_stat) unlink_free_space(ctl, right_info); else @@ -2188,8 +2216,10 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl, merged = true; } + /* See try_merge_free_space() comment. */ if (left_info && !left_info->bitmap && - left_info->offset + left_info->bytes == offset) { + left_info->offset + left_info->bytes == offset && + (!is_trimmed || btrfs_free_space_trimmed(left_info))) { if (update_stat) unlink_free_space(ctl, left_info); else @@ -2225,6 +2255,10 @@ static bool steal_from_bitmap_to_end(struct btrfs_free_space_ctl *ctl, bytes = (j - i) * ctl->unit; info->bytes += bytes; + /* See try_merge_free_space() comment. */ + if (!btrfs_free_space_trimmed(bitmap)) + info->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (update_stat) bitmap_clear_bits(ctl, bitmap, end, bytes); else @@ -2278,6 +2312,10 @@ static bool steal_from_bitmap_to_front(struct btrfs_free_space_ctl *ctl, info->offset -= bytes; info->bytes += bytes; + /* See try_merge_free_space() comment. */ + if (!btrfs_free_space_trimmed(bitmap)) + info->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (update_stat) bitmap_clear_bits(ctl, bitmap, info->offset, bytes); else @@ -2327,7 +2365,8 @@ static void steal_from_bitmap(struct btrfs_free_space_ctl *ctl, int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, - u64 offset, u64 bytes) + u64 offset, u64 bytes, + enum btrfs_trim_state trim_state) { struct btrfs_free_space *info; int ret = 0; @@ -2338,6 +2377,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, info->offset = offset; info->bytes = bytes; + info->trim_state = trim_state; RB_CLEAR_NODE(&info->offset_index); spin_lock(&ctl->tree_lock); @@ -2385,7 +2425,7 @@ int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, { return __btrfs_add_free_space(block_group->fs_info, block_group->free_space_ctl, - bytenr, size); + bytenr, size, 0); } int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, @@ -2460,8 +2500,11 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, } spin_unlock(&ctl->tree_lock); - ret = btrfs_add_free_space(block_group, offset + bytes, - old_end - (offset + bytes)); + ret = __btrfs_add_free_space(block_group->fs_info, + ctl, + offset + bytes, + old_end - (offset + bytes), + info->trim_state); WARN_ON(ret); goto out; } @@ -2630,6 +2673,7 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 ret = 0; u64 align_gap = 0; u64 align_gap_len = 0; + enum btrfs_trim_state align_gap_trim_state = BTRFS_TRIM_STATE_UNTRIMMED; spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, @@ -2646,6 +2690,7 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, unlink_free_space(ctl, entry); align_gap_len = offset - entry->offset; align_gap = entry->offset; + align_gap_trim_state = entry->trim_state; entry->offset = offset + bytes; WARN_ON(entry->bytes < bytes + align_gap_len); @@ -2661,7 +2706,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, if (align_gap_len) __btrfs_add_free_space(block_group->fs_info, ctl, - align_gap, align_gap_len); + align_gap, align_gap_len, + align_gap_trim_state); return ret; } diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 39c32c8fc24f..98a568dbd5e7 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -6,6 +6,14 @@ #ifndef BTRFS_FREE_SPACE_CACHE_H #define BTRFS_FREE_SPACE_CACHE_H +/* + * This is the trim state of an extent or bitmap. + */ +enum btrfs_trim_state { + BTRFS_TRIM_STATE_TRIMMED, + BTRFS_TRIM_STATE_UNTRIMMED, +}; + struct btrfs_free_space { struct rb_node offset_index; u64 offset; @@ -13,8 +21,14 @@ struct btrfs_free_space { u64 max_extent_size; unsigned long *bitmap; struct list_head list; + enum btrfs_trim_state trim_state; }; +static inline bool btrfs_free_space_trimmed(struct btrfs_free_space *info) +{ + return (info->trim_state == BTRFS_TRIM_STATE_TRIMMED); +} + struct btrfs_free_space_ctl { spinlock_t tree_lock; struct rb_root free_space_offset; @@ -84,7 +98,8 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root, void btrfs_init_free_space_ctl(struct btrfs_block_group_cache *block_group); int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, - u64 bytenr, u64 size); + u64 bytenr, u64 size, + enum btrfs_trim_state trim_state); int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 63cad7865d75..00e225de4fe6 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -107,7 +107,7 @@ static int caching_kthread(void *data) if (last != (u64)-1 && last + 1 != key.objectid) { __btrfs_add_free_space(fs_info, ctl, last + 1, - key.objectid - last - 1); + key.objectid - last - 1, 0); wake_up(&root->ino_cache_wait); } @@ -118,7 +118,7 @@ static int caching_kthread(void *data) if (last < root->highest_objectid - 1) { __btrfs_add_free_space(fs_info, ctl, last + 1, - root->highest_objectid - last - 1); + root->highest_objectid - last - 1, 0); } spin_lock(&root->ino_cache_lock); @@ -175,7 +175,8 @@ static void start_caching(struct btrfs_root *root) ret = btrfs_find_free_objectid(root, &objectid); if (!ret && objectid <= BTRFS_LAST_FREE_OBJECTID) { __btrfs_add_free_space(fs_info, ctl, objectid, - BTRFS_LAST_FREE_OBJECTID - objectid + 1); + BTRFS_LAST_FREE_OBJECTID - objectid + 1, + 0); wake_up(&root->ino_cache_wait); } @@ -221,7 +222,7 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid) return; again: if (root->ino_cache_state == BTRFS_CACHE_FINISHED) { - __btrfs_add_free_space(fs_info, pinned, objectid, 1); + __btrfs_add_free_space(fs_info, pinned, objectid, 1, 0); } else { down_write(&fs_info->commit_root_sem); spin_lock(&root->ino_cache_lock); @@ -234,7 +235,7 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid) start_caching(root); - __btrfs_add_free_space(fs_info, pinned, objectid, 1); + __btrfs_add_free_space(fs_info, pinned, objectid, 1, 0); up_write(&fs_info->commit_root_sem); } @@ -281,7 +282,7 @@ void btrfs_unpin_free_ino(struct btrfs_root *root) spin_unlock(rbroot_lock); if (count) __btrfs_add_free_space(root->fs_info, ctl, - info->offset, count); + info->offset, count, 0); kmem_cache_free(btrfs_free_space_cachep, info); } } From patchwork Wed Oct 23 22:52:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207961 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2CB7B1390 for ; Wed, 23 Oct 2019 22:53:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0B73C2084C for ; Wed, 23 Oct 2019 22:53:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871207; bh=3+pZvGRQFmXCiC73hqwwme4OCjzbslsacMnPOBZvY6k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=NQb6SOHU3VjMhYuLLxp103GGJkR9CkyjamelpFJSopI1ic6uA79sLKaYGT8s6Vcbc vzKzPnDlBQStEUMxCKGVULW8DA6KrC4HqiNUxGo4PEC8wxytRg2N8N3jfeGdUkMBds Q83u8NjDMKa/QLgYOA92aFExm71QN4F2dAIypbzo= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436904AbfJWWxZ (ORCPT ); Wed, 23 Oct 2019 18:53:25 -0400 Received: from mail-qt1-f182.google.com ([209.85.160.182]:39978 "EHLO mail-qt1-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436897AbfJWWxZ (ORCPT ); Wed, 23 Oct 2019 18:53:25 -0400 Received: by mail-qt1-f182.google.com with SMTP id o49so26907593qta.7 for ; Wed, 23 Oct 2019 15:53:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=48QLa1V0hW14gUloX8ypWjXwiWpA6zgCV7SpuWA+8WA=; b=oIJwEl/81ePvgbuwo3wJmDbRQPBiKFZgQxy4kmCQacB9EKN5tF20vJv2AAKScKVpQY m1e5i8JzP9Oe+ZMGTezqGGuBPUs7xG+7N/DCQjBfewgVLG5D45oGfmGleu2bX3ICzHUV cWSIOevCn/QI0RHlnqhLykTgbE2SYy4lmN9auI3UNAaSEVH/XeQMZYdyUil4gLs3OuwI ROMTktnQDz3WKakWoFDd+sGuJZEkP1iJTXxEfMbaFRVpZWw2DttPR8Ftthfg3kAXPld8 6PqQTt7XrL9tXCp4qngEXB/cG0wnXYWLM57Z0l94kDk3fCK60hqTcSfgaby8LXMuOQD0 tCNw== X-Gm-Message-State: APjAAAXchRlM++rbEroExJ6WzHrPSIFfzA+H2Wrd18Ej8EHmJ3oN7+8N cZ+b5LJDyeLNsJefb9NihtjNDAkf X-Google-Smtp-Source: APXvYqw2J41egc9raC/QS0gGPYntPZY5zNbhBxV4+boISQwNe3rjxbSOtpOvKeRzUBeZ2NlY4ZwIGA== X-Received: by 2002:a05:6214:12c:: with SMTP id w12mr11327044qvs.174.1571871203639; Wed, 23 Oct 2019 15:53:23 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:22 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 04/22] btrfs: keep track of cleanliness of the bitmap Date: Wed, 23 Oct 2019 18:52:58 -0400 Message-Id: <6563f9fc4cd47bade586d86c658c848f2ffeaa49.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There is a cap in btrfs in the amount of free extents that a block group can have. When it surpasses that threshold, future extents are placed into bitmaps. Instead of keeping track of if a certain bit is trimmed or not in a second bitmap, keep track of the relative state of the bitmap. With async discard, trimming bitmaps becomes a more frequent operation. As a trade off with simplicity, we keep track of if discarding a bitmap is in progress. If we fully scan a bitmap and trim as necessary, the bitmap is marked clean. This has some caveats as the min block size may skip over regions deemed too small. But this should be a reasonable trade off rather than keeping a second bitmap and making allocation paths more complex. The downside is we may overtrim, but ideally the min block size should prevent us from doing that too often and getting stuck trimming pathological cases. BTRFS_TRIM_STATE_TRIMMING is added to indicate a bitmap is in the process of being trimmed. If additional free space is added to that bitmap, the bit is cleared. A bitmap will be marked BTRFS_TRIM_STATE_TRIMMED if the trimming code was able to reach the end of it and the former is still set. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 89 +++++++++++++++++++++++++++++++++---- fs/btrfs/free-space-cache.h | 12 +++++ 2 files changed, 92 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d7f0cb961496..900b935e5997 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1975,11 +1975,18 @@ static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl, static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset, - u64 bytes) + u64 bytes, enum btrfs_trim_state trim_state) { u64 bytes_to_set = 0; u64 end; + /* + * This is a tradeoff to make bitmap trim state minimal. We mark the + * whole bitmap untrimmed if at any point we add untrimmed regions. + */ + if (trim_state == BTRFS_TRIM_STATE_UNTRIMMED) + info->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + end = info->offset + (u64)(BITS_PER_BITMAP * ctl->unit); bytes_to_set = min(end - offset, bytes); @@ -2054,10 +2061,12 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, struct btrfs_block_group_cache *block_group = NULL; int added = 0; u64 bytes, offset, bytes_added; + enum btrfs_trim_state trim_state; int ret; bytes = info->bytes; offset = info->offset; + trim_state = info->trim_state; if (!ctl->op->use_bitmap(ctl, info)) return 0; @@ -2092,8 +2101,8 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, } if (entry->offset == offset_to_bitmap(ctl, offset)) { - bytes_added = add_bytes_to_bitmap(ctl, entry, - offset, bytes); + bytes_added = add_bytes_to_bitmap(ctl, entry, offset, + bytes, trim_state); bytes -= bytes_added; offset += bytes_added; } @@ -2112,7 +2121,8 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, goto new_bitmap; } - bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes); + bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes, + trim_state); bytes -= bytes_added; offset += bytes_added; added = 0; @@ -2146,6 +2156,7 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, /* allocate the bitmap */ info->bitmap = kmem_cache_zalloc(btrfs_free_space_bitmap_cachep, GFP_NOFS); + info->trim_state = BTRFS_TRIM_STATE_TRIMMED; spin_lock(&ctl->tree_lock); if (!info->bitmap) { ret = -ENOMEM; @@ -3317,6 +3328,39 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, return ret; } +/* + * If we break out of trimming a bitmap prematurely, we should reset the + * trimming bit. In a rather contrieved case, it's possible to race here so + * reset the state to BTRFS_TRIM_STATE_UNTRIMMED. + * + * start = start of bitmap + * end = near end of bitmap + * + * Thread 1: Thread 2: + * trim_bitmaps(start) + * trim_bitmaps(end) + * end_trimming_bitmap() + * reset_trimming_bitmap() + */ +static void reset_trimming_bitmap(struct btrfs_free_space_ctl *ctl, u64 offset) +{ + struct btrfs_free_space *entry; + + spin_lock(&ctl->tree_lock); + + entry = tree_search_offset(ctl, offset, 1, 0); + if (entry) + entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + + spin_unlock(&ctl->tree_lock); +} + +static void end_trimming_bitmap(struct btrfs_free_space *entry) +{ + if (btrfs_free_space_trimming_bitmap(entry)) + entry->trim_state = BTRFS_TRIM_STATE_TRIMMED; +} + static int trim_bitmaps(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen) { @@ -3341,16 +3385,33 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, } entry = tree_search_offset(ctl, offset, 1, 0); - if (!entry) { + if (!entry || btrfs_free_space_trimmed(entry)) { spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); next_bitmap = true; goto next; } + /* + * Async discard bitmap trimming begins at by setting the start + * to be key.objectid and the offset_to_bitmap() aligns to the + * start of the bitmap. This lets us know we are fully + * scanning the bitmap rather than only some portion of it. + */ + if (start == offset) + entry->trim_state = BTRFS_TRIM_STATE_TRIMMING; + bytes = minlen; ret2 = search_bitmap(ctl, entry, &start, &bytes, false); if (ret2 || start >= end) { + /* + * This keeps the invariant that all bytes are trimmed + * if BTRFS_TRIM_STATE_TRIMMED is set on a bitmap. + */ + if (ret2 && !minlen) + end_trimming_bitmap(entry); + else + entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); next_bitmap = true; @@ -3359,6 +3420,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, bytes = min(bytes, end - start); if (bytes < minlen) { + entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); goto next; @@ -3376,18 +3438,21 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, ret = do_trimming(block_group, total_trimmed, start, bytes, start, bytes, &trim_entry); - if (ret) + if (ret) { + reset_trimming_bitmap(ctl, offset); break; + } next: if (next_bitmap) { offset += BITS_PER_BITMAP * ctl->unit; + start = offset; } else { start += bytes; - if (start >= offset + BITS_PER_BITMAP * ctl->unit) - offset += BITS_PER_BITMAP * ctl->unit; } if (fatal_signal_pending(current)) { + if (start != offset) + reset_trimming_bitmap(ctl, offset); ret = -ERESTARTSYS; break; } @@ -3441,6 +3506,7 @@ void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *block_group) int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen) { + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; int ret; *trimmed = 0; @@ -3458,6 +3524,9 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, goto out; ret = trim_bitmaps(block_group, trimmed, start, end, minlen); + /* If we ended in the middle of a bitmap, reset the trimming flag. */ + if (end % (BITS_PER_BITMAP * ctl->unit)) + reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end)); out: btrfs_put_block_group_trimming(block_group); return ret; @@ -3642,6 +3711,7 @@ int test_add_free_space_entry(struct btrfs_block_group_cache *cache, struct btrfs_free_space_ctl *ctl = cache->free_space_ctl; struct btrfs_free_space *info = NULL, *bitmap_info; void *map = NULL; + enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_TRIMMED; u64 bytes_added; int ret; @@ -3683,7 +3753,8 @@ int test_add_free_space_entry(struct btrfs_block_group_cache *cache, info = NULL; } - bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes); + bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes, + trim_state); bytes -= bytes_added; offset += bytes_added; diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 98a568dbd5e7..b9d1aad2f7e5 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -8,10 +8,16 @@ /* * This is the trim state of an extent or bitmap. + * + * BTRFS_TRIM_STATE_TRIMMING is special and used to maintain the state of a + * bitmap as we may need several trims to fully trim a single bitmap entry. + * This is reset should any free space other than trimmed space is added to the + * bitmap. */ enum btrfs_trim_state { BTRFS_TRIM_STATE_TRIMMED, BTRFS_TRIM_STATE_UNTRIMMED, + BTRFS_TRIM_STATE_TRIMMING, }; struct btrfs_free_space { @@ -29,6 +35,12 @@ static inline bool btrfs_free_space_trimmed(struct btrfs_free_space *info) return (info->trim_state == BTRFS_TRIM_STATE_TRIMMED); } +static inline bool btrfs_free_space_trimming_bitmap( + struct btrfs_free_space *info) +{ + return (info->trim_state == BTRFS_TRIM_STATE_TRIMMING); +} + struct btrfs_free_space_ctl { spinlock_t tree_lock; struct rb_root free_space_offset; From patchwork Wed Oct 23 22:52:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0BFA13B1 for ; Wed, 23 Oct 2019 22:53:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 830D120679 for ; Wed, 23 Oct 2019 22:53:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871209; bh=JmLZ4DOTIyvOn0ViZygtqr2qmxvODipJF/hWb+TRa98=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=ViF6exh8zOwHXT77r8zRP/Kg+3Qmdfqm6+jAHMfwEW4jpR3uYKeM39D06uvXNYWsQ LWdjSuZEyck9cGRLbkVxJwVcDlhyRQmQ5CKr1fLR3kY9wlQDhPA49ELvQ17IAFeU8s t1uhkj0Zgi2bK6d3JUxqZ3HK+hfVRsrl8YGRCijU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436910AbfJWWx2 (ORCPT ); Wed, 23 Oct 2019 18:53:28 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:34127 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436905AbfJWWx1 (ORCPT ); Wed, 23 Oct 2019 18:53:27 -0400 Received: by mail-qt1-f195.google.com with SMTP id e14so14969554qto.1 for ; Wed, 23 Oct 2019 15:53:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=sfFEBJUS2M6/qFNPbCwNadQB0Ks0Bi2yu6IFpfl4aRM=; b=hPXh3ezgnkpSW8VjZUusef/Fb6XXZ9yLv4pVXAZaqCGsRuR2CRUn9HUzXHC90Atm4o dI/MzW8BiKcvCYSjqYU5dOpLhXvt/sPYSHQ3zshKkW7zaul52pKI/BGKavPyXwgMuGG1 2lGHTmtx4j8U9N90UJBLZizU3NCaXdCnCA6CgyIF+g41k3IxlKyKdw5jWKqf7CeB7EMw xOgj5+Id2PFErX9oaR4Wo4p9Oj0SKlCyL7U1YO1I6CgixHfRdjvKdx0D1Ick5ltNnMz2 dzDAjH7rusz0QxrZ/J3M6MRlBHH8y9CEjcW7ZeXcZTeNdSVWWnHLon5l9J1HJKpX+Vdi XjvQ== X-Gm-Message-State: APjAAAVXjE46hlTWTFGSzX3FQxs5If8qSDBrR0pHkYlJotZ9ZpsOVY0c tp7AnsuW2nsAC/S8ydjLIhg= X-Google-Smtp-Source: APXvYqw6bWOkBvPNHSJ5wt5EJWLW01CBlGt9h6AU9BFtgY3sHDbPMHd8TofBDhHkr67znP2wrhQLGw== X-Received: by 2002:ad4:51cb:: with SMTP id p11mr7640249qvq.81.1571871205105; Wed, 23 Oct 2019 15:53:25 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:24 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 05/22] btrfs: add the beginning of async discard, discard workqueue Date: Wed, 23 Oct 2019 18:52:59 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When discard is enabled, everytime a pinned extent is released back to the block_group's free space cache, a discard is issued for the extent. This is an overeager approach when it comes to discarding and helping the SSD maintain enough free space to prevent severe garbage collection situations. This adds the beginning of async discard. Instead of issuing a discard prior to returning it to the free space, it is just marked as untrimmed. The block_group is then added to a LRU which then feeds into a workqueue to issue discards at a much slower rate. Full discarding of unused block groups is still done and will be address in a future patch in this series. For now, we don't persist the discard state of extents and bitmaps. Therefore, our failure recovery mode will be to consider extents untrimmed. This lets us handle failure and unmounting as one in the same. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/Makefile | 2 +- fs/btrfs/block-group.c | 4 + fs/btrfs/block-group.h | 9 ++ fs/btrfs/ctree.h | 21 +++ fs/btrfs/discard.c | 274 ++++++++++++++++++++++++++++++++++++ fs/btrfs/discard.h | 28 ++++ fs/btrfs/disk-io.c | 15 +- fs/btrfs/extent-tree.c | 4 + fs/btrfs/free-space-cache.c | 35 ++++- fs/btrfs/super.c | 35 ++++- 10 files changed, 417 insertions(+), 10 deletions(-) create mode 100644 fs/btrfs/discard.c create mode 100644 fs/btrfs/discard.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 82200dbca5ac..9a0ff3384381 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -11,7 +11,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \ - block-rsv.o delalloc-space.o block-group.o + block-rsv.o delalloc-space.o block-group.o discard.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index afe86028246a..8bbbe7488328 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -14,6 +14,7 @@ #include "sysfs.h" #include "tree-log.h" #include "delalloc-space.h" +#include "discard.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1273,6 +1274,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) } spin_unlock(&fs_info->unused_bgs_lock); + btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group); + mutex_lock(&fs_info->delete_unused_bgs_mutex); /* Don't want to race with allocators so take the groups_sem */ @@ -1622,6 +1625,7 @@ static struct btrfs_block_group_cache *btrfs_create_block_group_cache( INIT_LIST_HEAD(&cache->cluster_list); INIT_LIST_HEAD(&cache->bg_list); INIT_LIST_HEAD(&cache->ro_list); + INIT_LIST_HEAD(&cache->discard_list); INIT_LIST_HEAD(&cache->dirty_list); INIT_LIST_HEAD(&cache->io_list); btrfs_init_free_space_ctl(cache); diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index c391800388dd..633dce5b9d57 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -115,7 +115,11 @@ struct btrfs_block_group_cache { /* For read-only block groups */ struct list_head ro_list; + /* For discard operations */ atomic_t trimming; + struct list_head discard_list; + int discard_index; + u64 discard_eligible_time; /* For dirty block groups */ struct list_head dirty_list; @@ -157,6 +161,11 @@ struct btrfs_block_group_cache { struct btrfs_full_stripe_locks_tree full_stripe_locks_root; }; +static inline u64 btrfs_block_group_end(struct btrfs_block_group_cache *cache) +{ + return (cache->key.objectid + cache->key.offset); +} + #ifdef CONFIG_BTRFS_DEBUG static inline int btrfs_should_fragment_free_space( struct btrfs_block_group_cache *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1877586576aa..efa8390e8419 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -438,6 +438,21 @@ struct btrfs_full_stripe_locks_tree { struct mutex lock; }; +/* Discard control. */ +/* + * Async discard uses multiple lists to differentiate the discard filter + * parameters. + */ +#define BTRFS_NR_DISCARD_LISTS 1 + +struct btrfs_discard_ctl { + struct workqueue_struct *discard_workers; + struct delayed_work work; + spinlock_t lock; + struct btrfs_block_group_cache *cache; + struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; +}; + /* delayed seq elem */ struct seq_list { struct list_head list; @@ -524,6 +539,9 @@ enum { * so we don't need to offload checksums to workqueues. */ BTRFS_FS_CSUM_IMPL_FAST, + + /* Indicate that the discard workqueue can service discards. */ + BTRFS_FS_DISCARD_RUNNING, }; struct btrfs_fs_info { @@ -817,6 +835,8 @@ struct btrfs_fs_info { struct btrfs_workqueue *scrub_wr_completion_workers; struct btrfs_workqueue *scrub_parity_workers; + struct btrfs_discard_ctl discard_ctl; + #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY u32 check_integrity_print_mask; #endif @@ -1190,6 +1210,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info) #define BTRFS_MOUNT_FREE_SPACE_TREE (1 << 26) #define BTRFS_MOUNT_NOLOGREPLAY (1 << 27) #define BTRFS_MOUNT_REF_VERIFY (1 << 28) +#define BTRFS_MOUNT_DISCARD_ASYNC (1 << 29) #define BTRFS_DEFAULT_COMMIT_INTERVAL (30) #define BTRFS_DEFAULT_MAX_INLINE (2048) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c new file mode 100644 index 000000000000..0a72a1902ca6 --- /dev/null +++ b/fs/btrfs/discard.c @@ -0,0 +1,274 @@ +/* + * Copyright (C) 2019 Facebook. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include "ctree.h" +#include "block-group.h" +#include "discard.h" +#include "free-space-cache.h" + +/* This is an initial delay to give some chance for lba reuse. */ +#define BTRFS_DISCARD_DELAY (120ULL * NSEC_PER_SEC) + +static struct list_head *btrfs_get_discard_list( + struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + return &discard_ctl->discard_list[cache->discard_index]; +} + +void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + spin_lock(&discard_ctl->lock); + + if (list_empty(&cache->discard_list)) + cache->discard_eligible_time = (ktime_get_ns() + + BTRFS_DISCARD_DELAY); + + list_move_tail(&cache->discard_list, + btrfs_get_discard_list(discard_ctl, cache)); + + spin_unlock(&discard_ctl->lock); +} + +static bool remove_from_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + bool running = false; + + spin_lock(&discard_ctl->lock); + + if (cache == discard_ctl->cache) { + running = true; + discard_ctl->cache = NULL; + } + + cache->discard_eligible_time = 0; + list_del_init(&cache->discard_list); + + spin_unlock(&discard_ctl->lock); + + return running; +} + +/** + * find_next_cache - find cache that's up next for discarding + * @discard_ctl: discard control + * @now: current time + * + * Iterate over the discard lists to find the next block_group up for + * discarding checking the discard_eligible_time of block_group. + */ +static struct btrfs_block_group_cache *find_next_cache( + struct btrfs_discard_ctl *discard_ctl, + u64 now) +{ + struct btrfs_block_group_cache *ret_cache = NULL, *cache; + int i; + + for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) { + struct list_head *discard_list = &discard_ctl->discard_list[i]; + + if (!list_empty(discard_list)) { + cache = list_first_entry(discard_list, + struct btrfs_block_group_cache, + discard_list); + + if (!ret_cache) + ret_cache = cache; + + if (ret_cache->discard_eligible_time < now) + break; + + if (ret_cache->discard_eligible_time > + cache->discard_eligible_time) + ret_cache = cache; + } + } + + return ret_cache; +} + +/** + * peek_discard_list - wrap find_next_cache() + * @discard_ctl: discard control + * + * This wraps find_next_cache() and sets the cache to be in use. + */ +static struct btrfs_block_group_cache *peek_discard_list( + struct btrfs_discard_ctl *discard_ctl) +{ + struct btrfs_block_group_cache *cache; + u64 now = ktime_get_ns(); + + spin_lock(&discard_ctl->lock); + + cache = find_next_cache(discard_ctl, now); + + if (cache && now < cache->discard_eligible_time) + cache = NULL; + + discard_ctl->cache = cache; + + spin_unlock(&discard_ctl->lock); + + return cache; +} + +/** + * btrfs_discard_cancel_work - remove a block_group from the discard lists + * @discard_ctl: discard control + * @cache: block_group of interest + * + * This removes @cache from the discard lists. If necessary, it waits on the + * current work and then reschedules the delayed work. + */ +void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + if (remove_from_discard_list(discard_ctl, cache)) { + cancel_delayed_work_sync(&discard_ctl->work); + btrfs_discard_schedule_work(discard_ctl, true); + } +} + +/** + * btrfs_discard_queue_work - handles queuing the block_groups + * @discard_ctl: discard control + * @cache: block_group of interest + * + * This maintains the LRU order of the discard lists. + */ +void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + return; + + btrfs_add_to_discard_list(discard_ctl, cache); + if (!delayed_work_pending(&discard_ctl->work)) + btrfs_discard_schedule_work(discard_ctl, false); +} + +/** + * btrfs_discard_schedule_work - responsible for scheduling the discard work + * @discard_ctl: discard control + * @override: override the current timer + * + * Discards are issued by a delayed workqueue item. @override is used to + * update the current delay as the baseline delay interview is reevaluated + * on transaction commit. This is also maxed with any other rate limit. + */ +void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, + bool override) +{ + struct btrfs_block_group_cache *cache; + u64 now = ktime_get_ns(); + + spin_lock(&discard_ctl->lock); + + if (!btrfs_run_discard_work(discard_ctl)) + goto out; + + if (!override && delayed_work_pending(&discard_ctl->work)) + goto out; + + cache = find_next_cache(discard_ctl, now); + if (cache) { + u64 delay = 0; + + if (now < cache->discard_eligible_time) + delay = nsecs_to_jiffies(cache->discard_eligible_time - + now); + + mod_delayed_work(discard_ctl->discard_workers, + &discard_ctl->work, + delay); + } + +out: + spin_unlock(&discard_ctl->lock); +} + +/** + * btrfs_discard_workfn - discard work function + * @work: work + * + * This finds the next cache to start discarding and then discards it. + */ +static void btrfs_discard_workfn(struct work_struct *work) +{ + struct btrfs_discard_ctl *discard_ctl; + struct btrfs_block_group_cache *cache; + u64 trimmed = 0; + + discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); + + cache = peek_discard_list(discard_ctl); + if (!cache || !btrfs_run_discard_work(discard_ctl)) + return; + + btrfs_trim_block_group(cache, &trimmed, cache->key.objectid, + btrfs_block_group_end(cache), 0); + + remove_from_discard_list(discard_ctl, cache); + + btrfs_discard_schedule_work(discard_ctl, false); +} + +/** + * btrfs_run_discard_work - determines if async discard should be running + * @discard_ctl: discard control + * + * Checks if the file system is writeable and BTRFS_FS_DISCARD_RUNNING is set. + */ +bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl) +{ + struct btrfs_fs_info *fs_info = container_of(discard_ctl, + struct btrfs_fs_info, + discard_ctl); + + return (!(fs_info->sb->s_flags & SB_RDONLY) && + test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags)); +} + +void btrfs_discard_resume(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_test_opt(fs_info, DISCARD_ASYNC)) { + btrfs_discard_cleanup(fs_info); + return; + } + + set_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags); +} + +void btrfs_discard_stop(struct btrfs_fs_info *fs_info) +{ + clear_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags); +} + +void btrfs_discard_init(struct btrfs_fs_info *fs_info) +{ + struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl; + int i; + + spin_lock_init(&discard_ctl->lock); + + INIT_DELAYED_WORK(&discard_ctl->work, btrfs_discard_workfn); + + for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) + INIT_LIST_HEAD(&discard_ctl->discard_list[i]); +} + +void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) +{ + btrfs_discard_stop(fs_info); + cancel_delayed_work_sync(&fs_info->discard_ctl.work); +} diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h new file mode 100644 index 000000000000..48b4710a80d0 --- /dev/null +++ b/fs/btrfs/discard.h @@ -0,0 +1,28 @@ +/* + * Copyright (C) 2019 Facebook. All rights reserved. + */ + +#ifndef BTRFS_DISCARD_H +#define BTRFS_DISCARD_H + +struct btrfs_fs_info; +struct btrfs_discard_ctl; +struct btrfs_block_group_cache; + +void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache); + +void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache); +void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache); +void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, + bool override); +bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl); + +void btrfs_discard_resume(struct btrfs_fs_info *fs_info); +void btrfs_discard_stop(struct btrfs_fs_info *fs_info); +void btrfs_discard_init(struct btrfs_fs_info *fs_info); +void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info); + +#endif diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 044981cf6df9..a304ec972f67 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -41,6 +41,7 @@ #include "tree-checker.h" #include "ref-verify.h" #include "block-group.h" +#include "discard.h" #define BTRFS_SUPER_FLAG_SUPP (BTRFS_HEADER_FLAG_WRITTEN |\ BTRFS_HEADER_FLAG_RELOC |\ @@ -2009,6 +2010,8 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->flush_workers); btrfs_destroy_workqueue(fs_info->qgroup_rescan_workers); btrfs_destroy_workqueue(fs_info->extent_workers); + if (fs_info->discard_ctl.discard_workers) + destroy_workqueue(fs_info->discard_ctl.discard_workers); /* * Now that all other work queues are destroyed, we can safely destroy * the queues used for metadata I/O, since tasks from those other work @@ -2218,6 +2221,8 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, btrfs_alloc_workqueue(fs_info, "extent-refs", flags, min_t(u64, fs_devices->num_devices, max_active), 8); + fs_info->discard_ctl.discard_workers = + alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1); if (!(fs_info->workers && fs_info->delalloc_workers && fs_info->submit_workers && fs_info->flush_workers && @@ -2229,7 +2234,8 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->caching_workers && fs_info->readahead_workers && fs_info->fixup_workers && fs_info->delayed_workers && fs_info->extent_workers && - fs_info->qgroup_rescan_workers)) { + fs_info->qgroup_rescan_workers && + fs_info->discard_ctl.discard_workers)) { return -ENOMEM; } @@ -2772,6 +2778,8 @@ int open_ctree(struct super_block *sb, btrfs_init_dev_replace_locks(fs_info); btrfs_init_qgroup(fs_info); + btrfs_discard_init(fs_info); + btrfs_init_free_cluster(&fs_info->meta_alloc_cluster); btrfs_init_free_cluster(&fs_info->data_alloc_cluster); @@ -3284,6 +3292,8 @@ int open_ctree(struct super_block *sb, btrfs_qgroup_rescan_resume(fs_info); + btrfs_discard_resume(fs_info); + if (!fs_info->uuid_root) { btrfs_info(fs_info, "creating UUID tree"); ret = btrfs_create_uuid_tree(fs_info); @@ -3993,6 +4003,9 @@ void close_ctree(struct btrfs_fs_info *fs_info) */ kthread_park(fs_info->cleaner_kthread); + /* cancel or finish ongoing work */ + btrfs_discard_cleanup(fs_info); + /* wait for the qgroup rescan worker to stop */ btrfs_qgroup_wait_for_completion(fs_info, false); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6a40bba3cb19..de00fd6e338b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -32,6 +32,7 @@ #include "block-rsv.h" #include "delalloc-space.h" #include "block-group.h" +#include "discard.h" #undef SCRAMBLE_DELAYED_REFS @@ -2920,6 +2921,9 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) cond_resched(); } + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_discard_schedule_work(&fs_info->discard_ctl, true); + /* * Transaction is finished. We don't need the lock anymore. We * do need to clean up the block groups in case of a transaction diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 900b935e5997..8120630e4439 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -21,6 +21,7 @@ #include "space-info.h" #include "delalloc-space.h" #include "block-group.h" +#include "discard.h" #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) #define MAX_CACHE_BYTES_PER_GIG SZ_32K @@ -750,9 +751,11 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, /* * Sync discard ensures that the free space cache is always * trimmed. So when reading this in, the state should reflect - * that. + * that. We also do this for async as a stop gap for lack of + * persistence. */ - if (btrfs_test_opt(fs_info, DISCARD_SYNC)) + if (btrfs_test_opt(fs_info, DISCARD_SYNC) || + btrfs_test_opt(fs_info, DISCARD_ASYNC)) e->trim_state = BTRFS_TRIM_STATE_TRIMMED; if (!e->bytes) { @@ -2379,6 +2382,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, u64 offset, u64 bytes, enum btrfs_trim_state trim_state) { + struct btrfs_block_group_cache *cache = ctl->private; struct btrfs_free_space *info; int ret = 0; @@ -2428,6 +2432,9 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, ASSERT(ret != -EEXIST); } + if (trim_state != BTRFS_TRIM_STATE_TRIMMED) + btrfs_discard_queue_work(&fs_info->discard_ctl, cache); + return ret; } @@ -3201,6 +3208,7 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster *cluster) static int do_trimming(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 bytes, u64 reserved_start, u64 reserved_bytes, + enum btrfs_trim_state reserved_trim_state, struct btrfs_trim_range *trim_entry) { struct btrfs_space_info *space_info = block_group->space_info; @@ -3208,6 +3216,9 @@ static int do_trimming(struct btrfs_block_group_cache *block_group, struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; int ret; int update = 0; + u64 end = start + bytes; + u64 reserved_end = reserved_start + reserved_bytes; + enum btrfs_trim_state trim_state; u64 trimmed = 0; spin_lock(&space_info->lock); @@ -3221,11 +3232,20 @@ static int do_trimming(struct btrfs_block_group_cache *block_group, spin_unlock(&space_info->lock); ret = btrfs_discard_extent(fs_info, start, bytes, &trimmed); - if (!ret) + if (!ret) { *total_trimmed += trimmed; + trim_state = BTRFS_TRIM_STATE_TRIMMED; + } mutex_lock(&ctl->cache_writeout_mutex); - btrfs_add_free_space(block_group, reserved_start, reserved_bytes); + if (reserved_start < start) + __btrfs_add_free_space(fs_info, ctl, reserved_start, + start - reserved_start, + reserved_trim_state); + if (start + bytes < reserved_start + reserved_bytes) + __btrfs_add_free_space(fs_info, ctl, end, reserved_end - end, + reserved_trim_state); + __btrfs_add_free_space(fs_info, ctl, start, bytes, trim_state); list_del(&trim_entry->list); mutex_unlock(&ctl->cache_writeout_mutex); @@ -3252,6 +3272,7 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, int ret = 0; u64 extent_start; u64 extent_bytes; + enum btrfs_trim_state extent_trim_state; u64 bytes; while (start < end) { @@ -3293,6 +3314,7 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, extent_start = entry->offset; extent_bytes = entry->bytes; + extent_trim_state = entry->trim_state; start = max(start, extent_start); bytes = min(extent_start + extent_bytes, end) - start; if (bytes < minlen) { @@ -3311,7 +3333,8 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, mutex_unlock(&ctl->cache_writeout_mutex); ret = do_trimming(block_group, total_trimmed, start, bytes, - extent_start, extent_bytes, &trim_entry); + extent_start, extent_bytes, extent_trim_state, + &trim_entry); if (ret) break; next: @@ -3437,7 +3460,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, mutex_unlock(&ctl->cache_writeout_mutex); ret = do_trimming(block_group, total_trimmed, start, bytes, - start, bytes, &trim_entry); + start, bytes, 0, &trim_entry); if (ret) { reset_trimming_bitmap(ctl, offset); break; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index a02fece949cb..7a1bd85e1981 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -46,6 +46,7 @@ #include "sysfs.h" #include "tests/btrfs-tests.h" #include "block-group.h" +#include "discard.h" #include "qgroup.h" #define CREATE_TRACE_POINTS @@ -146,6 +147,8 @@ void __btrfs_handle_fs_error(struct btrfs_fs_info *fs_info, const char *function if (sb_rdonly(sb)) return; + btrfs_discard_stop(fs_info); + /* btrfs handle error by forcing the filesystem readonly */ sb->s_flags |= SB_RDONLY; btrfs_info(fs_info, "forced readonly"); @@ -313,6 +316,7 @@ enum { Opt_datasum, Opt_nodatasum, Opt_defrag, Opt_nodefrag, Opt_discard, Opt_nodiscard, + Opt_discard_mode, Opt_nologreplay, Opt_norecovery, Opt_ratio, @@ -376,6 +380,7 @@ static const match_table_t tokens = { {Opt_nodefrag, "noautodefrag"}, {Opt_discard, "discard"}, {Opt_nodiscard, "nodiscard"}, + {Opt_discard_mode, "discard=%s"}, {Opt_nologreplay, "nologreplay"}, {Opt_norecovery, "norecovery"}, {Opt_ratio, "metadata_ratio=%u"}, @@ -695,12 +700,26 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, info->metadata_ratio); break; case Opt_discard: - btrfs_set_and_info(info, DISCARD_SYNC, - "turning on sync discard"); + case Opt_discard_mode: + if (token == Opt_discard || + strcmp(args[0].from, "sync") == 0) { + btrfs_clear_opt(info->mount_opt, DISCARD_ASYNC); + btrfs_set_and_info(info, DISCARD_SYNC, + "turning on sync discard"); + } else if (strcmp(args[0].from, "async") == 0) { + btrfs_clear_opt(info->mount_opt, DISCARD_SYNC); + btrfs_set_and_info(info, DISCARD_ASYNC, + "turning on async discard"); + } else { + ret = -EINVAL; + goto out; + } break; case Opt_nodiscard: btrfs_clear_and_info(info, DISCARD_SYNC, "turning off discard"); + btrfs_clear_and_info(info, DISCARD_ASYNC, + "turning off async discard"); break; case Opt_space_cache: case Opt_space_cache_version: @@ -1324,6 +1343,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",flushoncommit"); if (btrfs_test_opt(info, DISCARD_SYNC)) seq_puts(seq, ",discard"); + if (btrfs_test_opt(info, DISCARD_ASYNC)) + seq_puts(seq, ",discard=async"); if (!(info->sb->s_flags & SB_POSIXACL)) seq_puts(seq, ",noacl"); if (btrfs_test_opt(info, SPACE_CACHE)) @@ -1714,6 +1735,14 @@ static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info, btrfs_cleanup_defrag_inodes(fs_info); } + /* If we toggled discard async. */ + if (!btrfs_raw_test_opt(old_opts, DISCARD_ASYNC) && + btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_discard_resume(fs_info); + else if (btrfs_raw_test_opt(old_opts, DISCARD_ASYNC) && + !btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_discard_cleanup(fs_info); + clear_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state); } @@ -1761,6 +1790,8 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) */ cancel_work_sync(&fs_info->async_reclaim_work); + btrfs_discard_cleanup(fs_info); + /* wait for the uuid_scan task to finish */ down(&fs_info->uuid_tree_rescan_sem); /* avoid complains from lockdep et al. */ From patchwork Wed Oct 23 22:53:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207963 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC0BC1390 for ; Wed, 23 Oct 2019 22:53:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A7C3920663 for ; Wed, 23 Oct 2019 22:53:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871208; bh=txLWs2RTUJYTBtllJ+5/F32BidQVn7aX0d3hsU3m3d4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=lyMc0xLBeXlgYf1wYDffyu4ZjEvu0IW5pD8P73Gr667AiblHQLlJtwWgOZ/K0y5eX QUOY8DZU8dkRqlvOgvSKDdz98g7HSoicR9eEv2TR9+/wKMINJR/vdM1GfjMbSbcw/Z XYWWgfmW34OyBjY5gQy9VxVmVMFTP3vT6+Pj9B4c= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436912AbfJWWx2 (ORCPT ); Wed, 23 Oct 2019 18:53:28 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:45406 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436901AbfJWWx1 (ORCPT ); Wed, 23 Oct 2019 18:53:27 -0400 Received: by mail-qt1-f194.google.com with SMTP id c21so34694124qtj.12 for ; Wed, 23 Oct 2019 15:53:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=UcCS18MW6I27O91LZcZBlA1ZkOOabi+x/dQBqSBv9EI=; b=gG5h6sI4Or6K4PBuATsqOM5icmHF9BDuWggOsUGMhCPBUcM3bClUvyejF+ChJJA9MS yhicHemZalIY7hteamJHddffv36yUvRd1vapbNhrb/ktl/IyWRZi01n+6sY9sEvHcfXr BFf3wxqnJq5MQRndHnmjqE7lPrMBSidCGzAZtRz+7Sxnhj+5QkNTRmA41U1Cldhy716e /eDlzfgbuEuuJmJ2uonHsr+z1PDx69GzuIhResPYrTgH3XjuzHe7wdyTQEABdr+SN5da JdjY/8L/jcOSVRqJ2WBuDYW+t6IOlYXOYjn9UKf//mezFdT5xuHXPmAKAfJXFUbhJBTH hvqA== X-Gm-Message-State: APjAAAWtR140M/1NErX5pJPpGHnqSwut1HK4S3+wXvYWAjj2fpMtDcsX q/ISbDCVjLNEnpWicIgNNQs= X-Google-Smtp-Source: APXvYqzzqJcyBDWqsC78eiXqNiUlJfi7+hJP58Vk4S+351SZoDKCZVQ6rpX7wvtg//V5cZJNVAISEw== X-Received: by 2002:a05:6214:1812:: with SMTP id o18mr3970896qvw.33.1571871206102; Wed, 23 Oct 2019 15:53:26 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:25 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 06/22] btrfs: handle empty block_group removal Date: Wed, 23 Oct 2019 18:53:00 -0400 Message-Id: <2232e97f78d01b39e48454844c7a462b1b7b7cb8.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org block_group removal is a little tricky. It can race with the extent allocator, the cleaner thread, and balancing. The current path is for a block_group to be added to the unused_bgs list. Then, when the cleaner thread comes around, it starts a transaction and then proceeds with removing the block_group. Extents that are pinned are subsequently removed from the pinned trees and then eventually a discard is issued for the entire block_group. Async discard introduces another player into the game, the discard workqueue. While it has none of the racing issues, the new problem is ensuring we don't leave free space untrimmed prior to forgetting the block_group. This is handled by placing fully free block_groups on a separate discard queue. This is necessary to maintain discarding order as in the future we will slowly trim even fully free block_groups. The ordering helps us make progress on the same block_group rather than say the last fully freed block_group or needing to search through the fully freed block groups at the beginning of a list and insert after. The new order of events is a fully freed block group gets placed on the unused discard queue first. Once it's processed, it will be placed on the unusued_bgs list and then the original sequence of events will happen, just without the final whole block_group discard. The mount flags can change when processing unused_bgs, so when flipping from DISCARD to DISCARD_ASYNC, the unused_bgs must be punted to the discard_list to be trimmed. If we flip off DISCARD_ASYNC, we punt free block groups on the discard_list to the unused_bg queue which will do the final discard for us. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 50 ++++++++++++++-- fs/btrfs/ctree.h | 9 ++- fs/btrfs/discard.c | 112 +++++++++++++++++++++++++++++++++++- fs/btrfs/discard.h | 6 ++ fs/btrfs/free-space-cache.c | 36 ++++++++++++ fs/btrfs/free-space-cache.h | 1 + fs/btrfs/scrub.c | 7 ++- 7 files changed, 211 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 8bbbe7488328..b447a7c5ac34 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1251,6 +1251,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) struct btrfs_block_group_cache *block_group; struct btrfs_space_info *space_info; struct btrfs_trans_handle *trans; + bool async_trim_enabled = btrfs_test_opt(fs_info, DISCARD_ASYNC); int ret = 0; if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) @@ -1260,6 +1261,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) while (!list_empty(&fs_info->unused_bgs)) { u64 start, end; int trimming; + bool async_trimmed; block_group = list_first_entry(&fs_info->unused_bgs, struct btrfs_block_group_cache, @@ -1281,10 +1283,24 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) /* Don't want to race with allocators so take the groups_sem */ down_write(&space_info->groups_sem); spin_lock(&block_group->lock); + + /* + * Async discard moves the final block group discard to be prior + * to the unused_bgs code path. Therefore, if it's not fully + * trimmed, punt it back to the async discard lists. + */ + async_trimmed = (!btrfs_test_opt(fs_info, DISCARD_ASYNC) || + btrfs_is_free_space_trimmed(block_group)); + if (block_group->reserved || block_group->pinned || btrfs_block_group_used(&block_group->item) || block_group->ro || - list_is_singular(&block_group->list)) { + list_is_singular(&block_group->list) || + !async_trimmed) { + /* Requeue if we failed because of async discard. */ + if (!async_trimmed) + btrfs_discard_queue_work(&fs_info->discard_ctl, + block_group); /* * We want to bail if we made new allocations or have * outstanding allocations in this block group. We do @@ -1367,6 +1383,17 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); + /* + * The normal path here is an unused block group is passed here, + * then trimming is handled in the transaction commit path. + * Async discard interposes before this to do the trimming + * before coming down the unused block group path as trimming + * will no longer be done later in the transaction commit path. + */ + if (!async_trim_enabled && + btrfs_test_opt(fs_info, DISCARD_ASYNC)) + goto flip_async; + /* DISCARD can flip during remount */ trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); @@ -1411,6 +1438,13 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_lock(&fs_info->unused_bgs_lock); } spin_unlock(&fs_info->unused_bgs_lock); + return; + +flip_async: + btrfs_end_transaction(trans); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_put_block_group(block_group); + btrfs_discard_punt_unused_bgs_list(fs_info); } void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) @@ -1618,6 +1652,8 @@ static struct btrfs_block_group_cache *btrfs_create_block_group_cache( cache->full_stripe_len = btrfs_full_stripe_len(fs_info, start); set_free_space_tree_thresholds(cache); + cache->discard_index = BTRFS_DISCARD_INDEX_UNUSED; + atomic_set(&cache->count, 1); spin_lock_init(&cache->lock); init_rwsem(&cache->data_rwsem); @@ -1829,7 +1865,11 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) inc_block_group_ro(cache, 1); } else if (btrfs_block_group_used(&cache->item) == 0) { ASSERT(list_empty(&cache->bg_list)); - btrfs_mark_bg_unused(cache); + if (btrfs_test_opt(info, DISCARD_ASYNC)) + btrfs_add_to_discard_unused_list( + &info->discard_ctl, cache); + else + btrfs_mark_bg_unused(cache); } } @@ -2724,8 +2764,10 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, * dirty list to avoid races between cleaner kthread and space * cache writeout. */ - if (!alloc && old_val == 0) - btrfs_mark_bg_unused(cache); + if (!alloc && old_val == 0) { + if (!btrfs_test_opt(info, DISCARD_ASYNC)) + btrfs_mark_bg_unused(cache); + } btrfs_put_block_group(cache); total -= num_bytes; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index efa8390e8419..e21aeb3a2266 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -441,9 +441,14 @@ struct btrfs_full_stripe_locks_tree { /* Discard control. */ /* * Async discard uses multiple lists to differentiate the discard filter - * parameters. + * parameters. Index 0 is for completely free block groups where we need to + * ensure the entire block group is trimmed without being lossy. Indices + * afterwards represent monotonically decreasing discard filter sizes to + * prioritize what should be discarded next. */ -#define BTRFS_NR_DISCARD_LISTS 1 +#define BTRFS_NR_DISCARD_LISTS 2 +#define BTRFS_DISCARD_INDEX_UNUSED 0 +#define BTRFS_DISCARD_INDEX_START 1 struct btrfs_discard_ctl { struct workqueue_struct *discard_workers; diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 0a72a1902ca6..5b5be658c397 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -28,9 +28,13 @@ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, { spin_lock(&discard_ctl->lock); - if (list_empty(&cache->discard_list)) + if (list_empty(&cache->discard_list) || + cache->discard_index == BTRFS_DISCARD_INDEX_UNUSED) { + if (cache->discard_index == BTRFS_DISCARD_INDEX_UNUSED) + cache->discard_index = BTRFS_DISCARD_INDEX_START; cache->discard_eligible_time = (ktime_get_ns() + BTRFS_DISCARD_DELAY); + } list_move_tail(&cache->discard_list, btrfs_get_discard_list(discard_ctl, cache)); @@ -38,6 +42,22 @@ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, spin_unlock(&discard_ctl->lock); } +void btrfs_add_to_discard_unused_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + spin_lock(&discard_ctl->lock); + + if (!list_empty(&cache->discard_list)) + list_del_init(&cache->discard_list); + + cache->discard_index = BTRFS_DISCARD_INDEX_UNUSED; + cache->discard_eligible_time = ktime_get_ns(); + list_add_tail(&cache->discard_list, + &discard_ctl->discard_list[BTRFS_DISCARD_INDEX_UNUSED]); + + spin_unlock(&discard_ctl->lock); +} + static bool remove_from_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) { @@ -152,7 +172,11 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) return; - btrfs_add_to_discard_list(discard_ctl, cache); + if (btrfs_block_group_used(&cache->item) == 0) + btrfs_add_to_discard_unused_list(discard_ctl, cache); + else + btrfs_add_to_discard_list(discard_ctl, cache); + if (!delayed_work_pending(&discard_ctl->work)) btrfs_discard_schedule_work(discard_ctl, false); } @@ -197,6 +221,27 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, spin_unlock(&discard_ctl->lock); } +/** + * btrfs_finish_discard_pass - determine next step of a block_group + * + * This determines the next step for a block group after it's finished going + * through a pass on a discard list. If it is unused and fully trimmed, we can + * mark it unused and send it to the unused_bgs path. Otherwise, pass it onto + * the appropriate filter list or let it fall off. + */ +static void btrfs_finish_discard_pass(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + remove_from_discard_list(discard_ctl, cache); + + if (btrfs_block_group_used(&cache->item) == 0) { + if (btrfs_is_free_space_trimmed(cache)) + btrfs_mark_bg_unused(cache); + else + btrfs_add_to_discard_unused_list(discard_ctl, cache); + } +} + /** * btrfs_discard_workfn - discard work function * @work: work @@ -218,7 +263,7 @@ static void btrfs_discard_workfn(struct work_struct *work) btrfs_trim_block_group(cache, &trimmed, cache->key.objectid, btrfs_block_group_end(cache), 0); - remove_from_discard_list(discard_ctl, cache); + btrfs_finish_discard_pass(discard_ctl, cache); btrfs_discard_schedule_work(discard_ctl, false); } @@ -239,6 +284,63 @@ bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl) test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags)); } +/** + * btrfs_discard_punt_unused_bgs_list - punt unused_bgs list to discard lists + * @fs_info: fs_info of interest + * + * The unused_bgs list needs to be punted to the discard lists because the + * order of operations is changed. In the normal sychronous discard path, the + * block groups are trimmed via a single large trim in transaction commit. This + * is ultimately what we are trying to avoid with asynchronous discard. Thus, + * it must be done before going down the unused_bgs path. + */ +void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info) +{ + struct btrfs_block_group_cache *cache, *next; + + spin_lock(&fs_info->unused_bgs_lock); + + /* We enabled async discard, so punt all to the queue. */ + list_for_each_entry_safe(cache, next, &fs_info->unused_bgs, bg_list) { + list_del_init(&cache->bg_list); + btrfs_add_to_discard_unused_list(&fs_info->discard_ctl, cache); + } + + spin_unlock(&fs_info->unused_bgs_lock); +} + +/** + * btrfs_discard_purge_list - purge discard lists + * @discard_ctl: discard control + * + * If we are disabling async discard, we may have intercepted block groups that + * are completely free and ready for the unused_bgs path. As discarding will + * now happen in transaction commit or not at all, we can safely mark the + * corresponding block groups as unused and they will be sent on their merry + * way to the unused_bgs list. + */ +static void btrfs_discard_purge_list(struct btrfs_discard_ctl *discard_ctl) +{ + struct btrfs_block_group_cache *cache, *next; + int i; + + spin_lock(&discard_ctl->lock); + + for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) { + list_for_each_entry_safe(cache, next, + &discard_ctl->discard_list[i], + discard_list) { + list_del_init(&cache->discard_list); + spin_unlock(&discard_ctl->lock); + if (btrfs_block_group_used(&cache->item) == 0) + btrfs_mark_bg_unused(cache); + spin_lock(&discard_ctl->lock); + } + } + + spin_unlock(&discard_ctl->lock); +} + void btrfs_discard_resume(struct btrfs_fs_info *fs_info) { if (!btrfs_test_opt(fs_info, DISCARD_ASYNC)) { @@ -246,6 +348,8 @@ void btrfs_discard_resume(struct btrfs_fs_info *fs_info) return; } + btrfs_discard_punt_unused_bgs_list(fs_info); + set_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags); } @@ -271,4 +375,6 @@ void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) { btrfs_discard_stop(fs_info); cancel_delayed_work_sync(&fs_info->discard_ctl.work); + + btrfs_discard_purge_list(&fs_info->discard_ctl); } diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 48b4710a80d0..db003a244eb7 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -9,9 +9,13 @@ struct btrfs_fs_info; struct btrfs_discard_ctl; struct btrfs_block_group_cache; +/* List operations. */ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); +void btrfs_add_to_discard_unused_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache); +/* Work operations. */ void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, @@ -20,6 +24,8 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, bool override); bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl); +/* Setup/Cleanup operations. */ +void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info); void btrfs_discard_resume(struct btrfs_fs_info *fs_info); void btrfs_discard_stop(struct btrfs_fs_info *fs_info); void btrfs_discard_init(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 8120630e4439..80a205449547 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2681,6 +2681,37 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group_cache *block_group) } +/** + * btrfs_is_free_space_trimmed - see if everything is trimmed + * @cache: block_group of interest + * + * Walk the @cache's free space rb_tree to determine if everything is trimmed. + */ +bool btrfs_is_free_space_trimmed(struct btrfs_block_group_cache *cache) +{ + struct btrfs_free_space_ctl *ctl = cache->free_space_ctl; + struct btrfs_free_space *info; + struct rb_node *node; + bool ret = true; + + spin_lock(&ctl->tree_lock); + node = rb_first(&ctl->free_space_offset); + + while (node) { + info = rb_entry(node, struct btrfs_free_space, offset_index); + + if (!btrfs_free_space_trimmed(info)) { + ret = false; + break; + } + + node = rb_next(node); + } + + spin_unlock(&ctl->tree_lock); + return ret; +} + u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 offset, u64 bytes, u64 empty_size, u64 *max_extent_size) @@ -2767,6 +2798,9 @@ int btrfs_return_cluster_to_free_space( ret = __btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&ctl->tree_lock); + btrfs_discard_queue_work(&block_group->fs_info->discard_ctl, + block_group); + /* finally drop our ref */ btrfs_put_block_group(block_group); return ret; @@ -3125,6 +3159,7 @@ int btrfs_find_space_cluster(struct btrfs_block_group_cache *block_group, u64 min_bytes; u64 cont1_bytes; int ret; + bool found_cluster = false; /* * Choose the minimum extent size we'll require for this @@ -3177,6 +3212,7 @@ int btrfs_find_space_cluster(struct btrfs_block_group_cache *block_group, list_del_init(&entry->list); if (!ret) { + found_cluster = true; atomic_inc(&block_group->count); list_add_tail(&cluster->block_group_list, &block_group->cluster_list); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index b9d1aad2f7e5..e703f9e09461 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -119,6 +119,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, void __btrfs_remove_free_space_cache(struct btrfs_free_space_ctl *ctl); void btrfs_remove_free_space_cache(struct btrfs_block_group_cache *block_group); +bool btrfs_is_free_space_trimmed(struct btrfs_block_group_cache *cache); u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 offset, u64 bytes, u64 empty_size, u64 *max_extent_size); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f7d4e03f4c5d..5abc736f965c 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -8,6 +8,7 @@ #include #include #include "ctree.h" +#include "discard.h" #include "volumes.h" #include "disk-io.h" #include "ordered-data.h" @@ -3683,7 +3684,11 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache->removed && !cache->ro && cache->reserved == 0 && btrfs_block_group_used(&cache->item) == 0) { spin_unlock(&cache->lock); - btrfs_mark_bg_unused(cache); + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_add_to_discard_unused_list( + &fs_info->discard_ctl, cache); + else + btrfs_mark_bg_unused(cache); } else { spin_unlock(&cache->lock); } From patchwork Wed Oct 23 22:53:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207969 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C7F6D1390 for ; Wed, 23 Oct 2019 22:53:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9C0D420679 for ; Wed, 23 Oct 2019 22:53:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871211; bh=BR4iDfvhlqRzhrbcMM8cVFYw3Za24gDbmJZYnDSiTUw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=aIgmUBF/WhkgHL2jU+rxGNumuYs9DZKkUvJ45/ewnlhhjp9tx6mtxN1AzPeVqD4eR 1IglzB9kmc4681faBi7riLX3iLbzcQigOsCZpzCvwm/naaOld3mMLvwgK+7B9q70Ni QVXYA2CLA2wqrOHJseSfMs3lMHnEBv/Y1YKJMnOU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436917AbfJWWx3 (ORCPT ); Wed, 23 Oct 2019 18:53:29 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:39250 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436901AbfJWWx3 (ORCPT ); Wed, 23 Oct 2019 18:53:29 -0400 Received: by mail-qt1-f193.google.com with SMTP id t8so17195953qtc.6 for ; Wed, 23 Oct 2019 15:53:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=I5D7zeCK3skC/0CQU3ERlVM5KltEkL33eq82KgwGTQA=; b=IU19R56IrelWC6ivtJMXIWr8XM6zKPSFFp1KpLg17IwJm0fDsgABxXYezkOHPIdUEq ncR+zTdJP8BkYzgzADtY1m/ne4JjLWgSKhzS5VVn4xqrolHvsVoBcdeL1UN+0pC+HhHV +qE01EDGJnBA9BV16pakIWXJxWmUeWXlCONkzKut9ZhbMo8QpVvlGdYKWurMXmrajBmz uQztnLmQ1k24EBkL+eYm7VSNRxWg9q9TcmnMqViReGLQY0XdNJyiMw0WwRpxhipS18D1 LGi87K3NpGAmPH6lLHF0q/v7EbR4IFhrEomKnlh4h+ftzMF5DCqf9sfvY+x1AhOzx/yk 0u4w== X-Gm-Message-State: APjAAAWvWnb/Sxxvay9pKNZbGnq1BUIg0o/bY/32qEAe6LmQZZRzwM3f ZouHfo7hDaNqSsD4u0zqYwo= X-Google-Smtp-Source: APXvYqxo9wfIL5IbYLx2wIg5Y6Zvek6lal+8X7EA0JWDi57CyL8MxNYEBxbCOwtMp2JGQM+OiY2utQ== X-Received: by 2002:a0c:8b57:: with SMTP id d23mr11895806qvc.159.1571871207695; Wed, 23 Oct 2019 15:53:27 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:26 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 07/22] btrfs: discard one region at a time in async discard Date: Wed, 23 Oct 2019 18:53:01 -0400 Message-Id: <6ab77d726e93f19b26858ca8c2248ae249701e71.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The prior two patches added discarding via a background workqueue. This just piggybacked off of the fstrim code to trim the whole block at once. Well inevitably this is worse performance wise and will aggressively overtrim. But it was nice to plumb the other infrastructure to keep the patches easier to review. This adds the real goal of this series which is discarding slowly (ie a slow long running fstrim). The discarding is split into two phases, extents and then bitmaps. The reason for this is two fold. First, the bitmap regions overlap the extent regions. Second, discarding the extents first will let the newly trimmed bitmaps have the highest chance of coalescing when being readded to the free space cache. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 15 +++++ fs/btrfs/discard.c | 79 ++++++++++++++++++---- fs/btrfs/free-space-cache.c | 130 ++++++++++++++++++++++++++++-------- fs/btrfs/free-space-cache.h | 6 ++ 4 files changed, 188 insertions(+), 42 deletions(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 633dce5b9d57..88266cc16c07 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -12,6 +12,19 @@ enum btrfs_disk_cache_state { BTRFS_DC_SETUP, }; +/* + * This describes the state of the block_group for async discard. This is due + * to the two pass nature of it where extent discarding is prioritized over + * bitmap discarding. BTRFS_DISCARD_RESET_CURSOR is set when we are resetting + * between lists to prevent contention for discard state variables + * (eg discard_cursor). + */ +enum btrfs_discard_state { + BTRFS_DISCARD_EXTENTS, + BTRFS_DISCARD_BITMAPS, + BTRFS_DISCARD_RESET_CURSOR, +}; + /* * Control flags for do_chunk_alloc's force field CHUNK_ALLOC_NO_FORCE means to * only allocate a chunk if we really need one. @@ -120,6 +133,8 @@ struct btrfs_block_group_cache { struct list_head discard_list; int discard_index; u64 discard_eligible_time; + u64 discard_cursor; + enum btrfs_discard_state discard_state; /* For dirty block groups */ struct list_head dirty_list; diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 5b5be658c397..e50728061658 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -23,21 +23,28 @@ static struct list_head *btrfs_get_discard_list( return &discard_ctl->discard_list[cache->discard_index]; } -void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, - struct btrfs_block_group_cache *cache) +static void __btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) { - spin_lock(&discard_ctl->lock); - if (list_empty(&cache->discard_list) || cache->discard_index == BTRFS_DISCARD_INDEX_UNUSED) { if (cache->discard_index == BTRFS_DISCARD_INDEX_UNUSED) cache->discard_index = BTRFS_DISCARD_INDEX_START; cache->discard_eligible_time = (ktime_get_ns() + BTRFS_DISCARD_DELAY); + cache->discard_state = BTRFS_DISCARD_RESET_CURSOR; } list_move_tail(&cache->discard_list, btrfs_get_discard_list(discard_ctl, cache)); +} + +void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + spin_lock(&discard_ctl->lock); + + __btrfs_add_to_discard_list(discard_ctl, cache); spin_unlock(&discard_ctl->lock); } @@ -52,6 +59,7 @@ void btrfs_add_to_discard_unused_list(struct btrfs_discard_ctl *discard_ctl, cache->discard_index = BTRFS_DISCARD_INDEX_UNUSED; cache->discard_eligible_time = ktime_get_ns(); + cache->discard_state = BTRFS_DISCARD_RESET_CURSOR; list_add_tail(&cache->discard_list, &discard_ctl->discard_list[BTRFS_DISCARD_INDEX_UNUSED]); @@ -119,23 +127,41 @@ static struct btrfs_block_group_cache *find_next_cache( /** * peek_discard_list - wrap find_next_cache() * @discard_ctl: discard control + * @discard_state: the discard_state of the block_group after state management * * This wraps find_next_cache() and sets the cache to be in use. + * discard_state's control flow is managed here. Variables related to + * discard_state are reset here as needed (eg discard_cursor). @discard_state + * is remembered as it may change while we're discarding, but we want the + * discard to execute in the context determined here. */ static struct btrfs_block_group_cache *peek_discard_list( - struct btrfs_discard_ctl *discard_ctl) + struct btrfs_discard_ctl *discard_ctl, + enum btrfs_discard_state *discard_state) { struct btrfs_block_group_cache *cache; u64 now = ktime_get_ns(); spin_lock(&discard_ctl->lock); +again: cache = find_next_cache(discard_ctl, now); - if (cache && now < cache->discard_eligible_time) + if (cache && now > cache->discard_eligible_time) { + if (cache->discard_index == BTRFS_DISCARD_INDEX_UNUSED && + btrfs_block_group_used(&cache->item) != 0) { + __btrfs_add_to_discard_list(discard_ctl, cache); + goto again; + } + if (cache->discard_state == BTRFS_DISCARD_RESET_CURSOR) { + cache->discard_cursor = cache->key.objectid; + cache->discard_state = BTRFS_DISCARD_EXTENTS; + } + discard_ctl->cache = cache; + *discard_state = cache->discard_state; + } else { cache = NULL; - - discard_ctl->cache = cache; + } spin_unlock(&discard_ctl->lock); @@ -246,24 +272,51 @@ static void btrfs_finish_discard_pass(struct btrfs_discard_ctl *discard_ctl, * btrfs_discard_workfn - discard work function * @work: work * - * This finds the next cache to start discarding and then discards it. + * This finds the next cache to start discarding and then discards a single + * region. It does this in a two-pass fashion: first extents and second + * bitmaps. Completely discarded block groups are sent to the unused_bgs path. */ static void btrfs_discard_workfn(struct work_struct *work) { struct btrfs_discard_ctl *discard_ctl; struct btrfs_block_group_cache *cache; + enum btrfs_discard_state discard_state; u64 trimmed = 0; discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); - cache = peek_discard_list(discard_ctl); + cache = peek_discard_list(discard_ctl, &discard_state); if (!cache || !btrfs_run_discard_work(discard_ctl)) return; - btrfs_trim_block_group(cache, &trimmed, cache->key.objectid, - btrfs_block_group_end(cache), 0); + /* Perform discarding. */ + if (discard_state == BTRFS_DISCARD_BITMAPS) + btrfs_trim_block_group_bitmaps(cache, &trimmed, + cache->discard_cursor, + btrfs_block_group_end(cache), + 0, true); + else + btrfs_trim_block_group_extents(cache, &trimmed, + cache->discard_cursor, + btrfs_block_group_end(cache), + 0, true); + + /* Determine next steps for a block_group. */ + if (cache->discard_cursor >= btrfs_block_group_end(cache)) { + if (discard_state == BTRFS_DISCARD_BITMAPS) { + btrfs_finish_discard_pass(discard_ctl, cache); + } else { + cache->discard_cursor = cache->key.objectid; + spin_lock(&discard_ctl->lock); + if (cache->discard_state != BTRFS_DISCARD_RESET_CURSOR) + cache->discard_state = BTRFS_DISCARD_BITMAPS; + spin_unlock(&discard_ctl->lock); + } + } - btrfs_finish_discard_pass(discard_ctl, cache); + spin_lock(&discard_ctl->lock); + discard_ctl->cache = NULL; + spin_unlock(&discard_ctl->lock); btrfs_discard_schedule_work(discard_ctl, false); } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 80a205449547..f840bc126cac 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3299,8 +3299,12 @@ static int do_trimming(struct btrfs_block_group_cache *block_group, return ret; } +/* + * If @async is set, then we will trim 1 region and return. + */ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, - u64 *total_trimmed, u64 start, u64 end, u64 minlen) + u64 *total_trimmed, u64 start, u64 end, u64 minlen, + bool async) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; struct btrfs_free_space *entry; @@ -3317,36 +3321,24 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, mutex_lock(&ctl->cache_writeout_mutex); spin_lock(&ctl->tree_lock); - if (ctl->free_space < minlen) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - break; - } + if (ctl->free_space < minlen) + goto out_unlock; entry = tree_search_offset(ctl, start, 0, 1); - if (!entry) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - break; - } + if (!entry) + goto out_unlock; - /* skip bitmaps */ - while (entry->bitmap) { + /* skip bitmaps and already trimmed entries */ + while (entry->bitmap || btrfs_free_space_trimmed(entry)) { node = rb_next(&entry->offset_index); - if (!node) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - goto out; - } + if (!node) + goto out_unlock; entry = rb_entry(node, struct btrfs_free_space, offset_index); } - if (entry->offset >= end) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - break; - } + if (entry->offset >= end) + goto out_unlock; extent_start = entry->offset; extent_bytes = entry->bytes; @@ -3371,10 +3363,15 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, ret = do_trimming(block_group, total_trimmed, start, bytes, extent_start, extent_bytes, extent_trim_state, &trim_entry); - if (ret) + if (ret) { + block_group->discard_cursor = start + bytes; break; + } next: start += bytes; + block_group->discard_cursor = start; + if (async && *total_trimmed) + break; if (fatal_signal_pending(current)) { ret = -ERESTARTSYS; @@ -3383,7 +3380,14 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, cond_resched(); } -out: + + return ret; + +out_unlock: + block_group->discard_cursor = btrfs_block_group_end(block_group); + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + return ret; } @@ -3420,8 +3424,12 @@ static void end_trimming_bitmap(struct btrfs_free_space *entry) entry->trim_state = BTRFS_TRIM_STATE_TRIMMED; } +/* + * If @async is set, then we will trim 1 region and return. + */ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, - u64 *total_trimmed, u64 start, u64 end, u64 minlen) + u64 *total_trimmed, u64 start, u64 end, u64 minlen, + bool async) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; struct btrfs_free_space *entry; @@ -3438,13 +3446,16 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, spin_lock(&ctl->tree_lock); if (ctl->free_space < minlen) { + block_group->discard_cursor = + btrfs_block_group_end(block_group); spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); break; } entry = tree_search_offset(ctl, offset, 1, 0); - if (!entry || btrfs_free_space_trimmed(entry)) { + if (!entry || (async && start == offset && + btrfs_free_space_trimmed(entry))) { spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); next_bitmap = true; @@ -3477,6 +3488,16 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, goto next; } + /* + * We already trimmed a region, but are using the locking above + * to reset the trim_state. + */ + if (async && *total_trimmed) { + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + return ret; + } + bytes = min(bytes, end - start); if (bytes < minlen) { entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; @@ -3499,6 +3520,8 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, start, bytes, 0, &trim_entry); if (ret) { reset_trimming_bitmap(ctl, offset); + block_group->discard_cursor = + btrfs_block_group_end(block_group); break; } next: @@ -3508,6 +3531,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, } else { start += bytes; } + block_group->discard_cursor = start; if (fatal_signal_pending(current)) { if (start != offset) @@ -3519,6 +3543,9 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, cond_resched(); } + if (offset >= end) + block_group->discard_cursor = end; + return ret; } @@ -3578,11 +3605,11 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, btrfs_get_block_group_trimming(block_group); spin_unlock(&block_group->lock); - ret = trim_no_bitmap(block_group, trimmed, start, end, minlen); + ret = trim_no_bitmap(block_group, trimmed, start, end, minlen, false); if (ret) goto out; - ret = trim_bitmaps(block_group, trimmed, start, end, minlen); + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, false); /* If we ended in the middle of a bitmap, reset the trimming flag. */ if (end % (BITS_PER_BITMAP * ctl->unit)) reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end)); @@ -3591,6 +3618,51 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, return ret; } +int btrfs_trim_block_group_extents(struct btrfs_block_group_cache *block_group, + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async) +{ + int ret; + + *trimmed = 0; + + spin_lock(&block_group->lock); + if (block_group->removed) { + spin_unlock(&block_group->lock); + return 0; + } + btrfs_get_block_group_trimming(block_group); + spin_unlock(&block_group->lock); + + ret = trim_no_bitmap(block_group, trimmed, start, end, minlen, async); + + btrfs_put_block_group_trimming(block_group); + return ret; +} + +int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async) +{ + int ret; + + *trimmed = 0; + + spin_lock(&block_group->lock); + if (block_group->removed) { + spin_unlock(&block_group->lock); + return 0; + } + btrfs_get_block_group_trimming(block_group); + spin_unlock(&block_group->lock); + + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, async); + + btrfs_put_block_group_trimming(block_group); + return ret; + +} + /* * Find the left-most item in the cache tree, and then return the * smallest inode number in the item. diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index e703f9e09461..316d349ec263 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -138,6 +138,12 @@ int btrfs_return_cluster_to_free_space( struct btrfs_free_cluster *cluster); int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen); +int btrfs_trim_block_group_extents(struct btrfs_block_group_cache *block_group, + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async); +int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async); /* Support functions for running our sanity tests */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS From patchwork Wed Oct 23 22:53:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207967 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 924C513B1 for ; Wed, 23 Oct 2019 22:53:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 71BAD20679 for ; Wed, 23 Oct 2019 22:53:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871211; bh=uwZpdf3ruu7RByZn8SsfgIYRmSMuTUp0jDtxggXua40=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=zMGrNPbuZ5EFRhKtPN2DFIzJ36c9y60zPjyOlQ6nCkoZJS4FFLtlePU6uZLgyuVHL Ux971rqeG1TSMBJmRA4w3oLtHEr7uqU9ag2adAh6ihEpJnN/ZntsAguu7I+Nc+MUUw 75asZRo7te8vN/8UeHI2h1AgJdRgCOkVKrk2xX0k= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436920AbfJWWxa (ORCPT ); Wed, 23 Oct 2019 18:53:30 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:41393 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436913AbfJWWx3 (ORCPT ); Wed, 23 Oct 2019 18:53:29 -0400 Received: by mail-qk1-f194.google.com with SMTP id p10so21494402qkg.8 for ; Wed, 23 Oct 2019 15:53:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=VIbSyVdau/I98QtdvvQuQTJpdN1XWghXREgP5+OlIwQ=; b=sLuVDnBvc8VkoItGRwYXT12+VUme9yaqKg6S/qzLeXSAlUAG+ea8hqlclUXip9j+K+ NrrmGaRyZAj3FHpXrvU4qg3wAzbgUR5+ZpL+chzCx+wCF9wjf3Xo+o3rTRNB7wp40VTR Hxgo0EbVzpvZLMyr7FECq7iGej+Q1CrTJOUp5p+JPVkVmcGDhwzM8ZZmm0htyhqT/XG6 xpY4vkj6n4xMWLoqeo5Yum0rDxlqYyLa2PUzBsrKV8JjNv+nDX8uvW3SOa5dRbGn0nqJ 0O4AN+Bng3gmEo4pAiDZnlhJP4YVi3RhrU6ePrMFG5Z6GH/TUIrgVHY5fVWfaAoTJWRO otZA== X-Gm-Message-State: APjAAAXYc6lzq1JUoJQkLS6HL9x+4+UzA2ryX9Q/GZxnVfz4oPfCZJ/J R8W3E886FURLpTIws7k7lmReaZdY X-Google-Smtp-Source: APXvYqzxlV4RZy1AbQTixL7rxWYadEky1L9VhT9b4KEptgiF5CR5zdnXIOhnhXX0iBtmgF9S4O6kKQ== X-Received: by 2002:a37:6305:: with SMTP id x5mr11120900qkb.498.1571871208672; Wed, 23 Oct 2019 15:53:28 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:28 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 08/22] btrfs: add removal calls for sysfs debug/ Date: Wed, 23 Oct 2019 18:53:02 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We probably should call sysfs_remove_group() on debug/. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/sysfs.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index f6d3c80f2e28..16f2865fbbd4 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -732,6 +732,10 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) kobject_del(fs_info->space_info_kobj); kobject_put(fs_info->space_info_kobj); } +#ifdef CONFIG_BTRFS_DEBUG + sysfs_remove_group(&fs_info->fs_devices->fsid_kobj, + &btrfs_debug_feature_attr_group); +#endif addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(&fs_info->fs_devices->fsid_kobj, &btrfs_feature_attr_group); sysfs_remove_files(&fs_info->fs_devices->fsid_kobj, btrfs_attrs); @@ -1170,6 +1174,9 @@ void __cold btrfs_exit_sysfs(void) sysfs_unmerge_group(&btrfs_kset->kobj, &btrfs_static_feature_attr_group); sysfs_remove_group(&btrfs_kset->kobj, &btrfs_feature_attr_group); +#ifdef CONFIG_BTRFS_DEBUG + sysfs_remove_group(&btrfs_kset->kobj, &btrfs_debug_feature_attr_group); +#endif kset_unregister(btrfs_kset); } From patchwork Wed Oct 23 22:53:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207971 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 712E313B1 for ; Wed, 23 Oct 2019 22:53:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5072820679 for ; Wed, 23 Oct 2019 22:53:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871212; bh=5+Nb+kmCIEIh9uIbuS1SJ6gZTSN+FNhCuljFgSMP0mg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=H5XW1JnNkCQpZ9lZDHzlTC+96wyb/j24Ohr8SsVhZE7XVPoJfbiJkvvidfVBVC7SP KAiKheVBI69zynOsTRHBz242n3tVo+cTQPYfjd15j44iIWiPzjBVypyoTi1nWzfE8C khrSrccNzaLO5msg899sNeNqomgiReF3K3WBe2YI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436921AbfJWWxb (ORCPT ); Wed, 23 Oct 2019 18:53:31 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:40010 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436918AbfJWWxa (ORCPT ); Wed, 23 Oct 2019 18:53:30 -0400 Received: by mail-qt1-f194.google.com with SMTP id o49so26908079qta.7 for ; Wed, 23 Oct 2019 15:53:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=BF86uypJMa+5lyUdgGXD80AM8sGQXwKBWiPQQJT/fKg=; b=bCxzavs+DIFZVfkO41eQI+3neFwM7BsbxARkeXySeUMn8esaV7yoPzb/B2yxfRWUBQ pk+NRfrNfnci1MIEoBEUmPsP5R2PRsNHMo5kN35O793qHNdnmjB39t8bJki+gicfzk+U esAg1NqDCMT900vfMGiGBLbUhQeyQNIGtBZFXx2yvvFYHY7gQaAw9kJWm4E7B6eYzsK4 TGS1+r1j+kLboFOi9z5kAA+mvxjhXdmi5W2ztpzxcMQcsJ/HEhxbRnxDouql/byDacJN GB31bu1KS5l0LsNyzDx5Q+gIuNFG+VJne509oG/F6GCLhpxrDyLuzrAQkvVfWB/jaZeN MwzA== X-Gm-Message-State: APjAAAVaLY9q92oi2MwZrG/qfnLfME37ITz8SYVQ+xGDLFh0CzP57Y4/ g5lDmLpeKI35FKeutNPK/K8= X-Google-Smtp-Source: APXvYqwT1IJUj4ahn0kofL6dx73SI2ASi1uKDz6bVIFmRf7J8DUqWumAVVBw+8LhwDm3BgixJbYrcg== X-Received: by 2002:a0c:8144:: with SMTP id 62mr11830005qvc.6.1571871209624; Wed, 23 Oct 2019 15:53:29 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:29 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 09/22] btrfs: make UUID/debug have its own kobject Date: Wed, 23 Oct 2019 18:53:03 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs only allowed attributes to be exposed in debug/. Let's let other groups be created by making debug its own kobject. This also makes the per-fs debug options separate from the global features mount attributes. This seems to be needed as sysfs_create_files() requires const struct attribute * while sysfs_create_group() can take struct attribute *. This seems nicer as per file system, you'll probably use to_fs_info(). Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 4 ++++ fs/btrfs/sysfs.c | 20 ++++++++++++++++---- 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e21aeb3a2266..8a34a90ce77f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -928,6 +928,10 @@ struct btrfs_fs_info { spinlock_t ref_verify_lock; struct rb_root block_tree; #endif + +#ifdef CONFIG_BTRFS_DEBUG + struct kobject *debug_kobj; +#endif }; static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 16f2865fbbd4..03694792d621 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -319,6 +319,10 @@ static const struct attribute_group btrfs_static_feature_attr_group = { * /sys/fs/btrfs/debug - applies to module or all filesystems * /sys/fs/btrfs/UUID - applies only to the given filesystem */ +static const struct attribute *btrfs_debug_mount_attrs[] = { + NULL, +}; + static struct attribute *btrfs_debug_feature_attrs[] = { NULL }; @@ -733,8 +737,12 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) kobject_put(fs_info->space_info_kobj); } #ifdef CONFIG_BTRFS_DEBUG - sysfs_remove_group(&fs_info->fs_devices->fsid_kobj, - &btrfs_debug_feature_attr_group); + if (fs_info->debug_kobj) { + sysfs_remove_files(fs_info->debug_kobj, + btrfs_debug_mount_attrs); + kobject_del(fs_info->debug_kobj); + kobject_put(fs_info->debug_kobj); + } #endif addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(&fs_info->fs_devices->fsid_kobj, &btrfs_feature_attr_group); @@ -1076,8 +1084,12 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) goto failure; #ifdef CONFIG_BTRFS_DEBUG - error = sysfs_create_group(fsid_kobj, - &btrfs_debug_feature_attr_group); + fs_info->debug_kobj = kobject_create_and_add("debug", fsid_kobj); + if (!fs_info->debug_kobj) + goto failure; + + error = sysfs_create_files(fs_info->debug_kobj, + btrfs_debug_mount_attrs); if (error) goto failure; #endif From patchwork Wed Oct 23 22:53:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207975 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B948A14ED for ; Wed, 23 Oct 2019 22:53:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 990042084C for ; Wed, 23 Oct 2019 22:53:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871215; bh=gZICtkjVbDPGYWD6LP7lvCuQxMp72urCy8KuC/QQFk4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=B56zb3h8+xohGF3q1u8P+hhyx7Lfu2+I7jZChZOIIdJ/In7QmNr3ff7DhymfrFwq9 je9ix9dPFV8CD5gEb41VWe/6k3IO//958MOGNa09f4ev7tkW8p3emUyW3XDBGiLF7W JgvH8v0cDx1m73fF1sqQq5U8wmBppEE3B2ptSW0w= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436927AbfJWWxd (ORCPT ); Wed, 23 Oct 2019 18:53:33 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:42932 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436923AbfJWWxd (ORCPT ); Wed, 23 Oct 2019 18:53:33 -0400 Received: by mail-qt1-f195.google.com with SMTP id w14so34720141qto.9 for ; Wed, 23 Oct 2019 15:53:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=w6sRwSjS0xaFTOroj8Wuxyc0NKzfoDPX14bR6M8eUCI=; b=JuCCUSdRAnNQMFSZrQaNuNWhjV64z4NPoNuJcdKoRbn70YsMGCPlaoxG59FCayzn/t IL8tx5acArGuoAD05CnoSwXz0Ysjcm+WO1mHLuz6OklJO+nsMIZBBK8g33y0rGcbMwe3 4qT0hO4fQpfRCrMK+lmK4dIFUoCDnv9JAx86wQxAGkXONPjwdy7Kzs2rA9D16L4sZ0Vh 1a5RtrKeNfcoN1Eym7vAGYHwTFXo/H7I9xCFdih7KyvKq3V7Vm6xUhMiHloOtmWJv2LY k5nj9nJsa13mNr6tSr70FFCiUUqxa8wDy8+XlYLG5dLpt8RMFuX0OkzLPCv0zonl7uFg ZZoQ== X-Gm-Message-State: APjAAAVDdylTbyqqw/iBuX2LvSwJ18z1aeWCVxBVZEEONAgBitfZK4au Q5X1/I4vm9Vggyfl/+UZ7H4= X-Google-Smtp-Source: APXvYqwRd8JEc4CN19xt7Tm4Bdtxg9uAI5tJLkfY5vEF0pZWmm5wmin7BJZyGwTVu9QBSW0U3Rhp8Q== X-Received: by 2002:a0c:b303:: with SMTP id s3mr11513102qve.172.1571871210720; Wed, 23 Oct 2019 15:53:30 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:30 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 10/22] btrfs: add discard sysfs directory Date: Wed, 23 Oct 2019 18:53:04 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Setup sysfs directory for discard stats + tunables. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 1 + fs/btrfs/sysfs.c | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8a34a90ce77f..835144930833 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -931,6 +931,7 @@ struct btrfs_fs_info { #ifdef CONFIG_BTRFS_DEBUG struct kobject *debug_kobj; + struct kobject *discard_kobj; #endif }; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 03694792d621..4b26de87d0ac 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -313,6 +313,13 @@ static const struct attribute_group btrfs_static_feature_attr_group = { #ifdef CONFIG_BTRFS_DEBUG +/* + * Discard statistics and tunables. + */ +static const struct attribute *discard_attrs[] = { + NULL, +}; + /* * Runtime debugging exported via sysfs * @@ -737,6 +744,11 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) kobject_put(fs_info->space_info_kobj); } #ifdef CONFIG_BTRFS_DEBUG + if (fs_info->discard_kobj) { + sysfs_remove_files(fs_info->discard_kobj, discard_attrs); + kobject_del(fs_info->discard_kobj); + kobject_put(fs_info->discard_kobj); + } if (fs_info->debug_kobj) { sysfs_remove_files(fs_info->debug_kobj, btrfs_debug_mount_attrs); @@ -1092,6 +1104,18 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) btrfs_debug_mount_attrs); if (error) goto failure; + + /* Discard directory. */ + fs_info->discard_kobj = kobject_create_and_add("discard", + fs_info->debug_kobj); + if (!fs_info->discard_kobj) { + error = -ENOMEM; + goto failure; + } + + error = sysfs_create_files(fs_info->discard_kobj, discard_attrs); + if (error) + goto failure; #endif error = addrm_unknown_feature_attrs(fs_info, true); From patchwork Wed Oct 23 22:53:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207973 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FD661390 for ; Wed, 23 Oct 2019 22:53:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CF88C20679 for ; Wed, 23 Oct 2019 22:53:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871215; bh=anNlL3OpTPO7Uf2873Qqqn2AqQNIys5kSvPWxH8+sG0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=a69kE9EY3YR6Sk9BtdsFU6i0gS6evITTeoMZIlKbyqj1Urax9wqMmJJU0Ld8qb9QQ cUX0NiQyzdYjGIR3brDQer7vwsAfThuezG6WW9Iz/qfTqiCWSbisbxiCF2c1IZNPLY wG6R43X2i4LhHh4hxsTJzXlBCC7i0rtA+o7H5XCg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436918AbfJWWxe (ORCPT ); Wed, 23 Oct 2019 18:53:34 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:43987 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436922AbfJWWxd (ORCPT ); Wed, 23 Oct 2019 18:53:33 -0400 Received: by mail-qt1-f196.google.com with SMTP id t20so34720624qtr.10 for ; Wed, 23 Oct 2019 15:53:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=9vnF1v9tghLyCAaI/nqnC2ZOsRQXZmmq3ryaM7aGcy4=; b=dPpy0WZyxLGjoiaUjICHubj74xQY91MXOGG8QyVjxVXOmruMp2WUol3Ok9rMu7/d78 UHk5KBuvcPNjuU+1rmPUPVjGeDWrSGu5u6eQGsKk35MAPKkEAtd9DQpRsuoQ096Fe5sc ZvuUQ4/m8nKrKbeMxALRNLNcB2Q0kGm764xLF0SkVt40sRQ9NWqAtECV8sjeWqg22Rik CDJdMYUKKF6ZlIE41NQ97EBzE+OUt0OfwzSpURR61lLsJ0CYDoj5KY+7hRdCdDKBV+2+ yVHbLhzSYUfouFBkXnIPJIOLwGE1MluyK8vRrTw/GI9wvA6x/6ef3ILzRD4PQuVkIXUS Yksg== X-Gm-Message-State: APjAAAU6qAbRhzGfBFTF7owu4RUL/ENO3jhRVA35BVPpfMOXdTmBTa2F he7Lam3JCPQf2qRX9/RsX0g= X-Google-Smtp-Source: APXvYqwfiE59bqZgX8tKsrxxgUX8dlednA7Bx/2zHFEDAiSyVLCwaajGP3Uk4GwB+B4NFFFL/Fbt7Q== X-Received: by 2002:a0c:b918:: with SMTP id u24mr11869521qvf.212.1571871211797; Wed, 23 Oct 2019 15:53:31 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:31 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 11/22] btrfs: track discardable extents for async discard Date: Wed, 23 Oct 2019 18:53:05 -0400 Message-Id: <0900386d8ae0654829d4113a2c5b865725e6cdd5.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The number of discardable extents will serve as the rate limiting metric for how often we should discard. This keeps track of discardable extents in the free space caches by maintaining deltas and propagating them to the global count. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 10 ++++ fs/btrfs/discard.c | 31 +++++++++++ fs/btrfs/discard.h | 4 ++ fs/btrfs/free-space-cache.c | 106 +++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.h | 2 + fs/btrfs/sysfs.c | 15 +++++ 6 files changed, 159 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 835144930833..2cf1dae512aa 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -100,6 +100,15 @@ struct btrfs_ref; #define BTRFS_MAX_EXTENT_SIZE SZ_128M +/* + * Deltas are an effective way to populate global statistics. Give macro names + * to make it clear what we're doing. An example is discard_extents in + * btrfs_free_space_ctl. + */ +#define BTRFS_STAT_NR_ENTRIES 2 +#define BTRFS_STAT_CURR 0 +#define BTRFS_STAT_PREV 1 + /* * Count how many BTRFS_MAX_EXTENT_SIZE cover the @size @@ -456,6 +465,7 @@ struct btrfs_discard_ctl { spinlock_t lock; struct btrfs_block_group_cache *cache; struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; + atomic_t discardable_extents; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index e50728061658..128ba18f2e5e 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -337,6 +337,35 @@ bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl) test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags)); } +/** + * btrfs_discard_update_discardable - propagate discard counters + * @cache: block_group of interest + * @ctl: free_space_ctl of @cache + * + * This propagates deltas of counters up to the discard_ctl. It maintains a + * current counter and a previous counter passing the delta up to the global + * stat. Then the current counter value becomes the previous counter value. + */ +void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, + struct btrfs_free_space_ctl *ctl) +{ + struct btrfs_discard_ctl *discard_ctl; + s32 extents_delta; + + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + return; + + discard_ctl = &cache->fs_info->discard_ctl; + + extents_delta = (ctl->discardable_extents[BTRFS_STAT_CURR] - + ctl->discardable_extents[BTRFS_STAT_PREV]); + if (extents_delta) { + atomic_add(extents_delta, &discard_ctl->discardable_extents); + ctl->discardable_extents[BTRFS_STAT_PREV] = + ctl->discardable_extents[BTRFS_STAT_CURR]; + } +} + /** * btrfs_discard_punt_unused_bgs_list - punt unused_bgs list to discard lists * @fs_info: fs_info of interest @@ -422,6 +451,8 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) INIT_LIST_HEAD(&discard_ctl->discard_list[i]); + + atomic_set(&discard_ctl->discardable_extents, 0); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index db003a244eb7..0d453491eac1 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -24,6 +24,10 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, bool override); bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl); +/* Update operations. */ +void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, + struct btrfs_free_space_ctl *ctl); + /* Setup/Cleanup operations. */ void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info); void btrfs_discard_resume(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index f840bc126cac..3a965876aae3 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -32,6 +32,9 @@ struct btrfs_trim_range { struct list_head list; }; +static int count_bitmap_extents(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *bitmap_info); + static int link_free_space(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info); static void unlink_free_space(struct btrfs_free_space_ctl *ctl, @@ -811,12 +814,17 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, ret = io_ctl_read_bitmap(&io_ctl, e); if (ret) goto free_cache; + e->bitmap_extents = count_bitmap_extents(ctl, e); + if (!btrfs_free_space_trimmed(e)) + ctl->discardable_extents[BTRFS_STAT_CURR] += + e->bitmap_extents; } io_ctl_drop_pages(&io_ctl); merge_space_tree(ctl); ret = 1; out: + btrfs_discard_update_discardable(ctl->private, ctl); io_ctl_free(&io_ctl); return ret; free_cache: @@ -1631,6 +1639,9 @@ __unlink_free_space(struct btrfs_free_space_ctl *ctl, { rb_erase(&info->offset_index, &ctl->free_space_offset); ctl->free_extents--; + + if (!info->bitmap && !btrfs_free_space_trimmed(info)) + ctl->discardable_extents[BTRFS_STAT_CURR]--; } static void unlink_free_space(struct btrfs_free_space_ctl *ctl, @@ -1651,6 +1662,9 @@ static int link_free_space(struct btrfs_free_space_ctl *ctl, if (ret) return ret; + if (!info->bitmap && !btrfs_free_space_trimmed(info)) + ctl->discardable_extents[BTRFS_STAT_CURR]++; + ctl->free_space += info->bytes; ctl->free_extents++; return ret; @@ -1707,17 +1721,29 @@ static inline void __bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset, u64 bytes) { - unsigned long start, count; + unsigned long start, count, end; + int extent_delta = -1; start = offset_to_bit(info->offset, ctl->unit, offset); count = bytes_to_bits(bytes, ctl->unit); - ASSERT(start + count <= BITS_PER_BITMAP); + end = start + count; + ASSERT(end <= BITS_PER_BITMAP); bitmap_clear(info->bitmap, start, count); info->bytes -= bytes; if (info->max_extent_size > ctl->unit) info->max_extent_size = 0; + + if (start && test_bit(start - 1, info->bitmap)) + extent_delta++; + + if (end < BITS_PER_BITMAP && test_bit(end, info->bitmap)) + extent_delta++; + + info->bitmap_extents += extent_delta; + if (!btrfs_free_space_trimmed(info)) + ctl->discardable_extents[BTRFS_STAT_CURR] += extent_delta; } static void bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, @@ -1732,16 +1758,28 @@ static void bitmap_set_bits(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset, u64 bytes) { - unsigned long start, count; + unsigned long start, count, end; + int extent_delta = 1; start = offset_to_bit(info->offset, ctl->unit, offset); count = bytes_to_bits(bytes, ctl->unit); - ASSERT(start + count <= BITS_PER_BITMAP); + end = start + count; + ASSERT(end <= BITS_PER_BITMAP); bitmap_set(info->bitmap, start, count); info->bytes += bytes; ctl->free_space += bytes; + + if (start && test_bit(start - 1, info->bitmap)) + extent_delta--; + + if (end < BITS_PER_BITMAP && test_bit(end, info->bitmap)) + extent_delta--; + + info->bitmap_extents += extent_delta; + if (!btrfs_free_space_trimmed(info)) + ctl->discardable_extents[BTRFS_STAT_CURR] += extent_delta; } /* @@ -1877,11 +1915,35 @@ find_free_space(struct btrfs_free_space_ctl *ctl, u64 *offset, u64 *bytes, return NULL; } +static int count_bitmap_extents(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *bitmap_info) +{ + struct btrfs_block_group_cache *cache = ctl->private; + u64 bytes = bitmap_info->bytes; + unsigned int rs, re; + int count = 0; + + if (!cache || !bytes) + return count; + + bitmap_for_each_set_region(bitmap_info->bitmap, rs, re, 0, + BITS_PER_BITMAP) { + bytes -= (rs - re) * ctl->unit; + count++; + + if (!bytes) + break; + } + + return count; +} + static void add_new_bitmap(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset) { info->offset = offset_to_bitmap(ctl, offset); info->bytes = 0; + info->bitmap_extents = 0; INIT_LIST_HEAD(&info->list); link_free_space(ctl, info); ctl->total_bitmaps++; @@ -1987,8 +2049,12 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl, * This is a tradeoff to make bitmap trim state minimal. We mark the * whole bitmap untrimmed if at any point we add untrimmed regions. */ - if (trim_state == BTRFS_TRIM_STATE_UNTRIMMED) + if (trim_state == BTRFS_TRIM_STATE_UNTRIMMED) { + if (btrfs_free_space_trimmed(info)) + ctl->discardable_extents[BTRFS_STAT_CURR] += + info->bitmap_extents; info->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + } end = info->offset + (u64)(BITS_PER_BITMAP * ctl->unit); @@ -2425,6 +2491,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, if (ret) kmem_cache_free(btrfs_free_space_cachep, info); out: + btrfs_discard_update_discardable(cache, ctl); spin_unlock(&ctl->tree_lock); if (ret) { @@ -2534,6 +2601,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, goto again; } out_lock: + btrfs_discard_update_discardable(block_group, ctl); spin_unlock(&ctl->tree_lock); out: return ret; @@ -2619,8 +2687,16 @@ __btrfs_return_cluster_to_free_space( bitmap = (entry->bitmap != NULL); if (!bitmap) { + /* merging treats extents as if they were new */ + if (!btrfs_free_space_trimmed(entry)) + ctl->discardable_extents[BTRFS_STAT_CURR]--; + try_merge_free_space(ctl, entry, false); steal_from_bitmap(ctl, entry, false); + + /* as we insert directly, update these statistics */ + if (!btrfs_free_space_trimmed(entry)) + ctl->discardable_extents[BTRFS_STAT_CURR]++; } tree_insert_offset(&ctl->free_space_offset, entry->offset, &entry->offset_index, bitmap); @@ -2677,6 +2753,7 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group_cache *block_group) cond_resched_lock(&ctl->tree_lock); } __btrfs_remove_free_space_cache_locked(ctl); + btrfs_discard_update_discardable(block_group, ctl); spin_unlock(&ctl->tree_lock); } @@ -2751,6 +2828,7 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, link_free_space(ctl, entry); } out: + btrfs_discard_update_discardable(block_group, ctl); spin_unlock(&ctl->tree_lock); if (align_gap_len) @@ -2916,6 +2994,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, entry->bitmap); ctl->total_bitmaps--; ctl->op->recalc_thresholds(ctl); + } else if (!btrfs_free_space_trimmed(entry)) { + ctl->discardable_extents[BTRFS_STAT_CURR]--; } kmem_cache_free(btrfs_free_space_cachep, entry); } @@ -3412,16 +3492,24 @@ static void reset_trimming_bitmap(struct btrfs_free_space_ctl *ctl, u64 offset) spin_lock(&ctl->tree_lock); entry = tree_search_offset(ctl, offset, 1, 0); - if (entry) + if (entry) { + if (btrfs_free_space_trimmed(entry)) + ctl->discardable_extents[BTRFS_STAT_CURR] += + entry->bitmap_extents; entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + } spin_unlock(&ctl->tree_lock); } -static void end_trimming_bitmap(struct btrfs_free_space *entry) +static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *entry) { - if (btrfs_free_space_trimming_bitmap(entry)) + if (btrfs_free_space_trimming_bitmap(entry)) { entry->trim_state = BTRFS_TRIM_STATE_TRIMMED; + ctl->discardable_extents[BTRFS_STAT_CURR] -= + entry->bitmap_extents; + } } /* @@ -3479,7 +3567,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, * if BTRFS_TRIM_STATE_TRIMMED is set on a bitmap. */ if (ret2 && !minlen) - end_trimming_bitmap(entry); + end_trimming_bitmap(ctl, entry); else entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; spin_unlock(&ctl->tree_lock); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 316d349ec263..5e65899a2afa 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -28,6 +28,7 @@ struct btrfs_free_space { unsigned long *bitmap; struct list_head list; enum btrfs_trim_state trim_state; + s32 bitmap_extents; }; static inline bool btrfs_free_space_trimmed(struct btrfs_free_space *info) @@ -50,6 +51,7 @@ struct btrfs_free_space_ctl { int total_bitmaps; int unit; u64 start; + s32 discardable_extents[BTRFS_STAT_NR_ENTRIES]; const struct btrfs_free_space_op *op; void *private; struct mutex cache_writeout_mutex; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4b26de87d0ac..25f3d9062b78 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -11,6 +11,7 @@ #include #include "ctree.h" +#include "discard.h" #include "disk-io.h" #include "transaction.h" #include "sysfs.h" @@ -316,7 +317,21 @@ static const struct attribute_group btrfs_static_feature_attr_group = { /* * Discard statistics and tunables. */ +#define discard_to_fs_info(_kobj) to_fs_info((_kobj)->parent->parent) + +static ssize_t btrfs_discardable_extents_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%d\n", + atomic_read(&fs_info->discard_ctl.discardable_extents)); +} +BTRFS_ATTR(discard, discardable_extents, btrfs_discardable_extents_show); + static const struct attribute *discard_attrs[] = { + BTRFS_ATTR_PTR(discard, discardable_extents), NULL, }; From patchwork Wed Oct 23 22:53:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 59A3413B1 for ; Wed, 23 Oct 2019 22:53:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2F68020679 for ; Wed, 23 Oct 2019 22:53:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871216; bh=dZe8f3evmLOHJdnw5EwzFFd2ZASIs+SC1Rbyy4+JD24=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=sUDL3vQ13G+xrkK8uI5Wl26xXLEiLNB3Yh59dwU4QgdftahL14NjLQ6oTtnCUNml7 pxgzDW41gQNQZLeCMS6Wwi+LWhUlTxuIuZrKamTudKDy27vHZz0VW6qWCZoJcfIRGj PyiirwXt3E3Zwxkxy8LbrRPWb+XhllxexQDavh5o= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436931AbfJWWxf (ORCPT ); Wed, 23 Oct 2019 18:53:35 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:34144 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436924AbfJWWxe (ORCPT ); Wed, 23 Oct 2019 18:53:34 -0400 Received: by mail-qt1-f196.google.com with SMTP id e14so14970184qto.1 for ; Wed, 23 Oct 2019 15:53:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=ETavxVld8bxTgckmKB7O6ig8J4PhMI6W7gXFOef2gMo=; b=nS20qdXKWop9wuPo6dxKkdpVLgJVE7JkZk4cw1+DdzqrjiuWrNTIdqPt6qe2jTfF/v lNCj/re/QISVHsK+JzikgGAR6lfUHhrLXOsNyad0Lsee2jxAUoQ278F1jqQgdHNYlflL nsx++x2GwxvZ41+lE8g/Ou+SovBLPltieiFwf5Ov/BQ3EL2BwOC2AATGGEwE/voEBHgW C/qvELwroQyi9IE2z47y9JJh3168PkZBbZjIP7RUbpvBsID0BSPaQ2ndP8NcgKbJOE2z 4wHmJ+6BxScwN1B3Ad9RfQGvWtOfVDDLkZu+Efc4htSbhLGqmueDUx+23qKMq7AO5cD3 VNxQ== X-Gm-Message-State: APjAAAWSeVCewJNg9Ho1as0PNetUDNyXi34LWr+3XCMiMbEUDn2LgyxH 9q1oR8fRg7Yh2A+hbNGC2TE= X-Google-Smtp-Source: APXvYqyQMDB+UG4wNi3rAXzKRmOqzeSyLnR0FINJDbKnyyAeoNIxXmwSTfntHgkNTxKzRX3mIBzmWg== X-Received: by 2002:aed:222b:: with SMTP id n40mr1118794qtc.109.1571871213052; Wed, 23 Oct 2019 15:53:33 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:32 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 12/22] btrfs: keep track of discardable_bytes Date: Wed, 23 Oct 2019 18:53:06 -0400 Message-Id: <3801afee5fe1a6a81e04e27692e2ded4c5df64ec.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Keep track of this metric so that we can understand how ahead or behind we are in discarding rate. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 1 + fs/btrfs/discard.c | 10 +++++++++ fs/btrfs/free-space-cache.c | 41 +++++++++++++++++++++++++++++-------- fs/btrfs/free-space-cache.h | 1 + fs/btrfs/sysfs.c | 12 +++++++++++ 5 files changed, 56 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2cf1dae512aa..43aa355f5c37 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -466,6 +466,7 @@ struct btrfs_discard_ctl { struct btrfs_block_group_cache *cache; struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; atomic_t discardable_extents; + atomic64_t discardable_bytes; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 128ba18f2e5e..9c561a561578 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -351,6 +351,7 @@ void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, { struct btrfs_discard_ctl *discard_ctl; s32 extents_delta; + s64 bytes_delta; if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) return; @@ -364,6 +365,14 @@ void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, ctl->discardable_extents[BTRFS_STAT_PREV] = ctl->discardable_extents[BTRFS_STAT_CURR]; } + + bytes_delta = (ctl->discardable_bytes[BTRFS_STAT_CURR] - + ctl->discardable_bytes[BTRFS_STAT_PREV]); + if (bytes_delta) { + atomic64_add(bytes_delta, &discard_ctl->discardable_bytes); + ctl->discardable_bytes[BTRFS_STAT_PREV] = + ctl->discardable_bytes[BTRFS_STAT_CURR]; + } } /** @@ -453,6 +462,7 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) INIT_LIST_HEAD(&discard_ctl->discard_list[i]); atomic_set(&discard_ctl->discardable_extents, 0); + atomic64_set(&discard_ctl->discardable_bytes, 0); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 3a965876aae3..8d4ffd50aee6 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -815,9 +815,11 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, if (ret) goto free_cache; e->bitmap_extents = count_bitmap_extents(ctl, e); - if (!btrfs_free_space_trimmed(e)) + if (!btrfs_free_space_trimmed(e)) { ctl->discardable_extents[BTRFS_STAT_CURR] += e->bitmap_extents; + ctl->discardable_bytes[BTRFS_STAT_CURR] += e->bytes; + } } io_ctl_drop_pages(&io_ctl); @@ -1640,8 +1642,10 @@ __unlink_free_space(struct btrfs_free_space_ctl *ctl, rb_erase(&info->offset_index, &ctl->free_space_offset); ctl->free_extents--; - if (!info->bitmap && !btrfs_free_space_trimmed(info)) + if (!info->bitmap && !btrfs_free_space_trimmed(info)) { ctl->discardable_extents[BTRFS_STAT_CURR]--; + ctl->discardable_bytes[BTRFS_STAT_CURR] -= info->bytes; + } } static void unlink_free_space(struct btrfs_free_space_ctl *ctl, @@ -1662,8 +1666,10 @@ static int link_free_space(struct btrfs_free_space_ctl *ctl, if (ret) return ret; - if (!info->bitmap && !btrfs_free_space_trimmed(info)) + if (!info->bitmap && !btrfs_free_space_trimmed(info)) { ctl->discardable_extents[BTRFS_STAT_CURR]++; + ctl->discardable_bytes[BTRFS_STAT_CURR] += info->bytes; + } ctl->free_space += info->bytes; ctl->free_extents++; @@ -1742,8 +1748,10 @@ static inline void __bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, extent_delta++; info->bitmap_extents += extent_delta; - if (!btrfs_free_space_trimmed(info)) + if (!btrfs_free_space_trimmed(info)) { ctl->discardable_extents[BTRFS_STAT_CURR] += extent_delta; + ctl->discardable_bytes[BTRFS_STAT_CURR] -= bytes; + } } static void bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, @@ -1778,8 +1786,10 @@ static void bitmap_set_bits(struct btrfs_free_space_ctl *ctl, extent_delta--; info->bitmap_extents += extent_delta; - if (!btrfs_free_space_trimmed(info)) + if (!btrfs_free_space_trimmed(info)) { ctl->discardable_extents[BTRFS_STAT_CURR] += extent_delta; + ctl->discardable_bytes[BTRFS_STAT_CURR] += bytes; + } } /* @@ -2050,9 +2060,11 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl, * whole bitmap untrimmed if at any point we add untrimmed regions. */ if (trim_state == BTRFS_TRIM_STATE_UNTRIMMED) { - if (btrfs_free_space_trimmed(info)) + if (btrfs_free_space_trimmed(info)) { ctl->discardable_extents[BTRFS_STAT_CURR] += info->bitmap_extents; + ctl->discardable_bytes[BTRFS_STAT_CURR] += info->bytes; + } info->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; } @@ -2688,15 +2700,21 @@ __btrfs_return_cluster_to_free_space( bitmap = (entry->bitmap != NULL); if (!bitmap) { /* merging treats extents as if they were new */ - if (!btrfs_free_space_trimmed(entry)) + if (!btrfs_free_space_trimmed(entry)) { ctl->discardable_extents[BTRFS_STAT_CURR]--; + ctl->discardable_bytes[BTRFS_STAT_CURR] -= + entry->bytes; + } try_merge_free_space(ctl, entry, false); steal_from_bitmap(ctl, entry, false); /* as we insert directly, update these statistics */ - if (!btrfs_free_space_trimmed(entry)) + if (!btrfs_free_space_trimmed(entry)) { ctl->discardable_extents[BTRFS_STAT_CURR]++; + ctl->discardable_bytes[BTRFS_STAT_CURR] += + entry->bytes; + } } tree_insert_offset(&ctl->free_space_offset, entry->offset, &entry->offset_index, bitmap); @@ -2987,6 +3005,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, spin_lock(&ctl->tree_lock); ctl->free_space -= bytes; + if (!entry->bitmap && !btrfs_free_space_trimmed(entry)) + ctl->discardable_bytes[BTRFS_STAT_CURR] -= bytes; if (entry->bytes == 0) { ctl->free_extents--; if (entry->bitmap) { @@ -3493,9 +3513,11 @@ static void reset_trimming_bitmap(struct btrfs_free_space_ctl *ctl, u64 offset) entry = tree_search_offset(ctl, offset, 1, 0); if (entry) { - if (btrfs_free_space_trimmed(entry)) + if (btrfs_free_space_trimmed(entry)) { ctl->discardable_extents[BTRFS_STAT_CURR] += entry->bitmap_extents; + ctl->discardable_bytes[BTRFS_STAT_CURR] += entry->bytes; + } entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; } @@ -3509,6 +3531,7 @@ static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, entry->trim_state = BTRFS_TRIM_STATE_TRIMMED; ctl->discardable_extents[BTRFS_STAT_CURR] -= entry->bitmap_extents; + ctl->discardable_bytes[BTRFS_STAT_CURR] -= entry->bytes; } } diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 5e65899a2afa..1c0ec98da529 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -52,6 +52,7 @@ struct btrfs_free_space_ctl { int unit; u64 start; s32 discardable_extents[BTRFS_STAT_NR_ENTRIES]; + s64 discardable_bytes[BTRFS_STAT_NR_ENTRIES]; const struct btrfs_free_space_op *op; void *private; struct mutex cache_writeout_mutex; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 25f3d9062b78..9ebb1f1b1de6 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -330,8 +330,20 @@ static ssize_t btrfs_discardable_extents_show(struct kobject *kobj, } BTRFS_ATTR(discard, discardable_extents, btrfs_discardable_extents_show); +static ssize_t btrfs_discardable_bytes_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + atomic64_read(&fs_info->discard_ctl.discardable_bytes)); +} +BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discardable_extents), + BTRFS_ATTR_PTR(discard, discardable_bytes), NULL, }; From patchwork Wed Oct 23 22:53:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207979 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E295713B1 for ; Wed, 23 Oct 2019 22:53:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C14D82084C for ; Wed, 23 Oct 2019 22:53:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871217; bh=wALUEVCdSmla/BspM1ToeJy+qWH9KedLTpmWMHNpw0g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=k0SuNj8969O2nXy+egskF/vbR6HO6QVvKqr0BprehV4IROjP8Fjln+AP7E5FYugep NY4PC32vQOy038F51KBzmYhOKj7/6SBIT1uJov/moacXIN2noODe5J27X5+fl1p3V0 4SST8EIDKjo8QZc9bdABeRgyR1PCO0GjLJmkNPnc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436933AbfJWWxh (ORCPT ); Wed, 23 Oct 2019 18:53:37 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:37098 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436922AbfJWWxf (ORCPT ); Wed, 23 Oct 2019 18:53:35 -0400 Received: by mail-qk1-f193.google.com with SMTP id u184so21518291qkd.4 for ; Wed, 23 Oct 2019 15:53:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=SiPaKWN5jGt8CN9xurjK+UJhwnPg3865HBBym/X9Ps4=; b=gpiUfecKRUt1TZvEUauuyDIcMpJfM07Wx72syChpqLL/bRAG0UFiZ7xbgUnvIkQpbx pp3bqsHiLg9EKOk+XkcYARLFKcNwA71vTAlFVklB9gbq7KDuXAqrBb+Eq+SATQJbkuSO 4dgFMQMB177o9Nb/Hhzc+6Ub1bv/BY/OvmD5k0hyxsUb4gYmGn42IU88v18DmBTRfb9+ GJ4a8hg9IaxpIVxur/l0R1pLHApgbHkuhbjySyhGHj0b1h8/YfJqH5g6ZtLitzzuF8Jf 7LbDHUCSWqE5KAxDSJcJbP+WTLdlKpkKPGv+jAaPIqK/71e00l7i4ZrTN8qKBma7T3Tt x2kQ== X-Gm-Message-State: APjAAAX3ec15xqyDup5G8QqOhi6VA/xsNJWpFxGHQ+SpS2+/eVDT1gSz P7rZKMHyyXKVUkVM9g0Qfys= X-Google-Smtp-Source: APXvYqzVPbEBl4ldgdFTjITJ/3CsgQrQFTYwcNAhLeXE3d9jTdx5wme/ovVaoEBkTa0PZk0Q13y9Zw== X-Received: by 2002:a37:c11:: with SMTP id 17mr8322544qkm.481.1571871214243; Wed, 23 Oct 2019 15:53:34 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:33 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 13/22] btrfs: calculate discard delay based on number of extents Date: Wed, 23 Oct 2019 18:53:07 -0400 Message-Id: <684dc3c1016cf8ff4215899a5c45958204d1d6d8.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Use the number of discardable extents to help guide our discard delay interval. This value is reevaluated every transaction commit. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/discard.c | 52 ++++++++++++++++++++++++++++++++++++++---- fs/btrfs/discard.h | 1 + fs/btrfs/extent-tree.c | 4 +++- fs/btrfs/sysfs.c | 31 +++++++++++++++++++++++++ 5 files changed, 85 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 43aa355f5c37..246141e2f825 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -467,6 +467,8 @@ struct btrfs_discard_ctl { struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; atomic_t discardable_extents; atomic64_t discardable_bytes; + u32 delay; + u32 iops_limit; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 9c561a561578..c3da4a537b5a 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -16,6 +16,11 @@ /* This is an initial delay to give some chance for lba reuse. */ #define BTRFS_DISCARD_DELAY (120ULL * NSEC_PER_SEC) +/* Target completion latency of discarding all discardable extents. */ +#define BTRFS_DISCARD_TARGET_MSEC (6 * 60 * 60ULL * MSEC_PER_SEC) +#define BTRFS_DISCARD_MAX_DELAY (10000UL) +#define BTRFS_DISCARD_MAX_IOPS (10UL) + static struct list_head *btrfs_get_discard_list( struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) @@ -232,11 +237,17 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, cache = find_next_cache(discard_ctl, now); if (cache) { - u64 delay = 0; + u32 delay = discard_ctl->delay; + + /* + * This timeout is to hopefully prevent immediate discarding + * in a recently allocated block group. + */ + if (now < cache->discard_eligible_time) { + u64 bg_timeout = cache->discard_eligible_time - now; - if (now < cache->discard_eligible_time) - delay = nsecs_to_jiffies(cache->discard_eligible_time - - now); + delay = max_t(u64, delay, nsecs_to_jiffies(bg_timeout)); + } mod_delayed_work(discard_ctl->discard_workers, &discard_ctl->work, @@ -337,6 +348,37 @@ bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl) test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags)); } +/** + * btrfs_discard_calc_delay - recalculate the base delay + * @discard_ctl: discard control + * + * Recalculate the base delay which is based off the total number of + * discardable_extents. Clamp this with the iops_limit and + * BTRFS_DISCARD_MAX_DELAY. + */ +void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl) +{ + s32 discardable_extents = + atomic_read(&discard_ctl->discardable_extents); + s32 iops_limit; + unsigned long delay; + + if (!discardable_extents) + return; + + spin_lock(&discard_ctl->lock); + + iops_limit = READ_ONCE(discard_ctl->iops_limit); + if (iops_limit) + iops_limit = MSEC_PER_SEC / iops_limit; + + delay = BTRFS_DISCARD_TARGET_MSEC / discardable_extents; + delay = clamp_t(s32, delay, iops_limit, BTRFS_DISCARD_MAX_DELAY); + discard_ctl->delay = msecs_to_jiffies(delay); + + spin_unlock(&discard_ctl->lock); +} + /** * btrfs_discard_update_discardable - propagate discard counters * @cache: block_group of interest @@ -463,6 +505,8 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) atomic_set(&discard_ctl->discardable_extents, 0); atomic64_set(&discard_ctl->discardable_bytes, 0); + discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY; + discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS; } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 0d453491eac1..2d933b44abd9 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -25,6 +25,7 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl); /* Update operations. */ +void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl); void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, struct btrfs_free_space_ctl *ctl); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index de00fd6e338b..81c2503b53c1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2921,8 +2921,10 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) cond_resched(); } - if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) { + btrfs_discard_calc_delay(&fs_info->discard_ctl); btrfs_discard_schedule_work(&fs_info->discard_ctl, true); + } /* * Transaction is finished. We don't need the lock anymore. We diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 9ebb1f1b1de6..4955afc225c7 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -341,9 +341,40 @@ static ssize_t btrfs_discardable_bytes_show(struct kobject *kobj, } BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show); +static ssize_t btrfs_discard_iops_limit_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%u\n", + READ_ONCE(fs_info->discard_ctl.iops_limit)); +} + +static ssize_t btrfs_discard_iops_limit_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl; + u32 iops_limit; + int ret; + + ret = kstrtou32(buf, 10, &iops_limit); + if (ret) + return -EINVAL; + + WRITE_ONCE(discard_ctl->iops_limit, iops_limit); + + return len; +} +BTRFS_ATTR_RW(discard, iops_limit, btrfs_discard_iops_limit_show, + btrfs_discard_iops_limit_store); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discardable_extents), BTRFS_ATTR_PTR(discard, discardable_bytes), + BTRFS_ATTR_PTR(discard, iops_limit), NULL, }; From patchwork Wed Oct 23 22:53:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207981 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB1B61390 for ; Wed, 23 Oct 2019 22:53:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CAFC52084C for ; Wed, 23 Oct 2019 22:53:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871220; bh=kVHtMalPFSpna0d9Xina0Zi7Z8ef9lkL+PbMsv6eNNY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=jcENq1OHtYuEqusK4he4uGCCPX2IMTKvOZG/kUpK8wRTOCEFkggIuHoknawRJRb2M FsP3wUO6rS3XukA0cdNdKjEdiTaTYENHp8/K5LNxDXUDc2NeMh1BEXo7nc1bCBm7XR lE34oUPZE7tPRpH78LZD9Kzeh1CiCFN3Kvh+7QN0= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436939AbfJWWxj (ORCPT ); Wed, 23 Oct 2019 18:53:39 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:40023 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436922AbfJWWxi (ORCPT ); Wed, 23 Oct 2019 18:53:38 -0400 Received: by mail-qt1-f194.google.com with SMTP id o49so26908569qta.7 for ; Wed, 23 Oct 2019 15:53:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=6n7xUP5ll26OKRSxQStboBnDNtcIk7uUWfdrIN1roj4=; b=IYL21k3zdxtOvzcBswyb/krjGXNnN3AXcjEwgeo0PeVYnVwneNhLY76cTI1N6E95iP m22twmxWFiCfbE7BVxXAnbpji7g5zikhTWkUKxaeUCUUF5n8wlNxVZ9CTi+4+Z0loL5C AD2DKYzUs6S/hPktfLkRDwT2NFPRmQ3wdV/VL/phMsCjtYYWvwMjnV2+DdYdSFXVJ6i9 fUZ7IuczDiEkOXK7xWOOe6bw5m78YUCxia98SHVsWHzix9iNrtM+6V/Zu8x41KX6NoJW n20lUJZwTxkhbXP4eRfOCE9tSQLn81kqCUpDemQ6qJPE16NCg6jpAfMp8FPUamZqCYq3 sZWg== X-Gm-Message-State: APjAAAUFC7wPfUhDyY0DmLfWHssdT3EwlQ1nlL5p6qYUiqXjHJ6mAyYJ +02riaZpJJvvK5iMeSxWE4A= X-Google-Smtp-Source: APXvYqy1vwAHTy3dZUJvXA3T6i2gRIEzFYSOeU+RZkUhGOP1bUqbQLAl0ZjB5/yF0oXGiC6z3UBYMg== X-Received: by 2002:aed:34c6:: with SMTP id x64mr1083712qtd.324.1571871215443; Wed, 23 Oct 2019 15:53:35 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:34 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 14/22] btrfs: add bps discard rate limit Date: Wed, 23 Oct 2019 18:53:08 -0400 Message-Id: <8efa082438eea760533f1cddffa74cebdea6f028.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Provide an ability to rate limit based on mbps in addition to the iops delay calculated from number of discardable extents. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/discard.c | 17 +++++++++++++++++ fs/btrfs/sysfs.c | 31 +++++++++++++++++++++++++++++++ 3 files changed, 50 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 246141e2f825..eccfc27e9b83 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -465,10 +465,12 @@ struct btrfs_discard_ctl { spinlock_t lock; struct btrfs_block_group_cache *cache; struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; + u64 prev_discard; atomic_t discardable_extents; atomic64_t discardable_bytes; u32 delay; u32 iops_limit; + u64 bps_limit; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index c3da4a537b5a..70873cd884bf 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -238,6 +238,19 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, cache = find_next_cache(discard_ctl, now); if (cache) { u32 delay = discard_ctl->delay; + u64 bps_limit = READ_ONCE(discard_ctl->bps_limit); + + /* + * A single delayed workqueue item is responsible for + * discarding, so we can manage the bytes rate limit by keeping + * track of the previous discard. + */ + if (bps_limit && discard_ctl->prev_discard) { + u64 bps_delay = (MSEC_PER_SEC * + discard_ctl->prev_discard / bps_limit); + + delay = max_t(u64, delay, msecs_to_jiffies(bps_delay)); + } /* * This timeout is to hopefully prevent immediate discarding @@ -312,6 +325,8 @@ static void btrfs_discard_workfn(struct work_struct *work) btrfs_block_group_end(cache), 0, true); + discard_ctl->prev_discard = trimmed; + /* Determine next steps for a block_group. */ if (cache->discard_cursor >= btrfs_block_group_end(cache)) { if (discard_state == BTRFS_DISCARD_BITMAPS) { @@ -503,10 +518,12 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) INIT_LIST_HEAD(&discard_ctl->discard_list[i]); + discard_ctl->prev_discard = 0; atomic_set(&discard_ctl->discardable_extents, 0); atomic64_set(&discard_ctl->discardable_bytes, 0); discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY; discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS; + discard_ctl->bps_limit = 0; } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4955afc225c7..070fa6223911 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -371,10 +371,41 @@ static ssize_t btrfs_discard_iops_limit_store(struct kobject *kobj, BTRFS_ATTR_RW(discard, iops_limit, btrfs_discard_iops_limit_show, btrfs_discard_iops_limit_store); +static ssize_t btrfs_discard_bps_limit_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%llu\n", + READ_ONCE(fs_info->discard_ctl.bps_limit)); +} + +static ssize_t btrfs_discard_bps_limit_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl; + u64 bps_limit; + int ret; + + ret = kstrtou64(buf, 10, &bps_limit); + if (ret) + return -EINVAL; + + WRITE_ONCE(discard_ctl->bps_limit, bps_limit); + + return len; +} +BTRFS_ATTR_RW(discard, bps_limit, btrfs_discard_bps_limit_show, + btrfs_discard_bps_limit_store); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discardable_extents), BTRFS_ATTR_PTR(discard, discardable_bytes), BTRFS_ATTR_PTR(discard, iops_limit), + BTRFS_ATTR_PTR(discard, bps_limit), NULL, }; From patchwork Wed Oct 23 22:53:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207985 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31B3D14ED for ; Wed, 23 Oct 2019 22:53:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 11F972084C for ; Wed, 23 Oct 2019 22:53:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871223; bh=FdH+xCBBuP+V9qHFj5VDM2KkAytFDtMjpsaAgWvlDA8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=KwF04sa/rGV1l2vW6K2P3Zj1kxXJG/nDuop2vfyLj+MZuWJ0GGOMrxGGEFq+UtnMh tD9rBnH/EzagglKBAcc7HpPFymWyVP91+LRlj5c6waTWLaK0Qo3wh6gIc9+yFAI9EC VlJhQfEmOYHjBt5LRHsLl1FFe54RPIwlCuQcPCWc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436944AbfJWWxl (ORCPT ); Wed, 23 Oct 2019 18:53:41 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:33596 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436935AbfJWWxj (ORCPT ); Wed, 23 Oct 2019 18:53:39 -0400 Received: by mail-qk1-f195.google.com with SMTP id 71so17639801qkl.0 for ; Wed, 23 Oct 2019 15:53:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=PYOKZ/1QajvnP3fueqt/XcahZsRMffUtiJV5/Gx2ziM=; b=mZ05iX3zYsc4XggjNeA9pvriaZt7fZe7UXV1fwyBKGce7qmhAOsrKI2TcpsmRnH5LV 1wQZt7O1r3KRjQgBup6GBpw3XAFZlTjXxFiGuve9s5hD5oQXP3TrYvMUTzD0CtuCElZ6 8hR8aCbA2t5fs7Dn1IhZPcFE8eG07hz9ebVVNaBaVngMUseaPG79cRsS7bX1J1zEd2yi 4W1Om1yxUN5VhwrS+uDuA/Ya0ulSSwwsbV+RrA8chTjvJ0SeKTi9/lAceEsEG1Khhzai SQVM604NNmILp+IxJuYf6WDhatIpW852bhYdsV6F9OJeWlFR1pHe1Z3FzYL0mEifuJsB F6pg== X-Gm-Message-State: APjAAAUoGc1419glSH5Ff21YLCPV1N8vAuW6a0agZZmIrFB8f+AuXArO M+Zq8dKrdNtgMjQF9urpVN8= X-Google-Smtp-Source: APXvYqzsGVi6PBnL4Y8oZsVIiibB8fXUZlp1Aim99CGXp95OBRXBNtRWwENkQcfrZlU3uniFOhlInw== X-Received: by 2002:a37:4793:: with SMTP id u141mr7480720qka.402.1571871216692; Wed, 23 Oct 2019 15:53:36 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:35 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 15/22] btrfs: limit max discard size for async discard Date: Wed, 23 Oct 2019 18:53:09 -0400 Message-Id: <633cde192d549559b675e1182bcf61c4f2dc58ca.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Throttle the maximum size of a discard so that we can provide an upper bound for the rate of async discard. While the block layer is able to split discards into the appropriate sized discards, we want to be able to account more accurately the rate at which we are consuming ncq slots as well as limit the upper bound of work for a discard. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/discard.h | 5 ++++ fs/btrfs/free-space-cache.c | 48 +++++++++++++++++++++++++++---------- 2 files changed, 41 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 2d933b44abd9..c1e745196b6c 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -5,10 +5,15 @@ #ifndef BTRFS_DISCARD_H #define BTRFS_DISCARD_H +#include + struct btrfs_fs_info; struct btrfs_discard_ctl; struct btrfs_block_group_cache; +/* Discard size limits. */ +#define BTRFS_ASYNC_DISCARD_MAX_SIZE (SZ_64M) + /* List operations. */ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 8d4ffd50aee6..186a4243fb7f 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3440,19 +3440,40 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, if (entry->offset >= end) goto out_unlock; - extent_start = entry->offset; - extent_bytes = entry->bytes; - extent_trim_state = entry->trim_state; - start = max(start, extent_start); - bytes = min(extent_start + extent_bytes, end) - start; - if (bytes < minlen) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - goto next; - } + if (async) { + start = extent_start = entry->offset; + bytes = extent_bytes = entry->bytes; + extent_trim_state = entry->trim_state; + if (bytes < minlen) { + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + goto next; + } + unlink_free_space(ctl, entry); + if (bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE) { + bytes = extent_bytes = + BTRFS_ASYNC_DISCARD_MAX_SIZE; + entry->offset += BTRFS_ASYNC_DISCARD_MAX_SIZE; + entry->bytes -= BTRFS_ASYNC_DISCARD_MAX_SIZE; + link_free_space(ctl, entry); + } else { + kmem_cache_free(btrfs_free_space_cachep, entry); + } + } else { + extent_start = entry->offset; + extent_bytes = entry->bytes; + extent_trim_state = entry->trim_state; + start = max(start, extent_start); + bytes = min(extent_start + extent_bytes, end) - start; + if (bytes < minlen) { + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + goto next; + } - unlink_free_space(ctl, entry); - kmem_cache_free(btrfs_free_space_cachep, entry); + unlink_free_space(ctl, entry); + kmem_cache_free(btrfs_free_space_cachep, entry); + } spin_unlock(&ctl->tree_lock); trim_entry.start = extent_start; @@ -3617,6 +3638,9 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, goto next; } + if (async && bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE) + bytes = BTRFS_ASYNC_DISCARD_MAX_SIZE; + bitmap_clear_bits(ctl, entry, start, bytes); if (entry->bytes == 0) free_bitmap(ctl, entry); From patchwork Wed Oct 23 22:53:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207983 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 93CE114ED for ; Wed, 23 Oct 2019 22:53:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 739322084C for ; Wed, 23 Oct 2019 22:53:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871221; bh=U3rPbeeLoyTJ273GceHLmaX7Af8cEmUTLdTNitF2vDk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=DPL1XQFp2v4C6Tey9oo2wxuDGAB/+fldj5/sd4rMzG90EHbvfi/0Fg9MnVUsRAibf Voiw2c51iCFA1IcL+mey8TIQxvdcLeARkqEesYMxAaOcGLs5JWnMClWLKjzEZB+DYg wWrHfZCmw3t2Z1VNlTSBcvxjK/PfCgMKN7hHEYGU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436942AbfJWWxk (ORCPT ); Wed, 23 Oct 2019 18:53:40 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:42601 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436934AbfJWWxj (ORCPT ); Wed, 23 Oct 2019 18:53:39 -0400 Received: by mail-qk1-f196.google.com with SMTP id m4so4564897qke.9 for ; Wed, 23 Oct 2019 15:53:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=3to4xmsgBoYlKXz82jXg7BfTjxcOMWxo25hbjE7yG4E=; b=rpAv9dEtXmMP+7L//5nDG0b0Ig5BbZVtwAv6HT0bMgfz+WIaigk4DM5OZCcHiPfUJ5 /CxErCNtgPD7JaPHiak5elW2dOnTVOuvWBB23U4Eu3hsOuhCr84ISpyJ+b+XhvVcmJwp VytECUseKoAiZsW+u2fcYCQQXrYnIHydhhDrLWfZ5zdjWYIuKPFK90ETnFAN+slS7EAa g+5nL/yOHs6gaHQ+6wYG9H163ThaawMamsSWd31sEp1KqETMsSSlUor6BZreYXAacnl6 IPfcgVGS1iLOQvzli1GpREi/Rq9ELA1YDKDPSwnr6cWougUIkq+sBlS6AElOcGwnTdx8 l1VA== X-Gm-Message-State: APjAAAXHiUPK3Xeo7GQoIURJo+k+Jxe0H+nTYiV7dxSD/2dBm1RxPhha RuLaZdD/8N+oPwRE3F0xraI= X-Google-Smtp-Source: APXvYqx/o5r2NfCDuAm4UdnjHlw+C9VxGWAXXyUvZ3beWhRHvkvXks2IQc9w5FVoVZaBQ+0IgeA2Ag== X-Received: by 2002:ae9:e513:: with SMTP id w19mr3798866qkf.308.1571871217764; Wed, 23 Oct 2019 15:53:37 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:37 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 16/22] btrfs: make max async discard size tunable Date: Wed, 23 Oct 2019 18:53:10 -0400 Message-Id: <6b33c5e684690d81dfc7161e0742eb24aa0e3bf1.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Expose max_discard_size as a tunable via sysfs. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 1 + fs/btrfs/discard.c | 1 + fs/btrfs/free-space-cache.c | 19 ++++++++++++------- fs/btrfs/sysfs.c | 31 +++++++++++++++++++++++++++++++ 4 files changed, 45 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index eccfc27e9b83..0a87970c7117 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -468,6 +468,7 @@ struct btrfs_discard_ctl { u64 prev_discard; atomic_t discardable_extents; atomic64_t discardable_bytes; + u64 max_discard_size; u32 delay; u32 iops_limit; u64 bps_limit; diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 70873cd884bf..f7799b659d39 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -521,6 +521,7 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) discard_ctl->prev_discard = 0; atomic_set(&discard_ctl->discardable_extents, 0); atomic64_set(&discard_ctl->discardable_bytes, 0); + discard_ctl->max_discard_size = BTRFS_ASYNC_DISCARD_MAX_SIZE; discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY; discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS; discard_ctl->bps_limit = 0; diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 186a4243fb7f..c62346b23ca0 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3406,6 +3406,8 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen, bool async) { + struct btrfs_discard_ctl *discard_ctl = + &block_group->fs_info->discard_ctl; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; struct btrfs_free_space *entry; struct rb_node *node; @@ -3414,6 +3416,7 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, u64 extent_bytes; enum btrfs_trim_state extent_trim_state; u64 bytes; + u64 max_discard_size = READ_ONCE(discard_ctl->max_discard_size); while (start < end) { struct btrfs_trim_range trim_entry; @@ -3450,11 +3453,10 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, goto next; } unlink_free_space(ctl, entry); - if (bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE) { - bytes = extent_bytes = - BTRFS_ASYNC_DISCARD_MAX_SIZE; - entry->offset += BTRFS_ASYNC_DISCARD_MAX_SIZE; - entry->bytes -= BTRFS_ASYNC_DISCARD_MAX_SIZE; + if (max_discard_size && bytes > max_discard_size) { + bytes = extent_bytes = max_discard_size; + entry->offset += max_discard_size; + entry->bytes -= max_discard_size; link_free_space(ctl, entry); } else { kmem_cache_free(btrfs_free_space_cachep, entry); @@ -3563,12 +3565,15 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen, bool async) { + struct btrfs_discard_ctl *discard_ctl = + &block_group->fs_info->discard_ctl; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; struct btrfs_free_space *entry; int ret = 0; int ret2; u64 bytes; u64 offset = offset_to_bitmap(ctl, start); + u64 max_discard_size = READ_ONCE(discard_ctl->max_discard_size); while (offset < end) { bool next_bitmap = false; @@ -3638,8 +3643,8 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, goto next; } - if (async && bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE) - bytes = BTRFS_ASYNC_DISCARD_MAX_SIZE; + if (async && max_discard_size && bytes > max_discard_size) + bytes = max_discard_size; bitmap_clear_bits(ctl, entry, start, bytes); if (entry->bytes == 0) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 070fa6223911..c441603b7da1 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -401,11 +401,42 @@ static ssize_t btrfs_discard_bps_limit_store(struct kobject *kobj, BTRFS_ATTR_RW(discard, bps_limit, btrfs_discard_bps_limit_show, btrfs_discard_bps_limit_store); +static ssize_t btrfs_discard_max_discard_size_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%llu\n", + READ_ONCE(fs_info->discard_ctl.max_discard_size)); +} + +static ssize_t btrfs_discard_max_discard_size_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl; + u64 max_discard_size; + int ret; + + ret = kstrtou64(buf, 10, &max_discard_size); + if (ret) + return -EINVAL; + + WRITE_ONCE(discard_ctl->max_discard_size, max_discard_size); + + return len; +} +BTRFS_ATTR_RW(discard, max_discard_size, btrfs_discard_max_discard_size_show, + btrfs_discard_max_discard_size_store); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discardable_extents), BTRFS_ATTR_PTR(discard, discardable_bytes), BTRFS_ATTR_PTR(discard, iops_limit), BTRFS_ATTR_PTR(discard, bps_limit), + BTRFS_ATTR_PTR(discard, max_discard_size), NULL, }; From patchwork Wed Oct 23 22:53:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207997 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D24BA1390 for ; Wed, 23 Oct 2019 22:53:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9D9A520663 for ; Wed, 23 Oct 2019 22:53:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871233; bh=ZnY6vBeFZb79Z0JUnkG/W2Q0gObzLgIFJDxyxQR5CdY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=DMoGJNK2gdKTLoBO0vPK+JUItSPi3Y7ewL2WHBVatLX0NNRR2qe2GZppTHsvTpRfr OVSFZn4IyUVkrKPXrovcJ2xd/FuwoEpTrq2Gy5dkXkZEB7LjlM8NFh/wwN9At2k7Qj 6Fn7jQ5T9JNK3l0JUeHsyLNTcMga8TPQJ4pNJfos= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436961AbfJWWxw (ORCPT ); Wed, 23 Oct 2019 18:53:52 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:42964 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436937AbfJWWxk (ORCPT ); Wed, 23 Oct 2019 18:53:40 -0400 Received: by mail-qt1-f195.google.com with SMTP id w14so34720833qto.9 for ; Wed, 23 Oct 2019 15:53:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=lX9FOONTj5msGOtE7JTjLUUHfskPijAQYi6iH1byrYQ=; b=YZIUnOGb9ZWDAy5mJlbjExv9ALMes1kBk0Y6r2W/ib3AlVhJSobrI6IMwYepG7SXCj iPzj6+mXQC9gQYiCtS1Dm3sk1GEXLZUcFc+EkMAn+VWlHjWXV3sfbf3iNtn5xuVvGiQy Av/B/vi/dH3nf81P4Rs4CchZIsh6cUZRG4m9P+y402tizFGplg1fv5m/MOvJygn8KPc8 NlNvxWUZg/RkqsOJS9C3oULcjf0CRI7yoZVZTPfvrGpG3mseCO54ycH3V9+kRJdw98B1 sdMnVDGN6BhGstZoM9BUSMnZPfi3zFUeDvG5oB2wOX+8JpVMWL+5M7/vxk5q+SJF/NkS FJKQ== X-Gm-Message-State: APjAAAW8KQhYpBHWVKFqGwOXRVBaEiEUBxbXm5q27xbgcnw76FIv40lW xgaYQw/UiIR9AU6RO1SeRM4= X-Google-Smtp-Source: APXvYqwZzxXkMkw4tfBUB6A3+HTUgZcBQGP0i7HTdDZnJ5XVCWkmnD3ehtz/B9g3fBzNtk7cnqD1tw== X-Received: by 2002:ac8:411b:: with SMTP id q27mr1159179qtl.53.1571871218959; Wed, 23 Oct 2019 15:53:38 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:38 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 17/22] btrfs: have multiple discard lists Date: Wed, 23 Oct 2019 18:53:11 -0400 Message-Id: <9affe695998972620340031ccc95caac909bfe4b.1571865774.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Non-block group destruction discarding currently only had a single list with no minimum discard length. This can lead to caravaning more meaningful discards behind a heavily fragmented block group. This adds support for multiple lists with minimum discard lengths to prevent the caravan effect. We promote block groups back up when we exceed the BTRFS_ASYNC_DISCARD_MAX_FILTER size, currently we support only 2 lists with filters of 1MB and 32KB respectively. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 2 +- fs/btrfs/discard.c | 98 ++++++++++++++++++++++++++++++++++--- fs/btrfs/discard.h | 4 ++ fs/btrfs/free-space-cache.c | 53 +++++++++++++++----- fs/btrfs/free-space-cache.h | 2 +- 5 files changed, 136 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0a87970c7117..57cfc0e11c53 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -455,7 +455,7 @@ struct btrfs_full_stripe_locks_tree { * afterwards represent monotonically decreasing discard filter sizes to * prioritize what should be discarded next. */ -#define BTRFS_NR_DISCARD_LISTS 2 +#define BTRFS_NR_DISCARD_LISTS 3 #define BTRFS_DISCARD_INDEX_UNUSED 0 #define BTRFS_DISCARD_INDEX_START 1 diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index f7799b659d39..592a5c7b9dc1 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -21,6 +21,10 @@ #define BTRFS_DISCARD_MAX_DELAY (10000UL) #define BTRFS_DISCARD_MAX_IOPS (10UL) +/* Montonically decreasing minimum length filters after index 0. */ +static int discard_minlen[BTRFS_NR_DISCARD_LISTS] = {0, + BTRFS_ASYNC_DISCARD_MAX_FILTER, BTRFS_ASYNC_DISCARD_MIN_FILTER}; + static struct list_head *btrfs_get_discard_list( struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) @@ -133,16 +137,18 @@ static struct btrfs_block_group_cache *find_next_cache( * peek_discard_list - wrap find_next_cache() * @discard_ctl: discard control * @discard_state: the discard_state of the block_group after state management + * @discard_index: the discard_index of the block_group after state management * * This wraps find_next_cache() and sets the cache to be in use. * discard_state's control flow is managed here. Variables related to * discard_state are reset here as needed (eg discard_cursor). @discard_state - * is remembered as it may change while we're discarding, but we want the - * discard to execute in the context determined here. + * and @discard_index are remembered as it may change while we're discarding, + * but we want the discard to execute in the context determined here. */ static struct btrfs_block_group_cache *peek_discard_list( struct btrfs_discard_ctl *discard_ctl, - enum btrfs_discard_state *discard_state) + enum btrfs_discard_state *discard_state, + int *discard_index) { struct btrfs_block_group_cache *cache; u64 now = ktime_get_ns(); @@ -164,6 +170,7 @@ static struct btrfs_block_group_cache *peek_discard_list( } discard_ctl->cache = cache; *discard_state = cache->discard_state; + *discard_index = cache->discard_index; } else { cache = NULL; } @@ -173,6 +180,63 @@ static struct btrfs_block_group_cache *peek_discard_list( return cache; } +/** + * btrfs_discard_check_filter - updates a block groups filters + * @cache: block group of interest + * @bytes: recently freed region size after coalescing + * + * Async discard maintains multiple lists with progressively smaller filters + * to prioritize discarding based on size. Should a free space that matches + * a larger filter be returned to the free_space_cache, prioritize that discard + * by moving @cache to the proper filter. + */ +void btrfs_discard_check_filter(struct btrfs_block_group_cache *cache, + u64 bytes) +{ + struct btrfs_discard_ctl *discard_ctl; + + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + return; + + discard_ctl = &cache->fs_info->discard_ctl; + + if (cache->discard_index > BTRFS_DISCARD_INDEX_START && + bytes >= discard_minlen[cache->discard_index - 1]) { + int i; + + remove_from_discard_list(discard_ctl, cache); + + for (i = BTRFS_DISCARD_INDEX_START; i < BTRFS_NR_DISCARD_LISTS; + i++) { + if (bytes >= discard_minlen[i]) { + cache->discard_index = i; + btrfs_add_to_discard_list(discard_ctl, cache); + break; + } + } + } +} + +/** + * btrfs_update_discard_index - moves a block_group along the discard lists + * @discard_ctl: discard control + * @cache: block_group of interest + * + * Increment @cache's discard_index. If it falls of the list, let it be. + * Otherwise add it back to the appropriate list. + */ +static void btrfs_update_discard_index(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + cache->discard_index++; + if (cache->discard_index == BTRFS_NR_DISCARD_LISTS) { + cache->discard_index = 1; + return; + } + + btrfs_add_to_discard_list(discard_ctl, cache); +} + /** * btrfs_discard_cancel_work - remove a block_group from the discard lists * @discard_ctl: discard control @@ -289,6 +353,8 @@ static void btrfs_finish_discard_pass(struct btrfs_discard_ctl *discard_ctl, btrfs_mark_bg_unused(cache); else btrfs_add_to_discard_unused_list(discard_ctl, cache); + } else { + btrfs_update_discard_index(discard_ctl, cache); } } @@ -305,25 +371,41 @@ static void btrfs_discard_workfn(struct work_struct *work) struct btrfs_discard_ctl *discard_ctl; struct btrfs_block_group_cache *cache; enum btrfs_discard_state discard_state; + int discard_index = 0; u64 trimmed = 0; + u64 minlen = 0; discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); - cache = peek_discard_list(discard_ctl, &discard_state); + cache = peek_discard_list(discard_ctl, &discard_state, &discard_index); if (!cache || !btrfs_run_discard_work(discard_ctl)) return; /* Perform discarding. */ - if (discard_state == BTRFS_DISCARD_BITMAPS) + minlen = discard_minlen[discard_index]; + + if (discard_state == BTRFS_DISCARD_BITMAPS) { + u64 maxlen = 0; + + /* + * Use the previous levels minimum discard length as the max + * length filter. In the case something is added to make a + * region go beyond the max filter, the entire bitmap is set + * back to BTRFS_TRIM_STATE_UNTRIMMED. + */ + if (discard_index != BTRFS_DISCARD_INDEX_UNUSED) + maxlen = discard_minlen[discard_index - 1]; + btrfs_trim_block_group_bitmaps(cache, &trimmed, cache->discard_cursor, btrfs_block_group_end(cache), - 0, true); - else + minlen, maxlen, true); + } else { btrfs_trim_block_group_extents(cache, &trimmed, cache->discard_cursor, btrfs_block_group_end(cache), - 0, true); + minlen, true); + } discard_ctl->prev_discard = trimmed; diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index c1e745196b6c..d24a6fd389f8 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -13,12 +13,16 @@ struct btrfs_block_group_cache; /* Discard size limits. */ #define BTRFS_ASYNC_DISCARD_MAX_SIZE (SZ_64M) +#define BTRFS_ASYNC_DISCARD_MAX_FILTER (SZ_1M) +#define BTRFS_ASYNC_DISCARD_MIN_FILTER (SZ_32K) /* List operations. */ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); void btrfs_add_to_discard_unused_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); +void btrfs_discard_check_filter(struct btrfs_block_group_cache *cache, + u64 bytes); /* Work operations. */ void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index c62346b23ca0..dc632afe5f61 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2463,6 +2463,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_block_group_cache *cache = ctl->private; struct btrfs_free_space *info; int ret = 0; + u64 filter_bytes = bytes; info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) @@ -2499,6 +2500,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, */ steal_from_bitmap(ctl, info, true); + filter_bytes = max(filter_bytes, info->bytes); + ret = link_free_space(ctl, info); if (ret) kmem_cache_free(btrfs_free_space_cachep, info); @@ -2511,8 +2514,10 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, ASSERT(ret != -EEXIST); } - if (trim_state != BTRFS_TRIM_STATE_TRIMMED) + if (trim_state != BTRFS_TRIM_STATE_TRIMMED) { + btrfs_discard_check_filter(cache, filter_bytes); btrfs_discard_queue_work(&fs_info->discard_ctl, cache); + } return ret; } @@ -3453,7 +3458,14 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, goto next; } unlink_free_space(ctl, entry); - if (max_discard_size && bytes > max_discard_size) { + /* + * Let bytes = BTRFS_MAX_DISCARD_SIZE + X. + * If X < BTRFS_ASYNC_DISCARD_MIN_FILTER, we won't trim + * X when we come back around. So trim it now. + */ + if (max_discard_size && + bytes >= (max_discard_size + + BTRFS_ASYNC_DISCARD_MIN_FILTER)) { bytes = extent_bytes = max_discard_size; entry->offset += max_discard_size; entry->bytes -= max_discard_size; @@ -3563,7 +3575,7 @@ static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, */ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen, - bool async) + u64 maxlen, bool async) { struct btrfs_discard_ctl *discard_ctl = &block_group->fs_info->discard_ctl; @@ -3591,7 +3603,15 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, } entry = tree_search_offset(ctl, offset, 1, 0); - if (!entry || (async && start == offset && + /* + * Bitmaps are marked trimmed lossily now to prevent constant + * discarding of the same bitmap (the reason why we are bound + * by the filters). So, retrim the block group bitmaps when we + * are preparing to punt to the unused_bgs list. This uses + * @minlen to determine if we are in BTRFS_DISCARD_INDEX_UNUSED + * which is the only discard index which sets minlen to 0. + */ + if (!entry || (async && minlen && start == offset && btrfs_free_space_trimmed(entry))) { spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); @@ -3612,10 +3632,10 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, ret2 = search_bitmap(ctl, entry, &start, &bytes, false); if (ret2 || start >= end) { /* - * This keeps the invariant that all bytes are trimmed - * if BTRFS_TRIM_STATE_TRIMMED is set on a bitmap. + * We lossily consider a bitmap trimmed if we only skip + * over regions <= BTRFS_ASYNC_DISCARD_MIN_FILTER. */ - if (ret2 && !minlen) + if (ret2 && minlen <= BTRFS_ASYNC_DISCARD_MIN_FILTER) end_trimming_bitmap(ctl, entry); else entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; @@ -3636,14 +3656,20 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, } bytes = min(bytes, end - start); - if (bytes < minlen) { - entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (bytes < minlen || (async && maxlen && bytes > maxlen)) { spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); goto next; } - if (async && max_discard_size && bytes > max_discard_size) + /* + * Let bytes = BTRFS_MAX_DISCARD_SIZE + X. + * If X < @minlen, we won't trim X when we come back around. + * So trim it now. We differ here from trimming extents as we + * don't keep individual state per bit. + */ + if (async && max_discard_size && + bytes > (max_discard_size + minlen)) bytes = max_discard_size; bitmap_clear_bits(ctl, entry, start, bytes); @@ -3749,7 +3775,7 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, if (ret) goto out; - ret = trim_bitmaps(block_group, trimmed, start, end, minlen, false); + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, 0, false); /* If we ended in the middle of a bitmap, reset the trimming flag. */ if (end % (BITS_PER_BITMAP * ctl->unit)) reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end)); @@ -3782,7 +3808,7 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group_cache *block_group, int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - bool async) + u64 maxlen, bool async) { int ret; @@ -3796,7 +3822,8 @@ int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, btrfs_get_block_group_trimming(block_group); spin_unlock(&block_group->lock); - ret = trim_bitmaps(block_group, trimmed, start, end, minlen, async); + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, maxlen, + async); btrfs_put_block_group_trimming(block_group); return ret; diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 1c0ec98da529..d3a7c9228409 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -146,7 +146,7 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group_cache *block_group, bool async); int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - bool async); + u64 maxlen, bool async); /* Support functions for running our sanity tests */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS From patchwork Wed Oct 23 22:53:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207991 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A41213B1 for ; Wed, 23 Oct 2019 22:53:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6967C20679 for ; Wed, 23 Oct 2019 22:53:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871228; bh=etGvJa8p7VX1dmouPEG/Zl5QmM7Pp+d7pb90UKzyXqI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=0Qi86guzYi9S9itqIeNxRFcbDtsz/vIkvXm10AZ2OKFiGKw1huhhKDnsXLzY3XRob TfTgzTivMPWYpZWVciTaT4+xdJ6CbucGl5m4XOqwRBzGAtNRuOq6fMPHiMk/HJCtwQ uPHknGJBipo0jd/IC0PBzIM9J77qdX6ydsifeWy8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436950AbfJWWxn (ORCPT ); Wed, 23 Oct 2019 18:53:43 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:41524 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436941AbfJWWxl (ORCPT ); Wed, 23 Oct 2019 18:53:41 -0400 Received: by mail-qt1-f196.google.com with SMTP id c17so31728196qtn.8 for ; Wed, 23 Oct 2019 15:53:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=eJZqH8JTIxnzaPVoCiD3/WZteI0vc7zvcPUHOAgd1tM=; b=A9uxdSgYVxaDu7xSVvv+2/1ukCY/8dpdwD+N3v6EwEnO041relk/ssXU9YysM3VOTt 5YOs+wLKacBl897dYj99jucoEUxFqiNesaqbiskrt83SoBzgicMyoxwIk5nj8xyiq8IN A1owNk9CTnfqQBVBNylT/vQRawAS79lXmGSRRCzpkcbgCQuhRxoPE13ZrzcEVgDXMhnr JVxsrtKwo+3XLvTop3j4ZNUo3OU+HjTcra0ul1DTVnYZcrsNUHZsrWkXej+NUExD+A8k 5vg3HkLV+wCBaVvs5qnS8J/aLREub2Wm0Mh5cpritsvbxj1EVH3UiKlqTN7SCCXe+w9F 1aJA== X-Gm-Message-State: APjAAAUvIwfBXiJtKnYsA6MNEbIWecsQBewkxRL4nO+6fuYKYXTl6haf nY4gnCxMSac/jnCgpebHIiw= X-Google-Smtp-Source: APXvYqyMQy+UjgDRN+mU7UN133ZhZgj9nB1TWp9xI2ZvorjAF1ZTqHfaU8mYDNB6R1AngfgWamHMxQ== X-Received: by 2002:ac8:b42:: with SMTP id m2mr1115373qti.174.1571871220235; Wed, 23 Oct 2019 15:53:40 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:39 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 18/22] btrfs: only keep track of data extents for async discard Date: Wed, 23 Oct 2019 18:53:12 -0400 Message-Id: <28b5064229e24388600f6f776621c6443c3e92b7.1571865775.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As mentioned earlier, discarding data can be done either by issuing an explicit discard or implicitly by reusing the LBA. Metadata chunks see much more frequent reuse due to well it being metadata. So instead of explicitly discarding metadata blocks, just leave them be and let the latter implicit discarding be done for them. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 6 ++++++ fs/btrfs/discard.c | 11 +++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 88266cc16c07..6a586b2968ac 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -181,6 +181,12 @@ static inline u64 btrfs_block_group_end(struct btrfs_block_group_cache *cache) return (cache->key.objectid + cache->key.offset); } +static inline bool btrfs_is_block_group_data( + struct btrfs_block_group_cache *cache) +{ + return (cache->flags & BTRFS_BLOCK_GROUP_DATA); +} + #ifdef CONFIG_BTRFS_DEBUG static inline int btrfs_should_fragment_free_space( struct btrfs_block_group_cache *block_group) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 592a5c7b9dc1..be5a4439ceb0 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -51,6 +51,9 @@ static void __btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) { + if (!btrfs_is_block_group_data(cache)) + return; + spin_lock(&discard_ctl->lock); __btrfs_add_to_discard_list(discard_ctl, cache); @@ -161,7 +164,10 @@ static struct btrfs_block_group_cache *peek_discard_list( if (cache && now > cache->discard_eligible_time) { if (cache->discard_index == BTRFS_DISCARD_INDEX_UNUSED && btrfs_block_group_used(&cache->item) != 0) { - __btrfs_add_to_discard_list(discard_ctl, cache); + if (btrfs_is_block_group_data(cache)) + __btrfs_add_to_discard_list(discard_ctl, cache); + else + list_del_init(&cache->discard_list); goto again; } if (cache->discard_state == BTRFS_DISCARD_RESET_CURSOR) { @@ -492,7 +498,8 @@ void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, s32 extents_delta; s64 bytes_delta; - if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC) || + !btrfs_is_block_group_data(cache)) return; discard_ctl = &cache->fs_info->discard_ctl; From patchwork Wed Oct 23 22:53:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207987 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFD5113B1 for ; Wed, 23 Oct 2019 22:53:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8E7EF20663 for ; Wed, 23 Oct 2019 22:53:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871225; bh=Bo05EwfjAV3x5gDoFkimQi2tO5j1Q4HL3yVX6cOJpwA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=HkmZfo7kin6YAdoNiYHtmohkexvXqavIEESjPRJU5lS6U+qQS7CPEud1xcf1FjyN7 bBiKym6zqVVdky/ojCbunBWW4CGqnfC8BQ0ueov3j45qA44FmfTJ+xhsCaBQfbQIol LqNMtKqhz6jRusAirzAU4rSPfcW+rDyCbOKytLLw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436951AbfJWWxn (ORCPT ); Wed, 23 Oct 2019 18:53:43 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:36047 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436935AbfJWWxm (ORCPT ); Wed, 23 Oct 2019 18:53:42 -0400 Received: by mail-qk1-f196.google.com with SMTP id y189so21522711qkc.3 for ; Wed, 23 Oct 2019 15:53:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=iarM5FdmIrUiYwUIUpdHLdZn+sCFvSu54d/lyzNxKXs=; b=arSzJx5dM3OYFNJdn3fVzPY877suAy6CJ6nj9UtYdwxcrhTCS9k5QfkOe9TjTsK8Ui 94c7ng8MjZ8GBtKBcyvtCYMTSIN4vmneLtnnpS/m0C+LdS2HdIKOlBtCLU22798WXd7B rooCq4B3y5u0fiiEtXc0KTJbUqL7ITN8EvwgxdFb/j5Uc1zkinLgYZvFRBzQ4mG4irQx xcW8XZ6Ly5AFGpsEHyt0KDEhozXP2WifF5RL0MspPcjlnCmc9THoT06e5CnkhDOO/dq6 7qrFICaSQAw5/MQJPipW3TDMa1zo3NJiJe2Q6gX0c4sMsr4ogpKgRuXTX9t/0Jmql6sj SQ6g== X-Gm-Message-State: APjAAAXK5/e25hyQKZn+Qf6iX0PHB644ZfXxj95rkr7U6qExBRRiERwj cRXCCqT2NY18b7B+0f1YjpC1jBmd X-Google-Smtp-Source: APXvYqzP42mEoRk8/Jv/Ki92cgJ7X5+atwbOkhRKD1063mrgJjDkk3fCmzFgUyAluznq1erwNbzRgA== X-Received: by 2002:a37:4ed5:: with SMTP id c204mr10878007qkb.41.1571871221295; Wed, 23 Oct 2019 15:53:41 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:40 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 19/22] btrfs: keep track of discard reuse stats Date: Wed, 23 Oct 2019 18:53:13 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Keep track of how much we are discarding and how often we are reusing with async discard. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/discard.c | 7 +++++++ fs/btrfs/free-space-cache.c | 14 ++++++++++++++ fs/btrfs/sysfs.c | 36 ++++++++++++++++++++++++++++++++++++ 4 files changed, 60 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 57cfc0e11c53..1bf016f8a3d8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -472,6 +472,9 @@ struct btrfs_discard_ctl { u32 delay; u32 iops_limit; u64 bps_limit; + u64 discard_extent_bytes; + u64 discard_bitmap_bytes; + atomic64_t discard_bytes_saved; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index be5a4439ceb0..f95e437d7629 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -406,11 +406,15 @@ static void btrfs_discard_workfn(struct work_struct *work) cache->discard_cursor, btrfs_block_group_end(cache), minlen, maxlen, true); + WRITE_ONCE(discard_ctl->discard_bitmap_bytes, + discard_ctl->discard_bitmap_bytes + trimmed); } else { btrfs_trim_block_group_extents(cache, &trimmed, cache->discard_cursor, btrfs_block_group_end(cache), minlen, true); + WRITE_ONCE(discard_ctl->discard_extent_bytes, + discard_ctl->discard_extent_bytes + trimmed); } discard_ctl->prev_discard = trimmed; @@ -614,6 +618,9 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY; discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS; discard_ctl->bps_limit = 0; + discard_ctl->discard_extent_bytes = 0; + discard_ctl->discard_bitmap_bytes = 0; + atomic64_set(&discard_ctl->discard_bytes_saved, 0); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index dc632afe5f61..29d3e21ba7fd 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2817,6 +2817,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 *max_extent_size) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + struct btrfs_discard_ctl *discard_ctl = + &block_group->fs_info->discard_ctl; struct btrfs_free_space *entry = NULL; u64 bytes_search = bytes + empty_size; u64 ret = 0; @@ -2833,6 +2835,10 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, ret = offset; if (entry->bitmap) { bitmap_clear_bits(ctl, entry, offset, bytes); + + if (!btrfs_free_space_trimmed(entry)) + atomic64_add(bytes, &discard_ctl->discard_bytes_saved); + if (!entry->bytes) free_bitmap(ctl, entry); } else { @@ -2841,6 +2847,9 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, align_gap = entry->offset; align_gap_trim_state = entry->trim_state; + if (!btrfs_free_space_trimmed(entry)) + atomic64_add(bytes, &discard_ctl->discard_bytes_saved); + entry->offset = offset + bytes; WARN_ON(entry->bytes < bytes + align_gap_len); @@ -2945,6 +2954,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, u64 min_start, u64 *max_extent_size) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + struct btrfs_discard_ctl *discard_ctl = + &block_group->fs_info->discard_ctl; struct btrfs_free_space *entry = NULL; struct rb_node *node; u64 ret = 0; @@ -3009,6 +3020,9 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, spin_lock(&ctl->tree_lock); + if (!btrfs_free_space_trimmed(entry)) + atomic64_add(bytes, &discard_ctl->discard_bytes_saved); + ctl->free_space -= bytes; if (!entry->bitmap && !btrfs_free_space_trimmed(entry)) ctl->discardable_bytes[BTRFS_STAT_CURR] -= bytes; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index c441603b7da1..a00e260ead53 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -431,12 +431,48 @@ static ssize_t btrfs_discard_max_discard_size_store(struct kobject *kobj, BTRFS_ATTR_RW(discard, max_discard_size, btrfs_discard_max_discard_size_show, btrfs_discard_max_discard_size_store); +static ssize_t btrfs_discard_extent_bytes_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + READ_ONCE(fs_info->discard_ctl.discard_extent_bytes)); +} +BTRFS_ATTR(discard, discard_extent_bytes, btrfs_discard_extent_bytes_show); + +static ssize_t btrfs_discard_bitmap_bytes_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + READ_ONCE(fs_info->discard_ctl.discard_bitmap_bytes)); +} +BTRFS_ATTR(discard, discard_bitmap_bytes, btrfs_discard_bitmap_bytes_show); + +static ssize_t btrfs_discard_bytes_saved_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + atomic64_read(&fs_info->discard_ctl.discard_bytes_saved)); +} +BTRFS_ATTR(discard, discard_bytes_saved, btrfs_discard_bytes_saved_show); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discardable_extents), BTRFS_ATTR_PTR(discard, discardable_bytes), BTRFS_ATTR_PTR(discard, iops_limit), BTRFS_ATTR_PTR(discard, bps_limit), BTRFS_ATTR_PTR(discard, max_discard_size), + BTRFS_ATTR_PTR(discard, discard_extent_bytes), + BTRFS_ATTR_PTR(discard, discard_bitmap_bytes), + BTRFS_ATTR_PTR(discard, discard_bytes_saved), NULL, }; From patchwork Wed Oct 23 22:53:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207989 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B5291390 for ; Wed, 23 Oct 2019 22:53:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4ACFD20679 for ; Wed, 23 Oct 2019 22:53:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871227; bh=Ib4qeT0AjsxYqjZI8Z4K0MZSeCcFtV8xMFSWJiXgN+Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=p1IKoIOcZMq5VPj322+MW6guXIRsfs/A0EEEP3BtrSNWo2JCAbV3mZzYMmhM4Vzp+ LsXIzU+9zqkyfnyLTr+AX4T/tH/OZ41BNSQFm7Nd9D0cAiDK29HdU29riCpQEbF506 aRaIOSK4qxu2eTa9k5jxygfARWm/D0dHFIoRYMKQ= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436956AbfJWWxp (ORCPT ); Wed, 23 Oct 2019 18:53:45 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:44350 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436948AbfJWWxn (ORCPT ); Wed, 23 Oct 2019 18:53:43 -0400 Received: by mail-qt1-f194.google.com with SMTP id z22so14227033qtq.11 for ; Wed, 23 Oct 2019 15:53:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=M9rErESMg+rw3MqywVWx5gYmWPPXOLSqDbavwxyakQQ=; b=nmn7pw2iaWQlTx6txXB69KXQ52vwMA2hREST5JAbdSvsSri/LzfKqxCwaErcceYjZ6 adNvZnMFHshixTnG4B5HwBIIgyNPm7DUuanyQcWapBbFvQwUi94txDW7dtdEDjnSMrPi t5SXUrwAamqNirRYZueJ65BJwLyIUWcK6PCas3jpCA/gkXnCFf4HzDHnUfCwoAB94Jwe v3XCapBnucWgM+1EofRPis7/CDpSg35pjFe2Zc/gH7EpxAwnriOzBuFOs9/RVaUZDO2L XjZiN+S5NSUjOYKWzRvGl4zFYLpadexVGWh7Q0Phu8XL0/GAXmqKAqWjVSCKehxNzUQb 7+MA== X-Gm-Message-State: APjAAAVOab030Q9yN9Q2ujlA0X1myeqAlkqfU7vK0uHhqpjmW5kpoPMS Y4uIqNk1iBFLxczayjtTnmk= X-Google-Smtp-Source: APXvYqzdv8O8NLbkHIBOq0ldNsvZOg4I7WR75O5H2z+aQcoX163TbDRhHy/l88BnbEPBKD4cHp74sg== X-Received: by 2002:aed:3baf:: with SMTP id r44mr1117197qte.30.1571871222528; Wed, 23 Oct 2019 15:53:42 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:41 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 20/22] btrfs: add async discard header Date: Wed, 23 Oct 2019 18:53:14 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Give a brief overview for how async discard is implemented. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/discard.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index f95e437d7629..2ff284a8a760 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -1,5 +1,39 @@ /* * Copyright (C) 2019 Facebook. All rights reserved. + * + * This contains the logic to handle async discard. + * + * Async discard manages trimming of free space outside of transaction commit. + * Discarding is done by managing the block_groups on a LRU list based on free + * space recency. Two passes are used to first prioritize discarding extents + * and then allow for trimming in the bitmap the best opportunity to coalesce. + * The block_groups are maintained on multiple lists to allow for multiple + * passes with different discard filter requirements. A delayed work item is + * used to manage discarding with timeout determined by a max of the delay + * incurred by the iops rate limit, byte rate limit, and the timeout of max + * delay of BTRFS_DISCARD_MAX_DELAY. + * + * The first list is special to manage discarding of fully free block groups. + * This is necessary because we issue a final trim for a full free block group + * after forgetting it. When a block group becomes unused, instead of directly + * being added to the unused_bgs list, we add it to this first list. Then + * from there, if it becomes fully discarded, we place it onto the unused_bgs + * list. + * + * The in-memory free space cache serves as the backing state for discard. + * Consequently this means there is no persistence. We opt to load all the + * block groups in as not discarded, so the mount case degenerates to the + * crashing case. + * + * As the free space cache uses bitmaps, there exists a tradeoff between + * ease/efficiency for find_free_extent() and the accuracy of discard state. + * Here we opt to let untrimmed regions merge with everything while only letting + * trimmed regions merge with other trimmed regions. This can cause + * overtrimming, but the coalescing benefit seems to be worth it. Additionally, + * bitmap state is tracked as a whole. If we're able to fully trim a bitmap, + * the trimmed flag is set on the bitmap. Otherwise, if an allocation comes in, + * this resets the state and we will retry trimming the whole bitmap. This is a + * tradeoff between discard state accuracy and the cost of accounting. */ #include From patchwork Wed Oct 23 22:53:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207993 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DF3813B1 for ; Wed, 23 Oct 2019 22:53:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F0CD720663 for ; Wed, 23 Oct 2019 22:53:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871230; bh=ibSbO+svmWh2fwKFASigztOG7v3rrRq9aeaHqgFXQ/o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=On4mAht7/8PuolZ3fz4dwcYJk2DRFSE6bfmvbI5xngMC59rtEAqtWSaW/DPB6au0G U9mJuTF1Yy8dwT871s7JZ3zsuQ9pOCaJj1gO1JF2nRzx6AT/ccpSjmkd6OmajUPods srZVyYvqKCTDy1f5XgJ23oBj5NaWrSX90PHtvL3o= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436957AbfJWWxr (ORCPT ); Wed, 23 Oct 2019 18:53:47 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:42607 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436953AbfJWWxo (ORCPT ); Wed, 23 Oct 2019 18:53:44 -0400 Received: by mail-qk1-f193.google.com with SMTP id m4so4565200qke.9 for ; Wed, 23 Oct 2019 15:53:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=cf3syB5TxnowiiB5Ffw9wa8pxofRulUCkNKiRoMRhMU=; b=n+dUKPG3PCa5Vv9Mwke8sCTl5lw+Rgbgj14ViuGi7rNU8DIUrVNhjFZrmjcqEjKlxl lB2tOKhpdUoibtMt9pS1miVv1df/zrDjRA3PWgADraTpQKc8PSwDi36spFVBs3pcjrOV 5HoB0QOSHgjdw1NhfzIhXB+aHRA0RUC6B7t+batkhbaqPubW30F4FjJwMnGpmUjuO6Bn 5IYeY3oGbhett/rEwOFZWMNLKQQ81u3r4kI5DT/32sF3KCxLxoSUa/10YoDk49M50kig c6F/3tGJBukFkLCvvqNsIm6QzhkhZoJg+OFZwZK+pmawYPQXbYGlXoGRxlAk1RbAsJNA 7/iw== X-Gm-Message-State: APjAAAXWoYDl1ckSwMOHOWKZahI3gbJG3E32yW3JtCblvZQhwFKAOGN4 5glDgHMQDaWAiMByn8lGVz8= X-Google-Smtp-Source: APXvYqzvptG9rpifZjVN12C9mv7V63MPU5z1+v+5xUU/qTfojBzSAwMZDNrD286ujAu7di11MgO+CA== X-Received: by 2002:a37:4fd1:: with SMTP id d200mr11119292qkb.413.1571871223657; Wed, 23 Oct 2019 15:53:43 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:42 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 21/22] btrfs: increase the metadata allowance for the free_space_cache Date: Wed, 23 Oct 2019 18:53:15 -0400 Message-Id: <0ee7de7dbc9dc043903e8da7c8d09df74ce03e09.1571865775.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently, there is no way for the free space cache to recover from being serviced by purely bitmaps because the extent threshold is set to 0 in recalculate_thresholds() when we surpass the metadata allowance. This adds a recovery mechanism by keeping large extents out of the bitmaps and increases the metadata upper bound to 64KB. The recovery mechanism bypasses this upper bound, thus making it a soft upper bound. But, with the bypass being 1MB or greater, it shouldn't add unbounded overhead. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 29d3e21ba7fd..4a769003414c 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -24,7 +24,8 @@ #include "discard.h" #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) -#define MAX_CACHE_BYTES_PER_GIG SZ_32K +#define MAX_CACHE_BYTES_PER_GIG SZ_64K +#define FORCE_EXTENT_THRESHOLD SZ_1M struct btrfs_trim_range { u64 start; @@ -1691,26 +1692,17 @@ static void recalculate_thresholds(struct btrfs_free_space_ctl *ctl) ASSERT(ctl->total_bitmaps <= max_bitmaps); /* - * The goal is to keep the total amount of memory used per 1gb of space - * at or below 32k, so we need to adjust how much memory we allow to be - * used by extent based free space tracking + * We are trying to keep the total amount of memory used per 1gb of + * space to be MAX_CACHE_BYTES_PER_GIG. However, with a reclamation + * mechanism of pulling extents >= FORCE_EXTENT_THRESHOLD out of + * bitmaps, we may end up using more memory than this. */ if (size < SZ_1G) max_bytes = MAX_CACHE_BYTES_PER_GIG; else max_bytes = MAX_CACHE_BYTES_PER_GIG * div_u64(size, SZ_1G); - /* - * we want to account for 1 more bitmap than what we have so we can make - * sure we don't go over our overall goal of MAX_CACHE_BYTES_PER_GIG as - * we add more bitmaps. - */ - bitmap_bytes = (ctl->total_bitmaps + 1) * ctl->unit; - - if (bitmap_bytes >= max_bytes) { - ctl->extents_thresh = 0; - return; - } + bitmap_bytes = ctl->total_bitmaps * ctl->unit; /* * we want the extent entry threshold to always be at most 1/2 the max @@ -2096,6 +2088,10 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl, forced = true; #endif + /* This is a way to reclaim large regions from the bitmaps. */ + if (!forced && info->bytes >= FORCE_EXTENT_THRESHOLD) + return false; + /* * If we are below the extents threshold then we can add this as an * extent, and don't have to deal with the bitmap From patchwork Wed Oct 23 22:53:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11207995 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B06721390 for ; Wed, 23 Oct 2019 22:53:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 84E8220663 for ; Wed, 23 Oct 2019 22:53:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571871231; bh=3lmGO28fmMb7jKPSDKtwrupwbX0i0o41QIcI16F92NA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=h7LKudok2f7c06GfXznW/jKLjb6HawzvSsWNb3/m86fgiM4H5XdCzXEWf3tZqPI0R rGuqqImElAzYZvX+plX+it0+QcH4Xj0ZzmiZEH8MB3aDgdHC1+mWXyU6Rj37nx0qac mRy4CXIKrbWgpr29xvl5JV3uqoUrUFDWSnkPwtK4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436959AbfJWWxt (ORCPT ); Wed, 23 Oct 2019 18:53:49 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:44124 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436955AbfJWWxp (ORCPT ); Wed, 23 Oct 2019 18:53:45 -0400 Received: by mail-qk1-f196.google.com with SMTP id u22so21478218qkk.11 for ; Wed, 23 Oct 2019 15:53:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=XO4Ox7XwanpnSyySncP2wM4YQIRvjcojjj71WaQ79eY=; b=eMRYehiwYEpp6oyoThxq2h85dmPTCTYC9ueCsorNt5Ce+RngTGcZm1GdQLcieZiciV 5CQC5uaARnVfi8S4lWCwixhtYqGaLHLYrQRLZINNeSbhLPYfWzvcIw3Ky4fQOOjUd2gJ tFteHbWLwRi3wLq84Go/PmMsn2YZKOIPL0bfRDxr7zNNXOEXxwilmEL9LWOlwATKqQuN O1715CgfMlY6BasHnhCGSpEwXSDPVOQWhZKnImlUUoJrFQYtS/Y5Ghv3wAQVZoZJ13hH UcV56wdizo9uQ9fXQMOlHc8OoHqhS7tjgbU1LQjWsXbutPBrCDYeBuTv96YIacyG/9za kNpg== X-Gm-Message-State: APjAAAUDjESGllzXWWRwK5itDgusS/3tc6J3OD72XoasAc7R3SUQ0Nye uQG8j5tzvEHod7Fp6eE4tdr+XbH0 X-Google-Smtp-Source: APXvYqyGaIsb7JhjYnq/5q/oqBZgqNK3bWm0Q4Kze5UdUrxZti49LktidxKbRMZCcqm4i15CIUsSdw== X-Received: by 2002:ae9:f108:: with SMTP id k8mr10148090qkg.78.1571871224781; Wed, 23 Oct 2019 15:53:44 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id j4sm11767542qkf.116.2019.10.23.15.53.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 23 Oct 2019 15:53:44 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 22/22] btrfs: make smaller extents more likely to go into bitmaps Date: Wed, 23 Oct 2019 18:53:16 -0400 Message-Id: <8f693ad39ca51e6fa80b78647c2df6f769593da8.1571865775.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org It's less than ideal for small extents to eat into our extent budget, so force extents <= 32KB into the bitmaps save for the first handful. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 4a769003414c..940e40c1712d 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2104,8 +2104,8 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl, * of cache left then go ahead an dadd them, no sense in adding * the overhead of a bitmap if we don't have to. */ - if (info->bytes <= fs_info->sectorsize * 4) { - if (ctl->free_extents * 2 <= ctl->extents_thresh) + if (info->bytes <= fs_info->sectorsize * 8) { + if (ctl->free_extents * 3 <= ctl->extents_thresh) return false; } else { return false;