From patchwork Mon Oct 7 20:17:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178445 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 80E75139A for ; Mon, 7 Oct 2019 20:17:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 54DFD206C0 for ; Mon, 7 Oct 2019 20:17:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479478; bh=9XwqdjHbRITBADbrmQ1Ae2sQPG1z7W9hgpABUks+54c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=0c5F+Dx3hDF5ULnh7+mU9cnBWk0ynTwFQoAsQSfDlTORQ5zxgJZX8LonOTJG7oy1J BIOTx6ewe0uPpWB51/djYTdIF6dOP7uaiWdZ2minXtGUx1B8hwn7DrV4BGgYMEiroJ hxDhh4kQxAd33GiiWKMXBAG7yrZ7MHkWVnJ+Deds= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728968AbfJGUR5 (ORCPT ); Mon, 7 Oct 2019 16:17:57 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:40888 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728371AbfJGUR5 (ORCPT ); Mon, 7 Oct 2019 16:17:57 -0400 Received: by mail-qk1-f194.google.com with SMTP id y144so13902987qkb.7 for ; Mon, 07 Oct 2019 13:17:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=YfoTHrZWTri2FfSYRFZ+GMDo6y/aJX20K7JEIIMnDMg=; b=Lv7bet/ADcMyXx4ArbIfw1MUY+usFbNArA8on77RygbKu4FYM3tg4w32Ba5DAZs9la EeOCnlFKOKy19dIYvs2vZoMpCLXoV4y/n73SbuRPshIxxEDhxM3OkuTENBJXfW5gw624 Bn+9LjseccmNYOlUNW+OZAyYSj7WIewjjyYNMZta6a6O3bzWgFk/nhIOAKgznKxn7WBe h/Wz3mSNpz4+yjbYKhiTCYJl+OQubMsuvo4kd5qqKNbENS3oe11I3ATvjZXMOVsbPCBj xpHNsv+bFhzZ0akREux2AWUyGmd+f3gECiicEEW3R597P1ZIq/OfOWdn1/REE2u6riAn 9k+w== X-Gm-Message-State: APjAAAU6iLu/wJ3zE56CMBIiGlvqs8Z+e9S1ZSD+IRnK7dDakKOCtOrt vTGLyplHCa5cnMEuscRS9wE= X-Google-Smtp-Source: APXvYqwiWZ4usOeVpSaYEhf6+BLNHizApCIRrJA+1gfnAueFyIpG2EDrKpm0/chGkm2t+Y/4ZgX1gQ== X-Received: by 2002:a37:b702:: with SMTP id h2mr25986685qkf.166.1570479474296; Mon, 07 Oct 2019 13:17:54 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.17.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:17:53 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 01/19] bitmap: genericize percpu bitmap region iterators Date: Mon, 7 Oct 2019 16:17:32 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Bitmaps are fairly popular for their space efficiency, but we don't have generic iterators available. Make percpu's bitmap region iterators available to everyone. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- include/linux/bitmap.h | 35 ++++++++++++++++++++++++ mm/percpu.c | 61 +++++++++++------------------------------- 2 files changed, 51 insertions(+), 45 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 90528f12bdfa..9b0664f36808 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -437,6 +437,41 @@ static inline int bitmap_parse(const char *buf, unsigned int buflen, return __bitmap_parse(buf, buflen, 0, maskp, nmaskbits); } +static inline void bitmap_next_clear_region(unsigned long *bitmap, + unsigned int *rs, unsigned int *re, + unsigned int end) +{ + *rs = find_next_zero_bit(bitmap, end, *rs); + *re = find_next_bit(bitmap, end, *rs + 1); +} + +static inline void bitmap_next_set_region(unsigned long *bitmap, + unsigned int *rs, unsigned int *re, + unsigned int end) +{ + *rs = find_next_bit(bitmap, end, *rs); + *re = find_next_zero_bit(bitmap, end, *rs + 1); +} + +/* + * Bitmap region iterators. Iterates over the bitmap between [@start, @end). + * @rs and @re should be integer variables and will be set to start and end + * index of the current clear or set region. + */ +#define bitmap_for_each_clear_region(bitmap, rs, re, start, end) \ + for ((rs) = (start), \ + bitmap_next_clear_region((bitmap), &(rs), &(re), (end)); \ + (rs) < (re); \ + (rs) = (re) + 1, \ + bitmap_next_clear_region((bitmap), &(rs), &(re), (end))) + +#define bitmap_for_each_set_region(bitmap, rs, re, start, end) \ + for ((rs) = (start), \ + bitmap_next_set_region((bitmap), &(rs), &(re), (end)); \ + (rs) < (re); \ + (rs) = (re) + 1, \ + bitmap_next_set_region((bitmap), &(rs), &(re), (end))) + /** * BITMAP_FROM_U64() - Represent u64 value in the format suitable for bitmap. * @n: u64 value diff --git a/mm/percpu.c b/mm/percpu.c index 7e06a1e58720..e9844086b236 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -270,33 +270,6 @@ static unsigned long pcpu_chunk_addr(struct pcpu_chunk *chunk, pcpu_unit_page_offset(cpu, page_idx); } -static void pcpu_next_unpop(unsigned long *bitmap, int *rs, int *re, int end) -{ - *rs = find_next_zero_bit(bitmap, end, *rs); - *re = find_next_bit(bitmap, end, *rs + 1); -} - -static void pcpu_next_pop(unsigned long *bitmap, int *rs, int *re, int end) -{ - *rs = find_next_bit(bitmap, end, *rs); - *re = find_next_zero_bit(bitmap, end, *rs + 1); -} - -/* - * Bitmap region iterators. Iterates over the bitmap between - * [@start, @end) in @chunk. @rs and @re should be integer variables - * and will be set to start and end index of the current free region. - */ -#define pcpu_for_each_unpop_region(bitmap, rs, re, start, end) \ - for ((rs) = (start), pcpu_next_unpop((bitmap), &(rs), &(re), (end)); \ - (rs) < (re); \ - (rs) = (re) + 1, pcpu_next_unpop((bitmap), &(rs), &(re), (end))) - -#define pcpu_for_each_pop_region(bitmap, rs, re, start, end) \ - for ((rs) = (start), pcpu_next_pop((bitmap), &(rs), &(re), (end)); \ - (rs) < (re); \ - (rs) = (re) + 1, pcpu_next_pop((bitmap), &(rs), &(re), (end))) - /* * The following are helper functions to help access bitmaps and convert * between bitmap offsets to address offsets. @@ -732,9 +705,8 @@ static void pcpu_chunk_refresh_hint(struct pcpu_chunk *chunk, bool full_scan) } bits = 0; - pcpu_for_each_md_free_region(chunk, bit_off, bits) { + pcpu_for_each_md_free_region(chunk, bit_off, bits) pcpu_block_update(chunk_md, bit_off, bit_off + bits); - } } /** @@ -749,7 +721,7 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) { struct pcpu_block_md *block = chunk->md_blocks + index; unsigned long *alloc_map = pcpu_index_alloc_map(chunk, index); - int rs, re, start; /* region start, region end */ + unsigned int rs, re, start; /* region start, region end */ /* promote scan_hint to contig_hint */ if (block->scan_hint) { @@ -765,10 +737,9 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) block->right_free = 0; /* iterate over free areas and update the contig hints */ - pcpu_for_each_unpop_region(alloc_map, rs, re, start, - PCPU_BITMAP_BLOCK_BITS) { + bitmap_for_each_clear_region(alloc_map, rs, re, start, + PCPU_BITMAP_BLOCK_BITS) pcpu_block_update(block, rs, re); - } } /** @@ -1041,13 +1012,13 @@ static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off, static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits, int *next_off) { - int page_start, page_end, rs, re; + unsigned int page_start, page_end, rs, re; page_start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE); page_end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE); rs = page_start; - pcpu_next_unpop(chunk->populated, &rs, &re, page_end); + bitmap_next_clear_region(chunk->populated, &rs, &re, page_end); if (rs >= page_end) return true; @@ -1702,13 +1673,13 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, /* populate if not all pages are already there */ if (!is_atomic) { - int page_start, page_end, rs, re; + unsigned int page_start, page_end, rs, re; page_start = PFN_DOWN(off); page_end = PFN_UP(off + size); - pcpu_for_each_unpop_region(chunk->populated, rs, re, - page_start, page_end) { + bitmap_for_each_clear_region(chunk->populated, rs, re, + page_start, page_end) { WARN_ON(chunk->immutable); ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp); @@ -1858,10 +1829,10 @@ static void pcpu_balance_workfn(struct work_struct *work) spin_unlock_irq(&pcpu_lock); list_for_each_entry_safe(chunk, next, &to_free, list) { - int rs, re; + unsigned int rs, re; - pcpu_for_each_pop_region(chunk->populated, rs, re, 0, - chunk->nr_pages) { + bitmap_for_each_set_region(chunk->populated, rs, re, 0, + chunk->nr_pages) { pcpu_depopulate_chunk(chunk, rs, re); spin_lock_irq(&pcpu_lock); pcpu_chunk_depopulated(chunk, rs, re); @@ -1893,7 +1864,7 @@ static void pcpu_balance_workfn(struct work_struct *work) } for (slot = pcpu_size_to_slot(PAGE_SIZE); slot < pcpu_nr_slots; slot++) { - int nr_unpop = 0, rs, re; + unsigned int nr_unpop = 0, rs, re; if (!nr_to_pop) break; @@ -1910,9 +1881,9 @@ static void pcpu_balance_workfn(struct work_struct *work) continue; /* @chunk can't go away while pcpu_alloc_mutex is held */ - pcpu_for_each_unpop_region(chunk->populated, rs, re, 0, - chunk->nr_pages) { - int nr = min(re - rs, nr_to_pop); + bitmap_for_each_clear_region(chunk->populated, rs, re, 0, + chunk->nr_pages) { + int nr = min_t(int, re - rs, nr_to_pop); ret = pcpu_populate_chunk(chunk, rs, rs + nr, gfp); if (!ret) { From patchwork Mon Oct 7 20:17:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178443 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4872713BD for ; Mon, 7 Oct 2019 20:17:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 272B4206C0 for ; Mon, 7 Oct 2019 20:17:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479477; bh=9sDhDRbYTrW/8mQk7o0nXFzlk3QY5b8v7gAmCVTP+q0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=yybVWyg+lLo3zLPD86/izgOF9YLG/czZYx8LxigoB37N9U+BIvsedteEesXNoAMZM HRHW22cSy7d+S9lfxnwtnCMCKPbhTMGsBBAimqYXUeKUl/tH+Z77PvGrUhrkO/4pHU upxtyAPrSe56hVh5yv+PmXbovhKhKGxkGqW9zI3k= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728792AbfJGUR4 (ORCPT ); Mon, 7 Oct 2019 16:17:56 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:41014 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728187AbfJGUR4 (ORCPT ); Mon, 7 Oct 2019 16:17:56 -0400 Received: by mail-qk1-f195.google.com with SMTP id p10so13890234qkg.8 for ; Mon, 07 Oct 2019 13:17:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=mX/ZOP+xwUvm9kyq1kt0/WmL3ZO5jZwqkXSGicUVRV8=; b=cBmwJ2z/fmLfVCD7/WHymKefAjZdUdi0kG9suI1DkRKIKN5JU6523lAT+ezlZcDMP/ tj/IVWsBimepkLejQ4vNSrdV03+xB0O26vMX4atZJdNCXiXlglTR78+3V7rvmZp+37ep qEkpt4GryRZIEqMfAlxdZ5iL6l/OshhoiMM2gAj9CrDogbitNfw2c1qQcBdSVQIYVc9M 19RDeP54jc2WF9XU/Mmvu6OCIq9zeO3jrsNPWSamfUprApkupR1xAC3a18UtR4Ps6nn9 AZyjWPAd9EUMgSpAE4I/UMuVE5z2e9rXVLAVrvXC4jKkSq8KdsXATfECFbOoAjwlFTRv muiw== X-Gm-Message-State: APjAAAUwiTJmVZzRetfThgrHDTG8jjBhNgs4ow8qUSpPlh1OxeUmgmmh DQSZej10Tzkx+FhnAlyvo9I= X-Google-Smtp-Source: APXvYqzuLSnTsLJ3M9qPPPG7j0k3xqDxttSZjK2K5C31YnH2SGjjb8wsQrBYA0rE8fA+P7/8BF5BKg== X-Received: by 2002:a37:67c6:: with SMTP id b189mr24773475qkc.472.1570479475482; Mon, 07 Oct 2019 13:17:55 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.17.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:17:54 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 02/19] btrfs: rename DISCARD opt to DISCARD_SYNC Date: Mon, 7 Oct 2019 16:17:33 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This series introduces async discard which will use the flag DISCARD_ASYNC, so rename the original flag to DISCARD_SYNC as it is synchronously done in transaction commit. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik Reviewed-by: Johannes Thumshirn --- fs/btrfs/block-group.c | 2 +- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent-tree.c | 4 ++-- fs/btrfs/super.c | 8 ++++---- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index bf7e3f23bba7..afe86028246a 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1365,7 +1365,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_unlock(&space_info->lock); /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD); + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 19d669d12ca1..1877586576aa 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1171,7 +1171,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info) #define BTRFS_MOUNT_FLUSHONCOMMIT (1 << 7) #define BTRFS_MOUNT_SSD_SPREAD (1 << 8) #define BTRFS_MOUNT_NOSSD (1 << 9) -#define BTRFS_MOUNT_DISCARD (1 << 10) +#define BTRFS_MOUNT_DISCARD_SYNC (1 << 10) #define BTRFS_MOUNT_FORCE_COMPRESS (1 << 11) #define BTRFS_MOUNT_SPACE_CACHE (1 << 12) #define BTRFS_MOUNT_CLEAR_CACHE (1 << 13) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 49cb26fa7c63..77a5904756c5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2903,7 +2903,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) break; } - if (btrfs_test_opt(fs_info, DISCARD)) + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) ret = btrfs_discard_extent(fs_info, start, end + 1 - start, NULL); @@ -4146,7 +4146,7 @@ static int __btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info, if (pin) pin_down_extent(cache, start, len, 1); else { - if (btrfs_test_opt(fs_info, DISCARD)) + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) ret = btrfs_discard_extent(fs_info, start, len, NULL); btrfs_add_free_space(cache, start, len); btrfs_free_reserved_bytes(cache, len, delalloc); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 1b151af25772..a02fece949cb 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -695,11 +695,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, info->metadata_ratio); break; case Opt_discard: - btrfs_set_and_info(info, DISCARD, - "turning on discard"); + btrfs_set_and_info(info, DISCARD_SYNC, + "turning on sync discard"); break; case Opt_nodiscard: - btrfs_clear_and_info(info, DISCARD, + btrfs_clear_and_info(info, DISCARD_SYNC, "turning off discard"); break; case Opt_space_cache: @@ -1322,7 +1322,7 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",nologreplay"); if (btrfs_test_opt(info, FLUSHONCOMMIT)) seq_puts(seq, ",flushoncommit"); - if (btrfs_test_opt(info, DISCARD)) + if (btrfs_test_opt(info, DISCARD_SYNC)) seq_puts(seq, ",discard"); if (!(info->sb->s_flags & SB_POSIXACL)) seq_puts(seq, ",noacl"); From patchwork Mon Oct 7 20:17:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178447 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A0C4139A for ; Mon, 7 Oct 2019 20:17:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6F141206C0 for ; Mon, 7 Oct 2019 20:17:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479479; bh=de3R1L7ft7IHltd7F8lhrmZxrG7bVTLxSkXfKh60s1E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=KK94BKOedQQYXdiFid0KD7uap1GCt9iZX/f+FDBklFe40BP5zk3SNZ7KBUU/uxUyG d7yjIa4EdNS+PlrvAJV+QVPfgAunl7CfAg9nDVd+g9rf8zkX/XZcxZcmIFtEKJ+/+Y St0Rit/VfkLcWeM4sAf4LpHn2AvzujmUggq5t8t4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729123AbfJGUR6 (ORCPT ); Mon, 7 Oct 2019 16:17:58 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:36527 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728187AbfJGUR5 (ORCPT ); Mon, 7 Oct 2019 16:17:57 -0400 Received: by mail-qk1-f194.google.com with SMTP id y189so13946744qkc.3 for ; Mon, 07 Oct 2019 13:17:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=Znc6CSWmUawJXrzFyLl0vEfpwCOgVYbEeV0r3nCJYGY=; b=nqrI4r5/VUzN7BDXK2DDKZ+eaAol1YwxRmzDhpGapQknbUCaE7QaZhrwcBf24dvT+N +SY55/zSWGiBOxhuES9AQiBcWjOyOV7kAFNl2SQVb2JytH/3A3NPuOxzmli2qtp/htFm m/cTVyPL3CoQEgozkL6DF1wy8ZCPvqwKPLyWwJ95dFZ2BvZZuezgkD5L1OP1pPoFy2mD e8e/FWsfRrweoE3Qt8gZldsdJGTf1Qvr6xu2W+t4SixlsIOhwyTKWPMAZ5fCPlZMUVkA 9PAsIN2ewaB4IN88Zc8Tr3YbgyY/OT9KEv/9vipctHpqcmvGLSCW1C6a+RcyjO1u5VRo Ya0Q== X-Gm-Message-State: APjAAAXGrcEaVO6s2fCAFwyI4I5gA+T2ziowhzIkqpxsB+g6TEGKe3Pl iwLyce4UgTle30d4jDkTznuUX8wo X-Google-Smtp-Source: APXvYqyaVktRSqfpjmz00X7Ko3xv12p5JMJeJKpfhFq6ePKxu2O9T/9NSNW8bmEi9gu53N5/kAS5LA== X-Received: by 2002:a37:a44f:: with SMTP id n76mr3876807qke.414.1570479476772; Mon, 07 Oct 2019 13:17:56 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.17.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:17:56 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 03/19] btrfs: keep track of which extents have been discarded Date: Mon, 7 Oct 2019 16:17:34 -0400 Message-Id: <5875088b5f4ada0ef73f097b238935dd583d5b3e.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Async discard will use the free space cache as backing knowledge for which extents to discard. This patch plumbs knowledge about which extents need to be discarded into the free space cache from unpin_extent_range(). An untrimmed extent can merge with everything as this is a new region. Absorbing trimmed extents is a tradeoff to for greater coalescing which makes life better for find_free_extent(). Additionally, it seems the size of a trim isn't as problematic as the trim io itself. When reading in the free space cache from disk, if sync is set, mark all extents as trimmed. The current code ensures at transaction commit that all free space is trimmed when sync is set, so this reflects that. Signed-off-by: Dennis Zhou --- fs/btrfs/extent-tree.c | 15 ++++++++++----- fs/btrfs/free-space-cache.c | 38 ++++++++++++++++++++++++++++++------- fs/btrfs/free-space-cache.h | 10 +++++++++- fs/btrfs/inode-map.c | 13 +++++++------ 4 files changed, 57 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 77a5904756c5..b9e3bedad878 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2782,7 +2782,7 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, } static int unpin_extent_range(struct btrfs_fs_info *fs_info, - u64 start, u64 end, + u64 start, u64 end, u32 fsc_flags, const bool return_free_space) { struct btrfs_block_group_cache *cache = NULL; @@ -2816,7 +2816,9 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, if (start < cache->last_byte_to_unpin) { len = min(len, cache->last_byte_to_unpin - start); if (return_free_space) - btrfs_add_free_space(cache, start, len); + __btrfs_add_free_space(fs_info, + cache->free_space_ctl, + start, len, fsc_flags); } start += len; @@ -2894,6 +2896,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) while (!trans->aborted) { struct extent_state *cached_state = NULL; + u32 fsc_flags = 0; mutex_lock(&fs_info->unused_bg_unpin_mutex); ret = find_first_extent_bit(unpin, 0, &start, &end, @@ -2903,12 +2906,14 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) break; } - if (btrfs_test_opt(fs_info, DISCARD_SYNC)) + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) { ret = btrfs_discard_extent(fs_info, start, end + 1 - start, NULL); + fsc_flags |= BTRFS_FSC_TRIMMED; + } clear_extent_dirty(unpin, start, end, &cached_state); - unpin_extent_range(fs_info, start, end, true); + unpin_extent_range(fs_info, start, end, fsc_flags, true); mutex_unlock(&fs_info->unused_bg_unpin_mutex); free_extent_state(cached_state); cond_resched(); @@ -5512,7 +5517,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u64 end) { - return unpin_extent_range(fs_info, start, end, false); + return unpin_extent_range(fs_info, start, end, 0, false); } /* diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d54dcd0ab230..f119895292b8 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -747,6 +747,14 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, goto free_cache; } + /* + * Sync discard ensures that the free space cache is always + * trimmed. So when reading this in, the state should reflect + * that. + */ + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) + e->flags |= BTRFS_FSC_TRIMMED; + if (!e->bytes) { kmem_cache_free(btrfs_free_space_cachep, e); goto free_cache; @@ -2165,6 +2173,7 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl, bool merged = false; u64 offset = info->offset; u64 bytes = info->bytes; + bool is_trimmed = btrfs_free_space_trimmed(info); /* * first we want to see if there is free space adjacent to the range we @@ -2178,7 +2187,8 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl, else left_info = tree_search_offset(ctl, offset - 1, 0, 0); - if (right_info && !right_info->bitmap) { + if (right_info && !right_info->bitmap && + (!is_trimmed || btrfs_free_space_trimmed(right_info))) { if (update_stat) unlink_free_space(ctl, right_info); else @@ -2189,7 +2199,8 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl, } if (left_info && !left_info->bitmap && - left_info->offset + left_info->bytes == offset) { + left_info->offset + left_info->bytes == offset && + (!is_trimmed || btrfs_free_space_trimmed(left_info))) { if (update_stat) unlink_free_space(ctl, left_info); else @@ -2225,6 +2236,9 @@ static bool steal_from_bitmap_to_end(struct btrfs_free_space_ctl *ctl, bytes = (j - i) * ctl->unit; info->bytes += bytes; + if (!btrfs_free_space_trimmed(bitmap)) + info->flags &= ~BTRFS_FSC_TRIMMED; + if (update_stat) bitmap_clear_bits(ctl, bitmap, end, bytes); else @@ -2278,6 +2292,9 @@ static bool steal_from_bitmap_to_front(struct btrfs_free_space_ctl *ctl, info->offset -= bytes; info->bytes += bytes; + if (!btrfs_free_space_trimmed(bitmap)) + info->flags &= ~BTRFS_FSC_TRIMMED; + if (update_stat) bitmap_clear_bits(ctl, bitmap, info->offset, bytes); else @@ -2327,7 +2344,7 @@ static void steal_from_bitmap(struct btrfs_free_space_ctl *ctl, int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, - u64 offset, u64 bytes) + u64 offset, u64 bytes, u32 flags) { struct btrfs_free_space *info; int ret = 0; @@ -2338,6 +2355,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, info->offset = offset; info->bytes = bytes; + info->flags = flags; RB_CLEAR_NODE(&info->offset_index); spin_lock(&ctl->tree_lock); @@ -2385,7 +2403,7 @@ int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, { return __btrfs_add_free_space(block_group->fs_info, block_group->free_space_ctl, - bytenr, size); + bytenr, size, 0); } int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, @@ -2460,8 +2478,11 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, } spin_unlock(&ctl->tree_lock); - ret = btrfs_add_free_space(block_group, offset + bytes, - old_end - (offset + bytes)); + ret = __btrfs_add_free_space(block_group->fs_info, + ctl, + offset + bytes, + old_end - (offset + bytes), + info->flags); WARN_ON(ret); goto out; } @@ -2630,6 +2651,7 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 ret = 0; u64 align_gap = 0; u64 align_gap_len = 0; + u64 align_gap_flags = 0; spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, @@ -2646,6 +2668,7 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, unlink_free_space(ctl, entry); align_gap_len = offset - entry->offset; align_gap = entry->offset; + align_gap_flags = entry->flags; entry->offset = offset + bytes; WARN_ON(entry->bytes < bytes + align_gap_len); @@ -2661,7 +2684,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, if (align_gap_len) __btrfs_add_free_space(block_group->fs_info, ctl, - align_gap, align_gap_len); + align_gap, align_gap_len, + align_gap_flags); return ret; } diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 39c32c8fc24f..ab3dfc00abb5 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -6,6 +6,8 @@ #ifndef BTRFS_FREE_SPACE_CACHE_H #define BTRFS_FREE_SPACE_CACHE_H +#define BTRFS_FSC_TRIMMED (1UL << 0) + struct btrfs_free_space { struct rb_node offset_index; u64 offset; @@ -13,8 +15,14 @@ struct btrfs_free_space { u64 max_extent_size; unsigned long *bitmap; struct list_head list; + u32 flags; }; +static inline bool btrfs_free_space_trimmed(struct btrfs_free_space *info) +{ + return (info->flags & BTRFS_FSC_TRIMMED); +} + struct btrfs_free_space_ctl { spinlock_t tree_lock; struct rb_root free_space_offset; @@ -84,7 +92,7 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root, void btrfs_init_free_space_ctl(struct btrfs_block_group_cache *block_group); int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, - u64 bytenr, u64 size); + u64 bytenr, u64 size, u32 flags); int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 63cad7865d75..00e225de4fe6 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -107,7 +107,7 @@ static int caching_kthread(void *data) if (last != (u64)-1 && last + 1 != key.objectid) { __btrfs_add_free_space(fs_info, ctl, last + 1, - key.objectid - last - 1); + key.objectid - last - 1, 0); wake_up(&root->ino_cache_wait); } @@ -118,7 +118,7 @@ static int caching_kthread(void *data) if (last < root->highest_objectid - 1) { __btrfs_add_free_space(fs_info, ctl, last + 1, - root->highest_objectid - last - 1); + root->highest_objectid - last - 1, 0); } spin_lock(&root->ino_cache_lock); @@ -175,7 +175,8 @@ static void start_caching(struct btrfs_root *root) ret = btrfs_find_free_objectid(root, &objectid); if (!ret && objectid <= BTRFS_LAST_FREE_OBJECTID) { __btrfs_add_free_space(fs_info, ctl, objectid, - BTRFS_LAST_FREE_OBJECTID - objectid + 1); + BTRFS_LAST_FREE_OBJECTID - objectid + 1, + 0); wake_up(&root->ino_cache_wait); } @@ -221,7 +222,7 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid) return; again: if (root->ino_cache_state == BTRFS_CACHE_FINISHED) { - __btrfs_add_free_space(fs_info, pinned, objectid, 1); + __btrfs_add_free_space(fs_info, pinned, objectid, 1, 0); } else { down_write(&fs_info->commit_root_sem); spin_lock(&root->ino_cache_lock); @@ -234,7 +235,7 @@ void btrfs_return_ino(struct btrfs_root *root, u64 objectid) start_caching(root); - __btrfs_add_free_space(fs_info, pinned, objectid, 1); + __btrfs_add_free_space(fs_info, pinned, objectid, 1, 0); up_write(&fs_info->commit_root_sem); } @@ -281,7 +282,7 @@ void btrfs_unpin_free_ino(struct btrfs_root *root) spin_unlock(rbroot_lock); if (count) __btrfs_add_free_space(root->fs_info, ctl, - info->offset, count); + info->offset, count, 0); kmem_cache_free(btrfs_free_space_cachep, info); } } From patchwork Mon Oct 7 20:17:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178449 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C2B013BD for ; Mon, 7 Oct 2019 20:18:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E63C821721 for ; Mon, 7 Oct 2019 20:18:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479481; bh=AaytXDBZgxnMuDMQyps6Pj9Pl3OL4k6oprWdH5Zmtas=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=1IgLu9X2xlZlnlb8jL3t/gnk/GG99pcVXug1Dt5ihI96fUJGfjBrutg78xTsDMbnV n701LiWXzcXgNIrKzPMYbFhMZfS33mMrNIVtfiajD2cfFNXVFRYpzrrhJEIRtocgX5 RV4Ugiz28PUbGFK3M2MQysej73GrBlNs7quRPyfg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729146AbfJGUR7 (ORCPT ); Mon, 7 Oct 2019 16:17:59 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:35681 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729119AbfJGUR7 (ORCPT ); Mon, 7 Oct 2019 16:17:59 -0400 Received: by mail-qk1-f196.google.com with SMTP id w2so13944801qkf.2 for ; Mon, 07 Oct 2019 13:17:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=zZaFBsxjBdS+nCe5yTkYZensX2LGWIFBrdhJ1OshTsY=; b=aZmGDhZfF5GGmpTgif00Kb++lrgUWzifrpCR+98rgaaQxIKJ4bK9AhIkQsqrSDzOc2 M5ZBK8wZXQ7NDltuBGPil+rlG6FCEYeOABJ2+4t2DEep+AU1JYsuMKwZF8b66gponY22 3mFfW82pQhTj34HjOn/kQRsVuFAOeSbyuVHUhKISboKucNCKr4Kh6jVWH/BaB1iYfcK1 3Yl2gFmK2Q4pVWHmSHiGHDr0mQMSz5Uf1jNsaLmXRWZLi09PVGZrawP1hdzv42WunWOR vUq/mChS2q4nLUIUhqDDhMgvBplJeBS3yAemp7TySv2psn9vGlmwSWRUOkmD5W+R18Xu KCZQ== X-Gm-Message-State: APjAAAWXk/7xl5//Toa52EH9Ja6cdISSYZOEafkfmRVmiA40CoU7uJq7 uPhlCaywhsuUDrkuAYvT2jk= X-Google-Smtp-Source: APXvYqx9eaemolPzrCy2gA0+sbdNSmjPgJnbBgvzth6VofedvKVMeUgnbFkjWRYFRqK0sD4F3y2yjA== X-Received: by 2002:a37:4d43:: with SMTP id a64mr25945699qkb.408.1570479477881; Mon, 07 Oct 2019 13:17:57 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.17.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:17:57 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 04/19] btrfs: keep track of cleanliness of the bitmap Date: Mon, 7 Oct 2019 16:17:35 -0400 Message-Id: <4cdbe31836b701c2c134c8484bb3531f7024031d.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There is a cap in btrfs in the amount of free extents that a block group can have. When it surpasses that threshold, future extents are placed into bitmaps. Instead of keeping track of if a certain bit is trimmed or not in a second bitmap, keep track of the relative state of the bitmap. With async discard, trimming bitmaps becomes a more frequent operation. As a trade off with simplicity, we keep track of if discarding a bitmap is in progress. If we fully scan a bitmap and trim as necessary, the bitmap is marked clean. This has some caveats as the min block size may skip over regions deemed too small. But this should be a reasonable trade off rather than keeping a second bitmap and making allocation paths more complex. The downside is we may overtrim, but ideally the min block size should prevent us from doing that too often and getting stuck trimming pathological cases. BTRFS_FSC_TRIMMING_BITMAP is added to indicate a bitmap is in the process of being trimmed. If additional free space is added to that bitmap, the bit is cleared. A bitmap will be marked BTRFS_FSC_TRIMMED if the trimming code was able to reach the end of it and the former is still set. Signed-off-by: Dennis Zhou --- fs/btrfs/free-space-cache.c | 83 +++++++++++++++++++++++++++++++++---- fs/btrfs/free-space-cache.h | 7 ++++ 2 files changed, 83 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index f119895292b8..129b9a164b35 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1975,11 +1975,14 @@ static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl, static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset, - u64 bytes) + u64 bytes, u32 flags) { u64 bytes_to_set = 0; u64 end; + if (!(flags & BTRFS_FSC_TRIMMED)) + info->flags &= ~(BTRFS_FSC_TRIMMED | BTRFS_FSC_TRIMMING_BITMAP); + end = info->offset + (u64)(BITS_PER_BITMAP * ctl->unit); bytes_to_set = min(end - offset, bytes); @@ -2054,10 +2057,12 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, struct btrfs_block_group_cache *block_group = NULL; int added = 0; u64 bytes, offset, bytes_added; + u32 flags; int ret; bytes = info->bytes; offset = info->offset; + flags = info->flags; if (!ctl->op->use_bitmap(ctl, info)) return 0; @@ -2093,7 +2098,7 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, if (entry->offset == offset_to_bitmap(ctl, offset)) { bytes_added = add_bytes_to_bitmap(ctl, entry, - offset, bytes); + offset, bytes, flags); bytes -= bytes_added; offset += bytes_added; } @@ -2112,7 +2117,8 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, goto new_bitmap; } - bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes); + bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes, + flags); bytes -= bytes_added; offset += bytes_added; added = 0; @@ -2146,6 +2152,7 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl, /* allocate the bitmap */ info->bitmap = kmem_cache_zalloc(btrfs_free_space_bitmap_cachep, GFP_NOFS); + info->flags |= BTRFS_FSC_TRIMMED; spin_lock(&ctl->tree_lock); if (!info->bitmap) { ret = -ENOMEM; @@ -3295,6 +3302,41 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, return ret; } +/* + * If we break out of trimming a bitmap prematurely, we should reset the + * trimming bit. In a rather contrieved case, it's possible to race here so + * clear BTRFS_FSC_TRIMMED as well. + * + * start = start of bitmap + * end = near end of bitmap + * + * Thread 1: Thread 2: + * trim_bitmaps(start) + * trim_bitmaps(end) + * end_trimming_bitmap() + * reset_trimming_bitmap() + */ +static void reset_trimming_bitmap(struct btrfs_free_space_ctl *ctl, u64 offset) +{ + struct btrfs_free_space *info; + + spin_lock(&ctl->tree_lock); + + info = tree_search_offset(ctl, offset, 1, 0); + if (info) + info->flags &= ~(BTRFS_FSC_TRIMMED | BTRFS_FSC_TRIMMING_BITMAP); + + spin_unlock(&ctl->tree_lock); +} + +static void end_trimming_bitmap(struct btrfs_free_space *entry) +{ + if (btrfs_free_space_trimming_bitmap(entry)) { + entry->flags |= BTRFS_FSC_TRIMMED; + entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; + } +} + static int trim_bitmaps(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen) { @@ -3326,9 +3368,26 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, goto next; } + /* + * Async discard bitmap trimming begins at by setting the start + * to be key.objectid and the offset_to_bitmap() aligns to the + * start of the bitmap. This lets us know we are fully + * scanning the bitmap rather than only some portion of it. + */ + if (start == offset) + entry->flags |= BTRFS_FSC_TRIMMING_BITMAP; + bytes = minlen; ret2 = search_bitmap(ctl, entry, &start, &bytes, false); if (ret2 || start >= end) { + /* + * This keeps the invariant that all bytes are trimmed + * if BTRFS_FSC_TRIMMED is set on a bitmap. + */ + if (ret2 && !minlen) + end_trimming_bitmap(entry); + else + entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); next_bitmap = true; @@ -3337,6 +3396,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, bytes = min(bytes, end - start); if (bytes < minlen) { + entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); goto next; @@ -3354,18 +3414,21 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, ret = do_trimming(block_group, total_trimmed, start, bytes, start, bytes, &trim_entry); - if (ret) + if (ret) { + reset_trimming_bitmap(ctl, offset); break; + } next: if (next_bitmap) { offset += BITS_PER_BITMAP * ctl->unit; + start = offset; } else { start += bytes; - if (start >= offset + BITS_PER_BITMAP * ctl->unit) - offset += BITS_PER_BITMAP * ctl->unit; } if (fatal_signal_pending(current)) { + if (start != offset) + reset_trimming_bitmap(ctl, offset); ret = -ERESTARTSYS; break; } @@ -3419,6 +3482,7 @@ void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *block_group) int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen) { + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; int ret; *trimmed = 0; @@ -3436,6 +3500,9 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, goto out; ret = trim_bitmaps(block_group, trimmed, start, end, minlen); + /* if we ended in the middle of a bitmap, reset the trimming flag */ + if (end % (BITS_PER_BITMAP * ctl->unit)) + reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end)); out: btrfs_put_block_group_trimming(block_group); return ret; @@ -3620,6 +3687,7 @@ int test_add_free_space_entry(struct btrfs_block_group_cache *cache, struct btrfs_free_space_ctl *ctl = cache->free_space_ctl; struct btrfs_free_space *info = NULL, *bitmap_info; void *map = NULL; + u32 flags = 0; u64 bytes_added; int ret; @@ -3661,7 +3729,8 @@ int test_add_free_space_entry(struct btrfs_block_group_cache *cache, info = NULL; } - bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes); + bytes_added = add_bytes_to_bitmap(ctl, bitmap_info, offset, bytes, + flags); bytes -= bytes_added; offset += bytes_added; diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index ab3dfc00abb5..dc73ec8d34bb 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -7,6 +7,7 @@ #define BTRFS_FREE_SPACE_CACHE_H #define BTRFS_FSC_TRIMMED (1UL << 0) +#define BTRFS_FSC_TRIMMING_BITMAP (1UL << 1) struct btrfs_free_space { struct rb_node offset_index; @@ -23,6 +24,12 @@ static inline bool btrfs_free_space_trimmed(struct btrfs_free_space *info) return (info->flags & BTRFS_FSC_TRIMMED); } +static inline +bool btrfs_free_space_trimming_bitmap(struct btrfs_free_space *info) +{ + return (info->flags & BTRFS_FSC_TRIMMING_BITMAP); +} + struct btrfs_free_space_ctl { spinlock_t tree_lock; struct rb_root free_space_offset; From patchwork Mon Oct 7 20:17:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178451 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AAE313BD for ; Mon, 7 Oct 2019 20:18:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4C046206C0 for ; Mon, 7 Oct 2019 20:18:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479482; bh=CdWlkiECnM4bGowXPlbl4j6DzmVJ07tZeC2iZ7mgUUY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=j8xgjVCn1gc/dih+vnzAHmyaVEwKxHmZO94p+wFXf8QLsoafZAwAY6m0sNDHqwplh 9ePLEnp/6/xGlHhd4U6Y8TcMQQrUwfzRPaNr4iFSR2wmr7gZgHRNsoM2tjQuF3Qntc H/dakV9q0Hu89b2aUFHFgs0tc5thDsKXGuSWQu8s= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729189AbfJGUSB (ORCPT ); Mon, 7 Oct 2019 16:18:01 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:40899 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729169AbfJGUSA (ORCPT ); Mon, 7 Oct 2019 16:18:00 -0400 Received: by mail-qk1-f195.google.com with SMTP id y144so13903269qkb.7 for ; Mon, 07 Oct 2019 13:17:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=ZJSpfXKr2hMfNtJ1fPN4QJ6gpi4z8sS8bTyjlFkZCP4=; b=Bnody1XzpFsobeicZgUsgkELWwrch1xLMAySmxECwM8kyLfxYiKqZU9mo+esCtCCLx WsZ8B8Ern4BiXzpDNmMoBLPOJrXnsM00tJEY63AaZx7sElNqYPZA0bQrIr2SA3mJvo5M HNujiiUjsmE/RB5we533RpdkASquqkRbuASnXYwYe7Wp22J0HGbZOM5sB126Y4TRhVdH 22Qb3S/SNUrJcTRwNz4kmeVoMVmaY3RaJUNFAiWIHVDhFqzXNMWeP3vPc5BvXzYJqdjH opzVY4Nob9UtjLfPkcVXObcO4LwghN+tZCdqBdn0iR7bGrxVAF5R9uULpFcujS7yNWDG 7AkA== X-Gm-Message-State: APjAAAVfPMADzWJqhK5EatwiUnqnA2WKd8gm6ceVhg1XHJXHBfkzW9Y3 i3vg6otn84q97Bb/6LDVyg4= X-Google-Smtp-Source: APXvYqxcNrnQcucxrnXSASkBzJqHE5LqkK4HFsB+hKbFOg+D6RoYfvdHSUo6T+g1x+o4cY/BQWTTpw== X-Received: by 2002:a05:620a:549:: with SMTP id o9mr25836933qko.223.1570479479127; Mon, 07 Oct 2019 13:17:59 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.17.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:17:58 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 05/19] btrfs: add the beginning of async discard, discard workqueue Date: Mon, 7 Oct 2019 16:17:36 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When discard is enabled, everytime a pinned extent is released back to the block_group's free space cache, a discard is issued for the extent. This is an overeager approach when it comes to discarding and helping the SSD maintain enough free space to prevent severe garbage collection situations. This adds the beginning of async discard. Instead of issuing a discard prior to returning it to the free space, it is just marked as untrimmed. The block_group is then added to a LRU which then feeds into a workqueue to issue discards at a much slower rate. Full discarding of unused block groups is still done and will be address in a future patch in this series. For now, we don't persist the discard state of extents and bitmaps. Therefore, our failure recovery mode will be to consider extents untrimmed. This lets us handle failure and unmounting as one in the same. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/Makefile | 2 +- fs/btrfs/block-group.c | 4 + fs/btrfs/block-group.h | 10 ++ fs/btrfs/ctree.h | 17 +++ fs/btrfs/discard.c | 200 ++++++++++++++++++++++++++++++++++++ fs/btrfs/discard.h | 49 +++++++++ fs/btrfs/disk-io.c | 15 ++- fs/btrfs/extent-tree.c | 4 + fs/btrfs/free-space-cache.c | 29 +++++- fs/btrfs/super.c | 35 ++++++- 10 files changed, 356 insertions(+), 9 deletions(-) create mode 100644 fs/btrfs/discard.c create mode 100644 fs/btrfs/discard.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 82200dbca5ac..9a0ff3384381 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -11,7 +11,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \ - block-rsv.o delalloc-space.o block-group.o + block-rsv.o delalloc-space.o block-group.o discard.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index afe86028246a..8bbbe7488328 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -14,6 +14,7 @@ #include "sysfs.h" #include "tree-log.h" #include "delalloc-space.h" +#include "discard.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1273,6 +1274,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) } spin_unlock(&fs_info->unused_bgs_lock); + btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group); + mutex_lock(&fs_info->delete_unused_bgs_mutex); /* Don't want to race with allocators so take the groups_sem */ @@ -1622,6 +1625,7 @@ static struct btrfs_block_group_cache *btrfs_create_block_group_cache( INIT_LIST_HEAD(&cache->cluster_list); INIT_LIST_HEAD(&cache->bg_list); INIT_LIST_HEAD(&cache->ro_list); + INIT_LIST_HEAD(&cache->discard_list); INIT_LIST_HEAD(&cache->dirty_list); INIT_LIST_HEAD(&cache->io_list); btrfs_init_free_space_ctl(cache); diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index c391800388dd..0f9a1c91753f 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -115,7 +115,11 @@ struct btrfs_block_group_cache { /* For read-only block groups */ struct list_head ro_list; + /* For discard operations */ atomic_t trimming; + struct list_head discard_list; + int discard_index; + u64 discard_delay; /* For dirty block groups */ struct list_head dirty_list; @@ -157,6 +161,12 @@ struct btrfs_block_group_cache { struct btrfs_full_stripe_locks_tree full_stripe_locks_root; }; +static inline +u64 btrfs_block_group_end(struct btrfs_block_group_cache *cache) +{ + return (cache->key.objectid + cache->key.offset); +} + #ifdef CONFIG_BTRFS_DEBUG static inline int btrfs_should_fragment_free_space( struct btrfs_block_group_cache *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1877586576aa..419445868909 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -438,6 +438,17 @@ struct btrfs_full_stripe_locks_tree { struct mutex lock; }; +/* discard control */ +#define BTRFS_NR_DISCARD_LISTS 1 + +struct btrfs_discard_ctl { + struct workqueue_struct *discard_workers; + struct delayed_work work; + spinlock_t lock; + struct btrfs_block_group_cache *cache; + struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; +}; + /* delayed seq elem */ struct seq_list { struct list_head list; @@ -524,6 +535,9 @@ enum { * so we don't need to offload checksums to workqueues. */ BTRFS_FS_CSUM_IMPL_FAST, + + /* Indicate that the discard workqueue can service discards. */ + BTRFS_FS_DISCARD_RUNNING, }; struct btrfs_fs_info { @@ -817,6 +831,8 @@ struct btrfs_fs_info { struct btrfs_workqueue *scrub_wr_completion_workers; struct btrfs_workqueue *scrub_parity_workers; + struct btrfs_discard_ctl discard_ctl; + #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY u32 check_integrity_print_mask; #endif @@ -1190,6 +1206,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info) #define BTRFS_MOUNT_FREE_SPACE_TREE (1 << 26) #define BTRFS_MOUNT_NOLOGREPLAY (1 << 27) #define BTRFS_MOUNT_REF_VERIFY (1 << 28) +#define BTRFS_MOUNT_DISCARD_ASYNC (1 << 29) #define BTRFS_DEFAULT_COMMIT_INTERVAL (30) #define BTRFS_DEFAULT_MAX_INLINE (2048) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c new file mode 100644 index 000000000000..6df124639e55 --- /dev/null +++ b/fs/btrfs/discard.c @@ -0,0 +1,200 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Facebook. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include "ctree.h" +#include "block-group.h" +#include "discard.h" +#include "free-space-cache.h" + +#define BTRFS_DISCARD_DELAY (300ULL * NSEC_PER_SEC) + +static struct list_head * +btrfs_get_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + return &discard_ctl->discard_list[cache->discard_index]; +} + +void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + u64 now = ktime_get_ns(); + + spin_lock(&discard_ctl->lock); + + if (list_empty(&cache->discard_list)) + cache->discard_delay = now + BTRFS_DISCARD_DELAY; + + list_move_tail(&cache->discard_list, + btrfs_get_discard_list(discard_ctl, cache)); + + spin_unlock(&discard_ctl->lock); +} + +static bool remove_from_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + bool running = false; + + spin_lock(&discard_ctl->lock); + + if (cache == discard_ctl->cache) { + running = true; + discard_ctl->cache = NULL; + } + + cache->discard_delay = 0; + list_del_init(&cache->discard_list); + + spin_unlock(&discard_ctl->lock); + + return running; +} + +static struct btrfs_block_group_cache * +find_next_cache(struct btrfs_discard_ctl *discard_ctl, u64 now) +{ + struct btrfs_block_group_cache *ret_cache = NULL, *cache; + int i; + + for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) { + struct list_head *discard_list = &discard_ctl->discard_list[i]; + + if (!list_empty(discard_list)) { + cache = list_first_entry(discard_list, + struct btrfs_block_group_cache, + discard_list); + + if (!ret_cache) + ret_cache = cache; + + if (ret_cache->discard_delay < now) + break; + + if (ret_cache->discard_delay > cache->discard_delay) + ret_cache = cache; + } + } + + return ret_cache; +} + +static struct btrfs_block_group_cache * +peek_discard_list(struct btrfs_discard_ctl *discard_ctl) +{ + struct btrfs_block_group_cache *cache; + u64 now = ktime_get_ns(); + + spin_lock(&discard_ctl->lock); + + cache = find_next_cache(discard_ctl, now); + + if (cache && now < cache->discard_delay) + cache = NULL; + + discard_ctl->cache = cache; + + spin_unlock(&discard_ctl->lock); + + return cache; +} + +void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + if (remove_from_discard_list(discard_ctl, cache)) { + cancel_delayed_work_sync(&discard_ctl->work); + btrfs_discard_schedule_work(discard_ctl, true); + } +} + +void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, + bool override) +{ + struct btrfs_block_group_cache *cache; + u64 now = ktime_get_ns(); + + spin_lock(&discard_ctl->lock); + + if (!btrfs_run_discard_work(discard_ctl)) + goto out; + + if (!override && delayed_work_pending(&discard_ctl->work)) + goto out; + + cache = find_next_cache(discard_ctl, now); + if (cache) { + u64 delay = 0; + + if (now < cache->discard_delay) + delay = nsecs_to_jiffies(cache->discard_delay - now); + + mod_delayed_work(discard_ctl->discard_workers, + &discard_ctl->work, + delay); + } + +out: + spin_unlock(&discard_ctl->lock); +} + +static void btrfs_discard_workfn(struct work_struct *work) +{ + struct btrfs_discard_ctl *discard_ctl; + struct btrfs_block_group_cache *cache; + u64 trimmed = 0; + + discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); + + cache = peek_discard_list(discard_ctl); + if (!cache || !btrfs_run_discard_work(discard_ctl)) + return; + + btrfs_trim_block_group(cache, &trimmed, cache->key.objectid, + btrfs_block_group_end(cache), 0); + + remove_from_discard_list(discard_ctl, cache); + + btrfs_discard_schedule_work(discard_ctl, false); +} + +void btrfs_discard_resume(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_test_opt(fs_info, DISCARD_ASYNC)) { + btrfs_discard_cleanup(fs_info); + return; + } + + set_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags); +} + +void btrfs_discard_stop(struct btrfs_fs_info *fs_info) +{ + clear_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags); +} + +void btrfs_discard_init(struct btrfs_fs_info *fs_info) +{ + struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl; + int i; + + spin_lock_init(&discard_ctl->lock); + + INIT_DELAYED_WORK(&discard_ctl->work, btrfs_discard_workfn); + + for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) + INIT_LIST_HEAD(&discard_ctl->discard_list[i]); +} + +void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) +{ + btrfs_discard_stop(fs_info); + cancel_delayed_work_sync(&fs_info->discard_ctl.work); +} diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h new file mode 100644 index 000000000000..6d7805bb0eb7 --- /dev/null +++ b/fs/btrfs/discard.h @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Facebook. All rights reserved. + */ + +#ifndef BTRFS_DISCARD_H +#define BTRFS_DISCARD_H + +#include +#include + +#include "ctree.h" + +void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache); + +void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache); +void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, + bool override); +void btrfs_discard_resume(struct btrfs_fs_info *fs_info); +void btrfs_discard_stop(struct btrfs_fs_info *fs_info); +void btrfs_discard_init(struct btrfs_fs_info *fs_info); +void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info); + +static inline +bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl) +{ + struct btrfs_fs_info *fs_info = container_of(discard_ctl, + struct btrfs_fs_info, + discard_ctl); + + return (!(fs_info->sb->s_flags & SB_RDONLY) && + test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags)); +} + +static inline +void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + return; + + btrfs_add_to_discard_list(discard_ctl, cache); + if (!delayed_work_pending(&discard_ctl->work)) + btrfs_discard_schedule_work(discard_ctl, false); +} + +#endif diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 044981cf6df9..a304ec972f67 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -41,6 +41,7 @@ #include "tree-checker.h" #include "ref-verify.h" #include "block-group.h" +#include "discard.h" #define BTRFS_SUPER_FLAG_SUPP (BTRFS_HEADER_FLAG_WRITTEN |\ BTRFS_HEADER_FLAG_RELOC |\ @@ -2009,6 +2010,8 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->flush_workers); btrfs_destroy_workqueue(fs_info->qgroup_rescan_workers); btrfs_destroy_workqueue(fs_info->extent_workers); + if (fs_info->discard_ctl.discard_workers) + destroy_workqueue(fs_info->discard_ctl.discard_workers); /* * Now that all other work queues are destroyed, we can safely destroy * the queues used for metadata I/O, since tasks from those other work @@ -2218,6 +2221,8 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, btrfs_alloc_workqueue(fs_info, "extent-refs", flags, min_t(u64, fs_devices->num_devices, max_active), 8); + fs_info->discard_ctl.discard_workers = + alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1); if (!(fs_info->workers && fs_info->delalloc_workers && fs_info->submit_workers && fs_info->flush_workers && @@ -2229,7 +2234,8 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->caching_workers && fs_info->readahead_workers && fs_info->fixup_workers && fs_info->delayed_workers && fs_info->extent_workers && - fs_info->qgroup_rescan_workers)) { + fs_info->qgroup_rescan_workers && + fs_info->discard_ctl.discard_workers)) { return -ENOMEM; } @@ -2772,6 +2778,8 @@ int open_ctree(struct super_block *sb, btrfs_init_dev_replace_locks(fs_info); btrfs_init_qgroup(fs_info); + btrfs_discard_init(fs_info); + btrfs_init_free_cluster(&fs_info->meta_alloc_cluster); btrfs_init_free_cluster(&fs_info->data_alloc_cluster); @@ -3284,6 +3292,8 @@ int open_ctree(struct super_block *sb, btrfs_qgroup_rescan_resume(fs_info); + btrfs_discard_resume(fs_info); + if (!fs_info->uuid_root) { btrfs_info(fs_info, "creating UUID tree"); ret = btrfs_create_uuid_tree(fs_info); @@ -3993,6 +4003,9 @@ void close_ctree(struct btrfs_fs_info *fs_info) */ kthread_park(fs_info->cleaner_kthread); + /* cancel or finish ongoing work */ + btrfs_discard_cleanup(fs_info); + /* wait for the qgroup rescan worker to stop */ btrfs_qgroup_wait_for_completion(fs_info, false); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index b9e3bedad878..d69ee5f51b38 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -32,6 +32,7 @@ #include "block-rsv.h" #include "delalloc-space.h" #include "block-group.h" +#include "discard.h" #undef SCRAMBLE_DELAYED_REFS @@ -2919,6 +2920,9 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) cond_resched(); } + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_discard_schedule_work(&fs_info->discard_ctl, true); + /* * Transaction is finished. We don't need the lock anymore. We * do need to clean up the block groups in case of a transaction diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 129b9a164b35..54ff1bc97777 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -21,6 +21,7 @@ #include "space-info.h" #include "delalloc-space.h" #include "block-group.h" +#include "discard.h" #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) #define MAX_CACHE_BYTES_PER_GIG SZ_32K @@ -2353,6 +2354,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, struct btrfs_free_space_ctl *ctl, u64 offset, u64 bytes, u32 flags) { + struct btrfs_block_group_cache *cache = ctl->private; struct btrfs_free_space *info; int ret = 0; @@ -2402,6 +2404,9 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, ASSERT(ret != -EEXIST); } + if (!(flags & BTRFS_FSC_TRIMMED)) + btrfs_discard_queue_work(&fs_info->discard_ctl, cache); + return ret; } @@ -3175,14 +3180,17 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster *cluster) static int do_trimming(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 bytes, u64 reserved_start, u64 reserved_bytes, - struct btrfs_trim_range *trim_entry) + u32 reserved_flags, struct btrfs_trim_range *trim_entry) { struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; int ret; int update = 0; + u64 end = start + bytes; + u64 reserved_end = reserved_start + reserved_bytes; u64 trimmed = 0; + u32 flags = 0; spin_lock(&space_info->lock); spin_lock(&block_group->lock); @@ -3195,11 +3203,19 @@ static int do_trimming(struct btrfs_block_group_cache *block_group, spin_unlock(&space_info->lock); ret = btrfs_discard_extent(fs_info, start, bytes, &trimmed); - if (!ret) + if (!ret) { *total_trimmed += trimmed; + flags |= BTRFS_FSC_TRIMMED; + } mutex_lock(&ctl->cache_writeout_mutex); - btrfs_add_free_space(block_group, reserved_start, reserved_bytes); + if (reserved_start < start) + __btrfs_add_free_space(fs_info, ctl, reserved_start, + start - reserved_start, reserved_flags); + if (start + bytes < reserved_start + reserved_bytes) + __btrfs_add_free_space(fs_info, ctl, end, + reserved_end - end, reserved_flags); + __btrfs_add_free_space(fs_info, ctl, start, bytes, flags); list_del(&trim_entry->list); mutex_unlock(&ctl->cache_writeout_mutex); @@ -3226,6 +3242,7 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, int ret = 0; u64 extent_start; u64 extent_bytes; + u32 extent_flags; u64 bytes; while (start < end) { @@ -3267,6 +3284,7 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, extent_start = entry->offset; extent_bytes = entry->bytes; + extent_flags = entry->flags; start = max(start, extent_start); bytes = min(extent_start + extent_bytes, end) - start; if (bytes < minlen) { @@ -3285,7 +3303,8 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, mutex_unlock(&ctl->cache_writeout_mutex); ret = do_trimming(block_group, total_trimmed, start, bytes, - extent_start, extent_bytes, &trim_entry); + extent_start, extent_bytes, extent_flags, + &trim_entry); if (ret) break; next: @@ -3413,7 +3432,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, mutex_unlock(&ctl->cache_writeout_mutex); ret = do_trimming(block_group, total_trimmed, start, bytes, - start, bytes, &trim_entry); + start, bytes, 0, &trim_entry); if (ret) { reset_trimming_bitmap(ctl, offset); break; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index a02fece949cb..3da60d7be535 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -46,6 +46,7 @@ #include "sysfs.h" #include "tests/btrfs-tests.h" #include "block-group.h" +#include "discard.h" #include "qgroup.h" #define CREATE_TRACE_POINTS @@ -146,6 +147,8 @@ void __btrfs_handle_fs_error(struct btrfs_fs_info *fs_info, const char *function if (sb_rdonly(sb)) return; + btrfs_discard_stop(fs_info); + /* btrfs handle error by forcing the filesystem readonly */ sb->s_flags |= SB_RDONLY; btrfs_info(fs_info, "forced readonly"); @@ -313,6 +316,7 @@ enum { Opt_datasum, Opt_nodatasum, Opt_defrag, Opt_nodefrag, Opt_discard, Opt_nodiscard, + Opt_discard_version, Opt_nologreplay, Opt_norecovery, Opt_ratio, @@ -376,6 +380,7 @@ static const match_table_t tokens = { {Opt_nodefrag, "noautodefrag"}, {Opt_discard, "discard"}, {Opt_nodiscard, "nodiscard"}, + {Opt_discard_version, "discard=%s"}, {Opt_nologreplay, "nologreplay"}, {Opt_norecovery, "norecovery"}, {Opt_ratio, "metadata_ratio=%u"}, @@ -695,12 +700,26 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, info->metadata_ratio); break; case Opt_discard: - btrfs_set_and_info(info, DISCARD_SYNC, - "turning on sync discard"); + case Opt_discard_version: + if (token == Opt_discard || + strcmp(args[0].from, "sync") == 0) { + btrfs_clear_opt(info->mount_opt, DISCARD_ASYNC); + btrfs_set_and_info(info, DISCARD_SYNC, + "turning on sync discard"); + } else if (strcmp(args[0].from, "async") == 0) { + btrfs_clear_opt(info->mount_opt, DISCARD_SYNC); + btrfs_set_and_info(info, DISCARD_ASYNC, + "turning on async discard"); + } else { + ret = -EINVAL; + goto out; + } break; case Opt_nodiscard: btrfs_clear_and_info(info, DISCARD_SYNC, "turning off discard"); + btrfs_clear_and_info(info, DISCARD_ASYNC, + "turning off async discard"); break; case Opt_space_cache: case Opt_space_cache_version: @@ -1324,6 +1343,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",flushoncommit"); if (btrfs_test_opt(info, DISCARD_SYNC)) seq_puts(seq, ",discard"); + if (btrfs_test_opt(info, DISCARD_ASYNC)) + seq_puts(seq, ",discard=async"); if (!(info->sb->s_flags & SB_POSIXACL)) seq_puts(seq, ",noacl"); if (btrfs_test_opt(info, SPACE_CACHE)) @@ -1714,6 +1735,14 @@ static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info, btrfs_cleanup_defrag_inodes(fs_info); } + /* if we toggled discard async */ + if (!btrfs_raw_test_opt(old_opts, DISCARD_ASYNC) && + btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_discard_resume(fs_info); + else if (btrfs_raw_test_opt(old_opts, DISCARD_ASYNC) && + !btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_discard_cleanup(fs_info); + clear_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state); } @@ -1761,6 +1790,8 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) */ cancel_work_sync(&fs_info->async_reclaim_work); + btrfs_discard_cleanup(fs_info); + /* wait for the uuid_scan task to finish */ down(&fs_info->uuid_tree_rescan_sem); /* avoid complains from lockdep et al. */ From patchwork Mon Oct 7 20:17:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178453 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A6643139A for ; Mon, 7 Oct 2019 20:18:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 71B60206C0 for ; Mon, 7 Oct 2019 20:18:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479483; bh=vk1lqf3tc1vRinEtUFRLdWjG/OovWgUol7ko58GKuKA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=EGKZ1KQkyMnWxTQjgQuZ0JSJpT4dXg481No6I/kaMR+O/EgxcUVlNqxd8ZOiH3Opp 0YLPM9Ww+gXM6/dNNLYcvxmHLCfDm3dyTE9Xdrr0TeBIihxmrlDXWXAF0u0E610I5m eHvdr0m2oh3b71XJ1ZiWSR4l0bOChKiBM9f5ZSaM= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729239AbfJGUSC (ORCPT ); Mon, 7 Oct 2019 16:18:02 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:40868 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729119AbfJGUSB (ORCPT ); Mon, 7 Oct 2019 16:18:01 -0400 Received: by mail-qt1-f196.google.com with SMTP id m61so10047476qte.7 for ; Mon, 07 Oct 2019 13:18:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=4zhJvEVozzXfi+Fj0SQVIfiRXkv+yyPRxFgZt5dIYbs=; b=nZ+BBDLZzmy7yCm9JMBHKhf5sNZVOulUPM3XLqQZMWrR0PjHIbpwQKDyJmcg4QTA4x nJnr8/Wp1OvF+7Og0ib/tplkaAmu+1o4YcOH4sWjPFk6vNSINnr+Nixt6oESGRpEL9+q mT0avfdAXRjRkl0Gg36RDldgSjAagVrMB7TmhcrELtYNC+B1igtD5x9R1lC2lI1Kzn9r JGjT2dunFOSF5iJaBAQL4gtsjPxOQdRJXeEiCxtLPj+EZoOFhQ9LusicgTEhGIAZtAoC D9Xz5vu6R7U5pOtNTlBOdSxM5RadNNkL7Z7nVf2As7PLXrMzR4qnYVGVz5e8QM2b8Rh7 +rgw== X-Gm-Message-State: APjAAAWmGHKoF8F1+mQ6SuNFxL9YWTibMs7F3l2JxgYqYBQ6QSxqZjUk kHXklJBZTXwjZmyvqYduEZo= X-Google-Smtp-Source: APXvYqzL5kY/IztbiB4A7iIf0PpkGa+CLocoHkjcEja0pcGp1bBVlUk/eQ8cf8UUdgu4CxdZC+BTAw== X-Received: by 2002:aed:27c1:: with SMTP id m1mr32171290qtg.197.1570479480128; Mon, 07 Oct 2019 13:18:00 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.17.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:17:59 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 06/19] btrfs: handle empty block_group removal Date: Mon, 7 Oct 2019 16:17:37 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org block_group removal is a little tricky. It can race with the extent allocator, the cleaner thread, and balancing. The current path is for a block_group to be added to the unused_bgs list. Then, when the cleaner thread comes around, it starts a transaction and then proceeds with removing the block_group. Extents that are pinned are subsequently removed from the pinned trees and then eventually a discard is issued for the entire block_group. Async discard introduces another player into the game, the discard workqueue. While it has none of the racing issues, the new problem is ensuring we don't leave free space untrimmed prior to forgetting the block_group. This is handled by placing fully free block_groups on a separate discard queue. This is necessary to maintain discarding order as in the future we will slowly trim even fully free block_groups. The ordering helps us make progress on the same block_group rather than say the last fully freed block_group or needing to search through the fully freed block groups at the beginning of a list and insert after. The new order of events is a fully freed block group gets placed on the discard queue first. Once it's processed, it will be placed on the unusued_bgs list and then the original sequence of events will happen, just without the final whole block_group discard. The mount flags can change when processing unused_bgs, so when flipping from DISCARD to DISCARD_ASYNC, the unused_bgs must be punted to the discard_list to be trimmed. If we flip off DISCARD_ASYNC, we punt free block groups on the discard_list to the unused_bg queue which will do the final discard for us. Signed-off-by: Dennis Zhou --- fs/btrfs/block-group.c | 39 ++++++++++++++++++--- fs/btrfs/ctree.h | 2 +- fs/btrfs/discard.c | 68 ++++++++++++++++++++++++++++++++++++- fs/btrfs/discard.h | 11 +++++- fs/btrfs/free-space-cache.c | 33 ++++++++++++++++++ fs/btrfs/free-space-cache.h | 1 + fs/btrfs/scrub.c | 7 +++- 7 files changed, 153 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 8bbbe7488328..73e5a9384491 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1251,6 +1251,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) struct btrfs_block_group_cache *block_group; struct btrfs_space_info *space_info; struct btrfs_trans_handle *trans; + bool async_trim_enabled = btrfs_test_opt(fs_info, DISCARD_ASYNC); int ret = 0; if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) @@ -1260,6 +1261,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) while (!list_empty(&fs_info->unused_bgs)) { u64 start, end; int trimming; + bool async_trimmed; block_group = list_first_entry(&fs_info->unused_bgs, struct btrfs_block_group_cache, @@ -1281,10 +1283,20 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) /* Don't want to race with allocators so take the groups_sem */ down_write(&space_info->groups_sem); spin_lock(&block_group->lock); + + /* async discard requires block groups to be fully trimmed */ + async_trimmed = (!btrfs_test_opt(fs_info, DISCARD_ASYNC) || + btrfs_is_free_space_trimmed(block_group)); + if (block_group->reserved || block_group->pinned || btrfs_block_group_used(&block_group->item) || block_group->ro || - list_is_singular(&block_group->list)) { + list_is_singular(&block_group->list) || + !async_trimmed) { + /* requeue if we failed because of async discard */ + if (!async_trimmed) + btrfs_discard_queue_work(&fs_info->discard_ctl, + block_group); /* * We want to bail if we made new allocations or have * outstanding allocations in this block group. We do @@ -1367,6 +1379,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); + if (!async_trim_enabled && + btrfs_test_opt(fs_info, DISCARD_ASYNC)) + goto flip_async; + /* DISCARD can flip during remount */ trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); @@ -1411,6 +1427,13 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_lock(&fs_info->unused_bgs_lock); } spin_unlock(&fs_info->unused_bgs_lock); + return; + +flip_async: + btrfs_end_transaction(trans); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_put_block_group(block_group); + btrfs_discard_punt_unused_bgs_list(fs_info); } void btrfs_mark_bg_unused(struct btrfs_block_group_cache *bg) @@ -1618,6 +1641,8 @@ static struct btrfs_block_group_cache *btrfs_create_block_group_cache( cache->full_stripe_len = btrfs_full_stripe_len(fs_info, start); set_free_space_tree_thresholds(cache); + cache->discard_index = 1; + atomic_set(&cache->count, 1); spin_lock_init(&cache->lock); init_rwsem(&cache->data_rwsem); @@ -1829,7 +1854,11 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) inc_block_group_ro(cache, 1); } else if (btrfs_block_group_used(&cache->item) == 0) { ASSERT(list_empty(&cache->bg_list)); - btrfs_mark_bg_unused(cache); + if (btrfs_test_opt(info, DISCARD_ASYNC)) + btrfs_add_to_discard_free_list( + &info->discard_ctl, cache); + else + btrfs_mark_bg_unused(cache); } } @@ -2724,8 +2753,10 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, * dirty list to avoid races between cleaner kthread and space * cache writeout. */ - if (!alloc && old_val == 0) - btrfs_mark_bg_unused(cache); + if (!alloc && old_val == 0) { + if (!btrfs_test_opt(info, DISCARD_ASYNC)) + btrfs_mark_bg_unused(cache); + } btrfs_put_block_group(cache); total -= num_bytes; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 419445868909..c328d2e85e4d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -439,7 +439,7 @@ struct btrfs_full_stripe_locks_tree { }; /* discard control */ -#define BTRFS_NR_DISCARD_LISTS 1 +#define BTRFS_NR_DISCARD_LISTS 2 struct btrfs_discard_ctl { struct workqueue_struct *discard_workers; diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 6df124639e55..fb92b888774d 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -29,8 +29,11 @@ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, spin_lock(&discard_ctl->lock); - if (list_empty(&cache->discard_list)) + if (list_empty(&cache->discard_list) || !cache->discard_index) { + if (!cache->discard_index) + cache->discard_index = 1; cache->discard_delay = now + BTRFS_DISCARD_DELAY; + } list_move_tail(&cache->discard_list, btrfs_get_discard_list(discard_ctl, cache)); @@ -38,6 +41,23 @@ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, spin_unlock(&discard_ctl->lock); } +void btrfs_add_to_discard_free_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + u64 now = ktime_get_ns(); + + spin_lock(&discard_ctl->lock); + + if (!list_empty(&cache->discard_list)) + list_del_init(&cache->discard_list); + + cache->discard_index = 0; + cache->discard_delay = now; + list_add_tail(&cache->discard_list, &discard_ctl->discard_list[0]); + + spin_unlock(&discard_ctl->lock); +} + static bool remove_from_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) { @@ -161,10 +181,52 @@ static void btrfs_discard_workfn(struct work_struct *work) btrfs_block_group_end(cache), 0); remove_from_discard_list(discard_ctl, cache); + if (btrfs_is_free_space_trimmed(cache)) + btrfs_mark_bg_unused(cache); + else if (cache->free_space_ctl->free_space == cache->key.offset) + btrfs_add_to_discard_free_list(discard_ctl, cache); btrfs_discard_schedule_work(discard_ctl, false); } +void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info) +{ + struct btrfs_block_group_cache *cache, *next; + + /* we enabled async discard, so punt all to the queue */ + spin_lock(&fs_info->unused_bgs_lock); + + list_for_each_entry_safe(cache, next, &fs_info->unused_bgs, bg_list) { + list_del_init(&cache->bg_list); + btrfs_add_to_discard_free_list(&fs_info->discard_ctl, cache); + } + + spin_unlock(&fs_info->unused_bgs_lock); +} + +static void btrfs_discard_purge_list(struct btrfs_discard_ctl *discard_ctl) +{ + struct btrfs_block_group_cache *cache, *next; + int i; + + spin_lock(&discard_ctl->lock); + + for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) { + list_for_each_entry_safe(cache, next, + &discard_ctl->discard_list[i], + discard_list) { + list_del_init(&cache->discard_list); + spin_unlock(&discard_ctl->lock); + if (cache->free_space_ctl->free_space == + cache->key.offset) + btrfs_mark_bg_unused(cache); + spin_lock(&discard_ctl->lock); + } + } + + spin_unlock(&discard_ctl->lock); +} + void btrfs_discard_resume(struct btrfs_fs_info *fs_info) { if (!btrfs_test_opt(fs_info, DISCARD_ASYNC)) { @@ -172,6 +234,8 @@ void btrfs_discard_resume(struct btrfs_fs_info *fs_info) return; } + btrfs_discard_punt_unused_bgs_list(fs_info); + set_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags); } @@ -197,4 +261,6 @@ void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) { btrfs_discard_stop(fs_info); cancel_delayed_work_sync(&fs_info->discard_ctl.work); + + btrfs_discard_purge_list(&fs_info->discard_ctl); } diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 6d7805bb0eb7..55f79b624943 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -10,9 +10,14 @@ #include #include "ctree.h" +#include "block-group.h" +#include "free-space-cache.h" void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); +void btrfs_add_to_discard_free_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache); +void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info); void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); @@ -41,7 +46,11 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) return; - btrfs_add_to_discard_list(discard_ctl, cache); + if (cache->free_space_ctl->free_space == cache->key.offset) + btrfs_add_to_discard_free_list(discard_ctl, cache); + else + btrfs_add_to_discard_list(discard_ctl, cache); + if (!delayed_work_pending(&discard_ctl->work)) btrfs_discard_schedule_work(discard_ctl, false); } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 54ff1bc97777..ed0e7ee4c78d 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2653,6 +2653,31 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group_cache *block_group) } +bool btrfs_is_free_space_trimmed(struct btrfs_block_group_cache *cache) +{ + struct btrfs_free_space_ctl *ctl = cache->free_space_ctl; + struct btrfs_free_space *info; + struct rb_node *node; + bool ret = true; + + spin_lock(&ctl->tree_lock); + node = rb_first(&ctl->free_space_offset); + + while (node) { + info = rb_entry(node, struct btrfs_free_space, offset_index); + + if (!btrfs_free_space_trimmed(info)) { + ret = false; + break; + } + + node = rb_next(node); + } + + spin_unlock(&ctl->tree_lock); + return ret; +} + u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 offset, u64 bytes, u64 empty_size, u64 *max_extent_size) @@ -2739,6 +2764,9 @@ int btrfs_return_cluster_to_free_space( ret = __btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&ctl->tree_lock); + btrfs_discard_queue_work(&block_group->fs_info->discard_ctl, + block_group); + /* finally drop our ref */ btrfs_put_block_group(block_group); return ret; @@ -3097,6 +3125,7 @@ int btrfs_find_space_cluster(struct btrfs_block_group_cache *block_group, u64 min_bytes; u64 cont1_bytes; int ret; + bool found_cluster = false; /* * Choose the minimum extent size we'll require for this @@ -3149,6 +3178,7 @@ int btrfs_find_space_cluster(struct btrfs_block_group_cache *block_group, list_del_init(&entry->list); if (!ret) { + found_cluster = true; atomic_inc(&block_group->count); list_add_tail(&cluster->block_group_list, &block_group->cluster_list); @@ -3160,6 +3190,9 @@ int btrfs_find_space_cluster(struct btrfs_block_group_cache *block_group, spin_unlock(&cluster->lock); spin_unlock(&ctl->tree_lock); + if (found_cluster) + btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group); + return ret; } diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index dc73ec8d34bb..b688e70a7512 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -107,6 +107,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, void __btrfs_remove_free_space_cache(struct btrfs_free_space_ctl *ctl); void btrfs_remove_free_space_cache(struct btrfs_block_group_cache *block_group); +bool btrfs_is_free_space_trimmed(struct btrfs_block_group_cache *cache); u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 offset, u64 bytes, u64 empty_size, u64 *max_extent_size); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f7d4e03f4c5d..49927a642b5a 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -8,6 +8,7 @@ #include #include #include "ctree.h" +#include "discard.h" #include "volumes.h" #include "disk-io.h" #include "ordered-data.h" @@ -3683,7 +3684,11 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache->removed && !cache->ro && cache->reserved == 0 && btrfs_block_group_used(&cache->item) == 0) { spin_unlock(&cache->lock); - btrfs_mark_bg_unused(cache); + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_add_to_discard_free_list( + &fs_info->discard_ctl, cache); + else + btrfs_mark_bg_unused(cache); } else { spin_unlock(&cache->lock); } From patchwork Mon Oct 7 20:17:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178455 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2B443139A for ; Mon, 7 Oct 2019 20:18:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EB3C521479 for ; Mon, 7 Oct 2019 20:18:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479485; bh=htyHazXOF8fZn9fxA5VykOEXrP3vQwx2aIIgFAi/9Bo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=pthrn7XvzwImgTIuXcg+XHDiJqqiswltDvuXJqpb9/UalDGcDI87HxfsuDCjzyQhT XsAYwJrYnIq59q4hK+5DSdlDTy13+stJjmhk+r3xLzk5Ur0SbKRV/Sz99DxcjH9JyM kPMw/ys8yDiQj2z3kbdzpAAVxboGKN9ZjYvQ+ou8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729263AbfJGUSE (ORCPT ); Mon, 7 Oct 2019 16:18:04 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:41024 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729206AbfJGUSD (ORCPT ); Mon, 7 Oct 2019 16:18:03 -0400 Received: by mail-qk1-f193.google.com with SMTP id p10so13890559qkg.8 for ; Mon, 07 Oct 2019 13:18:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=OvykiGlK0dwM5VC2pokwGARv8VVGdAaBGLlYZRpcXBc=; b=tqOpGsClgwIPV3XaMEFauAYI+dBSBMQRqqYFmfdi0+wKa46tjyk01tFAbLYKfS7ruc RR9dbY6nPtJC4wOCcRgcRLoUF+fWskp/c8aFimtFVKUVKsj9O6SjAo9Bm/mTOFNlGg0K Zxdgv9z7tP6eFxnsxdym4tMIj4cKBfUQAE+6f8pfFnLehRWGCyce3l0q3LVHhABTXgL3 DMh3BPxFr+bsu9IOq9o+DOYcBrqf/FxzD7yQj5LW2HiqwWrYqysKLBdoCSwwaYSLlY/F wNiKiDWFRiWKYGx5yMiCIeofiBkp7PaaXBiq0dgLZwK+P1q4u0thYCe/VAINwMqT54hi VMgg== X-Gm-Message-State: APjAAAVXfNlgVywPvT4Vg2qT8vRjDJ3KVoDUXuaumcdFTndjrPsl1QaT kzMycAB8iFPpskc3wSs+cFdvjg5h X-Google-Smtp-Source: APXvYqyyRY+HTxBM6sl9U49CC8UJUd9TL2sPt67g/H0jEgGiHga/QJ6QSSglEyfXJ4Z5JC6Q1cg6Tw== X-Received: by 2002:a37:82c1:: with SMTP id e184mr26424584qkd.206.1570479481423; Mon, 07 Oct 2019 13:18:01 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:00 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 07/19] btrfs: discard one region at a time in async discard Date: Mon, 7 Oct 2019 16:17:38 -0400 Message-Id: <79b26926486575eb53825c1aad607e99028d8447.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The prior two patches added discarding via a background workqueue. This just piggybacked off of the fstrim code to trim the whole block at once. Well inevitably this is worse performance wise and will aggressively overtrim. But it was nice to plumb the other infrastructure to keep the patches easier to review. This adds the real goal of this series which is discarding slowly (ie a slow long running fstrim). The discarding is split into two phases, extents and then bitmaps. The reason for this is two fold. First, the bitmap regions overlap the extent regions. Second, discarding the extents first will let the newly trimmed bitmaps have the highest chance of coalescing when being readded to the free space cache. Signed-off-by: Dennis Zhou --- fs/btrfs/block-group.h | 2 + fs/btrfs/discard.c | 73 ++++++++++++++++++++----- fs/btrfs/discard.h | 16 ++++++ fs/btrfs/extent-tree.c | 3 +- fs/btrfs/free-space-cache.c | 106 ++++++++++++++++++++++++++---------- fs/btrfs/free-space-cache.h | 6 +- 6 files changed, 159 insertions(+), 47 deletions(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 0f9a1c91753f..b59e6a8ed73d 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -120,6 +120,8 @@ struct btrfs_block_group_cache { struct list_head discard_list; int discard_index; u64 discard_delay; + u64 discard_cursor; + u32 discard_flags; /* For dirty block groups */ struct list_head dirty_list; diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index fb92b888774d..26a1e44b4bfa 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -22,21 +22,28 @@ btrfs_get_discard_list(struct btrfs_discard_ctl *discard_ctl, return &discard_ctl->discard_list[cache->discard_index]; } -void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, - struct btrfs_block_group_cache *cache) +static void __btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) { u64 now = ktime_get_ns(); - spin_lock(&discard_ctl->lock); - if (list_empty(&cache->discard_list) || !cache->discard_index) { if (!cache->discard_index) cache->discard_index = 1; cache->discard_delay = now + BTRFS_DISCARD_DELAY; + cache->discard_flags |= BTRFS_DISCARD_RESET_CURSOR; } list_move_tail(&cache->discard_list, btrfs_get_discard_list(discard_ctl, cache)); +} + +void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + spin_lock(&discard_ctl->lock); + + __btrfs_add_to_discard_list(discard_ctl, cache); spin_unlock(&discard_ctl->lock); } @@ -53,6 +60,7 @@ void btrfs_add_to_discard_free_list(struct btrfs_discard_ctl *discard_ctl, cache->discard_index = 0; cache->discard_delay = now; + cache->discard_flags |= BTRFS_DISCARD_RESET_CURSOR; list_add_tail(&cache->discard_list, &discard_ctl->discard_list[0]); spin_unlock(&discard_ctl->lock); @@ -114,13 +122,24 @@ peek_discard_list(struct btrfs_discard_ctl *discard_ctl) spin_lock(&discard_ctl->lock); +again: cache = find_next_cache(discard_ctl, now); - if (cache && now < cache->discard_delay) + if (cache && now > cache->discard_delay) { + discard_ctl->cache = cache; + if (cache->discard_index == 0 && + cache->free_space_ctl->free_space != cache->key.offset) { + __btrfs_add_to_discard_list(discard_ctl, cache); + goto again; + } + if (btrfs_discard_reset_cursor(cache)) { + cache->discard_cursor = cache->key.objectid; + cache->discard_flags &= ~(BTRFS_DISCARD_RESET_CURSOR | + BTRFS_DISCARD_BITMAPS); + } + } else { cache = NULL; - - discard_ctl->cache = cache; - + } spin_unlock(&discard_ctl->lock); return cache; @@ -173,18 +192,42 @@ static void btrfs_discard_workfn(struct work_struct *work) discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); +again: cache = peek_discard_list(discard_ctl); if (!cache || !btrfs_run_discard_work(discard_ctl)) return; - btrfs_trim_block_group(cache, &trimmed, cache->key.objectid, - btrfs_block_group_end(cache), 0); + if (btrfs_discard_bitmaps(cache)) + btrfs_trim_block_group_bitmaps(cache, &trimmed, + cache->discard_cursor, + btrfs_block_group_end(cache), + 0, true); + else + btrfs_trim_block_group(cache, &trimmed, cache->discard_cursor, + btrfs_block_group_end(cache), 0, true); + + if (cache->discard_cursor >= btrfs_block_group_end(cache)) { + if (btrfs_discard_bitmaps(cache)) { + remove_from_discard_list(discard_ctl, cache); + if (btrfs_is_free_space_trimmed(cache)) + btrfs_mark_bg_unused(cache); + else if (cache->free_space_ctl->free_space == + cache->key.offset) + btrfs_add_to_discard_free_list(discard_ctl, + cache); + } else { + cache->discard_cursor = cache->key.objectid; + cache->discard_flags |= BTRFS_DISCARD_BITMAPS; + } + } + + spin_lock(&discard_ctl->lock); + discard_ctl->cache = NULL; + spin_unlock(&discard_ctl->lock); - remove_from_discard_list(discard_ctl, cache); - if (btrfs_is_free_space_trimmed(cache)) - btrfs_mark_bg_unused(cache); - else if (cache->free_space_ctl->free_space == cache->key.offset) - btrfs_add_to_discard_free_list(discard_ctl, cache); + /* we didn't trim anything but we really ought to so try again */ + if (trimmed == 0) + goto again; btrfs_discard_schedule_work(discard_ctl, false); } diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 55f79b624943..22cfa7e401bb 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -13,6 +13,22 @@ #include "block-group.h" #include "free-space-cache.h" +/* discard flags */ +#define BTRFS_DISCARD_RESET_CURSOR (1UL << 0) +#define BTRFS_DISCARD_BITMAPS (1UL << 1) + +static inline +bool btrfs_discard_reset_cursor(struct btrfs_block_group_cache *cache) +{ + return (cache->discard_flags & BTRFS_DISCARD_RESET_CURSOR); +} + +static inline +bool btrfs_discard_bitmaps(struct btrfs_block_group_cache *cache) +{ + return (cache->discard_flags & BTRFS_DISCARD_BITMAPS); +} + void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); void btrfs_add_to_discard_free_list(struct btrfs_discard_ctl *discard_ctl, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d69ee5f51b38..ff42e4abb01d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5683,7 +5683,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) &group_trimmed, start, end, - range->minlen); + range->minlen, + false); trimmed += group_trimmed; if (ret) { diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index ed0e7ee4c78d..97b3074e83c0 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3267,7 +3267,8 @@ static int do_trimming(struct btrfs_block_group_cache *block_group, } static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, - u64 *total_trimmed, u64 start, u64 end, u64 minlen) + u64 *total_trimmed, u64 start, u64 end, u64 minlen, + bool async) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; struct btrfs_free_space *entry; @@ -3284,36 +3285,25 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, mutex_lock(&ctl->cache_writeout_mutex); spin_lock(&ctl->tree_lock); - if (ctl->free_space < minlen) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - break; - } + if (ctl->free_space < minlen) + goto out_unlock; entry = tree_search_offset(ctl, start, 0, 1); - if (!entry) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - break; - } + if (!entry) + goto out_unlock; /* skip bitmaps */ - while (entry->bitmap) { + while (entry->bitmap || (async && + btrfs_free_space_trimmed(entry))) { node = rb_next(&entry->offset_index); - if (!node) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - goto out; - } + if (!node) + goto out_unlock; entry = rb_entry(node, struct btrfs_free_space, offset_index); } - if (entry->offset >= end) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - break; - } + if (entry->offset >= end) + goto out_unlock; extent_start = entry->offset; extent_bytes = entry->bytes; @@ -3338,10 +3328,15 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, ret = do_trimming(block_group, total_trimmed, start, bytes, extent_start, extent_bytes, extent_flags, &trim_entry); - if (ret) + if (ret) { + block_group->discard_cursor = start + bytes; break; + } next: start += bytes; + block_group->discard_cursor = start; + if (async && *total_trimmed) + break; if (fatal_signal_pending(current)) { ret = -ERESTARTSYS; @@ -3350,7 +3345,14 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, cond_resched(); } -out: + + return ret; + +out_unlock: + block_group->discard_cursor = btrfs_block_group_end(block_group); + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + return ret; } @@ -3390,7 +3392,8 @@ static void end_trimming_bitmap(struct btrfs_free_space *entry) } static int trim_bitmaps(struct btrfs_block_group_cache *block_group, - u64 *total_trimmed, u64 start, u64 end, u64 minlen) + u64 *total_trimmed, u64 start, u64 end, u64 minlen, + bool async) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; struct btrfs_free_space *entry; @@ -3407,13 +3410,16 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, spin_lock(&ctl->tree_lock); if (ctl->free_space < minlen) { + block_group->discard_cursor = + btrfs_block_group_end(block_group); spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); break; } entry = tree_search_offset(ctl, offset, 1, 0); - if (!entry) { + if (!entry || (async && start == offset && + btrfs_free_space_trimmed(entry))) { spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); next_bitmap = true; @@ -3446,6 +3452,16 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, goto next; } + /* + * We already trimmed a region, but are using the locking above + * to reset the BTRFS_FSC_TRIMMING_BITMAP flag. + */ + if (async && *total_trimmed) { + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + return ret; + } + bytes = min(bytes, end - start); if (bytes < minlen) { entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; @@ -3468,6 +3484,8 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, start, bytes, 0, &trim_entry); if (ret) { reset_trimming_bitmap(ctl, offset); + block_group->discard_cursor = + btrfs_block_group_end(block_group); break; } next: @@ -3477,6 +3495,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, } else { start += bytes; } + block_group->discard_cursor = start; if (fatal_signal_pending(current)) { if (start != offset) @@ -3488,6 +3507,9 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, cond_resched(); } + if (offset >= end) + block_group->discard_cursor = end; + return ret; } @@ -3532,7 +3554,8 @@ void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *block_group) } int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, - u64 *trimmed, u64 start, u64 end, u64 minlen) + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; int ret; @@ -3547,11 +3570,11 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, btrfs_get_block_group_trimming(block_group); spin_unlock(&block_group->lock); - ret = trim_no_bitmap(block_group, trimmed, start, end, minlen); - if (ret) + ret = trim_no_bitmap(block_group, trimmed, start, end, minlen, async); + if (ret || async) goto out; - ret = trim_bitmaps(block_group, trimmed, start, end, minlen); + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, false); /* if we ended in the middle of a bitmap, reset the trimming flag */ if (end % (BITS_PER_BITMAP * ctl->unit)) reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end)); @@ -3560,6 +3583,29 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, return ret; } +int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async) +{ + int ret; + + *trimmed = 0; + + spin_lock(&block_group->lock); + if (block_group->removed) { + spin_unlock(&block_group->lock); + return 0; + } + btrfs_get_block_group_trimming(block_group); + spin_unlock(&block_group->lock); + + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, async); + + btrfs_put_block_group_trimming(block_group); + return ret; + +} + /* * Find the left-most item in the cache tree, and then return the * smallest inode number in the item. diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index b688e70a7512..450ea01ea0c7 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -125,7 +125,11 @@ int btrfs_return_cluster_to_free_space( struct btrfs_block_group_cache *block_group, struct btrfs_free_cluster *cluster); int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, - u64 *trimmed, u64 start, u64 end, u64 minlen); + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async); +int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, + u64 *trimmed, u64 start, u64 end, u64 minlen, + bool async); /* Support functions for running our sanity tests */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS From patchwork Mon Oct 7 20:17:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178457 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03521139A for ; Mon, 7 Oct 2019 20:18:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CCEFF21479 for ; Mon, 7 Oct 2019 20:18:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479485; bh=8DwOSrctF/57SxEK0sYhWZsAQshXFFLygHLlxQXdVO8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=LAijPAWtRInD/fm7/wWHQD/qUCqpX7TkhxQ0+JP6nboZdtHT2oYiwAi+DDCiS4g18 dwwqMEP+fTU0ciMx8lfPWKxBj2520ZI4TH/bKIYIQLChhiR8bNUvwUUnrTCpJDxml7 M5QbmKp0mIk/ma47XO1viuFr47tMW0536mVdjWnk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729274AbfJGUSE (ORCPT ); Mon, 7 Oct 2019 16:18:04 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:43718 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728877AbfJGUSD (ORCPT ); Mon, 7 Oct 2019 16:18:03 -0400 Received: by mail-qk1-f194.google.com with SMTP id h126so13898476qke.10 for ; Mon, 07 Oct 2019 13:18:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=FEfKhK13RzvZMEwZXKKlhPFCSdu8GvaM4WgRr2ZLFRo=; b=WC2sTIfw/Kxjj6VtKrLefmT9ZrwHPwWbmZncZROpJkklQQ+FkX47TR6PtcxygmZtBI 5x6uTkDZeS69pdT7XVY12ysl+sgfhzMzo2IpD27bLSSXQr8obmUpzffKihevBTsj3JQQ ooZXPFFhtSjLvKHOANo0q8OFhZohzK+g8S4jmyicwST3KWVggKE8N+viQ6BSrIKGF7BZ TmvK5F23TkhnIxVMHdmSrUiSriya27X37TwibIs8gBkJR1LFCq7oHOlfQniJ+jkPBS4e QB5xQG4TXURFHGqBm3CEPVokSGD7pmOQQw/3A1jTORoQvs3h4dJ1ee2KEGoNp5tzFySx 5gsA== X-Gm-Message-State: APjAAAXdY+k48FqvD2lpp/BAOBJbYPhIq/biLUKaxj0G2Bx+dAa8gIT7 GRzQX9Ul7c8nv7Emyt4ur2Q= X-Google-Smtp-Source: APXvYqyUWGEmo62N0950dGheEqkGo2AId5YeHWrIKEIjeZpU4QKjcY/zEc9NPkgLcqLOOYthxiRQ2w== X-Received: by 2002:a37:d52:: with SMTP id 79mr25470194qkn.300.1570479482645; Mon, 07 Oct 2019 13:18:02 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:01 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 08/19] btrfs: track discardable extents for asnyc discard Date: Mon, 7 Oct 2019 16:17:39 -0400 Message-Id: <31c4f29228c76df72cc92112e397db648e9b9ab9.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The number of discardable extents will serve as the rate limiting metric for how often we should discard. This keeps track of discardable extents in the free space caches by maintaining deltas and propagating them to the global count. This also setups up a discard directory in btrfs sysfs and exports the total discard_extents count. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 2 + fs/btrfs/discard.c | 2 + fs/btrfs/discard.h | 19 ++++++++ fs/btrfs/free-space-cache.c | 93 ++++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.h | 2 + fs/btrfs/sysfs.c | 33 +++++++++++++ 6 files changed, 144 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c328d2e85e4d..43e515939b9c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -447,6 +447,7 @@ struct btrfs_discard_ctl { spinlock_t lock; struct btrfs_block_group_cache *cache; struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; + atomic_t discard_extents; }; /* delayed seq elem */ @@ -831,6 +832,7 @@ struct btrfs_fs_info { struct btrfs_workqueue *scrub_wr_completion_workers; struct btrfs_workqueue *scrub_parity_workers; + struct kobject *discard_kobj; struct btrfs_discard_ctl discard_ctl; #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 26a1e44b4bfa..0544eb6717d4 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -298,6 +298,8 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) INIT_LIST_HEAD(&discard_ctl->discard_list[i]); + + atomic_set(&discard_ctl->discard_extents, 0); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 22cfa7e401bb..85939d62521e 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -71,4 +71,23 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, btrfs_discard_schedule_work(discard_ctl, false); } +static inline +void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, + struct btrfs_free_space_ctl *ctl) +{ + struct btrfs_discard_ctl *discard_ctl; + s32 extents_delta; + + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + return; + + discard_ctl = &cache->fs_info->discard_ctl; + + extents_delta = ctl->discard_extents[0] - ctl->discard_extents[1]; + if (extents_delta) { + atomic_add(extents_delta, &discard_ctl->discard_extents); + ctl->discard_extents[1] = ctl->discard_extents[0]; + } +} + #endif diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 97b3074e83c0..6c2bebfd206f 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -32,6 +32,9 @@ struct btrfs_trim_range { struct list_head list; }; +static int count_bitmap_extents(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *bitmap_info); + static int link_free_space(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info); static void unlink_free_space(struct btrfs_free_space_ctl *ctl, @@ -809,12 +812,15 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, ret = io_ctl_read_bitmap(&io_ctl, e); if (ret) goto free_cache; + e->bitmap_extents = count_bitmap_extents(ctl, e); + ctl->discard_extents[0] += e->bitmap_extents; } io_ctl_drop_pages(&io_ctl); merge_space_tree(ctl); ret = 1; out: + btrfs_discard_update_discardable(ctl->private, ctl); io_ctl_free(&io_ctl); return ret; free_cache: @@ -1629,6 +1635,9 @@ __unlink_free_space(struct btrfs_free_space_ctl *ctl, { rb_erase(&info->offset_index, &ctl->free_space_offset); ctl->free_extents--; + + if (!info->bitmap && !btrfs_free_space_trimmed(info)) + ctl->discard_extents[0]--; } static void unlink_free_space(struct btrfs_free_space_ctl *ctl, @@ -1649,6 +1658,9 @@ static int link_free_space(struct btrfs_free_space_ctl *ctl, if (ret) return ret; + if (!info->bitmap && !btrfs_free_space_trimmed(info)) + ctl->discard_extents[0]++; + ctl->free_space += info->bytes; ctl->free_extents++; return ret; @@ -1705,17 +1717,29 @@ static inline void __bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset, u64 bytes) { - unsigned long start, count; + unsigned long start, count, end; + int extent_delta = -1; start = offset_to_bit(info->offset, ctl->unit, offset); count = bytes_to_bits(bytes, ctl->unit); - ASSERT(start + count <= BITS_PER_BITMAP); + end = start + count; + ASSERT(end <= BITS_PER_BITMAP); bitmap_clear(info->bitmap, start, count); info->bytes -= bytes; if (info->max_extent_size > ctl->unit) info->max_extent_size = 0; + + if (start && test_bit(start - 1, info->bitmap)) + extent_delta++; + + if (end < BITS_PER_BITMAP && test_bit(end, info->bitmap)) + extent_delta++; + + info->bitmap_extents += extent_delta; + if (!btrfs_free_space_trimmed(info)) + ctl->discard_extents[0] += extent_delta; } static void bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, @@ -1730,16 +1754,28 @@ static void bitmap_set_bits(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset, u64 bytes) { - unsigned long start, count; + unsigned long start, count, end; + int extent_delta = 1; start = offset_to_bit(info->offset, ctl->unit, offset); count = bytes_to_bits(bytes, ctl->unit); - ASSERT(start + count <= BITS_PER_BITMAP); + end = start + count; + ASSERT(end <= BITS_PER_BITMAP); bitmap_set(info->bitmap, start, count); info->bytes += bytes; ctl->free_space += bytes; + + if (start && test_bit(start - 1, info->bitmap)) + extent_delta--; + + if (end < BITS_PER_BITMAP && test_bit(end, info->bitmap)) + extent_delta--; + + info->bitmap_extents += extent_delta; + if (!btrfs_free_space_trimmed(info)) + ctl->discard_extents[0] += extent_delta; } /* @@ -1875,11 +1911,35 @@ find_free_space(struct btrfs_free_space_ctl *ctl, u64 *offset, u64 *bytes, return NULL; } +static int count_bitmap_extents(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *bitmap_info) +{ + struct btrfs_block_group_cache *cache = ctl->private; + u64 bytes = bitmap_info->bytes; + unsigned int rs, re; + int count = 0; + + if (!cache || !bytes) + return count; + + bitmap_for_each_set_region(bitmap_info->bitmap, rs, re, 0, + BITS_PER_BITMAP) { + bytes -= (rs - re) * ctl->unit; + count++; + + if (!bytes) + break; + } + + return count; +} + static void add_new_bitmap(struct btrfs_free_space_ctl *ctl, struct btrfs_free_space *info, u64 offset) { info->offset = offset_to_bitmap(ctl, offset); info->bytes = 0; + info->bitmap_extents = 0; INIT_LIST_HEAD(&info->list); link_free_space(ctl, info); ctl->total_bitmaps++; @@ -1981,8 +2041,11 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl, u64 bytes_to_set = 0; u64 end; - if (!(flags & BTRFS_FSC_TRIMMED)) + if (!(flags & BTRFS_FSC_TRIMMED)) { + if (btrfs_free_space_trimmed(info)) + ctl->discard_extents[0] += info->bitmap_extents; info->flags &= ~(BTRFS_FSC_TRIMMED | BTRFS_FSC_TRIMMING_BITMAP); + } end = info->offset + (u64)(BITS_PER_BITMAP * ctl->unit); @@ -2397,6 +2460,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, if (ret) kmem_cache_free(btrfs_free_space_cachep, info); out: + btrfs_discard_update_discardable(cache, ctl); spin_unlock(&ctl->tree_lock); if (ret) { @@ -2506,6 +2570,7 @@ int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group, goto again; } out_lock: + btrfs_discard_update_discardable(block_group, ctl); spin_unlock(&ctl->tree_lock); out: return ret; @@ -2591,8 +2656,16 @@ __btrfs_return_cluster_to_free_space( bitmap = (entry->bitmap != NULL); if (!bitmap) { + /* merging treats extents as if they were new */ + if (!btrfs_free_space_trimmed(entry)) + ctl->discard_extents[0]--; + try_merge_free_space(ctl, entry, false); steal_from_bitmap(ctl, entry, false); + + /* as we insert directly, update these statistics */ + if (!btrfs_free_space_trimmed(entry)) + ctl->discard_extents[0]++; } tree_insert_offset(&ctl->free_space_offset, entry->offset, &entry->offset_index, bitmap); @@ -2649,6 +2722,7 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group_cache *block_group) cond_resched_lock(&ctl->tree_lock); } __btrfs_remove_free_space_cache_locked(ctl); + btrfs_discard_update_discardable(block_group, ctl); spin_unlock(&ctl->tree_lock); } @@ -2717,6 +2791,7 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, link_free_space(ctl, entry); } out: + btrfs_discard_update_discardable(block_group, ctl); spin_unlock(&ctl->tree_lock); if (align_gap_len) @@ -2882,6 +2957,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, entry->bitmap); ctl->total_bitmaps--; ctl->op->recalc_thresholds(ctl); + } else if (!btrfs_free_space_trimmed(entry)) { + ctl->discard_extents[0]--; } kmem_cache_free(btrfs_free_space_cachep, entry); } @@ -3383,11 +3460,13 @@ static void reset_trimming_bitmap(struct btrfs_free_space_ctl *ctl, u64 offset) spin_unlock(&ctl->tree_lock); } -static void end_trimming_bitmap(struct btrfs_free_space *entry) +static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, + struct btrfs_free_space *entry) { if (btrfs_free_space_trimming_bitmap(entry)) { entry->flags |= BTRFS_FSC_TRIMMED; entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; + ctl->discard_extents[0] -= entry->bitmap_extents; } } @@ -3443,7 +3522,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, * if BTRFS_FSC_TRIMMED is set on a bitmap. */ if (ret2 && !minlen) - end_trimming_bitmap(entry); + end_trimming_bitmap(ctl, entry); else entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; spin_unlock(&ctl->tree_lock); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 450ea01ea0c7..855f42dc15cd 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -16,6 +16,7 @@ struct btrfs_free_space { u64 max_extent_size; unsigned long *bitmap; struct list_head list; + s32 bitmap_extents; u32 flags; }; @@ -39,6 +40,7 @@ struct btrfs_free_space_ctl { int total_bitmaps; int unit; u64 start; + s32 discard_extents[2]; const struct btrfs_free_space_op *op; void *private; struct mutex cache_writeout_mutex; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index f6d3c80f2e28..14c6910128f1 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -11,6 +11,7 @@ #include #include "ctree.h" +#include "discard.h" #include "disk-io.h" #include "transaction.h" #include "sysfs.h" @@ -470,6 +471,22 @@ static const struct attribute *allocation_attrs[] = { NULL, }; +static ssize_t btrfs_discard_extents_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + + return snprintf(buf, PAGE_SIZE, "%d\n", + atomic_read(&fs_info->discard_ctl.discard_extents)); +} +BTRFS_ATTR(discard, discard_extents, btrfs_discard_extents_show); + +static const struct attribute *discard_attrs[] = { + BTRFS_ATTR_PTR(discard, discard_extents), + NULL, +}; + static ssize_t btrfs_label_show(struct kobject *kobj, struct kobj_attribute *a, char *buf) { @@ -727,6 +744,12 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) { btrfs_reset_fs_info_ptr(fs_info); + if (fs_info->discard_kobj) { + sysfs_remove_files(fs_info->discard_kobj, discard_attrs); + kobject_del(fs_info->discard_kobj); + kobject_put(fs_info->discard_kobj); + } + if (fs_info->space_info_kobj) { sysfs_remove_files(fs_info->space_info_kobj, allocation_attrs); kobject_del(fs_info->space_info_kobj); @@ -1093,6 +1116,16 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) if (error) goto failure; + fs_info->discard_kobj = kobject_create_and_add("discard", fsid_kobj); + if (!fs_info->discard_kobj) { + error = -ENOMEM; + goto failure; + } + + error = sysfs_create_files(fs_info->discard_kobj, discard_attrs); + if (error) + goto failure; + return 0; failure: btrfs_sysfs_remove_mounted(fs_info); From patchwork Mon Oct 7 20:17:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178459 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D9F94139A for ; Mon, 7 Oct 2019 20:18:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B03A821721 for ; Mon, 7 Oct 2019 20:18:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479486; bh=4Oqsl9BM0yVKIDiSdIVeuWQpkRoTqgNPW5gjsxVR/Cw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=xYjGz/DTqDjoeWCg5tW97qWftYrynebdEZMu+7B+Fd76uG0+tXHS6BVPPTEFhTwk2 ywcn5ok3S9UTE0O+VzyJAY4mjN3YunqK54Uphwp5291tzMZrwM8iX8cLv4Q5Lfqo2a gbELg12tUS4QLCNvxORAlQBEObcAGQ5hjEfgLyoE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729260AbfJGUSF (ORCPT ); Mon, 7 Oct 2019 16:18:05 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:33843 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729264AbfJGUSF (ORCPT ); Mon, 7 Oct 2019 16:18:05 -0400 Received: by mail-qk1-f193.google.com with SMTP id q203so13951920qke.1 for ; Mon, 07 Oct 2019 13:18:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=5d4bzRvGzIjCUR9aMJmJfxJ30nvzwsQ76WLURJDIoOg=; b=pOxI8LaUDAP3VVMex1c51nueWviGwvEN83wmW/TTxS8sZCWsgSnKk3u77qYXjczFn0 KQ0kTG8ws94vtTe7JLMv3ljlzBR+oDYebIcH3OcZc0UC7UCZgBZcFNMpdsSWJJb19gpr mIRe2qWEEYpl2/w4646R4MD5QyU5tiS8DNMcRZGa4lozAETJvHw0bORp1Sw07apwhgjZ XDTS9y8TRmjhFkzX7hEa8REL9afSz0w9ALu7HuWaj/4QrvpRC30RgwFKlwivUtEJ9llY O6i3yZKAYIHFoEeUMV5sIzTWTZngQhZx76YOuw9P2z8Mq2H4K/CNYn+SENyZJub2jP2m ja3g== X-Gm-Message-State: APjAAAXR7EEOGOWlN6YvbgkDsG54G4mqzM0DITaC8v8NYc+6tPJk67im jikwbEjIzvioyKfdqJ7LFdI= X-Google-Smtp-Source: APXvYqw1YusE9tk/IdkWMSnGhM0FSNoftUf84zL7dm2tWoDJ3h+0ao765zuADcceggWlFf9yyKYGmA== X-Received: by 2002:a37:44e:: with SMTP id 75mr23457700qke.99.1570479483951; Mon, 07 Oct 2019 13:18:03 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:03 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 09/19] btrfs: keep track of discardable_bytes Date: Mon, 7 Oct 2019 16:17:40 -0400 Message-Id: X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Keep track of this metric so that we can understand how ahead or behind we are in discarding rate. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 1 + fs/btrfs/discard.c | 1 + fs/btrfs/discard.h | 7 +++++++ fs/btrfs/free-space-cache.c | 32 +++++++++++++++++++++++++------- fs/btrfs/free-space-cache.h | 1 + fs/btrfs/sysfs.c | 12 ++++++++++++ 6 files changed, 47 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 43e515939b9c..8479ab037812 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -448,6 +448,7 @@ struct btrfs_discard_ctl { struct btrfs_block_group_cache *cache; struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; atomic_t discard_extents; + atomic64_t discardable_bytes; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 0544eb6717d4..75a2ff14b3c0 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -300,6 +300,7 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) INIT_LIST_HEAD(&discard_ctl->discard_list[i]); atomic_set(&discard_ctl->discard_extents, 0); + atomic64_set(&discard_ctl->discardable_bytes, 0); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 85939d62521e..d55a9a9f8ad8 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -77,6 +77,7 @@ void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, { struct btrfs_discard_ctl *discard_ctl; s32 extents_delta; + s64 bytes_delta; if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) return; @@ -88,6 +89,12 @@ void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, atomic_add(extents_delta, &discard_ctl->discard_extents); ctl->discard_extents[1] = ctl->discard_extents[0]; } + + bytes_delta = ctl->discardable_bytes[0] - ctl->discardable_bytes[1]; + if (bytes_delta) { + atomic64_add(bytes_delta, &discard_ctl->discardable_bytes); + ctl->discardable_bytes[1] = ctl->discardable_bytes[0]; + } } #endif diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 6c2bebfd206f..54f3c8325858 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -814,6 +814,7 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, goto free_cache; e->bitmap_extents = count_bitmap_extents(ctl, e); ctl->discard_extents[0] += e->bitmap_extents; + ctl->discardable_bytes[0] += e->bytes; } io_ctl_drop_pages(&io_ctl); @@ -1636,8 +1637,10 @@ __unlink_free_space(struct btrfs_free_space_ctl *ctl, rb_erase(&info->offset_index, &ctl->free_space_offset); ctl->free_extents--; - if (!info->bitmap && !btrfs_free_space_trimmed(info)) + if (!info->bitmap && !btrfs_free_space_trimmed(info)) { ctl->discard_extents[0]--; + ctl->discardable_bytes[0] -= info->bytes; + } } static void unlink_free_space(struct btrfs_free_space_ctl *ctl, @@ -1658,8 +1661,10 @@ static int link_free_space(struct btrfs_free_space_ctl *ctl, if (ret) return ret; - if (!info->bitmap && !btrfs_free_space_trimmed(info)) + if (!info->bitmap && !btrfs_free_space_trimmed(info)) { ctl->discard_extents[0]++; + ctl->discardable_bytes[0] += info->bytes; + } ctl->free_space += info->bytes; ctl->free_extents++; @@ -1738,8 +1743,10 @@ static inline void __bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, extent_delta++; info->bitmap_extents += extent_delta; - if (!btrfs_free_space_trimmed(info)) + if (!btrfs_free_space_trimmed(info)) { ctl->discard_extents[0] += extent_delta; + ctl->discardable_bytes[0] -= bytes; + } } static void bitmap_clear_bits(struct btrfs_free_space_ctl *ctl, @@ -1774,8 +1781,10 @@ static void bitmap_set_bits(struct btrfs_free_space_ctl *ctl, extent_delta--; info->bitmap_extents += extent_delta; - if (!btrfs_free_space_trimmed(info)) + if (!btrfs_free_space_trimmed(info)) { ctl->discard_extents[0] += extent_delta; + ctl->discardable_bytes[0] += bytes; + } } /* @@ -2042,8 +2051,10 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl, u64 end; if (!(flags & BTRFS_FSC_TRIMMED)) { - if (btrfs_free_space_trimmed(info)) + if (btrfs_free_space_trimmed(info)) { ctl->discard_extents[0] += info->bitmap_extents; + ctl->discardable_bytes[0] += info->bytes; + } info->flags &= ~(BTRFS_FSC_TRIMMED | BTRFS_FSC_TRIMMING_BITMAP); } @@ -2657,15 +2668,19 @@ __btrfs_return_cluster_to_free_space( bitmap = (entry->bitmap != NULL); if (!bitmap) { /* merging treats extents as if they were new */ - if (!btrfs_free_space_trimmed(entry)) + if (!btrfs_free_space_trimmed(entry)) { ctl->discard_extents[0]--; + ctl->discardable_bytes[0] -= entry->bytes; + } try_merge_free_space(ctl, entry, false); steal_from_bitmap(ctl, entry, false); /* as we insert directly, update these statistics */ - if (!btrfs_free_space_trimmed(entry)) + if (!btrfs_free_space_trimmed(entry)) { ctl->discard_extents[0]++; + ctl->discardable_bytes[0] += entry->bytes; + } } tree_insert_offset(&ctl->free_space_offset, entry->offset, &entry->offset_index, bitmap); @@ -2950,6 +2965,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, spin_lock(&ctl->tree_lock); ctl->free_space -= bytes; + if (!entry->bitmap && !btrfs_free_space_trimmed(entry)) + ctl->discardable_bytes[0] -= bytes; if (entry->bytes == 0) { ctl->free_extents--; if (entry->bitmap) { @@ -3467,6 +3484,7 @@ static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, entry->flags |= BTRFS_FSC_TRIMMED; entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; ctl->discard_extents[0] -= entry->bitmap_extents; + ctl->discardable_bytes[0] -= entry->bytes; } } diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 855f42dc15cd..c5cce44b03af 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -41,6 +41,7 @@ struct btrfs_free_space_ctl { int unit; u64 start; s32 discard_extents[2]; + s64 discardable_bytes[2]; const struct btrfs_free_space_op *op; void *private; struct mutex cache_writeout_mutex; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 14c6910128f1..a2852706ec6c 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -482,8 +482,20 @@ static ssize_t btrfs_discard_extents_show(struct kobject *kobj, } BTRFS_ATTR(discard, discard_extents, btrfs_discard_extents_show); +static ssize_t btrfs_discardable_bytes_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + atomic64_read(&fs_info->discard_ctl.discardable_bytes)); +} +BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discard_extents), + BTRFS_ATTR_PTR(discard, discardable_bytes), NULL, }; From patchwork Mon Oct 7 20:17:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178463 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73F24139A for ; Mon, 7 Oct 2019 20:18:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 48C4621721 for ; Mon, 7 Oct 2019 20:18:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479489; bh=hyFNSL8jYGqGiyGNCPVSgMzOmVhYmt9B/3Ya1zof1e8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=Dffy2FNut90tF8QOCLphxhhjgooDp0ZmTyMnZA/nDuk0jwp/qWGlqz150BqMRSmnd Gp4YCGg0IQ6tR4TQksOy3f0M8cErawE7vMZmG7oJPTceIF+iRgjqa5gc3SI5faWpOn S54hplurWjblxd2+yTZVSrFYuGKpQIp48GlED7wg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729302AbfJGUSH (ORCPT ); Mon, 7 Oct 2019 16:18:07 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:42276 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728877AbfJGUSG (ORCPT ); Mon, 7 Oct 2019 16:18:06 -0400 Received: by mail-qk1-f193.google.com with SMTP id f16so13910144qkl.9 for ; Mon, 07 Oct 2019 13:18:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=pFrOtMC1OaM1f7SjzSV9uatJzpYwvrI5BTRglYiLU7Q=; b=lLDheAOYPCwnd0FKMJeHjFlmAFu2S5wgwp/tMAac0sXcMVFjYPQez66VfCm+l6MQYW 6j9r5AI8ZbI1VtZEs1pfneRy1Lei3g1v85V8H8fNMqChw1WAb9xpwbh3KWTP0P0VKTFD ZKDStPmANAqPqkeT/iYgr6V5c9mmD7d24NqeQuqqBO0yHOplv8yvepy2EWmkYOIGAfNV uqdQEtVZFtlXWfDO6ms1aU68vLtPosGcQzpI+SQaWG0ETLJAarHXqN6WoxLq2ihUpJKE s3TdoX+2nNnKtY7s2TOWGBLoQGptVY1YPIn84/kQDZ/tb1VFPp/TPa3s4TbRIdtie/S5 AWmg== X-Gm-Message-State: APjAAAU12tgGi9ilEjZN6wM+X3Aoepdn3cZ5Ij4HWJgIMQGmcekw7WC5 NAb6ZBadkPQGR9p4ZLgH9jk= X-Google-Smtp-Source: APXvYqydA5kmPCs2yCTdIhbRAZtm2wHQ85bI/V8WGsNsEMHEF5FJ7+OLlTc3RYFtUIeUs1Rg7CG6xQ== X-Received: by 2002:a37:a849:: with SMTP id r70mr25242221qke.37.1570479485003; Mon, 07 Oct 2019 13:18:05 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:04 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 10/19] btrfs: calculate discard delay based on number of extents Date: Mon, 7 Oct 2019 16:17:41 -0400 Message-Id: <37690bf17c3b3c9f20137fb186c7af4021bb664b.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Use the number of discardable extents to help guide our discard delay interval. This value is reevaluated every transaction commit. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/discard.c | 31 +++++++++++++++++++++++++++++-- fs/btrfs/discard.h | 3 +++ fs/btrfs/extent-tree.c | 4 +++- fs/btrfs/sysfs.c | 30 ++++++++++++++++++++++++++++++ 5 files changed, 67 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8479ab037812..b0823961d049 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -449,6 +449,8 @@ struct btrfs_discard_ctl { struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; atomic_t discard_extents; atomic64_t discardable_bytes; + atomic_t delay; + atomic_t iops_limit; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 75a2ff14b3c0..c7afb5f8240d 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -15,6 +15,11 @@ #define BTRFS_DISCARD_DELAY (300ULL * NSEC_PER_SEC) +/* target discard delay in milliseconds */ +#define BTRFS_DISCARD_TARGET_MSEC (6 * 60 * 60ULL * MSEC_PER_SEC) +#define BTRFS_DISCARD_MAX_DELAY (10000UL) +#define BTRFS_DISCARD_MAX_IOPS (10UL) + static struct list_head * btrfs_get_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) @@ -170,10 +175,12 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, cache = find_next_cache(discard_ctl, now); if (cache) { - u64 delay = 0; + u64 delay = atomic_read(&discard_ctl->delay); if (now < cache->discard_delay) - delay = nsecs_to_jiffies(cache->discard_delay - now); + delay = max_t(u64, delay, + nsecs_to_jiffies(cache->discard_delay - + now)); mod_delayed_work(discard_ctl->discard_workers, &discard_ctl->work, @@ -232,6 +239,24 @@ static void btrfs_discard_workfn(struct work_struct *work) btrfs_discard_schedule_work(discard_ctl, false); } +void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl) +{ + s32 discard_extents = atomic_read(&discard_ctl->discard_extents); + s32 iops_limit; + unsigned long delay; + + if (!discard_extents) + return; + + iops_limit = atomic_read(&discard_ctl->iops_limit); + if (iops_limit) + iops_limit = MSEC_PER_SEC / iops_limit; + + delay = BTRFS_DISCARD_TARGET_MSEC / discard_extents; + delay = clamp_t(s32, delay, iops_limit, BTRFS_DISCARD_MAX_DELAY); + atomic_set(&discard_ctl->delay, msecs_to_jiffies(delay)); +} + void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info) { struct btrfs_block_group_cache *cache, *next; @@ -301,6 +326,8 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) atomic_set(&discard_ctl->discard_extents, 0); atomic64_set(&discard_ctl->discardable_bytes, 0); + atomic_set(&discard_ctl->delay, BTRFS_DISCARD_MAX_DELAY); + atomic_set(&discard_ctl->iops_limit, BTRFS_DISCARD_MAX_IOPS); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index d55a9a9f8ad8..acaf56f63b1c 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -7,6 +7,8 @@ #define BTRFS_DISCARD_H #include +#include +#include #include #include "ctree.h" @@ -39,6 +41,7 @@ void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, bool override); +void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl); void btrfs_discard_resume(struct btrfs_fs_info *fs_info); void btrfs_discard_stop(struct btrfs_fs_info *fs_info); void btrfs_discard_init(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ff42e4abb01d..ab0d46da3771 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2920,8 +2920,10 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) cond_resched(); } - if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) { + btrfs_discard_calc_delay(&fs_info->discard_ctl); btrfs_discard_schedule_work(&fs_info->discard_ctl, true); + } /* * Transaction is finished. We don't need the lock anymore. We diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index a2852706ec6c..b9a62e470316 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -493,9 +493,39 @@ static ssize_t btrfs_discardable_bytes_show(struct kobject *kobj, } BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show); +static ssize_t btrfs_discard_iops_limit_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + + return snprintf(buf, PAGE_SIZE, "%d\n", + atomic_read(&fs_info->discard_ctl.iops_limit)); +} + +static ssize_t btrfs_discard_iops_limit_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + s32 iops_limit; + int ret; + + ret = kstrtos32(buf, 10, &iops_limit); + if (ret || iops_limit < 0) + return -EINVAL; + + atomic_set(&fs_info->discard_ctl.iops_limit, iops_limit); + + return len; +} +BTRFS_ATTR_RW(discard, iops_limit, btrfs_discard_iops_limit_show, + btrfs_discard_iops_limit_store); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discard_extents), BTRFS_ATTR_PTR(discard, discardable_bytes), + BTRFS_ATTR_PTR(discard, iops_limit), NULL, }; From patchwork Mon Oct 7 20:17:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178461 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D181417D4 for ; Mon, 7 Oct 2019 20:18:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B1B1621479 for ; Mon, 7 Oct 2019 20:18:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479488; bh=s+E5WbSeNyqiagC8h7fnSuS1TLncvySOgb/XX99VqbY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=WxTu5KuxbRxqGiZjc4mUZ4zjAN+U105vhClkFjnll8kWoiTMLmOCMHPS2XFKDXQxx E8SymaUDc06YYW6H9NGAoI1Auw1Mkdk41+625xVY+BdHrEx0+JyQC7VAY2Hr/3NXBd LdI7IIQuHowj+PPzta3rsgmcOwE5AGXb2PcMnxRA= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729330AbfJGUSH (ORCPT ); Mon, 7 Oct 2019 16:18:07 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:35699 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729264AbfJGUSG (ORCPT ); Mon, 7 Oct 2019 16:18:06 -0400 Received: by mail-qk1-f196.google.com with SMTP id w2so13945230qkf.2 for ; Mon, 07 Oct 2019 13:18:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=D69HcfbLaSqbc4D/lcdl3qkYuH7POSGAARB4xBOl5Rw=; b=IsN/ITdF0QX1bNntnsz2aGkeAmknZhbtdgEP5lVJji5cPW7X565ZFHPCcNPuHgJ718 QYDrORVZxHzRRR4bt53PQ7nq6/9QZMt14ZKG77dqLmTo0oY98a1rUXYnWEk34aOCCWSH 4iq2OmVgRudfMC7/o1aPDiY9r1WTYQ8C9IcVuTUjEdtdUK0SUpChVj9Ly2uI+I4XxyaO lfR4EeqWvYHf0q4EZX2e675ctxQmNh8J/iCA7BhtWDnTA5gJHaBGQB3LnDazNm4rm2VZ o5JV80qcoMU/WW8FjS6b0Gds7rblrN0CXOoDoFXWSfXmoB63iNk51iu1DdNJ/Ed4Kmds 5C+w== X-Gm-Message-State: APjAAAWRq8+yBc7DtoC2jvIaicnJ5mPxP9ocv/iAh68PDGq+VJPLPtEi lOiXMFQm03BswlEWz+UKSyQ= X-Google-Smtp-Source: APXvYqxCFuD4JEDI7jlIDOud2CKVs841A1prKKAVpL/PSod8rjpj4Y2B8/2huArkWdrGQV/mxmGeWw== X-Received: by 2002:a37:2e42:: with SMTP id u63mr25014950qkh.157.1570479486024; Mon, 07 Oct 2019 13:18:06 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:05 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 11/19] btrfs: add bps discard rate limit Date: Mon, 7 Oct 2019 16:17:42 -0400 Message-Id: <17152a4b1f9a0719623af4ef98e5e8670dd70799.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Provide an ability to rate limit based on mbps in addition to the iops delay calculated from number of discardable extents. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/discard.c | 11 +++++++++++ fs/btrfs/sysfs.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b0823961d049..e81f699347e0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -447,10 +447,12 @@ struct btrfs_discard_ctl { spinlock_t lock; struct btrfs_block_group_cache *cache; struct list_head discard_list[BTRFS_NR_DISCARD_LISTS]; + u64 prev_discard; atomic_t discard_extents; atomic64_t discardable_bytes; atomic_t delay; atomic_t iops_limit; + atomic64_t bps_limit; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index c7afb5f8240d..072c73f48297 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -176,6 +176,13 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl, cache = find_next_cache(discard_ctl, now); if (cache) { u64 delay = atomic_read(&discard_ctl->delay); + s64 bps_limit = atomic64_read(&discard_ctl->bps_limit); + + if (bps_limit) + delay = max_t(u64, delay, + msecs_to_jiffies(MSEC_PER_SEC * + discard_ctl->prev_discard / + bps_limit)); if (now < cache->discard_delay) delay = max_t(u64, delay, @@ -213,6 +220,8 @@ static void btrfs_discard_workfn(struct work_struct *work) btrfs_trim_block_group(cache, &trimmed, cache->discard_cursor, btrfs_block_group_end(cache), 0, true); + discard_ctl->prev_discard = trimmed; + if (cache->discard_cursor >= btrfs_block_group_end(cache)) { if (btrfs_discard_bitmaps(cache)) { remove_from_discard_list(discard_ctl, cache); @@ -324,10 +333,12 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++) INIT_LIST_HEAD(&discard_ctl->discard_list[i]); + discard_ctl->prev_discard = 0; atomic_set(&discard_ctl->discard_extents, 0); atomic64_set(&discard_ctl->discardable_bytes, 0); atomic_set(&discard_ctl->delay, BTRFS_DISCARD_MAX_DELAY); atomic_set(&discard_ctl->iops_limit, BTRFS_DISCARD_MAX_IOPS); + atomic64_set(&discard_ctl->bps_limit, 0); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index b9a62e470316..6fc4d644401b 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -522,10 +522,40 @@ static ssize_t btrfs_discard_iops_limit_store(struct kobject *kobj, BTRFS_ATTR_RW(discard, iops_limit, btrfs_discard_iops_limit_show, btrfs_discard_iops_limit_store); +static ssize_t btrfs_discard_bps_limit_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + atomic64_read(&fs_info->discard_ctl.bps_limit)); +} + +static ssize_t btrfs_discard_bps_limit_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + s64 bps_limit; + int ret; + + ret = kstrtos64(buf, 10, &bps_limit); + if (ret || bps_limit < 0) + return -EINVAL; + + atomic64_set(&fs_info->discard_ctl.bps_limit, bps_limit); + + return len; +} +BTRFS_ATTR_RW(discard, bps_limit, btrfs_discard_bps_limit_show, + btrfs_discard_bps_limit_store); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discard_extents), BTRFS_ATTR_PTR(discard, discardable_bytes), BTRFS_ATTR_PTR(discard, iops_limit), + BTRFS_ATTR_PTR(discard, bps_limit), NULL, }; From patchwork Mon Oct 7 20:17:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178465 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 361B3139A for ; Mon, 7 Oct 2019 20:18:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 15FF621721 for ; Mon, 7 Oct 2019 20:18:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479490; bh=KyfANiAyRVdTJe4vsNZ/V9JMuB0SCD8c0hqKUEvRzHY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=nAbB9pivu7OMePWtPSuw+4wFI2+Zk1WNs/NPkmnjfvuriZmAZXHs5V6l/Cdf5jK60 Pxbb6y/J3TylwvnPg1DMpguXs8iMddHz78xt7OpaI++MPmAAjnKvbiYGS8nrnqrh04 k6udSwmFNaIzmhhwKwEgR7NU3lF2mb/SYuCZwAo8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729345AbfJGUSJ (ORCPT ); Mon, 7 Oct 2019 16:18:09 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:43728 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728877AbfJGUSH (ORCPT ); Mon, 7 Oct 2019 16:18:07 -0400 Received: by mail-qk1-f193.google.com with SMTP id h126so13898729qke.10 for ; Mon, 07 Oct 2019 13:18:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=p5f1fR6uqshj2e3AeHo2v6JCgHrjUmX0NwZ1sL8nb/w=; b=D/VHKxyb1JVLWdHepr929HzBwspAtMsLAqw7DJsEe8a0TYgqYjvdhZ9N2bqQe6DISJ yg2aTfek3jSTGH2sI42vpcns5JgP3V6ZQywb+5Cshna34+8/pQaKZfZLCXrEF5v/wR4j E5RbkBTB83rCHCJUybXbDtR2tTYVeUx/i0Sthg8KFE0JnMuU8MBf71uF6m8LU8ne3rBX obXBwhMe2/rJamxWa9DRIpkuuwbH53Jl0dVDx8MOBo8NE1K6W/x0uq6IESBRUE4Kc6qQ dslJBv4/N6sCmMrsosbfIC6+TtmnIDN37shO59YDolow5rFwMe2LD/KINOO129mVdpnp Yvow== X-Gm-Message-State: APjAAAVioj1ROwW4rdxlQRjSW5Tr7qE3ParQBQI/KnECVuwbiSpB2aBp PZCzYZ6jaKZmVRI3Q0HFqSk= X-Google-Smtp-Source: APXvYqzNXgFQ2DXnJExJLG/lQfvB+VBMhs1EF72u02hdYNmMFtH4MlHRnDSPOb3aB2jYkqgLbyH+Hw== X-Received: by 2002:a37:e58:: with SMTP id 85mr25242403qko.403.1570479487060; Mon, 07 Oct 2019 13:18:07 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:06 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 12/19] btrfs: limit max discard size for async discard Date: Mon, 7 Oct 2019 16:17:43 -0400 Message-Id: <7eed2e0ebdc4cc4a7e31c6fa7a180f10158fba0f.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Throttle the maximum size of a discard so that we can provide an upper bound for the rate of async discard. While the block layer is able to split discards into the appropriate sized discards, we want to be able to account more accurately the rate at which we are consuming ncq slots as well as limit the upper bound of work for a discard. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/discard.h | 4 ++++ fs/btrfs/free-space-cache.c | 47 +++++++++++++++++++++++++++---------- 2 files changed, 39 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index acaf56f63b1c..898dd92dbf8f 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -8,6 +8,7 @@ #include #include +#include #include #include @@ -15,6 +16,9 @@ #include "block-group.h" #include "free-space-cache.h" +/* discard size limits */ +#define BTRFS_DISCARD_MAX_SIZE (SZ_64M) + /* discard flags */ #define BTRFS_DISCARD_RESET_CURSOR (1UL << 0) #define BTRFS_DISCARD_BITMAPS (1UL << 1) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 54f3c8325858..ce33803a45b2 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -3399,19 +3399,39 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, if (entry->offset >= end) goto out_unlock; - extent_start = entry->offset; - extent_bytes = entry->bytes; - extent_flags = entry->flags; - start = max(start, extent_start); - bytes = min(extent_start + extent_bytes, end) - start; - if (bytes < minlen) { - spin_unlock(&ctl->tree_lock); - mutex_unlock(&ctl->cache_writeout_mutex); - goto next; - } + if (async) { + start = extent_start = entry->offset; + bytes = extent_bytes = entry->bytes; + extent_flags = entry->flags; + if (bytes < minlen) { + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + goto next; + } + unlink_free_space(ctl, entry); + if (bytes > BTRFS_DISCARD_MAX_SIZE) { + bytes = extent_bytes = BTRFS_DISCARD_MAX_SIZE; + entry->offset += BTRFS_DISCARD_MAX_SIZE; + entry->bytes -= BTRFS_DISCARD_MAX_SIZE; + link_free_space(ctl, entry); + } else { + kmem_cache_free(btrfs_free_space_cachep, entry); + } + } else { + extent_start = entry->offset; + extent_bytes = entry->bytes; + extent_flags = entry->flags; + start = max(start, extent_start); + bytes = min(extent_start + extent_bytes, end) - start; + if (bytes < minlen) { + spin_unlock(&ctl->tree_lock); + mutex_unlock(&ctl->cache_writeout_mutex); + goto next; + } - unlink_free_space(ctl, entry); - kmem_cache_free(btrfs_free_space_cachep, entry); + unlink_free_space(ctl, entry); + kmem_cache_free(btrfs_free_space_cachep, entry); + } spin_unlock(&ctl->tree_lock); trim_entry.start = extent_start; @@ -3567,6 +3587,9 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, goto next; } + if (async && bytes > BTRFS_DISCARD_MAX_SIZE) + bytes = BTRFS_DISCARD_MAX_SIZE; + bitmap_clear_bits(ctl, entry, start, bytes); if (entry->bytes == 0) free_bitmap(ctl, entry); From patchwork Mon Oct 7 20:17:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178469 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8547F13BD for ; Mon, 7 Oct 2019 20:18:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5B4C721479 for ; Mon, 7 Oct 2019 20:18:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479492; bh=colDUVFAlk9iK3uV2OF2+Y7xgy6bk/0uEijLKNx6esw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=H4WjnKbyqsf+lWSavPzqbuHvnoBOF8bZfILJxO1qJYz0DHh9IrUa37E1USdUqgHX8 0Q0Bn/l8qqO33vJ46WjtAsub8i/Y/76jO0lAw7lky7R3Mh4f9dDCICzdli1R3qujS3 5PqUwRakHbpitHHv68ulOmJOD1BLvw+PaMj7/eSU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729359AbfJGUSK (ORCPT ); Mon, 7 Oct 2019 16:18:10 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:43785 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729337AbfJGUSJ (ORCPT ); Mon, 7 Oct 2019 16:18:09 -0400 Received: by mail-qt1-f195.google.com with SMTP id c4so8425803qtn.10 for ; Mon, 07 Oct 2019 13:18:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=r7R7OZY/hHgr+pYZ6c8HtwO+nRUQ1qiv+kCfR1ZKasw=; b=EK5Ku3qoDXoJeDN4tnp1aVvXwii1ZKt4ZaJy0953Eo/mS0aKwUKnqEvBuh3jHnS6Dn eOMRzLkHvaWlZI2Bwx+9WnepT/5hdGhd56NfPm9OPqIoAY9VgyX4nwYbqc4najR3iRJE 7azH0/8tyt60sT+oyTtLaMr9lWiqGjBKgnCETBXIZU0vVYMp9oa5eu4b75pdMZWhaYRd 7NfKZep6Rivm+pi67TfqkmShyC/+mhavsMGUZOdotpUQHvpa9wl3GqSZqe+YktYfLYfr A76jspjvevVsWPf9iUlX3ic/CZYmjtIq0vcwDxH1idBxqopGBHK5hLb7h2uirYgYbiJl E/jw== X-Gm-Message-State: APjAAAUhZOCIzYiGDC8agmgD43QH5UzLZeOToSGzfrcelkgO1t9jzV3+ Q6TmqT11yO+6xlW7xNZUP5k= X-Google-Smtp-Source: APXvYqwA1W53qJ69OaSxH2Ln48q5LasdzuYDfD2lh+BDYfQoL3lQTpOj1zHGxDeaA57mezpGHkJ/6g== X-Received: by 2002:a0c:9276:: with SMTP id 51mr28734864qvz.35.1570479488058; Mon, 07 Oct 2019 13:18:08 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:07 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 13/19] btrfs: have multiple discard lists Date: Mon, 7 Oct 2019 16:17:44 -0400 Message-Id: <87b5ef751e8febd481065e475bd53b276e670ff6.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Non-block group destruction discarding currently only had a single list with no minimum discard length. This can lead to caravaning more meaningful discards behind a heavily fragmented block group. This adds support for multiple lists with minimum discard lengths to prevent the caravan effect. We promote block groups back up when we exceed the BTRFS_DISCARD_MAX_FILTER size, currently we support only 2 lists with filters of 1MB and 32KB respectively. Signed-off-by: Dennis Zhou --- fs/btrfs/ctree.h | 2 +- fs/btrfs/discard.c | 60 +++++++++++++++++++++++++++++++++---- fs/btrfs/discard.h | 4 +++ fs/btrfs/free-space-cache.c | 37 +++++++++++++++-------- fs/btrfs/free-space-cache.h | 2 +- 5 files changed, 85 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e81f699347e0..b5608f8dc41a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -439,7 +439,7 @@ struct btrfs_full_stripe_locks_tree { }; /* discard control */ -#define BTRFS_NR_DISCARD_LISTS 2 +#define BTRFS_NR_DISCARD_LISTS 3 struct btrfs_discard_ctl { struct workqueue_struct *discard_workers; diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 072c73f48297..296cbffc5957 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -20,6 +20,10 @@ #define BTRFS_DISCARD_MAX_DELAY (10000UL) #define BTRFS_DISCARD_MAX_IOPS (10UL) +/* montonically decreasing filters after 0 */ +static int discard_minlen[BTRFS_NR_DISCARD_LISTS] = {0, + BTRFS_DISCARD_MAX_FILTER, BTRFS_DISCARD_MIN_FILTER}; + static struct list_head * btrfs_get_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) @@ -120,7 +124,7 @@ find_next_cache(struct btrfs_discard_ctl *discard_ctl, u64 now) } static struct btrfs_block_group_cache * -peek_discard_list(struct btrfs_discard_ctl *discard_ctl) +peek_discard_list(struct btrfs_discard_ctl *discard_ctl, int *discard_index) { struct btrfs_block_group_cache *cache; u64 now = ktime_get_ns(); @@ -132,6 +136,7 @@ peek_discard_list(struct btrfs_discard_ctl *discard_ctl) if (cache && now > cache->discard_delay) { discard_ctl->cache = cache; + *discard_index = cache->discard_index; if (cache->discard_index == 0 && cache->free_space_ctl->free_space != cache->key.offset) { __btrfs_add_to_discard_list(discard_ctl, cache); @@ -150,6 +155,36 @@ peek_discard_list(struct btrfs_discard_ctl *discard_ctl) return cache; } +void btrfs_discard_check_filter(struct btrfs_block_group_cache *cache, + u64 bytes) +{ + struct btrfs_discard_ctl *discard_ctl; + + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + return; + + discard_ctl = &cache->fs_info->discard_ctl; + + if (cache && cache->discard_index > 1 && + bytes >= BTRFS_DISCARD_MAX_FILTER) { + remove_from_discard_list(discard_ctl, cache); + cache->discard_index = 1; + btrfs_add_to_discard_list(discard_ctl, cache); + } +} + +static void btrfs_update_discard_index(struct btrfs_discard_ctl *discard_ctl, + struct btrfs_block_group_cache *cache) +{ + cache->discard_index++; + if (cache->discard_index == BTRFS_NR_DISCARD_LISTS) { + cache->discard_index = 1; + return; + } + + btrfs_add_to_discard_list(discard_ctl, cache); +} + void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) { @@ -202,23 +237,34 @@ static void btrfs_discard_workfn(struct work_struct *work) { struct btrfs_discard_ctl *discard_ctl; struct btrfs_block_group_cache *cache; + int discard_index = 0; u64 trimmed = 0; + u64 minlen = 0; discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); again: - cache = peek_discard_list(discard_ctl); + cache = peek_discard_list(discard_ctl, &discard_index); if (!cache || !btrfs_run_discard_work(discard_ctl)) return; - if (btrfs_discard_bitmaps(cache)) + minlen = discard_minlen[discard_index]; + + if (btrfs_discard_bitmaps(cache)) { + u64 maxlen = 0; + + if (discard_index) + maxlen = discard_minlen[discard_index - 1]; + btrfs_trim_block_group_bitmaps(cache, &trimmed, cache->discard_cursor, btrfs_block_group_end(cache), - 0, true); - else + minlen, maxlen, true); + } else { btrfs_trim_block_group(cache, &trimmed, cache->discard_cursor, - btrfs_block_group_end(cache), 0, true); + btrfs_block_group_end(cache), + minlen, true); + } discard_ctl->prev_discard = trimmed; @@ -231,6 +277,8 @@ static void btrfs_discard_workfn(struct work_struct *work) cache->key.offset) btrfs_add_to_discard_free_list(discard_ctl, cache); + else + btrfs_update_discard_index(discard_ctl, cache); } else { cache->discard_cursor = cache->key.objectid; cache->discard_flags |= BTRFS_DISCARD_BITMAPS; diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 898dd92dbf8f..1daa8da4a1b5 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -18,6 +18,8 @@ /* discard size limits */ #define BTRFS_DISCARD_MAX_SIZE (SZ_64M) +#define BTRFS_DISCARD_MAX_FILTER (SZ_1M) +#define BTRFS_DISCARD_MIN_FILTER (SZ_32K) /* discard flags */ #define BTRFS_DISCARD_RESET_CURSOR (1UL << 0) @@ -39,6 +41,8 @@ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); void btrfs_add_to_discard_free_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache); +void btrfs_discard_check_filter(struct btrfs_block_group_cache *cache, + u64 bytes); void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info); void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index ce33803a45b2..ed35dc090df6 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2471,6 +2471,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, if (ret) kmem_cache_free(btrfs_free_space_cachep, info); out: + btrfs_discard_check_filter(cache, bytes); btrfs_discard_update_discardable(cache, ctl); spin_unlock(&ctl->tree_lock); @@ -3409,7 +3410,13 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, goto next; } unlink_free_space(ctl, entry); - if (bytes > BTRFS_DISCARD_MAX_SIZE) { + /* + * Let bytes = BTRFS_MAX_DISCARD_SIZE + X. + * If X < BTRFS_DISCARD_MIN_FILTER, we won't trim X when + * we come back around. So trim it now. + */ + if (bytes > (BTRFS_DISCARD_MAX_SIZE + + BTRFS_DISCARD_MIN_FILTER)) { bytes = extent_bytes = BTRFS_DISCARD_MAX_SIZE; entry->offset += BTRFS_DISCARD_MAX_SIZE; entry->bytes -= BTRFS_DISCARD_MAX_SIZE; @@ -3510,7 +3517,7 @@ static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, static int trim_bitmaps(struct btrfs_block_group_cache *block_group, u64 *total_trimmed, u64 start, u64 end, u64 minlen, - bool async) + u64 maxlen, bool async) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; struct btrfs_free_space *entry; @@ -3535,7 +3542,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, } entry = tree_search_offset(ctl, offset, 1, 0); - if (!entry || (async && start == offset && + if (!entry || (async && minlen && start == offset && btrfs_free_space_trimmed(entry))) { spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); @@ -3556,10 +3563,10 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, ret2 = search_bitmap(ctl, entry, &start, &bytes, false); if (ret2 || start >= end) { /* - * This keeps the invariant that all bytes are trimmed - * if BTRFS_FSC_TRIMMED is set on a bitmap. + * We lossily consider a bitmap trimmed if we only skip + * over regions <= BTRFS_DISCARD_MIN_FILTER. */ - if (ret2 && !minlen) + if (ret2 && minlen <= BTRFS_DISCARD_MIN_FILTER) end_trimming_bitmap(ctl, entry); else entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; @@ -3580,14 +3587,19 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, } bytes = min(bytes, end - start); - if (bytes < minlen) { - entry->flags &= ~BTRFS_FSC_TRIMMING_BITMAP; + if (bytes < minlen || (async && maxlen && bytes > maxlen)) { spin_unlock(&ctl->tree_lock); mutex_unlock(&ctl->cache_writeout_mutex); goto next; } - if (async && bytes > BTRFS_DISCARD_MAX_SIZE) + /* + * Let bytes = BTRFS_MAX_DISCARD_SIZE + X. + * If X < BTRFS_DISCARD_MIN_FILTER, we won't trim X when we come + * back around. So trim it now. + */ + if (async && bytes > (BTRFS_DISCARD_MAX_SIZE + + BTRFS_DISCARD_MIN_FILTER)) bytes = BTRFS_DISCARD_MAX_SIZE; bitmap_clear_bits(ctl, entry, start, bytes); @@ -3694,7 +3706,7 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, if (ret || async) goto out; - ret = trim_bitmaps(block_group, trimmed, start, end, minlen, false); + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, 0, false); /* if we ended in the middle of a bitmap, reset the trimming flag */ if (end % (BITS_PER_BITMAP * ctl->unit)) reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end)); @@ -3705,7 +3717,7 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - bool async) + u64 maxlen, bool async) { int ret; @@ -3719,7 +3731,8 @@ int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, btrfs_get_block_group_trimming(block_group); spin_unlock(&block_group->lock); - ret = trim_bitmaps(block_group, trimmed, start, end, minlen, async); + ret = trim_bitmaps(block_group, trimmed, start, end, minlen, maxlen, + async); btrfs_put_block_group_trimming(block_group); return ret; diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index c5cce44b03af..90abf922f0ba 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -132,7 +132,7 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, bool async); int btrfs_trim_block_group_bitmaps(struct btrfs_block_group_cache *block_group, u64 *trimmed, u64 start, u64 end, u64 minlen, - bool async); + u64 maxlen, bool async); /* Support functions for running our sanity tests */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS From patchwork Mon Oct 7 20:17:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178467 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1611C139A for ; Mon, 7 Oct 2019 20:18:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E864721479 for ; Mon, 7 Oct 2019 20:18:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479492; bh=AhqD9SSmj8JyyAqvmr0cI0yECRT28/ZbwJX+pG5MUqE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=LoSvs17oL0D4kPxfvHu5hTq6Byw6leuEX9iVwNKpYFSTyRwwmEQruxQEPGmlDedwv G+HMoH05xYQl89e8EYtVk3o1dgyw9RUfBFh2mW4OZB2umZCAMWhsFveU/D6HNcp4ww S51Mf8SXFWhtt/Cycm8ecfL9oraaHnqvpefrummQ= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729368AbfJGUSL (ORCPT ); Mon, 7 Oct 2019 16:18:11 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:46194 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728877AbfJGUSK (ORCPT ); Mon, 7 Oct 2019 16:18:10 -0400 Received: by mail-qt1-f194.google.com with SMTP id u22so21068615qtq.13 for ; Mon, 07 Oct 2019 13:18:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=MFZA9vN24hbN2jx0uX3I3mAZyMymXf5B4Rd1zz5SUG4=; b=hrcDbWqmPsPXWJCDtLatUWIDNoikVRK7/j0A3ziI9Db33ih516ma7QuoH9P2GUuT1G hVi4eNSKFuRqbHYC5x0etYgPEbYACRKhmcTtM3LEmowjOxXnjl0y2WhU81riMRB9FroI ouavuhCvjkCItCpiZQxSL+hqRqmy84i6KepHhi5DkSRbfcIXhAmVNdUD4nbyUeGnI2zf BO8y+blZa6Hq6JOpfo6iBaFI8vnK250QojCLpisKlEPlOwYwroRH8mBzDhhhvtJ0o0w0 IaE0fM1qu2lVCS/SRzvA4jk5KgegcUDUnoq5shOhnrXGo4TXQ06samGtAplEyhGZPcmq j4TQ== X-Gm-Message-State: APjAAAXtPYicLKgnG0CQ9G2Cffy1IrHFfFl8I9sqdH5oOFX4dgDIATVy MZNXooUm5xk+yoqtvPdNPEA= X-Google-Smtp-Source: APXvYqy0ia8XazbWUzZiqErEo/VEi7+kD0l7u93enSc10WzWsUQEAgAsPpbxrBqb4WMGjDcJ+EOIKQ== X-Received: by 2002:a0c:ef85:: with SMTP id w5mr29307392qvr.159.1570479489336; Mon, 07 Oct 2019 13:18:09 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:08 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 14/19] btrfs: only keep track of data extents for async discard Date: Mon, 7 Oct 2019 16:17:45 -0400 Message-Id: <679c631d04f50a54f011c6317b99d96798a3ca4d.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As mentioned earlier, discarding data can be done either by issuing an explicit discard or implicitly by reusing the LBA. Metadata chunks see much more frequent reuse due to well it being metadata. So instead of explicitly discarding metadata blocks, just leave them be and let the latter implicit discarding be done for them. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 6 ++++++ fs/btrfs/discard.c | 8 +++++++- fs/btrfs/discard.h | 3 ++- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index b59e6a8ed73d..7739099e974a 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -169,6 +169,12 @@ u64 btrfs_block_group_end(struct btrfs_block_group_cache *cache) return (cache->key.objectid + cache->key.offset); } +static inline +bool btrfs_is_block_group_data(struct btrfs_block_group_cache *cache) +{ + return (cache->flags & BTRFS_BLOCK_GROUP_DATA); +} + #ifdef CONFIG_BTRFS_DEBUG static inline int btrfs_should_fragment_free_space( struct btrfs_block_group_cache *block_group) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 296cbffc5957..0e4d5a22c661 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -50,6 +50,9 @@ static void __btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, struct btrfs_block_group_cache *cache) { + if (!btrfs_is_block_group_data(cache)) + return; + spin_lock(&discard_ctl->lock); __btrfs_add_to_discard_list(discard_ctl, cache); @@ -139,7 +142,10 @@ peek_discard_list(struct btrfs_discard_ctl *discard_ctl, int *discard_index) *discard_index = cache->discard_index; if (cache->discard_index == 0 && cache->free_space_ctl->free_space != cache->key.offset) { - __btrfs_add_to_discard_list(discard_ctl, cache); + if (btrfs_is_block_group_data(cache)) + __btrfs_add_to_discard_list(discard_ctl, cache); + else + list_del_init(&cache->discard_list); goto again; } if (btrfs_discard_reset_cursor(cache)) { diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h index 1daa8da4a1b5..552daa7251df 100644 --- a/fs/btrfs/discard.h +++ b/fs/btrfs/discard.h @@ -90,7 +90,8 @@ void btrfs_discard_update_discardable(struct btrfs_block_group_cache *cache, s32 extents_delta; s64 bytes_delta; - if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC) || + !btrfs_is_block_group_data(cache)) return; discard_ctl = &cache->fs_info->discard_ctl; From patchwork Mon Oct 7 20:17:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178471 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF243139A for ; Mon, 7 Oct 2019 20:18:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9ED8E21721 for ; Mon, 7 Oct 2019 20:18:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479493; bh=SsNKFbW5rUScI5MxgGp7mfMw+r4koVdhtafLGiiPnn8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=wP5i9GFsdUpizPAvh+uDa7IYWvVAdqrseIjCm4iK7z7eYs+oQkPntUj0y4RFpirjn OtreR7K8oJfOgjx2ldeRXLJMxuerMv0eH5gzk7m7/V8CfiD2LChdiYegHLj/BX7hxF 58ha9gElkP8heqXLbqmAtd0sL/JkhHQ4u0JD3ey0= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729379AbfJGUSM (ORCPT ); Mon, 7 Oct 2019 16:18:12 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:37106 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729363AbfJGUSL (ORCPT ); Mon, 7 Oct 2019 16:18:11 -0400 Received: by mail-qt1-f194.google.com with SMTP id e15so6257337qtr.4 for ; Mon, 07 Oct 2019 13:18:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=btKJ6g9fTO1tYKb8F6/OgOsQXGTbh+3K3gbcTVgRpv8=; b=mueRn1SNW1DtS3SWPCrP5Nh7tOPic28Otuy/v0vw6Dspa0U7r+RRHDNExNwC0oECyS 0nFX8epiTzXwp3Pr0ymK/EUiu2jLzaoHtoBDirHenSUi10myxD17+oMM+OrulACgaZ/6 xJGouliyEbtS72PsYaueu+r+BL+5pxUxUvrmXbpBif0s83Mtks+OCCMfsLZSzt4quJXx EopZGrOrnKVT3ZMJ5r+q4DzJ9jSn/BJ9wtPJoBE2XDOV8d/FA+IFf+thTS2+uF4pcmRh Rlk3bPy9Z/myNACtN5HHLVnw8OZRUuP3j9YPKrnN2G+c3hMjPWl6518dQe4DaFxojN5M jjKg== X-Gm-Message-State: APjAAAU6rAian8wZCQbAl9ZCypkAlSZCk3WiEVNqmr8arvqF4db+uDVW 5Q29WqsWuOi/xGwLfruDTj4= X-Google-Smtp-Source: APXvYqyVid1DQ637aoxSO6T4UHyz0+tHRWlO6WzbTiHs/H1xQdWg1tBxZoUoPV+2S7Vs7LNBtxjVZQ== X-Received: by 2002:ac8:43cc:: with SMTP id w12mr22481392qtn.301.1570479490571; Mon, 07 Oct 2019 13:18:10 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:09 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 15/19] btrfs: load block_groups into discard_list on mount Date: Mon, 7 Oct 2019 16:17:46 -0400 Message-Id: <31ce602fac88f25567a0b3e89037693ec962c1c7.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Async discard doesn't remember the discard state of a block_group when unmounting or when we crash. So, any block_group that is not fully used may have undiscarded regions. However, free space caches are read in on demand. Let the discard worker read in the free space cache so we can proceed with discarding rather than wait for the block_group to be used. This prevents us from indefinitely deferring discards until that particular block_group is reused. Signed-off-by: Dennis Zhou --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/discard.c | 15 +++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 73e5a9384491..684959c96c3f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1859,6 +1859,8 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) &info->discard_ctl, cache); else btrfs_mark_bg_unused(cache); + } else if (btrfs_test_opt(info, DISCARD_ASYNC)) { + btrfs_add_to_discard_list(&info->discard_ctl, cache); } } diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 0e4d5a22c661..d99ba31e6f3b 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -246,6 +246,7 @@ static void btrfs_discard_workfn(struct work_struct *work) int discard_index = 0; u64 trimmed = 0; u64 minlen = 0; + int ret; discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); @@ -254,6 +255,19 @@ static void btrfs_discard_workfn(struct work_struct *work) if (!cache || !btrfs_run_discard_work(discard_ctl)) return; + if (!btrfs_block_group_cache_done(cache)) { + ret = btrfs_cache_block_group(cache, 0); + if (ret) { + remove_from_discard_list(discard_ctl, cache); + goto out; + } + ret = btrfs_wait_block_group_cache_done(cache); + if (ret) { + remove_from_discard_list(discard_ctl, cache); + goto out; + } + } + minlen = discard_minlen[discard_index]; if (btrfs_discard_bitmaps(cache)) { @@ -291,6 +305,7 @@ static void btrfs_discard_workfn(struct work_struct *work) } } +out: spin_lock(&discard_ctl->lock); discard_ctl->cache = NULL; spin_unlock(&discard_ctl->lock); From patchwork Mon Oct 7 20:17:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178479 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A2313139A for ; Mon, 7 Oct 2019 20:18:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8158D21479 for ; Mon, 7 Oct 2019 20:18:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479499; bh=d6mXXeaAZ7LCVuEobYtBD9DmzEBpuYD7BZces7UMqw8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=QxRVvNqRSP97mhTB4amK9iFl85seBxLD6kKNXOvZDnaCmRtVAaSoRKz/OTxPteO3F 6JUo/MOidRK8Cndjd0QnxZQOGQqWMxaqzN1qhC27jQ8cPLoKYohfg3AieVsB1y3u6k kmGC+TXEFtlmIa5iYdaCAS4e+0gQkB2UfTLi/zLw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729387AbfJGUSO (ORCPT ); Mon, 7 Oct 2019 16:18:14 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:35571 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728877AbfJGUSM (ORCPT ); Mon, 7 Oct 2019 16:18:12 -0400 Received: by mail-qt1-f193.google.com with SMTP id m15so21165297qtq.2 for ; Mon, 07 Oct 2019 13:18:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=3+Si9zlq8+mMMhUVKnM1J2vvhlL7j7jQ55qCTivrpw4=; b=Pj9tg/5RNJQXrCTRv53xT+J5ugAocB52K7+fIg7nqVE658QxrSMyreB7XZcx7ijEf/ 0R48s6kt313SwylcH3e9PFJuyElkMyF9w1Y6VY7I5u/3prERKsAacYoSQLtl4nZiOCEe LP1h5l9Ngdl0jmiqyVD5jAtoou5Joqamn5cHgtree+dlCnF8pDh1Wn3iDaoiTw7wl6zV OCU0SPY+mRNZIZQ14RCau74UGuAlAz/DQc3LuBgbmSnkUZolQJHcsolRnuxYo0DR0CZD as58JbA+AevQEZfoQt0L3Fq2G6ZrsRpDjxUeHaskCI+rxQoUM+1d7h62rtSOYmcwQ0jm j9uw== X-Gm-Message-State: APjAAAWNfWUlRVhPPB9UYiQneJjyvSDt7HtFy470AymS1nXTlmC7H1C+ 29+vsnGdyxn0o7dD/KXvMZ8= X-Google-Smtp-Source: APXvYqyQJakr8uKmWnFtXKJ2jFWsJWw2BYgrDcGclo0qyag1gTNdnDpPp001gzqDQJHIO2PAG4GFKg== X-Received: by 2002:ac8:340d:: with SMTP id u13mr31292984qtb.103.1570479491439; Mon, 07 Oct 2019 13:18:11 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:10 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 16/19] btrfs: keep track of discard reuse stats Date: Mon, 7 Oct 2019 16:17:47 -0400 Message-Id: <60e557d71cb58574edbc2c429534fbfefd55df48.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Keep track of how much we are discarding and how often we are reusing with async discard. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/discard.c | 5 +++++ fs/btrfs/free-space-cache.c | 10 ++++++++++ fs/btrfs/sysfs.c | 36 ++++++++++++++++++++++++++++++++++++ 4 files changed, 54 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b5608f8dc41a..2f52b29ff74c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -453,6 +453,9 @@ struct btrfs_discard_ctl { atomic_t delay; atomic_t iops_limit; atomic64_t bps_limit; + atomic64_t discard_extent_bytes; + atomic64_t discard_bitmap_bytes; + atomic64_t discard_bytes_saved; }; /* delayed seq elem */ diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index d99ba31e6f3b..f0088ca19d28 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -280,10 +280,12 @@ static void btrfs_discard_workfn(struct work_struct *work) cache->discard_cursor, btrfs_block_group_end(cache), minlen, maxlen, true); + atomic64_add(trimmed, &discard_ctl->discard_bitmap_bytes); } else { btrfs_trim_block_group(cache, &trimmed, cache->discard_cursor, btrfs_block_group_end(cache), minlen, true); + atomic64_add(trimmed, &discard_ctl->discard_extent_bytes); } discard_ctl->prev_discard = trimmed; @@ -408,6 +410,9 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info) atomic_set(&discard_ctl->delay, BTRFS_DISCARD_MAX_DELAY); atomic_set(&discard_ctl->iops_limit, BTRFS_DISCARD_MAX_IOPS); atomic64_set(&discard_ctl->bps_limit, 0); + atomic64_set(&discard_ctl->discard_extent_bytes, 0); + atomic64_set(&discard_ctl->discard_bitmap_bytes, 0); + atomic64_set(&discard_ctl->discard_bytes_saved, 0); } void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index ed35dc090df6..480119016c0d 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2773,6 +2773,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 *max_extent_size) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + struct btrfs_discard_ctl *discard_ctl = + &block_group->fs_info->discard_ctl; struct btrfs_free_space *entry = NULL; u64 bytes_search = bytes + empty_size; u64 ret = 0; @@ -2797,6 +2799,9 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, align_gap = entry->offset; align_gap_flags = entry->flags; + if (!btrfs_free_space_trimmed(entry)) + atomic64_add(bytes, &discard_ctl->discard_bytes_saved); + entry->offset = offset + bytes; WARN_ON(entry->bytes < bytes + align_gap_len); @@ -2901,6 +2906,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, u64 min_start, u64 *max_extent_size) { struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + struct btrfs_discard_ctl *discard_ctl = + &block_group->fs_info->discard_ctl; struct btrfs_free_space *entry = NULL; struct rb_node *node; u64 ret = 0; @@ -2965,6 +2972,9 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, spin_lock(&ctl->tree_lock); + if (!btrfs_free_space_trimmed(entry)) + atomic64_add(bytes, &discard_ctl->discard_bytes_saved); + ctl->free_space -= bytes; if (!entry->bitmap && !btrfs_free_space_trimmed(entry)) ctl->discardable_bytes[0] -= bytes; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 6fc4d644401b..29a290d75492 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -551,11 +551,47 @@ static ssize_t btrfs_discard_bps_limit_store(struct kobject *kobj, BTRFS_ATTR_RW(discard, bps_limit, btrfs_discard_bps_limit_show, btrfs_discard_bps_limit_store); +static ssize_t btrfs_discard_extent_bytes_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + atomic64_read(&fs_info->discard_ctl.discard_extent_bytes)); +} +BTRFS_ATTR(discard, discard_extent_bytes, btrfs_discard_extent_bytes_show); + +static ssize_t btrfs_discard_bitmap_bytes_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + atomic64_read(&fs_info->discard_ctl.discard_bitmap_bytes)); +} +BTRFS_ATTR(discard, discard_bitmap_bytes, btrfs_discard_bitmap_bytes_show); + +static ssize_t btrfs_discard_bytes_saved_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(kobj->parent); + + return snprintf(buf, PAGE_SIZE, "%lld\n", + atomic64_read(&fs_info->discard_ctl.discard_bytes_saved)); +} +BTRFS_ATTR(discard, discard_bytes_saved, btrfs_discard_bytes_saved_show); + static const struct attribute *discard_attrs[] = { BTRFS_ATTR_PTR(discard, discard_extents), BTRFS_ATTR_PTR(discard, discardable_bytes), BTRFS_ATTR_PTR(discard, iops_limit), BTRFS_ATTR_PTR(discard, bps_limit), + BTRFS_ATTR_PTR(discard, discard_extent_bytes), + BTRFS_ATTR_PTR(discard, discard_bitmap_bytes), + BTRFS_ATTR_PTR(discard, discard_bytes_saved), NULL, }; From patchwork Mon Oct 7 20:17:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178475 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADCC213BD for ; Mon, 7 Oct 2019 20:18:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8D2ED21479 for ; Mon, 7 Oct 2019 20:18:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479498; bh=D31kpMZdemBL4AvwfO1SPZXqb63/b8azMJhrtAQDsaA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=qLKvav1D2XTwP2SdVaaVck2NqEXzl9kHJm3nZHdG8ukwO7qGrCEW40AKiTDdJEm2X lHoSo+m3v/GfMaOfH/+nxc6s2IoSmoG7vpvc5shj9OIHIWxurM5biMLG5is7s1Sssv Ye3cjFqzExrfK8Ekff9UiXA0FVk6bfzZ3BR6+Yv0= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729390AbfJGUSP (ORCPT ); Mon, 7 Oct 2019 16:18:15 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:42487 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729005AbfJGUSN (ORCPT ); Mon, 7 Oct 2019 16:18:13 -0400 Received: by mail-qt1-f196.google.com with SMTP id w14so21099046qto.9 for ; Mon, 07 Oct 2019 13:18:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=h6IDdT8ZUNSTQuRSOYxObXBD5gIilW4iPVId/v0phQU=; b=B9RKS25D8U+aSy87yyy3BxK5058OiNvSdK785rXpKVoADcYJ15uuGdx00uBfnHr66j 03PuK5wSjqyk9PW1JfCwwRZQOV8iOWbA+/CUjSm5t5DdBAXscNMT7XKBWA9sGhRSUghx bN3/e00fE1n5uvOSG2hUSWxWL3nLN/n/7VoeoxedJgTke4YuN2xHwNBerQDF4njjL0TR 9ARE7BokaTbsjwJUW8pav85WOghsJZCPaf3Gbk8C99Yd2m+yEKOrCiw8CDivw+R/7U57 IjkAqo1lALK+JXtY/G7/RMu9iZY1u77pBIrrVDOHaF8sSWUqcSDFvw2+7xbR9NbCY6gk x7Mg== X-Gm-Message-State: APjAAAWagYVLJUY3BHoH/lfsUfiCRcORM84Ns2q+owcFAaHF5ENDp8oi UBlcLqKgYPi2xoftousV5eY= X-Google-Smtp-Source: APXvYqzp4fCKXNPmXgURk7DofpSCKHK8tG/AraAzvHzTm6nu/Bj4htVjWUxkLKI3Yo8CAY0Hu3tk4Q== X-Received: by 2002:aed:32c6:: with SMTP id z64mr30969422qtd.202.1570479492407; Mon, 07 Oct 2019 13:18:12 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:11 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 17/19] btrfs: add async discard header Date: Mon, 7 Oct 2019 16:17:48 -0400 Message-Id: <497ce83624b2d2947ce85a8381f39123bd4e7a53.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Give a brief overview for how async discard is implemented. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/discard.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index f0088ca19d28..61e341685acd 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -1,6 +1,40 @@ // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2019 Facebook. All rights reserved. + * + * This contains the logic to handle async discard. + * + * Async discard manages trimming of free space outside of transaction commit. + * Discarding is done by managing the block_groups on a LRU list based on free + * space recency. Two passes are used to first prioritize discarding extents + * and then allow for trimming in the bitmap the best opportunity to coalesce. + * The block_groups are maintained on multiple lists to allow for multiple + * passes with different discard filter requirements. A delayed work item is + * used to manage discarding with timeout determined by a max of the delay + * incurred by the iops rate limit, byte rate limit, and the timeout of max + * delay of BTRFS_DISCARD_MAX_DELAY. + * + * The first list is special to manage discarding of fully free block groups. + * This is necessary because we issue a final trim for a full free block group + * after forgetting it. When a block group becomes unused, instead of directly + * being added to the unused_bgs list, we add it to this first list. Then + * from there, if it becomes fully discarded, we place it onto the unused_bgs + * list. + * + * The in-memory free space cache serves as the backing state for discard. + * Consequently this means there is no persistence. We opt to load all the + * block groups in as not discarded, so the mount case degenerates to the + * crashing case. + * + * As the free space cache uses bitmaps, there exists a tradeoff between + * ease/efficiency for find_free_extent() and the accuracy of discard state. + * Here we opt to let untrimmed regions merge with everything while only letting + * trimmed regions merge with other trimmed regions. This can cause + * overtrimming, but the coalescing benefit seems to be worth it. Additionally, + * bitmap state is tracked as a whole. If we're able to fully trim a bitmap, + * the trimmed flag is set on the bitmap. Otherwise, if an allocation comes in, + * this resets the state and we will retry trimming the whole bitmap. This is a + * tradeoff between discard state accuracy and the cost of accounting. */ #include From patchwork Mon Oct 7 20:17:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178473 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16A1F139A for ; Mon, 7 Oct 2019 20:18:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E9CC421479 for ; Mon, 7 Oct 2019 20:18:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479497; bh=3qOnVvLxs1I/g+rPlLz35cb86WfaeefidX0KU6jYE24=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=HesE/H/sARsSN9T4l/lG+J0LePqiGZayANUfBu2sEe/C1KKFWygJp0RxI2UEZb1jK I8/FYmVcNqakx15nIwOvaqD+eSXiDOngQnwMKGf5ZEwv9yav/qg/kVbsPBfjLSuL89 WwLQLaRDgFExdMDgBfc8DBqGzP18juusr9r7K7yo= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729402AbfJGUSP (ORCPT ); Mon, 7 Oct 2019 16:18:15 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:37901 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729380AbfJGUSO (ORCPT ); Mon, 7 Oct 2019 16:18:14 -0400 Received: by mail-qk1-f196.google.com with SMTP id u186so13942834qkc.5 for ; Mon, 07 Oct 2019 13:18:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=ydAJYdps+yVy//BCuVYseS/Oav8Svt31Ns5WcndpjHM=; b=Rq+dhffEeyNx36PHKGx5avXT3hmzaihHEbUCxBl/4Fjv0L/3mFlDO+DD8uCxtQSKPi 9VZ7qiV5Gmk7/1uG7kf58bLMtJBj9Kw0Ml1eky3KUYUXcrvHhd5gzVnMh0pyserJ0NWa BDii6AUu2tRcB7UeP+kL/c83Lc4CuUu89sLcq6GLq8f1fmQMgLea1n9ns2dzNgPAiqEr H7U63U0lHZJ72LSA5j+64rN8pu1Nl6yRQsBOBLQ1nVAk77xezIQKMd1YZ+a93G53DO+c HCtkxV7QNiN6p/LE2Bj/PbWm+PPWR3i7mPxHHakNiv9wqQlb6PPg1FlTFibut9EmgmJ6 G2/A== X-Gm-Message-State: APjAAAUUhPBvCd5mhwUbqiU7tJT4T0PwcQyWXkHc6zMjGMn69grYZZX2 AR5TETqKtPsoBkcrZ1VSn3Q= X-Google-Smtp-Source: APXvYqyCnse2OVlmmP5jbjkElNNnd3QPowNCkYaeMm+CkzmcYas6ECxgcbSl/FbyFUZHxiJAyU6YoA== X-Received: by 2002:a37:a2c3:: with SMTP id l186mr24169465qke.461.1570479493477; Mon, 07 Oct 2019 13:18:13 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:12 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 18/19] btrfs: increase the metadata allowance for the free_space_cache Date: Mon, 7 Oct 2019 16:17:49 -0400 Message-Id: <4b171367793ccdcd722e787e5ae0f4f547ed5c43.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently, there is no way for the free space cache to recover from being serviced by purely bitmaps because the extent threshold is set to 0 in recalculate_thresholds() when we surpass the metadata allowance. This adds a recovery mechanism by keeping large extents out of the bitmaps and increases the metadata upper bound to 64KB. The recovery mechanism bypasses this upper bound, thus making it a soft upper bound. But, with the bypass being 1MB or greater, it shouldn't add unbounded overhead. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 480119016c0d..a0941d281a63 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -24,7 +24,8 @@ #include "discard.h" #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) -#define MAX_CACHE_BYTES_PER_GIG SZ_32K +#define MAX_CACHE_BYTES_PER_GIG SZ_64K +#define FORCE_EXTENT_THRESHOLD SZ_1M struct btrfs_trim_range { u64 start; @@ -1686,26 +1687,17 @@ static void recalculate_thresholds(struct btrfs_free_space_ctl *ctl) ASSERT(ctl->total_bitmaps <= max_bitmaps); /* - * The goal is to keep the total amount of memory used per 1gb of space - * at or below 32k, so we need to adjust how much memory we allow to be - * used by extent based free space tracking + * We are trying to keep the total amount of memory used per 1gb of + * space to be MAX_CACHE_BYTES_PER_GIG. However, with a reclamation + * mechanism of pulling extents >= FORCE_EXTENT_THRESHOLD out of + * bitmaps, we may end up using more memory than this. */ if (size < SZ_1G) max_bytes = MAX_CACHE_BYTES_PER_GIG; else max_bytes = MAX_CACHE_BYTES_PER_GIG * div_u64(size, SZ_1G); - /* - * we want to account for 1 more bitmap than what we have so we can make - * sure we don't go over our overall goal of MAX_CACHE_BYTES_PER_GIG as - * we add more bitmaps. - */ - bitmap_bytes = (ctl->total_bitmaps + 1) * ctl->unit; - - if (bitmap_bytes >= max_bytes) { - ctl->extents_thresh = 0; - return; - } + bitmap_bytes = ctl->total_bitmaps * ctl->unit; /* * we want the extent entry threshold to always be at most 1/2 the max @@ -2086,6 +2078,10 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl, forced = true; #endif + /* this is a way to reclaim large regions from the bitmaps */ + if (!forced && info->bytes >= FORCE_EXTENT_THRESHOLD) + return false; + /* * If we are below the extents threshold then we can add this as an * extent, and don't have to deal with the bitmap From patchwork Mon Oct 7 20:17:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 11178477 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7E1817D4 for ; Mon, 7 Oct 2019 20:18:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B79BB21721 for ; Mon, 7 Oct 2019 20:18:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570479498; bh=aLXAPRBhBANBc0hKKb21XdRxSFiGW2ot7tllrEXmxX8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:In-Reply-To: References:List-ID:From; b=DqJ+TWWsUTeoWwOhpHXLCh9R9wKcrnqvJ8eF7FSuamIEVtQaYJD59H1V7Rw8rJTFB Hcw63u1g5ou0I8nzBvsLCiZEgr3hdWub3mdXPLUunUzZztVoNDLJHEOG16Hw4FJOB0 TYEi+3aqlLq1lsQfX9QoZueZpjQHr8dbrX6qle6U= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729403AbfJGUSR (ORCPT ); Mon, 7 Oct 2019 16:18:17 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:44200 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729389AbfJGUSP (ORCPT ); Mon, 7 Oct 2019 16:18:15 -0400 Received: by mail-qk1-f196.google.com with SMTP id u22so13901889qkk.11 for ; Mon, 07 Oct 2019 13:18:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=9i4eW4XHtLg4mQvpj1SRlQ9LMSYlft0dZrZiKgc7lNc=; b=d4JFZJaoUEuZFSEwX2x3zFY93nXob2rAKGwG07wslsGqWeOXcp9+csQDNv3T5q8JmK L4m/skMvXpYL3R4Fm7PlZ78vlqXOTi9RncJEPUWW/V4FzdXuX57yw3cvDTDUu1OC8mAJ 2GxAnBF/94qqlHCsxhc62YaaXL4GktBPWDCnIAkSkZ9guV0C3CgEHPxhYcAqaT7Z2gNi XIh+bs5PTKSb3YsrpdWAXZlX9p15rn6nESYQxb/bt8hzrqzyKrO+cWLqFjnXNC1KqExp OK428A9o0Rywla1UBvHnySsSzaVK7cBoDjIWbkXKpSCOVQeGaLRvrYpdf4KRMiwQtxUR 5KHQ== X-Gm-Message-State: APjAAAXR4pQQ6HeXgOZM22/5CMkE5kbawzmZnQa6IZz67AYl4ueh4YYd Uk07AbTMptMqIJ9mvBUvLOs= X-Google-Smtp-Source: APXvYqxa6ZJabEo8SK6alGiljEjYMvyjmKcA+CPcqErjSlCsaD9LsyCvqX4XcXSXZbsE3QImy2Civw== X-Received: by 2002:a37:bcc6:: with SMTP id m189mr2919131qkf.436.1570479494699; Mon, 07 Oct 2019 13:18:14 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([163.114.130.128]) by smtp.gmail.com with ESMTPSA id k2sm6904005qtm.42.2019.10.07.13.18.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 07 Oct 2019 13:18:14 -0700 (PDT) From: Dennis Zhou To: David Sterba , Chris Mason , Josef Bacik , Omar Sandoval Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org, Dennis Zhou Subject: [PATCH 19/19] btrfs: make smaller extents more likely to go into bitmaps Date: Mon, 7 Oct 2019 16:17:50 -0400 Message-Id: <03e95dd7b652034541d964d8b9617ec029765575.1570479299.git.dennis@kernel.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org It's less than ideal for small extents to eat into our extent budget, so force extents <= 32KB into the bitmaps save for the first handful. Signed-off-by: Dennis Zhou Reviewed-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index a0941d281a63..505091940580 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2094,8 +2094,8 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl, * of cache left then go ahead an dadd them, no sense in adding * the overhead of a bitmap if we don't have to. */ - if (info->bytes <= fs_info->sectorsize * 4) { - if (ctl->free_extents * 2 <= ctl->extents_thresh) + if (info->bytes <= fs_info->sectorsize * 8) { + if (ctl->free_extents * 3 <= ctl->extents_thresh) return false; } else { return false;