From patchwork Thu Oct 8 20:48:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824559 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEC0B1592 for ; Thu, 8 Oct 2020 20:49:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C36CA2222B for ; Thu, 8 Oct 2020 20:49:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="Ab3KVyJW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730719AbgJHUtA (ORCPT ); Thu, 8 Oct 2020 16:49:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730070AbgJHUtA (ORCPT ); Thu, 8 Oct 2020 16:49:00 -0400 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29DC0C0613D3 for ; Thu, 8 Oct 2020 13:49:00 -0700 (PDT) Received: by mail-qv1-xf42.google.com with SMTP id db4so3773425qvb.4 for ; Thu, 08 Oct 2020 13:49:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=U/O2Yi709tlVgsdsaEGe/gcGutS1xQQwBM7+WexgcZM=; b=Ab3KVyJWlucI/0teAQWYIr8MxdqmQACuAQYpSOA6fNJG92lg5zJC9/UAftmyCd6XnO b727a7+zp2B/UUJevVO9//wq5L1VavYs5o84QlcAMpI95jiRGIU/vF5K4R58cKx5iJaU pZl5p2u4PpdGCxDlVv7mTyGCnibsD8uRwCxDyEtCjuRTubJ8k9pCY63HWeZu9eW0x6PA IJJo+rd0Cq84kvmLn1OaWH+rog+DFzmAr69jNjDP/dzyl7D/Zisn0VOdHJQoY7d97Xe2 cBo6A6j2+8J0BL/LWNlE1q5pgsP2CXCW3HuZhIu4K9jDLZ5uSIPHUR/wzD2mFziH26YY GKtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=U/O2Yi709tlVgsdsaEGe/gcGutS1xQQwBM7+WexgcZM=; b=NOgOKCkodcphsDHw8KEdRPMYuClqd9SX8o4LGwRUZ0YXKC8XDtj+qh0CbW1x4ZBfEZ LZKcfAav4G1rYRsUROuC1tQVWYyV8M56cNXktwKjvWNNycXRZJFhBvUmqOvSgf4G535R Ps4pXwu4BfCnMtSxmyx9nujjswTc+S/xItU+NUi7AkEB/hifgO2ESueho3DaR32DMJ88 kWmqBxLG1jCa7WzXKDIOhyObZC981oUk4CeBdxvoz+AU7lbpk5As2q1ipFjFLvS3Jz1L is8eW3HMI4nzl+Ld2VDETp8Hqaxf8AfDPbsmtO8t28WFDZQXPnpu8I0STmRzsdr4kqOq GEdQ== X-Gm-Message-State: AOAM530J0393/kMTfLs1gML5ACFY7jN4hMd/jMWwrE4GK8q7j62uAZ6U gd+YG21U8Mu4MqqShGz1WqZsyaPzszozJ2bp X-Google-Smtp-Source: ABdhPJzI+NCRkf8OZ5IAC6fbQWKCL4TWSD3eAUWJ34OcOTrN8gkhE/PKzOBAjBTPjL0KtjK1+KGNcQ== X-Received: by 2002:a0c:a89a:: with SMTP id x26mr10160244qva.36.1602190139089; Thu, 08 Oct 2020 13:48:59 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id d129sm4827266qkg.127.2020.10.08.13.48.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:48:58 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 01/11] btrfs: add a trace point for reserve tickets Date: Thu, 8 Oct 2020 16:48:45 -0400 Message-Id: <2c9e83b67b44db093fd8d854f484e478bc2abef6.1602189832.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org While debugging a ENOSPC related performance problem I needed to see the time difference between start and end of a reserve ticket, so add a trace point to report when we handle a reserve ticket. I opted to spit out start_ns itself without calculating the difference because there could be a gap between enabling the tracpoint and setting start_ns. Doing it this way allows us to filter on 0 start_ns so we don't get bogus entries, and we can easily calculate the time difference with bpftrace or something else. Signed-off-by: Josef Bacik Reviewed-by: Nikolay Borisov --- fs/btrfs/space-info.c | 10 +++++++++- include/trace/events/btrfs.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 64099565ab8f..f1a525251c2a 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1224,6 +1224,7 @@ static void wait_reserve_ticket(struct btrfs_fs_info *fs_info, static int handle_reserve_ticket(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, struct reserve_ticket *ticket, + u64 start_ns, u64 orig_bytes, enum btrfs_reserve_flush_enum flush) { int ret; @@ -1279,6 +1280,8 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info, * space wasn't reserved at all). */ ASSERT(!(ticket->bytes == 0 && ticket->error)); + trace_btrfs_reserve_ticket(fs_info, space_info->flags, orig_bytes, + start_ns, flush, ticket->error); return ret; } @@ -1312,6 +1315,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, { struct work_struct *async_work; struct reserve_ticket ticket; + u64 start_ns = 0; u64 used; int ret = 0; bool pending_tickets; @@ -1364,6 +1368,9 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, space_info->reclaim_size += ticket.bytes; init_waitqueue_head(&ticket.wait); ticket.steal = (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL); + if (trace_btrfs_reserve_ticket_enabled()) + start_ns = ktime_get_ns(); + if (flush == BTRFS_RESERVE_FLUSH_ALL || flush == BTRFS_RESERVE_FLUSH_ALL_STEAL || flush == BTRFS_RESERVE_FLUSH_DATA) { @@ -1400,7 +1407,8 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, if (!ret || flush == BTRFS_RESERVE_NO_FLUSH) return ret; - return handle_reserve_ticket(fs_info, space_info, &ticket, flush); + return handle_reserve_ticket(fs_info, space_info, &ticket, start_ns, + orig_bytes, flush); } /** diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index ecd24c719de4..eb348656839f 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -2025,6 +2025,35 @@ TRACE_EVENT(btrfs_convert_extent_bit, __print_flags(__entry->clear_bits, "|", EXTENT_FLAGS)) ); +TRACE_EVENT(btrfs_reserve_ticket, + TP_PROTO(const struct btrfs_fs_info *fs_info, u64 flags, u64 bytes, + u64 start_ns, int flush, int error), + + TP_ARGS(fs_info, flags, bytes, start_ns, flush, error), + + TP_STRUCT__entry_btrfs( + __field( u64, flags ) + __field( u64, bytes ) + __field( u64, start_ns ) + __field( int, flush ) + __field( int, error ) + ), + + TP_fast_assign_btrfs(fs_info, + __entry->flags = flags; + __entry->bytes = bytes; + __entry->start_ns = start_ns; + __entry->flush = flush; + __entry->error = error; + ), + + TP_printk_btrfs("flags=%s bytes=%llu start_ns=%llu flush=%s error=%d", + __print_flags(__entry->flags, "|", BTRFS_GROUP_FLAGS), + __entry->bytes, __entry->start_ns, + __print_symbolic(__entry->flush, FLUSH_ACTIONS), + __entry->error) +); + DECLARE_EVENT_CLASS(btrfs_sleep_tree_lock, TP_PROTO(const struct extent_buffer *eb, u64 start_ns), From patchwork Thu Oct 8 20:48:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9673D15E6 for ; Thu, 8 Oct 2020 20:49:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E69F2222B for ; Thu, 8 Oct 2020 20:49:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="y6MSI9LK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730723AbgJHUtE (ORCPT ); Thu, 8 Oct 2020 16:49:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730070AbgJHUtE (ORCPT ); Thu, 8 Oct 2020 16:49:04 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C0D5C0613D2 for ; Thu, 8 Oct 2020 13:49:02 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id g3so6288956qtq.10 for ; Thu, 08 Oct 2020 13:49:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=b7wlyUf0+LhXlj1tfj1Dt6sA44k40R/Fu9Edp1aUcfw=; b=y6MSI9LKExjg70d8PBeDQzYsna+53eMLXSE4t4WqMA9Yrdc1RFst86TzRlVrYucxPL SUvurm585FGoJdE3NSK/eHYYsUTRJ79WQKkUGV97Arhg1wJsdorlBmAHO5qjylme9wmU qfvRBvuY997l4huTDt78AwDliRCZZU2Vu/KNntVaQgykZmOQoTzMKLlLuM4oUZfwVmCs 9SSv8m2SJDIdpCOEF20vVF2Zp7IVuN6f/PkpaEN9uF3TZ/dT9gUT7NLaQxZI3DFf4PDi AtxxNS5w0YRz+hlaHtYxKc67sW8uaWx6mv6Ii41MTXzRgzJXyrOR8Nt5oB0q9v8vaplR z0IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=b7wlyUf0+LhXlj1tfj1Dt6sA44k40R/Fu9Edp1aUcfw=; b=BhZnMk70SDW+G6USkCK6xBi2W4esnuxGwKbNXR/HM2ZHbub2VRQ4kjKW95KPZ8Fna1 rPw+7gEsymJ6VRz26iCX7u8X0ciBj3gwUPg2avgYsiwII+F/ULkYwcurQ22bajfe/w/0 6wjerLBYksobbi0IIjC2TEf7Wxz5cD7etXbJNL68YJi2nP+GfcZsHdQlOrmMPgZQL72G u5vUNxxoW+xc5EH8TMRFy1U0MJMonuvOknpyspwEfrfksLM11Hy9IY2cmshVWSF3SemQ gaWRFtwZhgZ3jsWiFM/EFyXraqRp3gRW+fTdC7yuZ4DMS+SrSH3XP/l0TzvSRruyCULY pSRg== X-Gm-Message-State: AOAM533P3Wes36Q1jdTdgTUCLd6zbq2XepwGmwONWUK8d9CvuEMMLbcf bFczqJRjTgDQ0c+yzpT68WuVuU5Pf6YGIgwd X-Google-Smtp-Source: ABdhPJwbWrpF3KH2D3mgqY3/eJaKxxKy5UHTyft4le8zHxRfoyxnz5GCAeJ6l9yJ9vnanqg0bW6yZA== X-Received: by 2002:ac8:2f79:: with SMTP id k54mr10729737qta.148.1602190141393; Thu, 08 Oct 2020 13:49:01 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id 128sm4453510qkh.53.2020.10.08.13.49.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:00 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 02/11] btrfs: track ordered bytes instead of just dio ordered bytes Date: Thu, 8 Oct 2020 16:48:46 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We track dio_bytes because the shrink delalloc code needs to know if we have more DIO in flight than we have normal buffered IO. The reason for this is because we can't "flush" DIO, we have to just wait on the ordered extents to finish. However this is true of all ordered extents. If we have more ordered space outstanding than dirty pages we should be waiting on ordered extents. We already are ok on this front technically, because we always do a FLUSH_DELALLOC_WAIT loop, but I want to use the ordered counter in the preemptive flushing code as well, so change this to count all ordered bytes instead of just DIO ordered bytes. Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 8 ++++---- fs/btrfs/ordered-data.c | 13 ++++++------- fs/btrfs/space-info.c | 13 +++++++------ 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aac3d6f4e35b..e817b3b3483d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -790,7 +790,7 @@ struct btrfs_fs_info { /* used to keep from writing metadata until there is a nice batch */ struct percpu_counter dirty_metadata_bytes; struct percpu_counter delalloc_bytes; - struct percpu_counter dio_bytes; + struct percpu_counter ordered_bytes; s32 dirty_metadata_batch; s32 delalloc_batch; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 764001609a15..61bb3321efaa 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1466,7 +1466,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) { percpu_counter_destroy(&fs_info->dirty_metadata_bytes); percpu_counter_destroy(&fs_info->delalloc_bytes); - percpu_counter_destroy(&fs_info->dio_bytes); + percpu_counter_destroy(&fs_info->ordered_bytes); percpu_counter_destroy(&fs_info->dev_replace.bio_counter); btrfs_free_csum_hash(fs_info); btrfs_free_stripe_hash_table(fs_info); @@ -2748,7 +2748,7 @@ static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block sb->s_blocksize = BTRFS_BDEV_BLOCKSIZE; sb->s_blocksize_bits = blksize_bits(BTRFS_BDEV_BLOCKSIZE); - ret = percpu_counter_init(&fs_info->dio_bytes, 0, GFP_KERNEL); + ret = percpu_counter_init(&fs_info->ordered_bytes, 0, GFP_KERNEL); if (ret) return ret; @@ -4055,9 +4055,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) percpu_counter_sum(&fs_info->delalloc_bytes)); } - if (percpu_counter_sum(&fs_info->dio_bytes)) + if (percpu_counter_sum(&fs_info->ordered_bytes)) btrfs_info(fs_info, "at unmount dio bytes count %lld", - percpu_counter_sum(&fs_info->dio_bytes)); + percpu_counter_sum(&fs_info->ordered_bytes)); btrfs_sysfs_remove_mounted(fs_info); btrfs_sysfs_remove_fsid(fs_info->fs_devices); diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 87bac9ecdf4c..9a277a475a1c 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -202,11 +202,11 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE) set_bit(type, &entry->flags); - if (dio) { - percpu_counter_add_batch(&fs_info->dio_bytes, num_bytes, - fs_info->delalloc_batch); + percpu_counter_add_batch(&fs_info->ordered_bytes, num_bytes, + fs_info->delalloc_batch); + + if (dio) set_bit(BTRFS_ORDERED_DIRECT, &entry->flags); - } /* one ref for the tree */ refcount_set(&entry->refs, 1); @@ -480,9 +480,8 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode, btrfs_delalloc_release_metadata(btrfs_inode, entry->num_bytes, false); - if (test_bit(BTRFS_ORDERED_DIRECT, &entry->flags)) - percpu_counter_add_batch(&fs_info->dio_bytes, -entry->num_bytes, - fs_info->delalloc_batch); + percpu_counter_add_batch(&fs_info->ordered_bytes, -entry->num_bytes, + fs_info->delalloc_batch); tree = &btrfs_inode->ordered_tree; spin_lock_irq(&tree->lock); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index f1a525251c2a..96d40f8df246 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -489,7 +489,7 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, { struct btrfs_trans_handle *trans; u64 delalloc_bytes; - u64 dio_bytes; + u64 ordered_bytes; u64 items; long time_left; int loops; @@ -513,8 +513,8 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, delalloc_bytes = percpu_counter_sum_positive( &fs_info->delalloc_bytes); - dio_bytes = percpu_counter_sum_positive(&fs_info->dio_bytes); - if (delalloc_bytes == 0 && dio_bytes == 0) { + ordered_bytes = percpu_counter_sum_positive(&fs_info->ordered_bytes); + if (delalloc_bytes == 0 && ordered_bytes == 0) { if (trans) return; if (wait_ordered) @@ -527,11 +527,11 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, * ordered extents, otherwise we'll waste time trying to flush delalloc * that likely won't give us the space back we need. */ - if (dio_bytes > delalloc_bytes) + if (ordered_bytes > delalloc_bytes) wait_ordered = true; loops = 0; - while ((delalloc_bytes || dio_bytes) && loops < 3) { + while ((delalloc_bytes || ordered_bytes) && loops < 3) { btrfs_start_delalloc_roots(fs_info, items); loops++; @@ -553,7 +553,8 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, delalloc_bytes = percpu_counter_sum_positive( &fs_info->delalloc_bytes); - dio_bytes = percpu_counter_sum_positive(&fs_info->dio_bytes); + ordered_bytes = percpu_counter_sum_positive( + &fs_info->ordered_bytes); } } From patchwork Thu Oct 8 20:48:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824565 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2504A1592 for ; Thu, 8 Oct 2020 20:49:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F1A482222B for ; Thu, 8 Oct 2020 20:49:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="XYcpewhS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730727AbgJHUtG (ORCPT ); Thu, 8 Oct 2020 16:49:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730070AbgJHUtF (ORCPT ); Thu, 8 Oct 2020 16:49:05 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32500C0613D3 for ; Thu, 8 Oct 2020 13:49:04 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id c23so6337081qtp.0 for ; Thu, 08 Oct 2020 13:49:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=roK6vwcrJtcBOUWkSd+sQsSCFg/WX2OHqDirvmbON7w=; b=XYcpewhSZg+b5EVzt78eRPg8xTvN5tfVsZHLQ/rkOhSkC+fLJrBougBf0fLJ/TmaqR IOZzdOYgcPpehbXdYhE+B55ZIYnPXs31TWd1wl7HuwQ7oEnKMrqiyKT+G2lcs0cT2/sw 4PyCq358+tq5Hgn5h7TckLzF90+NygtKIP8uJH7Puwq0y531mUIeM8f54PKIBh+NxuTj edrBvOxQil+WMYLGKhRf7QzSnyaFz/To43xSABaloTG+nwWTDim3N22mY0GML+VGzJIT x/dR5dSiLyA75xbkjed1gam12a1PVWWCRKB2TLmwMoODGu22THswujKV9Lvc2di78Gqi 1FOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=roK6vwcrJtcBOUWkSd+sQsSCFg/WX2OHqDirvmbON7w=; b=FPgeoGaPjWpFGpi2FbczBGFyRDkexVfos0ATWhLQWR5qZ+jggpJU/PsSSimNcjtxsr /DOZKSuTLm+1aU2iN9hT+LX+ykcU7i4rejbpl4sVDd1HJQ0Ogqu+e7Dx8h57baiop0FP jrMBelRB1AHVT5ARCjhYicmmzg20CjbbfrHWZJucb/HxwJmF/Qo9YB2YwFYBmV+8n9v4 8LEOdPc8kLiu+P7Ug/VTReVYmUZtQwLBRxACyK7OIKnOgrge1Fje5gPHfI9/IZ35xV++ tbQ8lbS9AhxEhcyeJWUSjgroWPjEgu6ZhM4HRIilo7bUzn3+b7mKiZXx1bxHr1YiFkVd P7vA== X-Gm-Message-State: AOAM532KXNfqNZtTyG3oLc9n8ZTagFVwRb5Hiy2dO5mQ+yhcjBUmODDR yXXtcJqo9vLrkFNdz52HXB9+Yj5HO+AYNNXh X-Google-Smtp-Source: ABdhPJy1Ml9mG98fbKp6XY2bHg38EgERN9t7kF/45SRvr+MleDXGsJzgkZHHUwLnvp2BLm7wVgBwxQ== X-Received: by 2002:ac8:570f:: with SMTP id 15mr9975082qtw.203.1602190143118; Thu, 08 Oct 2020 13:49:03 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id i62sm4815473qkf.36.2020.10.08.13.49.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:02 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 03/11] btrfs: introduce a FORCE_COMMIT_TRANS flush operation Date: Thu, 8 Oct 2020 16:48:47 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Sole-y for preemptive flushing, we want to be able to force the transaction commit without any of the ambiguity of may_commit_transaction(). This is because may_commit_transaction() checks tickets and such, and in preemptive flushing we already know it'll be helpful, so use this to keep the code nice and clean and straightforward. Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h | 1 + fs/btrfs/space-info.c | 8 ++++++++ include/trace/events/btrfs.h | 3 ++- 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e817b3b3483d..84c5db91dc44 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2654,6 +2654,7 @@ enum btrfs_flush_state { ALLOC_CHUNK_FORCE = 8, RUN_DELAYED_IPUTS = 9, COMMIT_TRANS = 10, + FORCE_COMMIT_TRANS = 11, }; int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 96d40f8df246..a215470c1887 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -737,6 +737,14 @@ static void flush_space(struct btrfs_fs_info *fs_info, case COMMIT_TRANS: ret = may_commit_transaction(fs_info, space_info); break; + case FORCE_COMMIT_TRANS: + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + break; + } + ret = btrfs_commit_transaction(trans); + break; default: ret = -ENOSPC; break; diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index eb348656839f..0a3d35d952c4 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -99,7 +99,8 @@ struct btrfs_space_info; EM( ALLOC_CHUNK, "ALLOC_CHUNK") \ EM( ALLOC_CHUNK_FORCE, "ALLOC_CHUNK_FORCE") \ EM( RUN_DELAYED_IPUTS, "RUN_DELAYED_IPUTS") \ - EMe(COMMIT_TRANS, "COMMIT_TRANS") + EM(COMMIT_TRANS, "COMMIT_TRANS") \ + EMe(FORCE_COMMIT_TRANS, "FORCE_COMMIT_TRANS") /* * First define the enums in the above macros to be exported to userspace via From patchwork Thu Oct 8 20:48:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824567 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B795B15E6 for ; Thu, 8 Oct 2020 20:49:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8870D2222B for ; Thu, 8 Oct 2020 20:49:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="kNMhrXH/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730731AbgJHUtH (ORCPT ); Thu, 8 Oct 2020 16:49:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730070AbgJHUtH (ORCPT ); Thu, 8 Oct 2020 16:49:07 -0400 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 099FCC0613D2 for ; Thu, 8 Oct 2020 13:49:07 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id q26so6308346qtb.5 for ; Thu, 08 Oct 2020 13:49:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=6nmBTwZ3JTjg/IdzntkwKzBYPzJUr8Sdobs3sffeKE0=; b=kNMhrXH/EqIh/rG/61IvAMw+Hi6gFYI0U7+IjtjuKMmUJWtLYvxxcwQEsxfkXobzOC x023gvxOVmF3TuXTr1uB/qwuOw44qrcCt8kqoeASWD9awi+5rc59gNNnhzhZs0kmeepE O6jAHNGDmoqDi9OiHcwZB5l7yrjkFE9HMN154YCVROIUPNxqMtuZeJEanXvGA9jUP45w qdnWCE6UML3Jkq1kT9s/Xab2Ur3SSKnnSLWyFlQ5+hf4Zx+FyxwnBofyCfHdRVB0+RbR 7cPNQWtshNSvB2ok2z7p5iCCMlYWEoPNaAV0NfxO2FQyWU+i8vC1XxVhyEQXlnZPyaOZ gcrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6nmBTwZ3JTjg/IdzntkwKzBYPzJUr8Sdobs3sffeKE0=; b=g8FYSiOX7XDwIkptTaiaF05PzpCwaLypRRvmIb/9w6hME+3JXVWnq2J4j9igfq794S uT1mdu+R0l+Dq751HUZL32BiuWMnNUxInFyMjxfH9cOaUIhtTTp24hxyXDkpSoZa0/mv CvgCXvrN+aQ1npA6XoDFpt7Fz0W9TnDmYIegel0HSyCawHWIuVxaxRQAwfT7WdW5akj3 nYETuovHu75RV9FFEbJCxXZUtY5J+fKA7ERMOPo2UZbP9Z0xebR7eGNILUxuBVv8nXXu /1XuT2DURlX9igkzFbKgfNi3kAZCpp+lTu9hVEFMsSOz6JVRpmyTr1oF9UcKmcnytgpD WhCQ== X-Gm-Message-State: AOAM531NtEZ3SF581Sz8zzFmVf6d2ufviTk8oKVPYTNmZzkkyRkEbkfw 8HDP6FL2sM9b37fNBWEs4SxmWb7RozkmPLZD X-Google-Smtp-Source: ABdhPJwNpTzeEPLsz6GNWlZnA3hkk0fLBLEq7gJrX+OSzltQ0Nl1WG3arYEp0EbEF/y0hfyQoKR2Pw== X-Received: by 2002:ac8:7555:: with SMTP id b21mr10497158qtr.119.1602190145167; Thu, 08 Oct 2020 13:49:05 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id j25sm4721112qtr.83.2020.10.08.13.49.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:04 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 04/11] btrfs: improve preemptive background space flushing Date: Thu, 8 Oct 2020 16:48:48 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently if we ever have to flush space because we do not have enough we allocate a ticket and attach it to the space_info, and then systematically flush things in the file system that hold space reservations until our space is reclaimed. However this has a latency cost, we must go to sleep and wait for the flushing to make progress before we are woken up and allowed to continue doing our work. In order to address that we used to kick off the async worker to flush space preemptively, so that we could be reclaiming space hopefully before any tasks needed to stop and wait for space to reclaim. When I introduced the ticketed ENOSPC stuff this broke slightly in the fact that we were using tickets to indicate if we were done flushing. No tickets, no more flushing. However this meant that we essentially never preemptively flushed. This caused a write performance regression that Nikolay noticed in an unrelated patch that removed the committing of the transaction during btrfs_end_transaction. The behavior that happened pre that patch was btrfs_end_transaction() would see that we were low on space, and it would commit the transaction. This was bad because in this particular case you could end up with thousands and thousands of transactions being committed during the 5 minute reproducer. With the patch to remove this behavior you got much more sane transaction commits, but we ended up slower because we would write for a while, flush, write for a while, flush again. To address this we need to reinstate a preemptive flushing mechanism. However it is distinctly different from our ticketing flushing in that it doesn't have tickets to base it's decisions on. Instead of bolting this logic into our existing flushing work, add another worker to handle this preemptive flushing. Here we will attempt to be slightly intelligent about the things that we flushing, attempting to balance between whichever pool is taking up the most space. Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/space-info.c | 101 +++++++++++++++++++++++++++++++++++++++++- 3 files changed, 101 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 84c5db91dc44..d72469ea7c87 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -922,6 +922,7 @@ struct btrfs_fs_info { /* Used to reclaim the metadata space in the background. */ struct work_struct async_reclaim_work; struct work_struct async_data_reclaim_work; + struct work_struct preempt_reclaim_work; spinlock_t unused_bgs_lock; struct list_head unused_bgs; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 61bb3321efaa..0b2b3a4a2b47 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4003,6 +4003,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) cancel_work_sync(&fs_info->async_reclaim_work); cancel_work_sync(&fs_info->async_data_reclaim_work); + cancel_work_sync(&fs_info->preempt_reclaim_work); /* Cancel or finish ongoing discard work */ btrfs_discard_cleanup(fs_info); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index a215470c1887..be054c5b39f6 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -996,6 +996,101 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) } while (flush_state <= COMMIT_TRANS); } +/* + * This handles pre-flushing of metadata space before we get to the point that + * we need to start blocking people on tickets. The logic here is different + * from the other flush paths because it doesn't rely on tickets to tell us how + * much we need to flush, instead it attempts to keep us below the 80% full + * watermark of space by flushing whichever reservation pool is currently the + * largest. + */ +static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) +{ + struct btrfs_fs_info *fs_info; + struct btrfs_space_info *space_info; + struct btrfs_block_rsv *delayed_block_rsv; + struct btrfs_block_rsv *delayed_refs_rsv; + struct btrfs_block_rsv *global_rsv; + struct btrfs_block_rsv *trans_rsv; + u64 used; + + fs_info = container_of(work, struct btrfs_fs_info, + preempt_reclaim_work); + space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA); + delayed_block_rsv = &fs_info->delayed_block_rsv; + delayed_refs_rsv = &fs_info->delayed_refs_rsv; + global_rsv = &fs_info->global_block_rsv; + trans_rsv = &fs_info->trans_block_rsv; + + spin_lock(&space_info->lock); + used = btrfs_space_info_used(space_info, true); + while (need_do_async_reclaim(fs_info, space_info, used)) { + enum btrfs_reserve_flush_enum flush; + u64 delalloc_size = 0; + u64 to_reclaim, block_rsv_size; + u64 global_rsv_size = global_rsv->reserved; + + /* + * We don't have a precise counter for the metadata being + * reserved for delalloc, so we'll approximate it by subtracting + * out the block rsv's space from the bytes_may_use. If that + * amount is higher than the individual reserves, then we can + * assume it's tied up in delalloc reservations. + */ + block_rsv_size = global_rsv_size + + delayed_block_rsv->reserved + + delayed_refs_rsv->reserved + + trans_rsv->reserved; + if (block_rsv_size < space_info->bytes_may_use) + delalloc_size = space_info->bytes_may_use - + block_rsv_size; + spin_unlock(&space_info->lock); + + /* + * We don't want to include the global_rsv in our calculation, + * because that's space we can't touch. Subtract it from the + * block_rsv_size for the next checks. + */ + block_rsv_size -= global_rsv_size; + + /* + * We really want to avoid flushing delalloc too much, as it + * could result in poor allocation patterns, so only flush it if + * it's larger than the rest of the pools combined. + */ + if (delalloc_size > block_rsv_size) { + to_reclaim = delalloc_size; + flush = FLUSH_DELALLOC; + } else if (space_info->bytes_pinned > + (delayed_block_rsv->reserved + + delayed_refs_rsv->reserved)) { + to_reclaim = space_info->bytes_pinned; + flush = FORCE_COMMIT_TRANS; + } else if (delayed_block_rsv->reserved > + delayed_refs_rsv->reserved) { + to_reclaim = delayed_block_rsv->reserved; + flush = FLUSH_DELAYED_ITEMS_NR; + } else { + to_reclaim = delayed_refs_rsv->reserved; + flush = FLUSH_DELAYED_REFS_NR; + } + + /* + * We don't want to reclaim everything, just a portion, so scale + * down the to_reclaim by 1/4. If it takes us down to 0, + * reclaim 1 items worth. + */ + to_reclaim >>= 2; + if (!to_reclaim) + to_reclaim = btrfs_calc_insert_metadata_size(fs_info, 1); + flush_space(fs_info, space_info, to_reclaim, flush); + cond_resched(); + spin_lock(&space_info->lock); + used = btrfs_space_info_used(space_info, true); + } + spin_unlock(&space_info->lock); +} + /* * FLUSH_DELALLOC_WAIT: * Space is freed from flushing delalloc in one of two ways. @@ -1122,6 +1217,8 @@ void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info) { INIT_WORK(&fs_info->async_reclaim_work, btrfs_async_reclaim_metadata_space); INIT_WORK(&fs_info->async_data_reclaim_work, btrfs_async_reclaim_data_space); + INIT_WORK(&fs_info->preempt_reclaim_work, + btrfs_preempt_reclaim_metadata_space); } static const enum btrfs_flush_state priority_flush_states[] = { @@ -1405,11 +1502,11 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, */ if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags) && need_do_async_reclaim(fs_info, space_info, used) && - !work_busy(&fs_info->async_reclaim_work)) { + !work_busy(&fs_info->preempt_reclaim_work)) { trace_btrfs_trigger_flush(fs_info, space_info->flags, orig_bytes, flush, "preempt"); queue_work(system_unbound_wq, - &fs_info->async_reclaim_work); + &fs_info->preempt_reclaim_work); } } spin_unlock(&space_info->lock); From patchwork Thu Oct 8 20:48:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B42815E6 for ; Thu, 8 Oct 2020 20:49:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DD7DA2222B for ; Thu, 8 Oct 2020 20:49:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="Ib0tRl+E" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730739AbgJHUtK (ORCPT ); Thu, 8 Oct 2020 16:49:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730070AbgJHUtJ (ORCPT ); Thu, 8 Oct 2020 16:49:09 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 360E5C0613D2 for ; Thu, 8 Oct 2020 13:49:08 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id a23so8424567qkg.13 for ; Thu, 08 Oct 2020 13:49:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=x92GZ9vI+s90nNZj2+FRkrNJVlyuJkfTrWIDkz/tp/g=; b=Ib0tRl+E3ZexekmcVx1CkTqLO5drrNx7nM6QHsgblFAdwCoeS9YYuomFUsybOHjOQ4 hudOGIViUthSie/3wg+3WtvmL79Pjpn36EqkZdYw687vWvo3L3KcuvKpWkil0rf9WPaL hNbi/Ry/wPt9Usw3vzznc3N9KJKLbIadjUSX3uLLD/NfKwtovH220+A9ZV0AEm8ILEjo 65pIBhgDbQpD/wFL2ygkJZIq3mJ+HVbmA++P6Ym6C9+LBrYEG4yYlstKks4QxOEy0xwf tv/kTicdzi0h49MUHprPzYZY+B1FS4UZ2elk7+KSCXmAhZ163Dw9JerjO3CFCGN2zlY0 Zkeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=x92GZ9vI+s90nNZj2+FRkrNJVlyuJkfTrWIDkz/tp/g=; b=ehRZwbs4wSpyYqDCOBz7BgkNP9cg8WbEl7/q49lAHXH79l6aC0I2MSqzA980GzaK87 7KSV6PgtHGvIMdryNt1ywAMfiT/3mHHndojvOsFMnTdPf7qkbfw07svMYe7W14+T50NA j8mhaza4LELfLFS9qhFdctWQIRzK8a4vsETvZM7PlzdtK88rhoRClT7mMGTiyvl1ofM7 W5k6nF0/saqygeoKxnEa6jt6B7j5V0fudJ3ECOfwtAnHYzTrkjptFLnvvHQPAffr/H08 eeDD84dcBi0BIXMhEZmN9XoIKp5fKlzPEznjw0iX49W0pHC8R9FdDJIoGGTRsDGX8uyh 3V6Q== X-Gm-Message-State: AOAM533jngK0zIH+GC7SbOfWYECLlYEywqqN28E6fUjpUuACAOCTCjY2 Qj6y7okRbgdOHudXbH+W7eBy0DmXYQ/Rcfa8 X-Google-Smtp-Source: ABdhPJxJlqsiomOIoyd/NNUkD1ZWyvX8QgWjs6oHLxbK7KlgqwscU5ngrHmbMdYJ9Karvfxhejvlvw== X-Received: by 2002:a37:4bc5:: with SMTP id y188mr6037844qka.429.1602190146920; Thu, 08 Oct 2020 13:49:06 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id h47sm4922615qtc.80.2020.10.08.13.49.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:06 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Cc: Nikolay Borisov Subject: [PATCH v2 05/11] btrfs: rename need_do_async_reclaim Date: Thu, 8 Oct 2020 16:48:49 -0400 Message-Id: <2acc05d33ceb7cff539d4a4d8065c530c8e4433e.1602189832.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org All of our normal flushing is asynchronous reclaim, so this helper is poorly named. This is more checking if we need to preemptively flush space, so rename it to need_preemptive_reclaim. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/space-info.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index be054c5b39f6..f16abb214825 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -804,9 +804,9 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, return to_reclaim; } -static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info, - struct btrfs_space_info *space_info, - u64 used) +static inline bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, + u64 used) { u64 thresh = div_factor_fine(space_info->total_bytes, 98); @@ -1024,7 +1024,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) spin_lock(&space_info->lock); used = btrfs_space_info_used(space_info, true); - while (need_do_async_reclaim(fs_info, space_info, used)) { + while (need_preemptive_reclaim(fs_info, space_info, used)) { enum btrfs_reserve_flush_enum flush; u64 delalloc_size = 0; u64 to_reclaim, block_rsv_size; @@ -1501,7 +1501,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, * the async reclaim as we will panic. */ if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags) && - need_do_async_reclaim(fs_info, space_info, used) && + need_preemptive_reclaim(fs_info, space_info, used) && !work_busy(&fs_info->preempt_reclaim_work)) { trace_btrfs_trigger_flush(fs_info, space_info->flags, orig_bytes, flush, "preempt"); From patchwork Thu Oct 8 20:48:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824571 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC8C51592 for ; Thu, 8 Oct 2020 20:49:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86F112222B for ; Thu, 8 Oct 2020 20:49:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="BBoCNtVA" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730742AbgJHUtL (ORCPT ); Thu, 8 Oct 2020 16:49:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730070AbgJHUtL (ORCPT ); Thu, 8 Oct 2020 16:49:11 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F539C0613D3 for ; Thu, 8 Oct 2020 13:49:09 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id m9so6276373qth.7 for ; Thu, 08 Oct 2020 13:49:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CjZgctdQgK8PeLrZMsiAb+aWb1LbvU4driBerUU6Nkc=; b=BBoCNtVADtL9omXcFb+pV9zz5jYOTz0wP7zuMZGOzBirX0ziQaLpbXk1I22kaRS352 ov9STNmaM1csw8KY7+/ZZychF6AXWePaJM/rkKiFqG3t3gFfDgTzdmRg+94tYKMTTX0Z Io2jODV9aa4tT6HE00dyZRy3oPSbb/ccqy1SSegs8a96+0SS1REzpQUKfHkQXRpDhwMc DLBnj+uEcYXEdeMvEon7DvxB37gzaGvT7YVIzYftqYAifgSHQ5yHQM6F/JtDfcyEb1an 8PyBqhXE+MQt4g4M3ZGe0YyFwjtIrDfzpH8oVlpepzMkKTMiNwt0y7GGcqpbPi7bIC1H PgWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CjZgctdQgK8PeLrZMsiAb+aWb1LbvU4driBerUU6Nkc=; b=PkN03ju+U8wQfv7R36YHimJLGHZmViU7t9CbDzoAm3btATKPydgHllaDGKQNiTOFqT e2nN4+ZVZpJeABPCF0PqUz0ntvLyKv1kyHvRvYk/wqgpZF45LSMXj45FkKCjwS4lyLXS nS1vbahhJQbmkoQ7RmygL0MNLCYo8bqeUdcmknBNa6cx50a3hyhgC865ayNHbIoVWVlD ILKcN9e8pJvVrVR2fcrqs3EZoyrQh4rCN0bWrwz1ZumKXQg/7+WiirkXGW7aMHgbePbh K7yB6hAbp1U0Gz7BVco/5Ijj6X06qK/3E1xbyVzan5TO3IqFd4jHbBQzcRuCxvwwdrmf fVfA== X-Gm-Message-State: AOAM533ODZjLJfV1MFNPLkt+UBzNTUsuQoviNtcFBFZ7SOZ/pbQ8jQVJ PmIPclJgMYp+j43e510EWIqnvmCrxW0o/uWI X-Google-Smtp-Source: ABdhPJx/WKyapGPrxaN9+Vb0Ns6WCXW0XU24pyXjKZwkftgsVFc50MfyENJoeWLdiKzf7GV1lDsNng== X-Received: by 2002:ac8:6618:: with SMTP id c24mr10401087qtp.125.1602190148557; Thu, 08 Oct 2020 13:49:08 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id d9sm4868419qtg.51.2020.10.08.13.49.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:07 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Cc: Nikolay Borisov Subject: [PATCH v2 06/11] btrfs: check reclaim_size in need_preemptive_reclaim Date: Thu, 8 Oct 2020 16:48:50 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If we're flushing space for tickets then we have space_info->reclaim_size set and we do not need to do background reclaim. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/space-info.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index f16abb214825..7770372a892b 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -814,6 +814,13 @@ static inline bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh) return 0; + /* + * We have tickets queued, bail so we don't compete with the async + * flushers. + */ + if (space_info->reclaim_size) + return 0; + if (!btrfs_calc_reclaim_metadata_size(fs_info, space_info)) return 0; From patchwork Thu Oct 8 20:48:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824575 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 608F41592 for ; Thu, 8 Oct 2020 20:49:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C00021D6C for ; Thu, 8 Oct 2020 20:49:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="nG0c6Z5Q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730753AbgJHUtN (ORCPT ); Thu, 8 Oct 2020 16:49:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730070AbgJHUtM (ORCPT ); Thu, 8 Oct 2020 16:49:12 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A7E1C0613D2 for ; Thu, 8 Oct 2020 13:49:11 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id z33so2444820qth.8 for ; Thu, 08 Oct 2020 13:49:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=qw2hbRdXo/2OvsCJ5TNWt3ZvM3FC6SJw1vzpAUuKLgg=; b=nG0c6Z5QD2ka7I/BZL4qR4jXkvAnYR/mtN9DiViE4m8MZUAhdLWfR8wNJr9GxMN4O9 ipdmW2PiLJQ9L+JyfALashxKhuKWTtammBmXvOU8EEX40Jh9bxk27Fj37yjd0m3t0jE3 rcQM2CD12tg7h29pa/OB4fwKzSNd7KkYF3goWp4M7Rg2g1vVNWFBCgCtflu/XMgfcCXp Xifwl+o1Coj79L/IzfH40USoJ/F6clHMDiQvGlZrhNUrTMvLjX7agBpaDVdWKVNuQpSO L5U8XxTzWAFU+UiuXwIWkkn73tj5EyeASURsAGx3LwBIc4nkeyXkAUDqH9E+dMKrEipJ Fa5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qw2hbRdXo/2OvsCJ5TNWt3ZvM3FC6SJw1vzpAUuKLgg=; b=ondyYkCpRlsxZ/mfVfs1DwlHwFsGedZoK9shWC0mQ/V9p/GyS1ytvXNclySScvwr1q FPNVBb7xjfxRPPMFnrDSzcgJvmyoZCBiVdozF6bxBk35iHbaCpEtWn45IlvW23u5Iaci SrH+4mvMic1qJlf+YcarQBUGE/JI2EDbSHlfKW5uXlhKZmxtdboRzyOLF6Yhzih96T2X raL+ZgdgmiEY9N+3AclSpXfpUf6BdS9a4sQ9XEzLvmwmek5sXaUtkEFCl1l7ZR/qIEzr JzLrFFmGGiQD7wdTTq+QlwTSPzGwq0yHy/lLvPG3MAuiJy3uBt16MH5cwwYnU0otu4a9 SOJg== X-Gm-Message-State: AOAM532zUxugTB7jUuykovESfBTHk+M/I5QcS0rO/qg8AyrwSlRz793S QZWpnrLqRBedEpKPmolBGsmkjabgJyDYdmBH X-Google-Smtp-Source: ABdhPJzTBI1kQOZneBFhr/C+3uQMPz5g87a0RHHCWAUQ4pzfCJgJCZ4DfCxkK5+DtbEKkqtlpGO3FA== X-Received: by 2002:ac8:7b3d:: with SMTP id l29mr10844924qtu.366.1602190150231; Thu, 08 Oct 2020 13:49:10 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id t3sm4737097qtq.24.2020.10.08.13.49.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:09 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 07/11] btrfs: rework btrfs_calc_reclaim_metadata_size Date: Thu, 8 Oct 2020 16:48:51 -0400 Message-Id: <2017283776449f9c59db05f301e7929e0a8db0bf.1602189832.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs_calc_reclaim_metadata_size does two things, it returns the space currently required for flushing by the tickets, and if there are no tickets it calculates a value for the preemptive flushing. However for the normal ticketed flushing we really only care about the space required for tickets. We will accidentally come in and flush one time, but as soon as we see there are no tickets we bail out of our flushing. Fix this by making btrfs_calc_reclaim_metadata_size really only tell us what is required for flushing if we have people waiting on space. Then move the preemptive flushing logic into need_preemptive_reclaim(). We ignore btrfs_calc_reclaim_metadata_size() in need_preemptive_reclaim() because if we are in this path then we made our reservation and there are not pending tickets currently, so we do not need to check it, simply do the fuzzy logic to check if we're getting low on space. Signed-off-by: Josef Bacik Reviewed-by: Nikolay Borisov --- fs/btrfs/space-info.c | 44 ++++++++++++++++++++----------------------- 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 7770372a892b..82cc3985a4b6 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -761,7 +761,6 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, { u64 used; u64 avail; - u64 expected; u64 to_reclaim = space_info->reclaim_size; lockdep_assert_held(&space_info->lock); @@ -779,28 +778,6 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, if (space_info->total_bytes + avail < used) to_reclaim += used - (space_info->total_bytes + avail); - if (to_reclaim) - return to_reclaim; - - to_reclaim = min_t(u64, num_online_cpus() * SZ_1M, SZ_16M); - if (btrfs_can_overcommit(fs_info, space_info, to_reclaim, - BTRFS_RESERVE_FLUSH_ALL)) - return 0; - - used = btrfs_space_info_used(space_info, true); - - if (btrfs_can_overcommit(fs_info, space_info, SZ_1M, - BTRFS_RESERVE_FLUSH_ALL)) - expected = div_factor_fine(space_info->total_bytes, 95); - else - expected = div_factor_fine(space_info->total_bytes, 90); - - if (used > expected) - to_reclaim = used - expected; - else - to_reclaim = 0; - to_reclaim = min(to_reclaim, space_info->bytes_may_use + - space_info->bytes_reserved); return to_reclaim; } @@ -809,6 +786,7 @@ static inline bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, u64 used) { u64 thresh = div_factor_fine(space_info->total_bytes, 98); + u64 to_reclaim, expected; /* If we're just plain full then async reclaim just slows us down. */ if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh) @@ -821,7 +799,25 @@ static inline bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, if (space_info->reclaim_size) return 0; - if (!btrfs_calc_reclaim_metadata_size(fs_info, space_info)) + to_reclaim = min_t(u64, num_online_cpus() * SZ_1M, SZ_16M); + if (btrfs_can_overcommit(fs_info, space_info, to_reclaim, + BTRFS_RESERVE_FLUSH_ALL)) + return 0; + + used = btrfs_space_info_used(space_info, true); + if (btrfs_can_overcommit(fs_info, space_info, SZ_1M, + BTRFS_RESERVE_FLUSH_ALL)) + expected = div_factor_fine(space_info->total_bytes, 95); + else + expected = div_factor_fine(space_info->total_bytes, 90); + + if (used > expected) + to_reclaim = used - expected; + else + to_reclaim = 0; + to_reclaim = min(to_reclaim, space_info->bytes_may_use + + space_info->bytes_reserved); + if (!to_reclaim) return 0; return (used >= thresh && !btrfs_fs_closing(fs_info) && From patchwork Thu Oct 8 20:48:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824573 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 734CE15E6 for ; Thu, 8 Oct 2020 20:49:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4464D21D6C for ; Thu, 8 Oct 2020 20:49:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="gz3gBGcF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730755AbgJHUtN (ORCPT ); Thu, 8 Oct 2020 16:49:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730746AbgJHUtN (ORCPT ); Thu, 8 Oct 2020 16:49:13 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05970C0613D3 for ; Thu, 8 Oct 2020 13:49:13 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id 188so8420980qkk.12 for ; Thu, 08 Oct 2020 13:49:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=JbC6vn7ZVqIJL+b/MUMEBlIHIAOgIKH2TkxD4nMyKXU=; b=gz3gBGcFMAyfbZ7XFzVASPHA7Qo4sW23Sy3qs1bw/QOHwbxUNL4plzeRVYAwkq4rqP OB5V9FNkhI7g0RPh5tmyT5+1+KdPaLr3ZiO5Dap1phosOQSkp6LyBPAh2D6SgDb6L5xu 9i4GXgnCBkV2I/NIo2bwH49u3QhpHwpY7KszT9mxFMJZQD7V71aUfQsBjbmbE3Q7GvhU Mz96xIeOs4cDIzYQN3LCsRpZ27tZLPKdkUCIT5OSga5DrtpiZWOwfGlfoWtureI+k2Dv XNICaoh+gTl24V2eDVJs7zHfwge+3STaCnPFpOLX5umCcp+MclFxLjBwde0KKEWgvd8t rw7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JbC6vn7ZVqIJL+b/MUMEBlIHIAOgIKH2TkxD4nMyKXU=; b=bOsKNx8Mia+GZ2Xu6sQVGcG+oK04l7i6F4fVIlC9tuVvjgKmWNbEisYSVDeL/uG8oz lgAjxw0SLNJwuA288YfEcGzMR3Wx52AlLMo4OhY+nBeoOTDd1SAoRvpDaE0SYYFR3gv+ kJxZnyd5F8OW1zcsD3CJChtAxjTtOIcNbv8CWHGOPqJ3S7gkgXR2+2/0BC7txdHPIkM9 fQKVWykTbZFGo3N4rLHrGrK5uQIEpB3JTV04d7Fo5L6hVp7Di1fgteg5rY8Zv8/covaj cdj+4JWQK53rwwXsHh4Ne/BfIIsvasTaAVqJ0va+1xbaLz3CPTIIVvEeNwuY1+OURGbT PZZA== X-Gm-Message-State: AOAM533eD8Yivb167y/5gtxYF3DDC3S/Tfb7KFBnOT/tXvYyJ45rMhTI ReG5JDVkQsYK0cajgWu3sh1pvh49C9F9uHNQ X-Google-Smtp-Source: ABdhPJwNVpwmZjM4e/JyoorxzXvJ/9OJyOekixUggjCTbp3wJiTECtBvwrPl0lFHiE9RoOeEBhry+A== X-Received: by 2002:a37:a251:: with SMTP id l78mr9577394qke.291.1602190151816; Thu, 08 Oct 2020 13:49:11 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id p17sm3932427qtq.79.2020.10.08.13.49.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:11 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 08/11] btrfs: simplify the logic in need_preemptive_flushing Date: Thu, 8 Oct 2020 16:48:52 -0400 Message-Id: <6602110b46e619a85bddfa0d19e936e303120bfc.1602189832.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org A lot of this was added all in one go with no explanation, and is a bit unwieldy and confusing. Simplify the logic to start preemptive flushing if we've reserved more than half of our available free space. Signed-off-by: Josef Bacik --- fs/btrfs/space-info.c | 73 ++++++++++++++++++++++++++++--------------- 1 file changed, 48 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 82cc3985a4b6..4dfd99846534 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -782,11 +782,11 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, } static inline bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, - struct btrfs_space_info *space_info, - u64 used) + struct btrfs_space_info *space_info) { + u64 ordered, delalloc; u64 thresh = div_factor_fine(space_info->total_bytes, 98); - u64 to_reclaim, expected; + u64 used; /* If we're just plain full then async reclaim just slows us down. */ if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh) @@ -799,26 +799,52 @@ static inline bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, if (space_info->reclaim_size) return 0; - to_reclaim = min_t(u64, num_online_cpus() * SZ_1M, SZ_16M); - if (btrfs_can_overcommit(fs_info, space_info, to_reclaim, - BTRFS_RESERVE_FLUSH_ALL)) - return 0; + /* + * If we have over half of the free space occupied by reservations or + * pinned then we want to start flushing. + * + * We do not do the traditional thing here, which is to say + * + * if (used >= ((total_bytes + avail) >> 1)) + * return 1; + * + * because this doesn't quite work how we want. If we had more than 50% + * of the space_info used by bytes_used and we had 0 available we'd just + * constantly run the background flusher. Instead we want it to kick in + * if our reclaimable space exceeds 50% of our available free space. + */ + thresh = calc_available_free_space(fs_info, space_info, + BTRFS_RESERVE_FLUSH_ALL); + thresh += (space_info->total_bytes - space_info->bytes_used - + space_info->bytes_reserved - space_info->bytes_readonly); + thresh >>= 1; - used = btrfs_space_info_used(space_info, true); - if (btrfs_can_overcommit(fs_info, space_info, SZ_1M, - BTRFS_RESERVE_FLUSH_ALL)) - expected = div_factor_fine(space_info->total_bytes, 95); - else - expected = div_factor_fine(space_info->total_bytes, 90); + used = space_info->bytes_pinned; - if (used > expected) - to_reclaim = used - expected; + /* + * If we have more ordered bytes than delalloc bytes then we're either + * doing a lot of DIO, or we simply don't have a lot of delalloc waiting + * around. Preemptive flushing is only useful in that it can free up + * space before tickets need to wait for things to finish. In the case + * of ordered extents, preemptively waiting on ordered extents gets us + * nothing, if our reservations are tied up in ordered extents we'll + * simply have to slow down writers by forcing them to wait on ordered + * extents. + * + * In the case that ordered is larger than delalloc, only include the + * block reserves that we would actually be able to directly reclaim + * from. In this case if we're heavy on metadata operations this will + * clearly be heavy enough to warrant preemptive flushing. In the case + * of heavy DIO or ordered reservations, preemptive flushing will just + * waste time and cause us to slow down. + */ + ordered = percpu_counter_sum_positive(&fs_info->ordered_bytes); + delalloc = percpu_counter_sum_positive(&fs_info->delalloc_bytes); + if (ordered >= delalloc) + used += fs_info->delayed_refs_rsv.reserved + + fs_info->delayed_block_rsv.reserved; else - to_reclaim = 0; - to_reclaim = min(to_reclaim, space_info->bytes_may_use + - space_info->bytes_reserved); - if (!to_reclaim) - return 0; + used += space_info->bytes_may_use; return (used >= thresh && !btrfs_fs_closing(fs_info) && !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)); @@ -1015,7 +1041,6 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) struct btrfs_block_rsv *delayed_refs_rsv; struct btrfs_block_rsv *global_rsv; struct btrfs_block_rsv *trans_rsv; - u64 used; fs_info = container_of(work, struct btrfs_fs_info, preempt_reclaim_work); @@ -1026,8 +1051,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) trans_rsv = &fs_info->trans_block_rsv; spin_lock(&space_info->lock); - used = btrfs_space_info_used(space_info, true); - while (need_preemptive_reclaim(fs_info, space_info, used)) { + while (need_preemptive_reclaim(fs_info, space_info)) { enum btrfs_reserve_flush_enum flush; u64 delalloc_size = 0; u64 to_reclaim, block_rsv_size; @@ -1089,7 +1113,6 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) flush_space(fs_info, space_info, to_reclaim, flush); cond_resched(); spin_lock(&space_info->lock); - used = btrfs_space_info_used(space_info, true); } spin_unlock(&space_info->lock); } @@ -1504,7 +1527,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, * the async reclaim as we will panic. */ if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags) && - need_preemptive_reclaim(fs_info, space_info, used) && + need_preemptive_reclaim(fs_info, space_info) && !work_busy(&fs_info->preempt_reclaim_work)) { trace_btrfs_trigger_flush(fs_info, space_info->flags, orig_bytes, flush, "preempt"); From patchwork Thu Oct 8 20:48:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824577 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE8A41592 for ; Thu, 8 Oct 2020 20:49:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8672D2223F for ; Thu, 8 Oct 2020 20:49:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="0qY5nzIq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730763AbgJHUtQ (ORCPT ); Thu, 8 Oct 2020 16:49:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730746AbgJHUtQ (ORCPT ); Thu, 8 Oct 2020 16:49:16 -0400 Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AADF4C0613D2 for ; Thu, 8 Oct 2020 13:49:14 -0700 (PDT) Received: by mail-qk1-x730.google.com with SMTP id 140so6757054qko.2 for ; Thu, 08 Oct 2020 13:49:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=J10bwqxUMUJRDGcvCUpdvN+5/oBXir1tP9do4FvVUmM=; b=0qY5nzIqbxafQPD4koy0rWsCXYIa8HGtd+h0vzJCIyNG5z2PrDSpEkvfwPPzlEfHKP uSj7jHiFkUvMSz3M6E3eyPnWLq3B+l9k1u8P8aDyrkbpVN7oRr/PBh8vkCCCErZ/O70t pLd9vybcPSLzSXoZCUeZ2n62hfJhpSnVGRm7s3HcZssQkNDyBXrerjDIs5AjFdrmZuHp o7OacZKlafEYv0L64tKYjRPnDcyptG2iZgIvvjEGn8sKcPZz/QXDSUJthOejOf0WwWrz tQEQTr2aN1dFtRFH8Edh3VP88xYlivn4CGhOH7U4qqWvhXpuUi+U80zxHv3wDNV9gUfg Q4QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=J10bwqxUMUJRDGcvCUpdvN+5/oBXir1tP9do4FvVUmM=; b=UpRgdgJ2wKeiZERCIYHw9V+qoXzMzXrBOgJP3ULJDijlwqWatYT/DYEmqNZkCKd+14 XklJzrD7/zQFG0D528DHGjSX0hdIrwyPYlU6GD3uYZUuLjtpYcOwUyYCNTBOgSR0pDbn B+CDVus25IH8dqtFRpQ8Bz9VbBBKU5EYCEnrt4cWZWzlFZhW0DnL6u7xjANv3jQkuegL EKE6xK1RTLB3tB+Ll+S03EIu0zgmPupr9aSP0Q2nmniJAORM0U7Z6K+lLtMUeVi9eSI8 xGIhs60LFkwUHJmbu3u125EjuD5zKxolmigWwcidZZdyO4098hHD+RUNtLDhY5ho0wjc JJog== X-Gm-Message-State: AOAM531NjHtSeh+WaeQLGG2Q/i4Xq3Rx3Mliri6UPBgV9xO9Jyw2jgDI Wyy8xyLjTtof6wf5DlaOYZB96VhZVHbi35Nu X-Google-Smtp-Source: ABdhPJxzIaZJPQ78E6RStEWPNa4oOs5QiNswrA1Pztz/zR/AVuq2xShwbig1+wpWmrKvsPEEpHH90A== X-Received: by 2002:a37:4854:: with SMTP id v81mr10253899qka.20.1602190153427; Thu, 08 Oct 2020 13:49:13 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id l30sm4970467qta.73.2020.10.08.13.49.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:12 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Cc: Nikolay Borisov Subject: [PATCH v2 09/11] btrfs: implement space clamping for preemptive flushing Date: Thu, 8 Oct 2020 16:48:53 -0400 Message-Id: <501bfeaa3dc6f4c59dd6062f6108bab974316b85.1602189832.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Starting preemptive flushing at 50% of available free space is a good start, but some workloads are particularly abusive and can quickly overwhelm the preemptive flushing code and drive us into using tickets. Handle this by clamping down on our threshold for starting and continuing to run preemptive flushing. This is particularly important for our overcommit case, as we can really drive the file system into overages and then it's more difficult to pull it back as we start to actually fill up the file system. The clamping is essentially 2^CLAMP, but we start at 1 so whatever we calculate for overcommit is the baseline. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/space-info.c | 40 ++++++++++++++++++++++++++++++++++++++-- fs/btrfs/space-info.h | 3 +++ 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 4dfd99846534..71bebb60f0ce 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -206,6 +206,7 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags) INIT_LIST_HEAD(&space_info->ro_bgs); INIT_LIST_HEAD(&space_info->tickets); INIT_LIST_HEAD(&space_info->priority_tickets); + space_info->clamp = 1; ret = btrfs_sysfs_add_space_info_type(info, space_info); if (ret) @@ -811,13 +812,13 @@ static inline bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, * because this doesn't quite work how we want. If we had more than 50% * of the space_info used by bytes_used and we had 0 available we'd just * constantly run the background flusher. Instead we want it to kick in - * if our reclaimable space exceeds 50% of our available free space. + * if our reclaimable space exceeds our clamped free space. */ thresh = calc_available_free_space(fs_info, space_info, BTRFS_RESERVE_FLUSH_ALL); thresh += (space_info->total_bytes - space_info->bytes_used - space_info->bytes_reserved - space_info->bytes_readonly); - thresh >>= 1; + thresh >>= space_info->clamp; used = space_info->bytes_pinned; @@ -1041,6 +1042,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) struct btrfs_block_rsv *delayed_refs_rsv; struct btrfs_block_rsv *global_rsv; struct btrfs_block_rsv *trans_rsv; + int loops = 0; fs_info = container_of(work, struct btrfs_fs_info, preempt_reclaim_work); @@ -1057,6 +1059,8 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) u64 to_reclaim, block_rsv_size; u64 global_rsv_size = global_rsv->reserved; + loops++; + /* * We don't have a precise counter for the metadata being * reserved for delalloc, so we'll approximate it by subtracting @@ -1114,6 +1118,10 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) cond_resched(); spin_lock(&space_info->lock); } + + /* We only went through once, back off our clamping. */ + if (loops == 1 && !space_info->reclaim_size) + space_info->clamp = max(1, space_info->clamp - 1); spin_unlock(&space_info->lock); } @@ -1427,6 +1435,26 @@ static inline bool is_normal_flushing(enum btrfs_reserve_flush_enum flush) (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL); } +static inline void maybe_clamp_preempt(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info) +{ + u64 ordered, delalloc; + + ordered = percpu_counter_sum_positive(&fs_info->ordered_bytes); + delalloc = percpu_counter_sum_positive(&fs_info->delalloc_bytes); + + /* + * If we're heavy on ordered operations then clamping won't help us. We + * need to clamp specifically to keep up with dirty'ing buffered + * writers, because there's not a 1:1 correlation of writing delalloc + * and freeing space, like there is with flushing delayed refs or + * delayed nodes. If we're already more ordered than delalloc then + * we're keeping up, otherwise we aren't and should probably clamp. + */ + if (ordered < delalloc) + space_info->clamp = min(space_info->clamp + 1, 8); +} + /** * reserve_metadata_bytes - try to reserve bytes from the block_rsv's space * @root - the root we're allocating for @@ -1519,6 +1547,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, list_add_tail(&ticket.list, &space_info->priority_tickets); } + + /* + * We were forced to add a reserve ticket, so our preemptive + * flushing is unable to keep up. Clamp down on the threshold + * for the preemptive flushing in order to keep up with the + * workload. + */ + maybe_clamp_preempt(fs_info, space_info); } else if (!ret && space_info->flags & BTRFS_BLOCK_GROUP_METADATA) { used += orig_bytes; /* diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 5646393b928c..bcbbfae131f6 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -22,6 +22,9 @@ struct btrfs_space_info { the space info if we had an ENOSPC in the allocator. */ + int clamp; /* Used to scale our threshold for preemptive + flushing. */ + unsigned int full:1; /* indicates that we cannot allocate any more chunks for this space */ unsigned int chunk_alloc:1; /* set if we are allocating a chunk */ From patchwork Thu Oct 8 20:48:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824579 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC8CE1592 for ; Thu, 8 Oct 2020 20:49:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94EBE21D6C for ; Thu, 8 Oct 2020 20:49:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="rb5GkT83" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730768AbgJHUtS (ORCPT ); Thu, 8 Oct 2020 16:49:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730765AbgJHUtQ (ORCPT ); Thu, 8 Oct 2020 16:49:16 -0400 Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C592C0613D2 for ; Thu, 8 Oct 2020 13:49:16 -0700 (PDT) Received: by mail-qv1-xf35.google.com with SMTP id j3so3751346qvi.7 for ; Thu, 08 Oct 2020 13:49:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kLZoYBEkTkdhWzTpj8OYyAbBB+9Bn+Xc6aKwXFI8Lx8=; b=rb5GkT83uGUmqpxaIKJYkDW15xlM7X2vnZeGUniBtzBsvK+zt+bKOMvtOKzcEIgR6j XpRc5+YfFh6hyg8POwm/wRYUhMyfWMZ8lU5+rtPjzWk7QR1uEnhvuQknIH9bv5pMhVtM +pbDWPdnzoAkMUktZYQ/tehiJcea8HQ1ma0+qWPeMbvOMsg43Iu6GeJPhQU9NXSw8GMv muUiSm6XYpkQqI2fxbWs8cwD9MKUp+FrmMnGyR/hOCs0CVfYVQooRzDDL6GtCPfjOuUW ZeMQoqLBThHJkzXe5bAbwmR22DwGKdem9REbzXd9bCaxAvF4e5xoNnH0goyIE97zeFUr T1uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kLZoYBEkTkdhWzTpj8OYyAbBB+9Bn+Xc6aKwXFI8Lx8=; b=fe/xTn+1OHmyOzLHGRBiJ8fvEN5d6nomrCiT3PsovVCkbwJVPoe4RG72jFpxP81Ppi CGmJGNwmI3606xMqd4wlLXP+qG2zeqyqoF6EzHAAsCwjw9XQ5A7L8I6o69A46vnXv1ij us/jicn7K++tfmXx7WaaF2TT0gCopw4YKy018TsHLr0wHZ6ogFIXV1PsS7bzDiHiPfOg CLTt0yNnkgGvIIXXGRSOiK++V9nhm/ZZ0nQ+WaGdQAXUzJR+w6nVGq54P8RgKRTWET2c pbC0IqAxuhc3ZRxWNje4Rv5oRnPNVAXiF8UO/Vr88a/XMS90ea9d85CY03HO30IzINRy Qn3g== X-Gm-Message-State: AOAM530tswY+B7RHupYumq3dwkVbscc9kRk/csLnCG1yNaw4Q+yhckIs Y1clCudnTR2yfzkiw2pifJykBXTygSoz0JKg X-Google-Smtp-Source: ABdhPJxcjpImc0SC/iRrFBUkwVkhAFIGUl8UB4dMEI/S25jdrFJaZQ1ZE8XoSpAnXsGUCe5KT8SEjg== X-Received: by 2002:a0c:b29e:: with SMTP id r30mr10196920qve.38.1602190155205; Thu, 08 Oct 2020 13:49:15 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id e17sm4455514qte.11.2020.10.08.13.49.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:14 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Cc: Nikolay Borisov Subject: [PATCH v2 10/11] btrfs: adjust the flush trace point to include the source Date: Thu, 8 Oct 2020 16:48:54 -0400 Message-Id: X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since we have normal ticketed flushing and preemptive flushing, adjust the tracepoint so that we know the source of the flushing action to make it easier to debug problems. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/space-info.c | 20 ++++++++++++-------- include/trace/events/btrfs.h | 10 ++++++---- 2 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 71bebb60f0ce..c5fc90dd8378 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -669,7 +669,7 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info, */ static void flush_space(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, u64 num_bytes, - int state) + int state, bool for_preempt) { struct btrfs_root *root = fs_info->extent_root; struct btrfs_trans_handle *trans; @@ -752,7 +752,7 @@ static void flush_space(struct btrfs_fs_info *fs_info, } trace_btrfs_flush_space(fs_info, space_info->flags, num_bytes, state, - ret); + ret, for_preempt); return; } @@ -978,7 +978,8 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) flush_state = FLUSH_DELAYED_ITEMS_NR; do { - flush_space(fs_info, space_info, to_reclaim, flush_state); + flush_space(fs_info, space_info, to_reclaim, flush_state, + false); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { space_info->flush = 0; @@ -1114,7 +1115,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) to_reclaim >>= 2; if (!to_reclaim) to_reclaim = btrfs_calc_insert_metadata_size(fs_info, 1); - flush_space(fs_info, space_info, to_reclaim, flush); + flush_space(fs_info, space_info, to_reclaim, flush, true); cond_resched(); spin_lock(&space_info->lock); } @@ -1205,7 +1206,8 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work) spin_unlock(&space_info->lock); while (!space_info->full) { - flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE); + flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE, + false); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { space_info->flush = 0; @@ -1218,7 +1220,7 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work) while (flush_state < ARRAY_SIZE(data_flush_states)) { flush_space(fs_info, space_info, U64_MAX, - data_flush_states[flush_state]); + data_flush_states[flush_state], false); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { space_info->flush = 0; @@ -1291,7 +1293,8 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info, flush_state = 0; do { - flush_space(fs_info, space_info, to_reclaim, states[flush_state]); + flush_space(fs_info, space_info, to_reclaim, states[flush_state], + false); flush_state++; spin_lock(&space_info->lock); if (ticket->bytes == 0) { @@ -1307,7 +1310,8 @@ static void priority_reclaim_data_space(struct btrfs_fs_info *fs_info, struct reserve_ticket *ticket) { while (!space_info->full) { - flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE); + flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE, + false); spin_lock(&space_info->lock); if (ticket->bytes == 0) { spin_unlock(&space_info->lock); diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 0a3d35d952c4..6d93637bae02 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1112,15 +1112,16 @@ TRACE_EVENT(btrfs_trigger_flush, TRACE_EVENT(btrfs_flush_space, TP_PROTO(const struct btrfs_fs_info *fs_info, u64 flags, u64 num_bytes, - int state, int ret), + int state, int ret, int for_preempt), - TP_ARGS(fs_info, flags, num_bytes, state, ret), + TP_ARGS(fs_info, flags, num_bytes, state, ret, for_preempt), TP_STRUCT__entry_btrfs( __field( u64, flags ) __field( u64, num_bytes ) __field( int, state ) __field( int, ret ) + __field( int, for_preempt ) ), TP_fast_assign_btrfs(fs_info, @@ -1128,15 +1129,16 @@ TRACE_EVENT(btrfs_flush_space, __entry->num_bytes = num_bytes; __entry->state = state; __entry->ret = ret; + __entry->for_preempt = for_preempt; ), - TP_printk_btrfs("state=%d(%s) flags=%llu(%s) num_bytes=%llu ret=%d", + TP_printk_btrfs("state=%d(%s) flags=%llu(%s) num_bytes=%llu ret=%d for_preempt=%d", __entry->state, __print_symbolic(__entry->state, FLUSH_STATES), __entry->flags, __print_flags((unsigned long)__entry->flags, "|", BTRFS_GROUP_FLAGS), - __entry->num_bytes, __entry->ret) + __entry->num_bytes, __entry->ret, __entry->for_preempt) ); DECLARE_EVENT_CLASS(btrfs__reserved_extent, From patchwork Thu Oct 8 20:48:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 11824581 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D74D15E6 for ; Thu, 8 Oct 2020 20:49:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 17F9B2222B for ; Thu, 8 Oct 2020 20:49:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="NlLF7uSn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730774AbgJHUtT (ORCPT ); Thu, 8 Oct 2020 16:49:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730765AbgJHUtS (ORCPT ); Thu, 8 Oct 2020 16:49:18 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AF30C0613D2 for ; Thu, 8 Oct 2020 13:49:18 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id s4so8477858qkf.7 for ; Thu, 08 Oct 2020 13:49:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=AcOwfXeJzCPo3rQkJuqCG0euPlIHFq1d+mOcr5uXw5o=; b=NlLF7uSncO7nQ1QQKSMIOVu12OAs7mlsNWMRXITb80kIRgumYk26TqhSG+5rAt5eUp qGmEFRKtjs10jBbVJIYRDrgCQDD028OmWOms/+ANW57iM/VWTGaX6s3JrsvvI9fbjOwB rOrAGzRj1xqJ4LPL5ccL39dwwWMbH9cjGSEqaBn/I/77QH7KI82P31dDASabzN3jf31R IIH2qfaxZWleBVIG7PEUdKpss/ay7YxU1049zQ23JzBEY2wbWeUiqkcVO3D3RoLkACnQ V05qy7Bj6ZplSFoJCPPE4bc/ukNsax99Nt5EMig7fLiKdDHDx74IKi4mRcLF7WP5/sFV yPMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AcOwfXeJzCPo3rQkJuqCG0euPlIHFq1d+mOcr5uXw5o=; b=UBwFVE7hKKDfLChzMWss7r8pNuEVWWuCcLaWxVyHOnoNxlgqQ6lSzfHHvcsby/9A8i s/BMbJEG6ulOEGblfG05c2vDMS8zcBhWKPwFYYBRuoBagoe/RSAyPXF4Z5kGYzgza6ti x9bUhyxZXfKFJwRYsZ5KFMXMQsan9yZsYnMKcS/nuFR2REAiGrCXH+UaVGuuI7oMDus5 J/JA99CRysThk3m+60/kcYExAz/Up6neL0sHvmASc9PY91mUFR4XhAuYq3lOhCvEE/DA d2DMMlJEn9Y0d9X73shYjS/Iv7pLzBkEP4Xd04doSiWzr8pd+8o8U/89OihdUZEwaKO+ /nDA== X-Gm-Message-State: AOAM5307PYDynLEwkl98GUPKR9mU6mYFaIr/wemcwpjJ1HqF+mCPCPEj GBGU7aOQYzMwga5YOxlUMfWfzEeojCYXGJg7 X-Google-Smtp-Source: ABdhPJx9/ERdP1RfF4Dt8v7TValekY3HgG5yUospl8ZCxmKqC1n05jyLdP1zmLvnnNW1o0gMnB+veA== X-Received: by 2002:a37:dc2:: with SMTP id 185mr8029707qkn.103.1602190156957; Thu, 08 Oct 2020 13:49:16 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id a66sm4768309qkd.47.2020.10.08.13.49.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 13:49:16 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 11/11] btrfs: add a trace class for dumping the current ENOSPC state Date: Thu, 8 Oct 2020 16:48:55 -0400 Message-Id: <1c3db31ef6d948a07236e63bc7b7b09f4671fbc5.1602189832.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Often when I'm debugging ENOSPC related issues I have to resort to printing the entire ENOSPC state with trace_printk() in different spots. This gets pretty annoying, so add a trace state that does this for us. Then add a trace point at the end of preemptive flushing so you can see the state of the space_info when we decide to exit preemptive flushing. This helped me figure out we weren't kicking in the preemptive flushing soon enough. Signed-off-by: Josef Bacik Reviewed-by: Nikolay Borisov --- fs/btrfs/space-info.c | 1 + include/trace/events/btrfs.h | 62 ++++++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index c5fc90dd8378..a4ffe8618126 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1123,6 +1123,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) /* We only went through once, back off our clamping. */ if (loops == 1 && !space_info->reclaim_size) space_info->clamp = max(1, space_info->clamp - 1); + trace_btrfs_done_preemptive_reclaim(fs_info, space_info); spin_unlock(&space_info->lock); } diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 6d93637bae02..74b466dc20ac 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -2028,6 +2028,68 @@ TRACE_EVENT(btrfs_convert_extent_bit, __print_flags(__entry->clear_bits, "|", EXTENT_FLAGS)) ); +DECLARE_EVENT_CLASS(btrfs_dump_space_info, + TP_PROTO(const struct btrfs_fs_info *fs_info, + const struct btrfs_space_info *sinfo), + + TP_ARGS(fs_info, sinfo), + + TP_STRUCT__entry_btrfs( + __field( u64, flags ) + __field( u64, total_bytes ) + __field( u64, bytes_used ) + __field( u64, bytes_pinned ) + __field( u64, bytes_reserved ) + __field( u64, bytes_may_use ) + __field( u64, bytes_readonly ) + __field( u64, reclaim_size ) + __field( int, clamp ) + __field( u64, global_reserved ) + __field( u64, trans_reserved ) + __field( u64, delayed_refs_reserved ) + __field( u64, delayed_reserved ) + __field( u64, free_chunk_space ) + ), + + TP_fast_assign_btrfs(fs_info, + __entry->flags = sinfo->flags; + __entry->total_bytes = sinfo->total_bytes; + __entry->bytes_used = sinfo->bytes_used; + __entry->bytes_pinned = sinfo->bytes_pinned; + __entry->bytes_reserved = sinfo->bytes_reserved; + __entry->bytes_may_use = sinfo->bytes_may_use; + __entry->bytes_readonly = sinfo->bytes_readonly; + __entry->reclaim_size = sinfo->reclaim_size; + __entry->clamp = sinfo->clamp; + __entry->global_reserved = fs_info->global_block_rsv.reserved; + __entry->trans_reserved = fs_info->trans_block_rsv.reserved; + __entry->delayed_refs_reserved = fs_info->delayed_refs_rsv.reserved; + __entry->delayed_reserved = fs_info->delayed_block_rsv.reserved; + __entry->free_chunk_space = atomic64_read(&fs_info->free_chunk_space); + ), + + TP_printk_btrfs("flags=%s total_bytes=%llu bytes_used=%llu " + "bytes_pinned=%llu bytes_reserved=%llu " + "bytes_may_use=%llu bytes_readonly=%llu " + "reclaim_size=%llu clamp=%d global_reserved=%llu " + "trans_reserved=%llu delayed_refs_reserved=%llu " + "delayed_reserved=%llu chunk_free_space=%llu", + __print_flags(__entry->flags, "|", BTRFS_GROUP_FLAGS), + __entry->total_bytes, __entry->bytes_used, + __entry->bytes_pinned, __entry->bytes_reserved, + __entry->bytes_may_use, __entry->bytes_readonly, + __entry->reclaim_size, __entry->clamp, + __entry->global_reserved, __entry->trans_reserved, + __entry->delayed_refs_reserved, + __entry->delayed_reserved, __entry->free_chunk_space) +); + +DEFINE_EVENT(btrfs_dump_space_info, btrfs_done_preemptive_reclaim, + TP_PROTO(const struct btrfs_fs_info *fs_info, + const struct btrfs_space_info *sinfo), + TP_ARGS(fs_info, sinfo) +); + TRACE_EVENT(btrfs_reserve_ticket, TP_PROTO(const struct btrfs_fs_info *fs_info, u64 flags, u64 bytes, u64 start_ns, int flush, int error),