From patchwork Tue Jun 29 13:59:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12349987 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7346C11F6A for ; Tue, 29 Jun 2021 13:59:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C221E61DAD for ; Tue, 29 Jun 2021 13:59:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234237AbhF2OB6 (ORCPT ); Tue, 29 Jun 2021 10:01:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233888AbhF2OB4 (ORCPT ); Tue, 29 Jun 2021 10:01:56 -0400 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 313C7C061768 for ; Tue, 29 Jun 2021 06:59:29 -0700 (PDT) Received: by mail-qt1-x82c.google.com with SMTP id g3so11868277qth.11 for ; Tue, 29 Jun 2021 06:59:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Z9+9p3DkUHErCibT+iydqnILLuZw6VNnyzs7xEbrTYU=; b=bGKWxbPYJooGpEobmKjOr2YirWVM4n0aMrkaE4Lb6UR+moWowxKut7DBR8/i4rnDNx HCn0221xyUjqQjoX7hpCkG2PJqURYDrlk3BV5KHKaSFmBqGJC+hsPjrDSMVCOw3Vt9WY T4YnQYJaFw26WmABTL6UdUqNoZG6xYJ3xcrPzd9Vxf5W9sJ1PKZvEznLAIWHB9Dz7cp3 tZrUBzDTy2DRXu82pez7ik3A1BgCDvfBFm3wJ8o+US7sSWR2YZRTLTUeTuh9YJcx9nWo k+BmQQw4XxkZdHxl+QeuGmyazgFds6R+WhlWHq/0twrKdgbabdqYpPCuycClaQd9p0Le b4XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Z9+9p3DkUHErCibT+iydqnILLuZw6VNnyzs7xEbrTYU=; b=bfeWrOqFsTA7JUIZA9cq0u4tb1/UxovUsFdwmo+modfZ3xER69h/zQe3fu+1GUf69j fW0cKrU7OA5OUqM5t0T6Q6zscI8OzfNxhgdT8cjMHWRdRtK1AVJZ+hoVps9nX9uk30eA yOV/FpBY/oZToBcERAjbJiu6DLolYSb7UG4X3n86+dlMK1cEGh4QCY93IUQqE1j8S7Hb s9VLt9RlNSRrbAYLr0/Z4DgsgYoUa4NkNjKivNdF/R1ZaATBy+/RkyqtkRQphHW1HR27 IyVJllt01V0ab/4KSC9ewOs57eKeO2yZSmOhvORU9bPndESX0Ho3JB7p3nkPsNs1LYWF n7eA== X-Gm-Message-State: AOAM5319gFDtBL4UooH8Wgfo1LhlNNiwLrnqXpdQTcIuUeJCMx8fc/IW opqwWPTO0sY2HqqmlU6NtxzxCkb2TM+cPw== X-Google-Smtp-Source: ABdhPJyASv4gh5GHQYK0u1rZ7RfJV7UwQ3QSHEOdHtTxbtbrlU+WlVBoJ7EoSjVE4y9Z9xfC8nFvbA== X-Received: by 2002:a05:622a:11cd:: with SMTP id n13mr26919751qtk.233.1624975167940; Tue, 29 Jun 2021 06:59:27 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id v20sm5861554qto.89.2021.06.29.06.59.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:27 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 1/8] btrfs: enable a tracepoint when we fail tickets Date: Tue, 29 Jun 2021 09:59:17 -0400 Message-Id: <196e7895350732ab509b4003427c95fce89b0d9c.1624974951.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org When debugging early enospc problems it was useful to have a tracepoint where we failed all tickets so I could check the state of the enospc counters at failure time to validate my fixes. This adds the tracpoint so you can easily get that information. Signed-off-by: Josef Bacik --- fs/btrfs/space-info.c | 2 ++ include/trace/events/btrfs.h | 6 ++++++ 2 files changed, 8 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 392997376a1c..af161eb808a2 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -825,6 +825,8 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info, struct reserve_ticket *ticket; u64 tickets_id = space_info->tickets_id; + trace_btrfs_fail_all_tickets(fs_info, space_info); + if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) { btrfs_info(fs_info, "cannot satisfy tickets, dumping space info"); __btrfs_dump_space_info(fs_info, space_info); diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index c7237317a8b9..3d81ba8c37b9 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -2098,6 +2098,12 @@ DEFINE_EVENT(btrfs_dump_space_info, btrfs_done_preemptive_reclaim, TP_ARGS(fs_info, sinfo) ); +DEFINE_EVENT(btrfs_dump_space_info, btrfs_fail_all_tickets, + TP_PROTO(const struct btrfs_fs_info *fs_info, + const struct btrfs_space_info *sinfo), + TP_ARGS(fs_info, sinfo) +); + TRACE_EVENT(btrfs_reserve_ticket, TP_PROTO(const struct btrfs_fs_info *fs_info, u64 flags, u64 bytes, u64 start_ns, int flush, int error), From patchwork Tue Jun 29 13:59:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12349989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFE80C11F66 for ; Tue, 29 Jun 2021 13:59:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8EC1861DAD for ; Tue, 29 Jun 2021 13:59:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234254AbhF2OB7 (ORCPT ); Tue, 29 Jun 2021 10:01:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234242AbhF2OB6 (ORCPT ); Tue, 29 Jun 2021 10:01:58 -0400 Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA87AC061767 for ; Tue, 29 Jun 2021 06:59:30 -0700 (PDT) Received: by mail-qv1-xf32.google.com with SMTP id m15so11088007qvc.9 for ; Tue, 29 Jun 2021 06:59:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ggfC/maK0oO8BzruPYnLm8oCOora3TE0jM1VJ+qaIJo=; b=HfdW7y98J8v5ntVSubohXh43BZv1hOr3OFFjW5nxbxWbd2EzmdkqbawEK7Q2Kd4k2p SfwGkyTmRWH1EzHb5uEWykLU266lFJ2buXyaUUD/hm1nExRYeTpwWwxOvHTbrlXyRkJi S7vITwdxfBzzh2W7pPNvk3kgIJloU9UK0GdxhyyFDH6q8q2tCxX1N6wTDWiFQSORmg/4 A8C3X6wZ/OkAoQPmxl46p660VPzGNK85G1wwI+NoueO3mi1nB+moKUshfzO0J6csNnVP zQGb5NNa/Jhg6XAH2TFAUY8VL4rx45N3cwmUicxPRN7iBiYPlBvLiu8KPZdwZolONl4o HRCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ggfC/maK0oO8BzruPYnLm8oCOora3TE0jM1VJ+qaIJo=; b=VOcwfufFr/sicZgmVICOtZGDI7YIxUOQhKPlhXiB4Ir6p6jXI+JYJxO4LobNzXyBSP Y0O5IP9o1jV84oq+Uuq/Cqt+eqxPEFVzUP9Cvjygv6M36z75lPVZvslu4zXd38ySvcsY kPRlB4vccD8+QKdylYoziu6XPO7kRxs65vFyx6JsTHcyBWR72Fu82pJHr19+tquUTYmz 8HozXLKT3SgMIS+DmA1vd+B7edFLBi858MMH+WHHYgOMYCAyUMxOtHBXK/FsR0hCq2eE rYQ61v3uHGUAyPLDzXgze0KnziRsOm+k6AKSndfgvX4Yyje4A8KOj0Zl+dL+1peIMhZH LWng== X-Gm-Message-State: AOAM532ztM8yjPQhWJ70IWfHtIsGgIsZJfTsUs5Wum0B/a90GtSQtUJ3 n8DqoxKgxBbftYBnJFASLjL6mdlLC+OIhQ== X-Google-Smtp-Source: ABdhPJy+9cZCI3J+HC03GFHjVLLx9o/q3oGEO97xtR+qebLtTeZ5pp/4sMxe+BsiMOgl84RjhFl9rg== X-Received: by 2002:a0c:f492:: with SMTP id i18mr1435016qvm.51.1624975169472; Tue, 29 Jun 2021 06:59:29 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id c65sm5900346qkg.84.2021.06.29.06.59.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:28 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v2 2/8] btrfs: handle shrink_delalloc pages calculation differently Date: Tue, 29 Jun 2021 09:59:18 -0400 Message-Id: <670bdb9693668dd0662a3c3db8a954df1aa966e4.1624974951.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We have been hitting some early ENOSPC issues in production with more recent kernels, and I tracked it down to us simply not flushing delalloc as aggressively as we should be. With tracing I was seeing us failing all tickets with all of the block rsvs at or around 0, with very little pinned space, but still around 120MiB of outstanding bytes_may_used. Upon further investigation I saw that we were flushing around 14 pages per shrink call for delalloc, despite having around 2GiB of delalloc outstanding. Consider the example of a 8 way machine, all CPUs trying to create a file in parallel, which at the time of this commit requires 5 items to do. Assuming a 16k leaf size, we have 10MiB of total metadata reclaim size waiting on reservations. Now assume we have 128MiB of delalloc outstanding. With our current math we would set items to 20, and then set to_reclaim to 20 * 256k, or 5MiB. Assuming that we went through this loop all 3 times, for both FLUSH_DELALLOC and FLUSH_DELALLOC_WAIT, and then did the full loop twice, we'd only flush 60MiB of the 128MiB delalloc space. This could leave a fair bit of delalloc reservations still hanging around by the time we go to ENOSPC out all the remaining tickets. Fix this two ways. First, change the calculations to be a fraction of the total delalloc bytes on the system. Prior to this change we were calculating based on dirty inodes so our math made more sense, now it's just completely unrelated to what we're actually doing. Second add a FLUSH_DELALLOC_FULL state, that we hold off until we've gone through the flush states at least once. This will empty the system of all delalloc so we're sure to be truly out of space when we start failing tickets. I'm tagging stable 5.10 and forward, because this is where we started using the page stuff heavily again. This affects earlier kernel versions as well, but would be a pain to backport to them as the flushing mechanisms aren't the same. CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h | 9 +++++---- fs/btrfs/space-info.c | 35 ++++++++++++++++++++++++++--------- include/trace/events/btrfs.h | 1 + 3 files changed, 32 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d7ef4d7d2c1a..232ff1a49ca6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2783,10 +2783,11 @@ enum btrfs_flush_state { FLUSH_DELAYED_REFS = 4, FLUSH_DELALLOC = 5, FLUSH_DELALLOC_WAIT = 6, - ALLOC_CHUNK = 7, - ALLOC_CHUNK_FORCE = 8, - RUN_DELAYED_IPUTS = 9, - COMMIT_TRANS = 10, + FLUSH_DELALLOC_FULL = 7, + ALLOC_CHUNK = 8, + ALLOC_CHUNK_FORCE = 9, + RUN_DELAYED_IPUTS = 10, + COMMIT_TRANS = 11, }; int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index af161eb808a2..0c539a94c6d9 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -494,6 +494,9 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, long time_left; int loops; + delalloc_bytes = percpu_counter_sum_positive(&fs_info->delalloc_bytes); + ordered_bytes = percpu_counter_sum_positive(&fs_info->ordered_bytes); + /* Calc the number of the pages we need flush for space reservation */ if (to_reclaim == U64_MAX) { items = U64_MAX; @@ -501,19 +504,21 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, /* * to_reclaim is set to however much metadata we need to * reclaim, but reclaiming that much data doesn't really track - * exactly, so increase the amount to reclaim by 2x in order to - * make sure we're flushing enough delalloc to hopefully reclaim - * some metadata reservations. + * exactly. What we really want to do is reclaim full inode's + * worth of reservations, however that's not available to us + * here. We will take a fraction of the delalloc bytes for our + * flushing loops and hope for the best. Delalloc will expand + * the amount we write to cover an entire dirty extent, which + * will reclaim the metadata reservation for that range. If + * it's not enough subsequent flush stages will be more + * aggressive. */ + to_reclaim = max(to_reclaim, delalloc_bytes >> 3); items = calc_reclaim_items_nr(fs_info, to_reclaim) * 2; - to_reclaim = items * EXTENT_SIZE_PER_ITEM; } trans = (struct btrfs_trans_handle *)current->journal_info; - delalloc_bytes = percpu_counter_sum_positive( - &fs_info->delalloc_bytes); - ordered_bytes = percpu_counter_sum_positive(&fs_info->ordered_bytes); if (delalloc_bytes == 0 && ordered_bytes == 0) return; @@ -596,8 +601,11 @@ static void flush_space(struct btrfs_fs_info *fs_info, break; case FLUSH_DELALLOC: case FLUSH_DELALLOC_WAIT: + case FLUSH_DELALLOC_FULL: + if (state == FLUSH_DELALLOC_FULL) + num_bytes = U64_MAX; shrink_delalloc(fs_info, space_info, num_bytes, - state == FLUSH_DELALLOC_WAIT, for_preempt); + state != FLUSH_DELALLOC, for_preempt); break; case FLUSH_DELAYED_REFS_NR: case FLUSH_DELAYED_REFS: @@ -907,6 +915,14 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) commit_cycles--; } + /* + * We do not want to empty the system of delalloc unless we're + * under heavy pressure, so allow one trip through the flushing + * logic before we start doing a FLUSH_DELALLOC_FULL. + */ + if (flush_state == FLUSH_DELALLOC_FULL && !commit_cycles) + flush_state++; + /* * We don't want to force a chunk allocation until we've tried * pretty hard to reclaim space. Think of the case where we @@ -1070,7 +1086,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) * so if we now have space to allocate do the force chunk allocation. */ static const enum btrfs_flush_state data_flush_states[] = { - FLUSH_DELALLOC_WAIT, + FLUSH_DELALLOC_FULL, RUN_DELAYED_IPUTS, COMMIT_TRANS, ALLOC_CHUNK_FORCE, @@ -1159,6 +1175,7 @@ static const enum btrfs_flush_state evict_flush_states[] = { FLUSH_DELAYED_REFS, FLUSH_DELALLOC, FLUSH_DELALLOC_WAIT, + FLUSH_DELALLOC_FULL, ALLOC_CHUNK, COMMIT_TRANS, }; diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 3d81ba8c37b9..ddf5c250726c 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -94,6 +94,7 @@ struct btrfs_space_info; EM( FLUSH_DELAYED_ITEMS, "FLUSH_DELAYED_ITEMS") \ EM( FLUSH_DELALLOC, "FLUSH_DELALLOC") \ EM( FLUSH_DELALLOC_WAIT, "FLUSH_DELALLOC_WAIT") \ + EM( FLUSH_DELALLOC_FULL, "FLUSH_DELALLOC_FULL") \ EM( FLUSH_DELAYED_REFS_NR, "FLUSH_DELAYED_REFS_NR") \ EM( FLUSH_DELAYED_REFS, "FLUSH_ELAYED_REFS") \ EM( ALLOC_CHUNK, "ALLOC_CHUNK") \ From patchwork Tue Jun 29 13:59:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12349991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 572F6C11F66 for ; Tue, 29 Jun 2021 13:59:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D86561DAC for ; Tue, 29 Jun 2021 13:59:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234273AbhF2OCC (ORCPT ); Tue, 29 Jun 2021 10:02:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234250AbhF2OB7 (ORCPT ); Tue, 29 Jun 2021 10:01:59 -0400 Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D2BCC061766 for ; Tue, 29 Jun 2021 06:59:32 -0700 (PDT) Received: by mail-qt1-x829.google.com with SMTP id w13so406165qtc.0 for ; Tue, 29 Jun 2021 06:59:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=aZs7tDGE5F1qLj1oIyCH3Juwy10djVLpnl4d0mbqTOU=; b=sYOM6GcIRtRtoRqfCEwniRDKBmn0n9Ipb8TayyFPHtzL6RUMfJchrZfjbSmBQfpeic 8GAfkWLjrw5wxYotsVBl5i0DlQq+fXoIzpNu1yMpPfiEKsLrtKICSt8oAuq7bph4uz5W dmBM/L5ModVgJZjDEEyfj7JxiWLg4fFT56fm4/S7xGTWWOO+5knmyaFfr7Unq7OtZzZE /iZ7OLb5iWyO2O7flnKlSGY3LjRlAdiiXP0k7O5xPnPfifJlEEit/NjsBf9fl9KoU6d7 mf4MhCClRG36MtM4oUjJIuU38wZ3zYC3Y+T+eZ9PtN/162Y8g2ahRIFBeqb6iwwBC0YY 5w+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aZs7tDGE5F1qLj1oIyCH3Juwy10djVLpnl4d0mbqTOU=; b=M4ykwU+LeWnDEpv+YClIayxUB6Lak66SY+FgwsCJ8x/DLoPiOQ7hZdQilZyTV5Eyb2 IbFsiyLetPjjHY8fQ50Jlzm410iOiSuPuZ3F9cQFS8pYqeFgB5OZ53XVXd4x1Wyqw9sp vEYjdJ+H+uvqIetBCeu/PTLgPZ2XcuL/nyZIec1YKqtkZKVDekFJxiIKTTFefhx6D3Cg EHvRwjZT7KnMzc51WoBVpnIHA4AK14vcBXJ8bDox8chLEgK2rQyWCBSKLBZldH6RF9kK UbCvVMb0BR4CtiUa4LB7EAfZXVvjykD8t42wr5NLmWxdl+9l/pH3z61hJ/CM78ktLF4Q H4hw== X-Gm-Message-State: AOAM531NRb4JmYA8IhD2+4bdoEMd3jMDaMdNsspQ/N4TXU6zlZUBwtYj dxQE5u5/2uP/grcWRZjI2ZlFROx1eH5REw== X-Google-Smtp-Source: ABdhPJxcwKD1lAamg2/QGQENbjhHQr/NHXSrfidJtydKyX2PZp+yRApsIKpmieQVKsvwPRR17v3dUA== X-Received: by 2002:a05:622a:18a2:: with SMTP id v34mr14891629qtc.255.1624975171014; Tue, 29 Jun 2021 06:59:31 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id p14sm11846857qkh.128.2021.06.29.06.59.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:30 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 3/8] btrfs: wait on async extents when flushing delalloc Date: Tue, 29 Jun 2021 09:59:19 -0400 Message-Id: <0ee87e54d0f14f0628d146e09fef34db2ce73e03.1624974951.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org I've been debugging an early ENOSPC problem in production and finally root caused it to this problem. When we switched to the per-inode in 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc") I pulled out the async extent handling, because we were doing the correct thing by calling filemap_flush() if we had async extents set. This would properly wait on any async extents by locking the page in the second flush, thus making sure our ordered extents were properly set up. However when I switched us back to page based flushing, I used sync_inode(), which allows us to pass in our own wbc. The problem here is that sync_inode() is smarter than the filemap_* helpers, it tries to avoid calling writepages at all. This means that our second call could skip calling do_writepages altogether, and thus not wait on the pagelock for the async helpers. This means we could come back before any ordered extents were created and then simply continue on in our flushing mechanisms and ENOSPC out when we have plenty of space to use. Fix this by putting back the async pages logic in shrink_delalloc. This allows us to bulk write out everything that we need to, and then we can wait in one place for the async helpers to catch up, and then wait on any ordered extents that are created. Fixes: e076ab2a2ca7 ("btrfs: shrink delalloc pages instead of full inodes") Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 4 ---- fs/btrfs/space-info.c | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e6eb20987351..b1f02e3fea5d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9714,10 +9714,6 @@ static int start_delalloc_inodes(struct btrfs_root *root, &work->work); } else { ret = sync_inode(inode, wbc); - if (!ret && - test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, - &BTRFS_I(inode)->runtime_flags)) - ret = sync_inode(inode, wbc); btrfs_add_delayed_iput(inode); if (ret || wbc->nr_to_write <= 0) goto out; diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 0c539a94c6d9..f140a89a3cdd 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -534,9 +534,49 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, while ((delalloc_bytes || ordered_bytes) && loops < 3) { u64 temp = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT; long nr_pages = min_t(u64, temp, LONG_MAX); + int async_pages; btrfs_start_delalloc_roots(fs_info, nr_pages, true); + /* + * We need to make sure any outstanding async pages are now + * processed before we continue. This is because things like + * sync_inode() try to be smart and skip writing if the inode is + * marked clean. We don't use filemap_fwrite for flushing + * because we want to control how many pages we write out at a + * time, thus this is the only safe way to make sure we've + * waited for outstanding compressed workers to have started + * their jobs and thus have ordered extents set up properly. + * + * This exists because we do not want to wait for each + * individual inode to finish its async work, we simply want to + * start the IO on everybody, and then come back here and wait + * for all of the async work to catch up. Once we're done with + * that we know we'll have ordered extents for everything and we + * can decide if we wait for that or not. + * + * If we choose to replace this in the future, make absolutely + * sure that the proper waiting is being done in the async case, + * as there have been bugs in that area before. + */ + async_pages = atomic_read(&fs_info->async_delalloc_pages); + if (!async_pages) + goto skip_async; + + /* + * We don't want to wait forever, if we wrote less pages in this + * loop than we have outstanding, only wait for that number of + * pages, otherwise we can wait for all async pages to finish + * before continuing. + */ + if (async_pages > nr_pages) + async_pages -= nr_pages; + else + async_pages = 0; + wait_event(fs_info->async_submit_wait, + atomic_read(&fs_info->async_delalloc_pages) <= + async_pages); +skip_async: loops++; if (wait_ordered && !trans) { btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1); From patchwork Tue Jun 29 13:59:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12349993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B8DCC11F67 for ; Tue, 29 Jun 2021 13:59:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4357461DAB for ; Tue, 29 Jun 2021 13:59:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234282AbhF2OCF (ORCPT ); Tue, 29 Jun 2021 10:02:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234267AbhF2OCB (ORCPT ); Tue, 29 Jun 2021 10:02:01 -0400 Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDDA1C061766 for ; Tue, 29 Jun 2021 06:59:33 -0700 (PDT) Received: by mail-qv1-xf35.google.com with SMTP id d2so2022710qvh.2 for ; Tue, 29 Jun 2021 06:59:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=a9FEw/mlX+aDER0g2+babK847oC7xUdGfaZoTgPtJkQ=; b=Vlu3QuZ/rcv1xG90SJOASIDYhc2os7rbaqE0gb/XIf13j6jJXigfLghnHp+VcLlk5M +tmvgfHOkRuZhKSaypYBZner2O9PouhzWp4Pbp0R8744onQtvFjNmxv+jtkOEpNpINUh IPLgvO+5cR2jdgah4o9rHhZOV/LNI8SRMBHGf5XTOsAYaWEo1FMixPj0agbqVTyvJ3TZ cZ9we6mk41WOE5bTxZrsTPwirhDKZLl0jwfxGmUByfvaS2CgigSJjO+RJ0ntlC1X1XHK vNI3YFKbTyMoBTUKdiay7uoGhGp4NfXHv5bRGEilufRQ5jodbAUH0rk9gbPl7ZW+xMVU +Isw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=a9FEw/mlX+aDER0g2+babK847oC7xUdGfaZoTgPtJkQ=; b=YrIu2o6gZe5tUSbZ9rlm+0Jw5GuhIogwe0TvsltzMzOLPiXXctuLEpUtPtQyQNKQ9T j9RSSj3cLDd5ypW/h9nLRo70zE7g9JMFlnauxvKFXG6bm4+KKX071eY8yuMpXq3mGXxc rEGYZ2CJwVjIDeZBqqQjrDAbzeUbLSn4ttmUh/2s5I4XYtILqDat97x+5yKdGiEAUhyC nwMs3StkG6zFXBjlezm6jr9MemaaQV8+OrByflCuk+2eZSmLmXE+rqH76/qGqQg4c1vq 2PYB8YlOQ9WBS1s3dg0FAkWPc2qa9CTkhsoEp3jIsDYclFgW1zqMYPjwTq+H8qBpl7Yu 0Shw== X-Gm-Message-State: AOAM532Zl8zi46XyZb9ETkQyzU7nnuyxJhn2szdycWbnlCn3Lksh6Zdz aTAI4iM4UOg0ORHHT9qE3TMHdFGyqQOKSA== X-Google-Smtp-Source: ABdhPJyF58CM+mZKvSxOBHUHW30ZVBaVWPKC/J3naBDg5RYwEHvP7MHU/j1+v/3D4JGphUHUncY2UA== X-Received: by 2002:a05:6214:16d2:: with SMTP id d18mr11440756qvz.34.1624975172734; Tue, 29 Jun 2021 06:59:32 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id b21sm4936518qkh.55.2021.06.29.06.59.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:32 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v2 4/8] btrfs: wake up async_delalloc_pages waiters after submit Date: Tue, 29 Jun 2021 09:59:20 -0400 Message-Id: <54425f6e0ece01f5d579e1bcc0aab22a988c301f.1624974951.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We use the async_delalloc_pages mechanism to make sure that we've completed our async work before trying to continue our delalloc flushing. The reason for this is we need to see any ordered extents that were created by our delalloc flushing. However we're waking up before we do the submit work, which is before we create the ordered extents. This is a pretty wide race window where we could potentially think there are no ordered extents and thus exit shrink_delalloc prematurely. Fix this by waking us up after we've done the work to create ordered extents. cc: stable@vger.kernel.org Signed-off-by: Josef Bacik Reviewed-by: Nikolay Borisov --- fs/btrfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b1f02e3fea5d..e388153c4ae4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1290,11 +1290,6 @@ static noinline void async_cow_submit(struct btrfs_work *work) nr_pages = (async_chunk->end - async_chunk->start + PAGE_SIZE) >> PAGE_SHIFT; - /* atomic_sub_return implies a barrier */ - if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) < - 5 * SZ_1M) - cond_wake_up_nomb(&fs_info->async_submit_wait); - /* * ->inode could be NULL if async_chunk_start has failed to compress, * in which case we don't have anything to submit, yet we need to @@ -1303,6 +1298,11 @@ static noinline void async_cow_submit(struct btrfs_work *work) */ if (async_chunk->inode) submit_compressed_extents(async_chunk); + + /* atomic_sub_return implies a barrier */ + if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) < + 5 * SZ_1M) + cond_wake_up_nomb(&fs_info->async_submit_wait); } static noinline void async_cow_free(struct btrfs_work *work) From patchwork Tue Jun 29 13:59:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12349995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E84AC11F6C for ; Tue, 29 Jun 2021 13:59:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EAB9261DAD for ; Tue, 29 Jun 2021 13:59:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234296AbhF2OCG (ORCPT ); Tue, 29 Jun 2021 10:02:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234275AbhF2OCD (ORCPT ); Tue, 29 Jun 2021 10:02:03 -0400 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0D9AC061787 for ; Tue, 29 Jun 2021 06:59:35 -0700 (PDT) Received: by mail-qt1-x836.google.com with SMTP id g3so11868551qth.11 for ; Tue, 29 Jun 2021 06:59:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=2uFn6PKotRyoL/EKvc9hZ16Pkk9IJiulS9AMGBGQqf0=; b=La0ktyq8ut/I4zTJmlU9AODAv7RgJ+QBSGCq1XtTKYB4oYTFM8iYSQm38B04EKyqV+ +U1bDk4OJO05Ix6tYjUKlKMBG5PjEYxi20Ol0yBWp91kbjU7KZjlrQX7J6ETDHNY8y3K JGvSuLiWmdC7IlhdeLryxSKrYIz9eYP+j+HgvyK0q99g6/0L2smkn/nsjwtkyBJ87bVe Rsjq/hDMViMdV9FIgfAX02Z053RhLvLl/PogVK9TdnSJkkjNpjm9IaTQT6abfErFe1DA 5gFhqKcmMjMYfVV4+YsVKtYfLvrvYubEDAXQps4SiPrjI96S6yIYUAfZVxUcdjpNhw+m k6pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2uFn6PKotRyoL/EKvc9hZ16Pkk9IJiulS9AMGBGQqf0=; b=j1OWrsuIX4gYvhpmFYC1CNOurSE9lEzeuDB9zRtyzJHDyS0PxV6ZTM6Hhr02f44edW sRur1L23UJLmUjG70ZaHSlMqIiw5ieAhE5KCHGKoPL4h1JwgWAYdRNRiuF/4A78vlfpp sH4KVXX3I11BoS+5+dciedIFAbALoC/ROwbEduUERcrTIWL2eMKNx1+UJ37/ZbUD/vuz PvoCQZuS6sb2xtEzgpLeGu7ja14NBy/4UV20XFUVM+5BCL9PMOTxuFNYVVAKOeBZaGBw H5KOE57c9qOVOimMGrAfrphKBw8s4s4ErAxFjg4awdWiYLXPNsYkplIXWQC9PXJy6S/Q hVqg== X-Gm-Message-State: AOAM532Uej8FQgPr01/BuXlJev6G0kP6z4Pb4WcDHgBImRzvdE/l0ypP RFf8KBMTeQEtTPUGP+AR0U3O7qisJ0hGEg== X-Google-Smtp-Source: ABdhPJzB3kjb1w+C0PCwYbRhbV1AeLxzp1l6rfNcvXs+p7qC+IhH8whbCnC5xwrtqTQwn5QzB3Kr4A== X-Received: by 2002:ac8:7dc9:: with SMTP id c9mr26612481qte.169.1624975174387; Tue, 29 Jun 2021 06:59:34 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id y7sm12201170qkp.103.2021.06.29.06.59.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:33 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 5/8] fs: add a filemap_fdatawrite_wbc helper Date: Tue, 29 Jun 2021 09:59:21 -0400 Message-Id: X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Btrfs sometimes needs to flush dirty pages on a bunch of dirty inodes in order to reclaim metadata reservations. Unfortunately most helpers in this area are too smart for us 1) The normal filemap_fdata* helpers only take range and sync modes, and don't give any indication of how much was written, so we can only flush full inodes, which isn't what we want in most cases. 2) The normal writeback path requires us to have the s_umount sem held, but we can't unconditionally take it in this path because we could deadlock. 3) The normal writeback path also skips inodes with I_SYNC set if we write with WB_SYNC_NONE. This isn't the behavior we want under heavy ENOSPC pressure, we want to actually make sure the pages are under writeback before returning, and if another thread is in the middle of writing the file we may return before they're under writeback and miss our ordered extents and not properly wait for completion. 4) sync_inode() uses the normal writeback path and has the same problem as #3. What we really want is to call do_writepages() with our wbc. This way we can make sure that writeback is actually started on the pages, and we can control how many pages are written as a whole as we write many inodes using the same wbc. Accomplish this with a new helper that does just that so we can use it for our ENOSPC flushing infrastructure. Signed-off-by: Josef Bacik Reviewed-by: Nikolay Borisov --- include/linux/fs.h | 2 ++ mm/filemap.c | 35 ++++++++++++++++++++++++++--------- 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index c3c88fdb9b2a..aace07f88b73 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2886,6 +2886,8 @@ extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); extern void __filemap_set_wb_err(struct address_space *mapping, int err); +extern int filemap_fdatawrite_wbc(struct address_space *mapping, + struct writeback_control *wbc); static inline int filemap_write_and_wait(struct address_space *mapping) { diff --git a/mm/filemap.c b/mm/filemap.c index 66f7e9fdfbc4..8395eafc178b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -376,6 +376,31 @@ static int filemap_check_and_keep_errors(struct address_space *mapping) return -ENOSPC; return 0; } +/** + * filemap_fdatawrite_wbc - start writeback on mapping dirty pages in range + * @mapping: address space structure to write + * @wbc: the writeback_control controlling the writeout + * + * Call writepages on the mapping using the provided wbc to control the + * writeout. + * + * Return: %0 on success, negative error code otherwise. + */ +int filemap_fdatawrite_wbc(struct address_space *mapping, + struct writeback_control *wbc) +{ + int ret; + + if (!mapping_can_writeback(mapping) || + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) + return 0; + + wbc_attach_fdatawrite_inode(wbc, mapping->host); + ret = do_writepages(mapping, wbc); + wbc_detach_inode(wbc); + return ret; +} +EXPORT_SYMBOL(filemap_fdatawrite_wbc); /** * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range @@ -397,7 +422,6 @@ static int filemap_check_and_keep_errors(struct address_space *mapping) int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end, int sync_mode) { - int ret; struct writeback_control wbc = { .sync_mode = sync_mode, .nr_to_write = LONG_MAX, @@ -405,14 +429,7 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, .range_end = end, }; - if (!mapping_can_writeback(mapping) || - !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) - return 0; - - wbc_attach_fdatawrite_inode(&wbc, mapping->host); - ret = do_writepages(mapping, &wbc); - wbc_detach_inode(&wbc); - return ret; + return filemap_fdatawrite_wbc(mapping, &wbc); } static inline int __filemap_fdatawrite(struct address_space *mapping, From patchwork Tue Jun 29 13:59:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12349997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 984E9C11F68 for ; Tue, 29 Jun 2021 13:59:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7CFC861DAB for ; Tue, 29 Jun 2021 13:59:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234302AbhF2OCH (ORCPT ); Tue, 29 Jun 2021 10:02:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234281AbhF2OCF (ORCPT ); Tue, 29 Jun 2021 10:02:05 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15A48C0617A8 for ; Tue, 29 Jun 2021 06:59:37 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id p7so7903341qvn.5 for ; Tue, 29 Jun 2021 06:59:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=PQ/W8mpYAbUtGA5QrU6gmz/RrpmvrTciMU8aA+nLdpU=; b=V7+aUAgY+oMMSbxoBzWixoV+m/l2YkgWljTEkP0AFXD2HRwtDfALfWKL2Lzxj0opl1 M6/qdUXArRHv++u2EbKRbLhy1Ibn1F8vz+fHdGselXl0ZPqxFihatyYvPkfOtxa3pPX0 JoK+w02i5ddPAiXZxMQY8v/eNhyFWl4bYSv9a/SkcMG0MvJvUjMdk+bl+pk8wDFyT5Rj BCZsCaQ8QJxvXE9WPlaZB0G+IeEb4wvF5lpIvji80zBuIZ3Eleij3WioKoZ6Pq1FVDWo LTba/mlg50r7sOo2UOVSI1EqZpSsrig3tNjrOM2cHjDh5Z2btAZq30cRUV60XxmhDEOc 4InQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PQ/W8mpYAbUtGA5QrU6gmz/RrpmvrTciMU8aA+nLdpU=; b=KD9lbGK3PwIqeM5WK4Cj8cA8cXQwhJHiLy20TXYFrzHhF4lRnr9iQ7V022rfqZlV1e jb9yCVL9qeXiR1eqnejsHZehjSoke/7JzUprB3OCOUU0Ak8MWFUOsSnyjeBh9mDjO9E7 SBRE9GnIFrXkQ7XtNst33Xa47QGtdOOzwgXTUzLp5593dHemJeywM5ZYFgPxB+myIkms 7MuyNjaUcuh/q5yJ6+aiBLN3cIvwt8StZTFomIlgB+th0wsfHm0Uz9potKGqvdOuzEWG r+7Z0Z01dRRl7hT5/bkWFVH6zz5fHNIjXt8UtTSYpfJOTSFwVhmFq9rGgwYJQ4VhAR7L KDYQ== X-Gm-Message-State: AOAM532aWFna3ey0EdJbY/PLAe/2ullWDIq8In+qzlI5sV2AZxFQwqu5 l79kOXKRBe7q8tdMwc9AfifenkW3prWU+w== X-Google-Smtp-Source: ABdhPJxDZTjM3UU2d7hw2VVyKXAcF6NfwZz2NeMmy35j4QnL4rs//Om4bbhitauYLEK9SQFUdur13w== X-Received: by 2002:ad4:580a:: with SMTP id dd10mr5823010qvb.17.1624975175886; Tue, 29 Jun 2021 06:59:35 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id o123sm12109369qkd.6.2021.06.29.06.59.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:35 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 6/8] btrfs: use the filemap_fdatawrite_wbc helper for delalloc shrinking Date: Tue, 29 Jun 2021 09:59:22 -0400 Message-Id: <2acb56dd851d31d7b5547099821f0cbf6dfb5d29.1624974951.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org sync_inode() has some holes that can cause problems if we're under heavy ENOSPC pressure. If there's writeback running on a separate thread sync_inode() will skip writing the inode altogether. What we really want is to make sure writeback has been started on all the pages to make sure we can see the ordered extents and wait on them if appropriate. Switch to this new helper which will allow us to accomplish this and avoid ENOSPC'ing early. Signed-off-by: Josef Bacik Reviewed-by: Nikolay Borisov --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e388153c4ae4..b25c84aba743 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9713,7 +9713,7 @@ static int start_delalloc_inodes(struct btrfs_root *root, btrfs_queue_work(root->fs_info->flush_workers, &work->work); } else { - ret = sync_inode(inode, wbc); + ret = filemap_fdatawrite_wbc(inode->i_mapping, wbc); btrfs_add_delayed_iput(inode); if (ret || wbc->nr_to_write <= 0) goto out; From patchwork Tue Jun 29 13:59:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12349999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C06C2C11F66 for ; Tue, 29 Jun 2021 13:59:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AE46261DAB for ; Tue, 29 Jun 2021 13:59:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234311AbhF2OCK (ORCPT ); Tue, 29 Jun 2021 10:02:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234269AbhF2OCG (ORCPT ); Tue, 29 Jun 2021 10:02:06 -0400 Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 850F7C061760 for ; Tue, 29 Jun 2021 06:59:38 -0700 (PDT) Received: by mail-qk1-x732.google.com with SMTP id 65so13655676qko.5 for ; Tue, 29 Jun 2021 06:59:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Xk61Zn1WGvZe9oDMxIvYakdXIuBAH34lTAxcJ/5AdLQ=; b=zN+HvvGCbZuQeiqfZwR5zvwx0m2wCKVBtylRJtgYyTzsq/t5jdG07i45CYmpoRwiwc tH/BotJ5WG8j2zKGM/FUDGc+HTr6r/jJpve47H4701qHdRMBr76b6+OxYJ3pGaEPIpgL TjmbwIf3FwQWlWmp4zJuEP02iz9zzlTCuyKXSjVBjti2VIlaESoJbexlo9eBUajlwu18 tfJKW+6MZy+SrFcu7ZZFWlbfXrWIO6k3g0hQ1t0Vg3tHH207Scjxlw8/leroAtdjVL2i yns24TPBSDfiF8NlZcQ4G5/JeW52meU7S0u7z1dwA9MJ1JvkxwoAueaZGfx5J7wevdr6 X0+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Xk61Zn1WGvZe9oDMxIvYakdXIuBAH34lTAxcJ/5AdLQ=; b=bV2GrMrK0dRYPWUHP6IBLm2RrQ3tIZ3UXc6Paf5mLtqJ3KaSkNqf1Y8dku7n3q20A7 ylR+X67zYK9Ft7Zj89fw0qmu1dHRoiUUavCxbd/0Rna3Gxj8Ihcjc9us7zrYajfZbzM9 r8Qt4rven0fxJyTn7afbqGAxpereDKSuoNVAPCaMARcXQ2FtNazHHW94ScbKgxJVyvcu Jr5SYT4/9eARn4E81n1TtN4ksnMM0Ph4s1QKOCWqMzhUOvV9p57Z1OqPAt967YnoRsve 7vPlojo78I634mgNIWyV8Ha0xj3r8d6zjT7XaLvOMYxL1XFiu7vrZCdhA2GRfG5dXIoK +b3w== X-Gm-Message-State: AOAM532xgjMlUfLm9oWeon1N24DtFicH8fTJ0Kp3glCGc36IFc6YFCcd zlQujBTlMhwVARlleAwafr5Z857fThEXUA== X-Google-Smtp-Source: ABdhPJzZjRFEthCixKwWy1jjV52zQ7wkSwJcfUKTrSniM0DJ5znFTUdh9aKKMDnq723lmOZAm1UuYw== X-Received: by 2002:a37:438e:: with SMTP id q136mr31108026qka.382.1624975177325; Tue, 29 Jun 2021 06:59:37 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id v15sm10361335qkp.96.2021.06.29.06.59.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:36 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 7/8] 9p: migrate from sync_inode to filemap_fdatawrite_wbc Date: Tue, 29 Jun 2021 09:59:23 -0400 Message-Id: <16ad65c145645b0ade200b45ecbf1b14f3e8c1c0.1624974951.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org We're going to remove sync_inode, so migrate to filemap_fdatawrite_wbc instead. Signed-off-by: Josef Bacik --- fs/9p/vfs_file.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index 59c32c9b799f..6b64e8391f30 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -625,12 +625,7 @@ static void v9fs_mmap_vm_close(struct vm_area_struct *vma) p9_debug(P9_DEBUG_VFS, "9p VMA close, %p, flushing", vma); inode = file_inode(vma->vm_file); - - if (!mapping_can_writeback(inode->i_mapping)) - wbc.nr_to_write = 0; - - might_sleep(); - sync_inode(inode, &wbc); + filemap_fdatawrite_wbc(inode->i_mapping, &wbc); } From patchwork Tue Jun 29 13:59:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12350001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57D5AC11F68 for ; Tue, 29 Jun 2021 13:59:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4364161DAC for ; Tue, 29 Jun 2021 13:59:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234269AbhF2OCL (ORCPT ); Tue, 29 Jun 2021 10:02:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234304AbhF2OCI (ORCPT ); Tue, 29 Jun 2021 10:02:08 -0400 Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 346A9C0617A6 for ; Tue, 29 Jun 2021 06:59:40 -0700 (PDT) Received: by mail-qk1-x736.google.com with SMTP id v139so3818293qkb.9 for ; Tue, 29 Jun 2021 06:59:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=4ITw8yYO9Mm2EpdRdd+ThoxB7QMMRcOH9aOSdlmSits=; b=zel7JVoULkqbaP8qxscO6Snjr185un1aVs6n69TVIE2ow8ljx1kLuJ3hbBwcdWF2hL lvlP8kGzZYprBVwJAVmRpORO0WL3q1YlvL0X3bax0f04lv7B8lG5I5JzlpJQvRZDiqbk BCyGw+2PBcAECTWQHk28r6hrmT+JHYnyr/vxMl3Y5UqueX3iyONyr8+PKlgB41HCwrEQ Zij0gAO2E8zngLniRq6bjBZa90B0eSaQtnt9S41cjmwLJcFa93qyRMyPoKh0O37hno/L YQGBvFn9DfjGPeiIn2OfMhjVQmbIvbrEmKMXDAfqGoQHLPixbVRXhXiuJjPAdwR4gf9K ZhQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4ITw8yYO9Mm2EpdRdd+ThoxB7QMMRcOH9aOSdlmSits=; b=O0L5T0per4TJ9TrSpiw0471pA7pakS/Mc9F6ZoN0MccI6eUMEbLPpJW0BHOFbqEQSS V4yA9WSSBrZQ40azfD3Ym/Z8nIauonvXyYiYLKratB49La9J7gY0Iu+EslKz2fWyG/J/ ejofs8mknLuoTHVyrknaRJxf7+JJnnRSKzLXDSeYG54+GxPO8vznc9VsmCPUK4RRQOqh EnIzjMcWUm78Mba7VOMWPljSJCy5EV33j5HXjoTMAE1oU5IA2yH2iNt11nMXtghgwSNP UBa/5fo/qVyVZrUpkS1UwtWMCm7MOZTwHBur5TSanCIBnYDL2EZaBC0HQc2N/qTZDjZa EObg== X-Gm-Message-State: AOAM533CidcUdJhFn/7Jgv/rCn/2PLdssXtc0RdV/HtqTCOVkOJmhbhg ZtK5Csr3y4yCXsHdyAWWNo2Pxbq6vFhY3g== X-Google-Smtp-Source: ABdhPJwvB4exiECoyL1Kq02QkN6xQhdmPZSvL7JXsenHOqVC0ZFyGDrdokutjMOIGDGDgLxiIx63zQ== X-Received: by 2002:a37:d55:: with SMTP id 82mr17882732qkn.330.1624975178981; Tue, 29 Jun 2021 06:59:38 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id j3sm9592144qth.63.2021.06.29.06.59.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Jun 2021 06:59:38 -0700 (PDT) From: Josef Bacik To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 8/8] fs: kill sync_inode Date: Tue, 29 Jun 2021 09:59:24 -0400 Message-Id: <9e3df65d3e9c7f35acc5434188ec2eaea669c363.1624974951.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Now that all users of sync_inode() have been deleted, remove sync_inode(). Signed-off-by: Josef Bacik --- fs/fs-writeback.c | 19 +------------------ include/linux/fs.h | 1 - 2 files changed, 1 insertion(+), 19 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e91980f49388..706dad22f735 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2608,23 +2608,6 @@ int write_inode_now(struct inode *inode, int sync) } EXPORT_SYMBOL(write_inode_now); -/** - * sync_inode - write an inode and its pages to disk. - * @inode: the inode to sync - * @wbc: controls the writeback mode - * - * sync_inode() will write an inode and its pages to disk. It will also - * correctly update the inode on its superblock's dirty inode lists and will - * update inode->i_state. - * - * The caller must have a ref on the inode. - */ -int sync_inode(struct inode *inode, struct writeback_control *wbc) -{ - return writeback_single_inode(inode, wbc); -} -EXPORT_SYMBOL(sync_inode); - /** * sync_inode_metadata - write an inode to disk * @inode: the inode to sync @@ -2641,6 +2624,6 @@ int sync_inode_metadata(struct inode *inode, int wait) .nr_to_write = 0, /* metadata-only */ }; - return sync_inode(inode, &wbc); + return writeback_single_inode(inode, &wbc); } EXPORT_SYMBOL(sync_inode_metadata); diff --git a/include/linux/fs.h b/include/linux/fs.h index aace07f88b73..7c33e5414747 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2458,7 +2458,6 @@ static inline void file_accessed(struct file *file) extern int file_modified(struct file *file); -int sync_inode(struct inode *inode, struct writeback_control *wbc); int sync_inode_metadata(struct inode *inode, int wait); struct file_system_type {