From patchwork Mon Jun 28 15:37:06 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 12348257
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 18FCDC49EB7
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:10 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id ED5D961A2B
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236906AbhF1PyL (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Mon, 28 Jun 2021 11:54:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54116 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233340AbhF1Pxv (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Mon, 28 Jun 2021 11:53:51 -0400
Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com
 [IPv6:2607:f8b0:4864:20::730])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98BB9C0610CB
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:15 -0700 (PDT)
Received: by mail-qk1-x730.google.com with SMTP id q16so6346358qke.10
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
        h=from:to:subject:date:message-id:in-reply-to:references:mime-version
         :content-transfer-encoding;
        bh=Z9+9p3DkUHErCibT+iydqnILLuZw6VNnyzs7xEbrTYU=;
        b=CTDsJBOx33O2tuFAdgQsI/EtrnEzunsFEIdFsYojutOEjh2+OzAdplMUR06kaFJuK2
         VxBDixEpcgJE4g4S+xK9Jb25HMjnaAauMONt8E7UEzhnxgCvCJoUIU/vcWma+okLVX/K
         ChJdhYOEcggz/c8hrn9nDf+EYO2quwpwM75F7imbZgacbE+/Cvqjg3bOXewyEyVWSMGa
         nmcV1dBa8aZoJmVW1KAiJQnbRjmw9wRVwdaSxbvjdEQm9OZHT6tJ/LP0DMfMmw0uF0mT
         03zXbgtBNFh7d5LO7thdYRYQL3aGB5q7XNN+cdy6UP2aqoAIS2QIv06baaCNdjvZo8nt
         pCuw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=Z9+9p3DkUHErCibT+iydqnILLuZw6VNnyzs7xEbrTYU=;
        b=SDdXmoaR4h40SdMwKGzcGACkdPveRgEZ/AsGd+caUbrmC+D4BPrG8Xre7K5xX1Sck5
         aAsgVOptN5EXWTet1TGVaBeUI9V+8J0oRuLoSFTU3R5BcZmPBd9fs7vgfG2an3Kz1ZJA
         JSAhKKj20F66cSbJtma2VZao7N5dsX8Demi5TKhF2GAnyEQoQ8FI/U4ZeeO4v+u0RHoF
         Jt9N95guUjKh1AFj4ymzmax/at6lnNzIaXtZlCuyO8JPVLbLChwhS1/x9pjNGmcF5HQS
         r/BAp/hzwHU8fZegXKcfDJzAUNXbqwQthQmCxMzu8W7/WVUPgB/APk0cb/KpPVQhkN9n
         Wsrw==
X-Gm-Message-State: AOAM533gqAxnhvVca8+V3qOs9ff2kuLNTDj+fsFshIRvE0Ut8JBkcANc
        QiY64Fm6y04piQM6Z8I9YiMLoLOGewql2A==
X-Google-Smtp-Source: 
 ABdhPJwvURi+iMeOOEowwvjthidaHJRskqoPaLhOTcwdp21f4puUIcWhM0j1yniVAluwznL/28YYAQ==
X-Received: by 2002:a05:620a:e09:: with SMTP id
 y9mr25464294qkm.359.1624894634324;
        Mon, 28 Jun 2021 08:37:14 -0700 (PDT)
Received: from localhost (cpe-174-109-172-136.nc.res.rr.com.
 [174.109.172.136])
        by smtp.gmail.com with ESMTPSA id
 t9sm2293098qto.68.2021.06.28.08.37.13
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 28 Jun 2021 08:37:14 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        kernel-team@fb.com
Subject: [PATCH 1/6] btrfs: enable a tracepoint when we fail tickets
Date: Mon, 28 Jun 2021 11:37:06 -0400
Message-Id: 
 <196e7895350732ab509b4003427c95fce89b0d9c.1624894102.git.josef@toxicpanda.com>
X-Mailer: git-send-email 2.26.3
In-Reply-To: <cover.1624894102.git.josef@toxicpanda.com>
References: <cover.1624894102.git.josef@toxicpanda.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

When debugging early enospc problems it was useful to have a tracepoint
where we failed all tickets so I could check the state of the enospc
counters at failure time to validate my fixes.  This adds the tracpoint
so you can easily get that information.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c        | 2 ++
 include/trace/events/btrfs.h | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 392997376a1c..af161eb808a2 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -825,6 +825,8 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
 	struct reserve_ticket *ticket;
 	u64 tickets_id = space_info->tickets_id;
 
+	trace_btrfs_fail_all_tickets(fs_info, space_info);
+
 	if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
 		btrfs_info(fs_info, "cannot satisfy tickets, dumping space info");
 		__btrfs_dump_space_info(fs_info, space_info);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index c7237317a8b9..3d81ba8c37b9 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -2098,6 +2098,12 @@ DEFINE_EVENT(btrfs_dump_space_info, btrfs_done_preemptive_reclaim,
 	TP_ARGS(fs_info, sinfo)
 );
 
+DEFINE_EVENT(btrfs_dump_space_info, btrfs_fail_all_tickets,
+	TP_PROTO(const struct btrfs_fs_info *fs_info,
+		 const struct btrfs_space_info *sinfo),
+	TP_ARGS(fs_info, sinfo)
+);
+
 TRACE_EVENT(btrfs_reserve_ticket,
 	TP_PROTO(const struct btrfs_fs_info *fs_info, u64 flags, u64 bytes,
 		 u64 start_ns, int flush, int error),

From patchwork Mon Jun 28 15:37:07 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 12348259
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 074ABC49EAB
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:10 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id DB64061C43
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236870AbhF1PyH (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Mon, 28 Jun 2021 11:54:07 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54062 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233808AbhF1Pxv (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Mon, 28 Jun 2021 11:53:51 -0400
Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com
 [IPv6:2607:f8b0:4864:20::f2a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FBBFC0610FD
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:17 -0700 (PDT)
Received: by mail-qv1-xf2a.google.com with SMTP id g14so1896457qvo.7
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=ggfC/maK0oO8BzruPYnLm8oCOora3TE0jM1VJ+qaIJo=;
        b=X4tYE3wN9nPtYHGPvIL2do7Yk+VwuIsVJuRV+cMsgScmF7dbtVso8P4MsmdilnREpN
         toNddL9ZBEHwkq+LIiWICAW6XmqmtK+ppBw/3jjBIDl10kFXIti34tGq+tqv8AptMHNO
         RQoubrfNtkDxnwwLx/yaKEV2dg7aJkpaSToaAPHok7vv6CK9Y0T9QDMdyqG2E6QDdvu/
         Y9CmIDvWJBw8tJNcdfdnnuebaeurgvlYHJUmmqj7YSUEdE0PyhT/etkPN1RzuMCYkdlE
         uv40fTJ1kur4otz4Db/TyZ0o8URZnQzkvDFhaQvApedIMi7D0jWMUN5yeESxx0i9nD0R
         40gw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=ggfC/maK0oO8BzruPYnLm8oCOora3TE0jM1VJ+qaIJo=;
        b=avK0rxkpN9T8iYdduMeeAT7XU5gqj9yFAI5cYws0Pa7TjFSxSpvsroTIQS3+u8WuWB
         lpX6ITE4usWPEbHkXJIpNePz3TSQUSlq0dvHw+ZLp3dZB4HRUHcfFECkeDuGuq+Bi4IC
         cdY8oEZa1+zZqsF/R0UVApV5iCHt2Bo2LOn4LoVs5DFJJZF/R85Vw4S/ynmLytSt2aGN
         xNy3haUQIi6reeUGkwapy2XhrwmIX9fVgJYYOw0BZ7x4fCx68V/p/zN+aXIijcwxVtoM
         A8iu/qjZY+rYP2otL/waQ6qekFqaJfGHc69QtJTrkAMYy1MSAWvUC9He1MxRgnDm7KOS
         jo8g==
X-Gm-Message-State: AOAM530mEB1jURKv4c1Cc3TVWaEzJaAYtlZnSIuwIGVUDlGkcD7cmA4+
        Xf0uBE09xHyi+jSlxigV8CbmHUCa4q9I0w==
X-Google-Smtp-Source: 
 ABdhPJyUUZgIaqracQ4xhK6Tlt0jyi+OG/vGCYiYUVl/kkTdD8sLB7my787tb2uMA43yjCpbvqduZA==
X-Received: by 2002:ad4:4245:: with SMTP id l5mr22841978qvq.45.1624894636420;
        Mon, 28 Jun 2021 08:37:16 -0700 (PDT)
Received: from localhost (cpe-174-109-172-136.nc.res.rr.com.
 [174.109.172.136])
        by smtp.gmail.com with ESMTPSA id
 k138sm247534qke.71.2021.06.28.08.37.15
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 28 Jun 2021 08:37:15 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        kernel-team@fb.com
Cc: stable@vger.kernel.org
Subject: [PATCH 2/6] btrfs: handle shrink_delalloc pages calculation
 differently
Date: Mon, 28 Jun 2021 11:37:07 -0400
Message-Id: 
 <670bdb9693668dd0662a3c3db8a954df1aa966e4.1624894102.git.josef@toxicpanda.com>
X-Mailer: git-send-email 2.26.3
In-Reply-To: <cover.1624894102.git.josef@toxicpanda.com>
References: <cover.1624894102.git.josef@toxicpanda.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

We have been hitting some early ENOSPC issues in production with more
recent kernels, and I tracked it down to us simply not flushing delalloc
as aggressively as we should be.  With tracing I was seeing us failing
all tickets with all of the block rsvs at or around 0, with very little
pinned space, but still around 120MiB of outstanding bytes_may_used.
Upon further investigation I saw that we were flushing around 14 pages
per shrink call for delalloc, despite having around 2GiB of delalloc
outstanding.

Consider the example of a 8 way machine, all CPUs trying to create a
file in parallel, which at the time of this commit requires 5 items to
do.  Assuming a 16k leaf size, we have 10MiB of total metadata reclaim
size waiting on reservations.  Now assume we have 128MiB of delalloc
outstanding.  With our current math we would set items to 20, and then
set to_reclaim to 20 * 256k, or 5MiB.

Assuming that we went through this loop all 3 times, for both
FLUSH_DELALLOC and FLUSH_DELALLOC_WAIT, and then did the full loop
twice, we'd only flush 60MiB of the 128MiB delalloc space.  This could
leave a fair bit of delalloc reservations still hanging around by the
time we go to ENOSPC out all the remaining tickets.

Fix this two ways.  First, change the calculations to be a fraction of
the total delalloc bytes on the system.  Prior to this change we were
calculating based on dirty inodes so our math made more sense, now it's
just completely unrelated to what we're actually doing.

Second add a FLUSH_DELALLOC_FULL state, that we hold off until we've
gone through the flush states at least once.  This will empty the system
of all delalloc so we're sure to be truly out of space when we start
failing tickets.

I'm tagging stable 5.10 and forward, because this is where we started
using the page stuff heavily again.  This affects earlier kernel
versions as well, but would be a pain to backport to them as the
flushing mechanisms aren't the same.

CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h             |  9 +++++----
 fs/btrfs/space-info.c        | 35 ++++++++++++++++++++++++++---------
 include/trace/events/btrfs.h |  1 +
 3 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d7ef4d7d2c1a..232ff1a49ca6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2783,10 +2783,11 @@ enum btrfs_flush_state {
 	FLUSH_DELAYED_REFS	=	4,
 	FLUSH_DELALLOC		=	5,
 	FLUSH_DELALLOC_WAIT	=	6,
-	ALLOC_CHUNK		=	7,
-	ALLOC_CHUNK_FORCE	=	8,
-	RUN_DELAYED_IPUTS	=	9,
-	COMMIT_TRANS		=	10,
+	FLUSH_DELALLOC_FULL	=	7,
+	ALLOC_CHUNK		=	8,
+	ALLOC_CHUNK_FORCE	=	9,
+	RUN_DELAYED_IPUTS	=	10,
+	COMMIT_TRANS		=	11,
 };
 
 int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index af161eb808a2..0c539a94c6d9 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -494,6 +494,9 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 	long time_left;
 	int loops;
 
+	delalloc_bytes = percpu_counter_sum_positive(&fs_info->delalloc_bytes);
+	ordered_bytes = percpu_counter_sum_positive(&fs_info->ordered_bytes);
+
 	/* Calc the number of the pages we need flush for space reservation */
 	if (to_reclaim == U64_MAX) {
 		items = U64_MAX;
@@ -501,19 +504,21 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 		/*
 		 * to_reclaim is set to however much metadata we need to
 		 * reclaim, but reclaiming that much data doesn't really track
-		 * exactly, so increase the amount to reclaim by 2x in order to
-		 * make sure we're flushing enough delalloc to hopefully reclaim
-		 * some metadata reservations.
+		 * exactly.  What we really want to do is reclaim full inode's
+		 * worth of reservations, however that's not available to us
+		 * here.  We will take a fraction of the delalloc bytes for our
+		 * flushing loops and hope for the best.  Delalloc will expand
+		 * the amount we write to cover an entire dirty extent, which
+		 * will reclaim the metadata reservation for that range.  If
+		 * it's not enough subsequent flush stages will be more
+		 * aggressive.
 		 */
+		to_reclaim = max(to_reclaim, delalloc_bytes >> 3);
 		items = calc_reclaim_items_nr(fs_info, to_reclaim) * 2;
-		to_reclaim = items * EXTENT_SIZE_PER_ITEM;
 	}
 
 	trans = (struct btrfs_trans_handle *)current->journal_info;
 
-	delalloc_bytes = percpu_counter_sum_positive(
-						&fs_info->delalloc_bytes);
-	ordered_bytes = percpu_counter_sum_positive(&fs_info->ordered_bytes);
 	if (delalloc_bytes == 0 && ordered_bytes == 0)
 		return;
 
@@ -596,8 +601,11 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		break;
 	case FLUSH_DELALLOC:
 	case FLUSH_DELALLOC_WAIT:
+	case FLUSH_DELALLOC_FULL:
+		if (state == FLUSH_DELALLOC_FULL)
+			num_bytes = U64_MAX;
 		shrink_delalloc(fs_info, space_info, num_bytes,
-				state == FLUSH_DELALLOC_WAIT, for_preempt);
+				state != FLUSH_DELALLOC, for_preempt);
 		break;
 	case FLUSH_DELAYED_REFS_NR:
 	case FLUSH_DELAYED_REFS:
@@ -907,6 +915,14 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 				commit_cycles--;
 		}
 
+		/*
+		 * We do not want to empty the system of delalloc unless we're
+		 * under heavy pressure, so allow one trip through the flushing
+		 * logic before we start doing a FLUSH_DELALLOC_FULL.
+		 */
+		if (flush_state == FLUSH_DELALLOC_FULL && !commit_cycles)
+			flush_state++;
+
 		/*
 		 * We don't want to force a chunk allocation until we've tried
 		 * pretty hard to reclaim space.  Think of the case where we
@@ -1070,7 +1086,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
  *   so if we now have space to allocate do the force chunk allocation.
  */
 static const enum btrfs_flush_state data_flush_states[] = {
-	FLUSH_DELALLOC_WAIT,
+	FLUSH_DELALLOC_FULL,
 	RUN_DELAYED_IPUTS,
 	COMMIT_TRANS,
 	ALLOC_CHUNK_FORCE,
@@ -1159,6 +1175,7 @@ static const enum btrfs_flush_state evict_flush_states[] = {
 	FLUSH_DELAYED_REFS,
 	FLUSH_DELALLOC,
 	FLUSH_DELALLOC_WAIT,
+	FLUSH_DELALLOC_FULL,
 	ALLOC_CHUNK,
 	COMMIT_TRANS,
 };
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 3d81ba8c37b9..ddf5c250726c 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -94,6 +94,7 @@ struct btrfs_space_info;
 	EM( FLUSH_DELAYED_ITEMS,	"FLUSH_DELAYED_ITEMS")		\
 	EM( FLUSH_DELALLOC,		"FLUSH_DELALLOC")		\
 	EM( FLUSH_DELALLOC_WAIT,	"FLUSH_DELALLOC_WAIT")		\
+	EM( FLUSH_DELALLOC_FULL,	"FLUSH_DELALLOC_FULL")		\
 	EM( FLUSH_DELAYED_REFS_NR,	"FLUSH_DELAYED_REFS_NR")	\
 	EM( FLUSH_DELAYED_REFS,		"FLUSH_ELAYED_REFS")		\
 	EM( ALLOC_CHUNK,		"ALLOC_CHUNK")			\

From patchwork Mon Jun 28 15:37:08 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 12348261
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1FC11C49EAB
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:15 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 0A17A619B2
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233180AbhF1Pyg (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Mon, 28 Jun 2021 11:54:36 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53954 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233893AbhF1Pxv (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Mon, 28 Jun 2021 11:53:51 -0400
Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com
 [IPv6:2607:f8b0:4864:20::72e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 036E0C061A86
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:19 -0700 (PDT)
Received: by mail-qk1-x72e.google.com with SMTP id bj15so27783332qkb.11
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
        h=from:to:subject:date:message-id:in-reply-to:references:mime-version
         :content-transfer-encoding;
        bh=aZs7tDGE5F1qLj1oIyCH3Juwy10djVLpnl4d0mbqTOU=;
        b=JozSjhQtyp/auLOLpc6oe9n3+tVJGszttTUGDZArVmHMIbKdxDdxDhdK+FpkCntBDP
         EaRsHPJkqKneMYmOqYf6roPpmsJmX65yd8uLT7MHG1tXsCgjTBMLOR4fDB6RpQRJQ5eI
         B8lgUE59ZlyKz6i5kpLWiLEJ5j7Nb2zWVbeQyo1Fb5OrbZsidHHDUxxzDa7ZAKjfSCSB
         nZJY1pLaxDZ1tsZYIWW8TjwK3qzWZUzPbLJPCgzv9Ixn6L8cInkVlgRT5hVBV95zgaiv
         QKrKQ4BnicuAmrqLWJs1XGGP9vauwmvjTOd3pR2PVNPL+UmnPe7GaG0/7s+XnjQFAOP7
         FS5w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=aZs7tDGE5F1qLj1oIyCH3Juwy10djVLpnl4d0mbqTOU=;
        b=s1Fg0K/OPUJnL60TUGOMBgnKcp+0KG7CSSUPM9/oWyOxLoY6NVIokmA9Z1FEvYUQ6h
         hDvbqUOm+0x1M0UfNHA7oga7KCpK9mkRnubBQZmxJMO9ZbhPctQ9Dyutot3tfXP5DMPk
         NAyadgEHKvLeTmViXWyzP4Eq6OeLcH3+b+gw3Oml7HQisKGHIN3xGnzJQ8ICZOSeNqO9
         anjuyvkakldnyRYOE5/vQlXTcMo+5E+8Qd4mEQYwC3d/I/FUXyH92Pii0sLdYa2GM6nV
         cr/JZ7Bmt7luflTNXlan569gvF8hVn7Z+dhHvFfpu1I61/nPgL6RLNu+xQNLDdZEBaVK
         d9LQ==
X-Gm-Message-State: AOAM5305ZzCWNkCHPl4xdRmJJECfg//lae/il8z39yW84pyZPH1PwU0N
        K1C02Nx1pdmQWngUjhljb/0wukmErYjANg==
X-Google-Smtp-Source: 
 ABdhPJyOMv2N0BzT4OlTpKUtNu7O6Tp8t/eDV2KCVW7UWxR/JtRv4py9yBoOl0lty3coKgHSnGW/QQ==
X-Received: by 2002:ae9:f801:: with SMTP id x1mr25696144qkh.253.1624894637930;
        Mon, 28 Jun 2021 08:37:17 -0700 (PDT)
Received: from localhost (cpe-174-109-172-136.nc.res.rr.com.
 [174.109.172.136])
        by smtp.gmail.com with ESMTPSA id
 z6sm10705710qke.24.2021.06.28.08.37.17
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 28 Jun 2021 08:37:17 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        kernel-team@fb.com
Subject: [PATCH 3/6] btrfs: wait on async extents when flushing delalloc
Date: Mon, 28 Jun 2021 11:37:08 -0400
Message-Id: 
 <0ee87e54d0f14f0628d146e09fef34db2ce73e03.1624894102.git.josef@toxicpanda.com>
X-Mailer: git-send-email 2.26.3
In-Reply-To: <cover.1624894102.git.josef@toxicpanda.com>
References: <cover.1624894102.git.josef@toxicpanda.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

I've been debugging an early ENOSPC problem in production and finally
root caused it to this problem.  When we switched to the per-inode in
38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in
shrink_delalloc") I pulled out the async extent handling, because we
were doing the correct thing by calling filemap_flush() if we had async
extents set.  This would properly wait on any async extents by locking
the page in the second flush, thus making sure our ordered extents were
properly set up.

However when I switched us back to page based flushing, I used
sync_inode(), which allows us to pass in our own wbc.  The problem here
is that sync_inode() is smarter than the filemap_* helpers, it tries to
avoid calling writepages at all.  This means that our second call could
skip calling do_writepages altogether, and thus not wait on the pagelock
for the async helpers.  This means we could come back before any ordered
extents were created and then simply continue on in our flushing
mechanisms and ENOSPC out when we have plenty of space to use.

Fix this by putting back the async pages logic in shrink_delalloc.  This
allows us to bulk write out everything that we need to, and then we can
wait in one place for the async helpers to catch up, and then wait on
any ordered extents that are created.

Fixes: e076ab2a2ca7 ("btrfs: shrink delalloc pages instead of full inodes")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c      |  4 ----
 fs/btrfs/space-info.c | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6eb20987351..b1f02e3fea5d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9714,10 +9714,6 @@ static int start_delalloc_inodes(struct btrfs_root *root,
 					 &work->work);
 		} else {
 			ret = sync_inode(inode, wbc);
-			if (!ret &&
-			    test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
-				     &BTRFS_I(inode)->runtime_flags))
-				ret = sync_inode(inode, wbc);
 			btrfs_add_delayed_iput(inode);
 			if (ret || wbc->nr_to_write <= 0)
 				goto out;
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 0c539a94c6d9..f140a89a3cdd 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -534,9 +534,49 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
 	while ((delalloc_bytes || ordered_bytes) && loops < 3) {
 		u64 temp = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;
 		long nr_pages = min_t(u64, temp, LONG_MAX);
+		int async_pages;
 
 		btrfs_start_delalloc_roots(fs_info, nr_pages, true);
 
+		/*
+		 * We need to make sure any outstanding async pages are now
+		 * processed before we continue.  This is because things like
+		 * sync_inode() try to be smart and skip writing if the inode is
+		 * marked clean.  We don't use filemap_fwrite for flushing
+		 * because we want to control how many pages we write out at a
+		 * time, thus this is the only safe way to make sure we've
+		 * waited for outstanding compressed workers to have started
+		 * their jobs and thus have ordered extents set up properly.
+		 *
+		 * This exists because we do not want to wait for each
+		 * individual inode to finish its async work, we simply want to
+		 * start the IO on everybody, and then come back here and wait
+		 * for all of the async work to catch up.  Once we're done with
+		 * that we know we'll have ordered extents for everything and we
+		 * can decide if we wait for that or not.
+		 *
+		 * If we choose to replace this in the future, make absolutely
+		 * sure that the proper waiting is being done in the async case,
+		 * as there have been bugs in that area before.
+		 */
+		async_pages = atomic_read(&fs_info->async_delalloc_pages);
+		if (!async_pages)
+			goto skip_async;
+
+		/*
+		 * We don't want to wait forever, if we wrote less pages in this
+		 * loop than we have outstanding, only wait for that number of
+		 * pages, otherwise we can wait for all async pages to finish
+		 * before continuing.
+		 */
+		if (async_pages > nr_pages)
+			async_pages -= nr_pages;
+		else
+			async_pages = 0;
+		wait_event(fs_info->async_submit_wait,
+			   atomic_read(&fs_info->async_delalloc_pages) <=
+			   async_pages);
+skip_async:
 		loops++;
 		if (wait_ordered && !trans) {
 			btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1);

From patchwork Mon Jun 28 15:37:09 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 12348263
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 50445C2B9F4
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:17 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 3D3B061A1D
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233886AbhF1Pyk (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Mon, 28 Jun 2021 11:54:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53902 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233317AbhF1Pxy (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Mon, 28 Jun 2021 11:53:54 -0400
Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com
 [IPv6:2607:f8b0:4864:20::729])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F6DBC09CDDC
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:20 -0700 (PDT)
Received: by mail-qk1-x729.google.com with SMTP id v139so541969qkb.9
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=a9FEw/mlX+aDER0g2+babK847oC7xUdGfaZoTgPtJkQ=;
        b=g1hVnL5vEe01gElT6lkPkTLuN9eYnV5E31nrPHzPH2QVuKZlG8/Bh/rUtLwO6Hx23S
         QlpyrH31Wgw7R+kSZH+UBSEH0CSqfnqVJevhFYezsIt9KIbQ7BNgMi0BAO25RQcdkgwc
         1pvWdmW3z/GYezQyeljEr5jbfK6xZaFQyo4oOw9u9ze//JAbog590TSELSMfLc0fjQjA
         Y2EAfmUKjZxE9uyLiApv8TkuXAbYOquRxcNetWULmdBgL/Y7Q68kgYCoeT56e9JhANJ/
         VTyJC3gaheL7O91pys4+1kBGK92HIw699tqGbxuf46ijpcsENDnxOhwcOnxxFQA/0+sQ
         +K4Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=a9FEw/mlX+aDER0g2+babK847oC7xUdGfaZoTgPtJkQ=;
        b=F6f5Y0LfJIUttBi15WEXlGwoNWSXMQnsPQ2NXT+L4LktSyDjR81L7VgHqSOqjh7qdU
         TnQoGdx+X9iSftD4EMmRRI9w7mON6E7k+2N6EMbEesZ/WNxswv6X24SHPktfRWetnikI
         /gdhFZaAeO/A5MqLppGDP9FojulGayNX9TITKtFqa9TybaK/cRMR6EEcDxRfi1vjS/F8
         0w6CV4OZnoXPyTp7jQOBO+BlIEzzhhZF3MpIaXJluazxIyukKloJl0vw3SC3ujgeooLz
         x/g/xkD+I/Q9pjIcI3N0dGn6v97+/zV1lZn2H1zvpONG+VFoG1Iomav+vJ+iaLpvQG06
         4A2Q==
X-Gm-Message-State: AOAM5333pTHbAyBFdOgEpxzyzapFB3Vqu16rJYCvYHXnaBfMrFf+hmcA
        3/eKHCKFGnLigXq2oRfA5L3JMasUzs7zSQ==
X-Google-Smtp-Source: 
 ABdhPJxYmj0tEwDN2Cbu+e/r9E8ss72xoqfjT59Uz7dit2Sn6cGFSeDWEEcgbZMsyQYBp1G/JPLafA==
X-Received: by 2002:a37:d55:: with SMTP id 82mr12707081qkn.330.1624894639311;
        Mon, 28 Jun 2021 08:37:19 -0700 (PDT)
Received: from localhost (cpe-174-109-172-136.nc.res.rr.com.
 [174.109.172.136])
        by smtp.gmail.com with ESMTPSA id
 o27sm7961728qkj.98.2021.06.28.08.37.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 28 Jun 2021 08:37:18 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        kernel-team@fb.com
Cc: stable@vger.kernel.org
Subject: [PATCH 4/6] btrfs: wake up async_delalloc_pages waiters after submit
Date: Mon, 28 Jun 2021 11:37:09 -0400
Message-Id: 
 <54425f6e0ece01f5d579e1bcc0aab22a988c301f.1624894102.git.josef@toxicpanda.com>
X-Mailer: git-send-email 2.26.3
In-Reply-To: <cover.1624894102.git.josef@toxicpanda.com>
References: <cover.1624894102.git.josef@toxicpanda.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

We use the async_delalloc_pages mechanism to make sure that we've
completed our async work before trying to continue our delalloc
flushing.  The reason for this is we need to see any ordered extents
that were created by our delalloc flushing.  However we're waking up
before we do the submit work, which is before we create the ordered
extents.  This is a pretty wide race window where we could potentially
think there are no ordered extents and thus exit shrink_delalloc
prematurely.  Fix this by waking us up after we've done the work to
create ordered extents.

cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b1f02e3fea5d..e388153c4ae4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1290,11 +1290,6 @@ static noinline void async_cow_submit(struct btrfs_work *work)
 	nr_pages = (async_chunk->end - async_chunk->start + PAGE_SIZE) >>
 		PAGE_SHIFT;
 
-	/* atomic_sub_return implies a barrier */
-	if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) <
-	    5 * SZ_1M)
-		cond_wake_up_nomb(&fs_info->async_submit_wait);
-
 	/*
 	 * ->inode could be NULL if async_chunk_start has failed to compress,
 	 * in which case we don't have anything to submit, yet we need to
@@ -1303,6 +1298,11 @@ static noinline void async_cow_submit(struct btrfs_work *work)
 	 */
 	if (async_chunk->inode)
 		submit_compressed_extents(async_chunk);
+
+	/* atomic_sub_return implies a barrier */
+	if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) <
+	    5 * SZ_1M)
+		cond_wake_up_nomb(&fs_info->async_submit_wait);
 }
 
 static noinline void async_cow_free(struct btrfs_work *work)

From patchwork Mon Jun 28 15:37:10 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 12348265
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 02B39C49EA3
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:21 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id D6E9D619B2
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232135AbhF1Pyp (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Mon, 28 Jun 2021 11:54:45 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53856 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S236765AbhF1Px6 (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Mon, 28 Jun 2021 11:53:58 -0400
Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com
 [IPv6:2607:f8b0:4864:20::f2a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16B1DC09CDDE
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:23 -0700 (PDT)
Received: by mail-qv1-xf2a.google.com with SMTP id m15so9465435qvc.9
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
        h=from:to:subject:date:message-id:in-reply-to:references:mime-version
         :content-transfer-encoding;
        bh=2uFn6PKotRyoL/EKvc9hZ16Pkk9IJiulS9AMGBGQqf0=;
        b=vjsF7wEKW6ei04UqscCGA/cy4uVEg94GtQJnFPvsMo+AkSxTwF2neFov13HfRcKigM
         AhmJCMb+Sq5qT7OFilEKJM2GweBn+GAIjRm252U+quAFxfA7EcCBNjR85IbCCIODhwD5
         3FCsur/SYfVyxoRQ52yb+5tmbmXOOd5HpF8+4vLHehxY3vVrZpUzTwznFAaLAMz5g5nX
         pNnLZgNjj4xR6p1gaUB4v7f/0wpoiiHrIpuuk4G867CPWWCaMB6SBBrN2Kedu0gCXk72
         9XntbwPPxm5lxCO4SYXFV9g5b8G//8lYdQ8xdUGSRLK86LaiqnrgP8k7KweeiTd3kUwf
         98Uw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=2uFn6PKotRyoL/EKvc9hZ16Pkk9IJiulS9AMGBGQqf0=;
        b=Px1vHDrZmdvgPKN544gVrMCQzvhHst9MfqhEU7iAbfATn21K7CDSCFZ8v1s0EA/0SH
         /WqELF/Odb4G/UHUgm7f91iMcgSOXQCMJiBNzijYj2R8iiNlCsPBrHQbQSRVw37t65qd
         FCviAS9ROnh6HprCkAX815ijD4W52t/uiHSW+ojy4zP5JzT2OMIcKPHSpOGAAm5gG98/
         Fcm+GcPsd1ZOThYA1AKE179pX3K87XMsWoBTY+MteDKVJIlfHh2UTgFnvehCPYE7W2P5
         nOuebAUcRiufgZtwZNiRLGLWphVXgH92jiEFre5I20RrhL+LipVHIlR20LlQ3TUAKIOt
         mZkA==
X-Gm-Message-State: AOAM532fwKIzM29GVKf/SHRKYm7Loa8WYZLUcWpzEcz2sgfq6dYagQ4+
        r2CCA12d7wF7BoVtpwbeh6oxE8Tv+syZlg==
X-Google-Smtp-Source: 
 ABdhPJwm+3IXwe3f3V2150WnfKY2nnf/IgWSavcLx6JykhSwGYgwDm+qsoIykuOPddXoQfYuJ7X/LQ==
X-Received: by 2002:ad4:4774:: with SMTP id d20mr6245183qvx.38.1624894641936;
        Mon, 28 Jun 2021 08:37:21 -0700 (PDT)
Received: from localhost (cpe-174-109-172-136.nc.res.rr.com.
 [174.109.172.136])
        by smtp.gmail.com with ESMTPSA id
 i19sm1773869qkl.19.2021.06.28.08.37.21
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 28 Jun 2021 08:37:21 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        kernel-team@fb.com
Subject: [PATCH 5/6] fs: add a filemap_fdatawrite_wbc helper
Date: Mon, 28 Jun 2021 11:37:10 -0400
Message-Id: 
 <b57a146e13e5e08ecffce68fa8a71cf1e36081c8.1624894102.git.josef@toxicpanda.com>
X-Mailer: git-send-email 2.26.3
In-Reply-To: <cover.1624894102.git.josef@toxicpanda.com>
References: <cover.1624894102.git.josef@toxicpanda.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

Btrfs sometimes needs to flush dirty pages on a bunch of dirty inodes in
order to reclaim metadata reservations.  Unfortunately most helpers in
this area are too smart for us

1) The normal filemap_fdata* helpers only take range and sync modes, and
   don't give any indication of how much was written, so we can only
   flush full inodes, which isn't what we want in most cases.
2) The normal writeback path requires us to have the s_umount sem held,
   but we can't unconditionally take it in this path because we could
   deadlock.
3) The normal writeback path also skips inodes with I_SYNC set if we
   write with WB_SYNC_NONE.  This isn't the behavior we want under heavy
   ENOSPC pressure, we want to actually make sure the pages are under
   writeback before returning, and if another thread is in the middle of
   writing the file we may return before they're under writeback and
   miss our ordered extents and not properly wait for completion.
4) sync_inode() uses the normal writeback path and has the same problem
   as #3.

What we really want is to call do_writepages() with our wbc.  This way
we can make sure that writeback is actually started on the pages, and we
can control how many pages are written as a whole as we write many
inodes using the same wbc.  Accomplish this with a new helper that does
just that so we can use it for our ENOSPC flushing infrastructure.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 35 ++++++++++++++++++++++++++---------
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index c3c88fdb9b2a..aace07f88b73 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2886,6 +2886,8 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+extern int filemap_fdatawrite_wbc(struct address_space *mapping,
+				  struct writeback_control *wbc);
 
 static inline int filemap_write_and_wait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 66f7e9fdfbc4..8395eafc178b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,31 @@ static int filemap_check_and_keep_errors(struct address_space *mapping)
 		return -ENOSPC;
 	return 0;
 }
+/**
+ * filemap_fdatawrite_wbc - start writeback on mapping dirty pages in range
+ * @mapping:	address space structure to write
+ * @wbc:	the writeback_control controlling the writeout
+ *
+ * Call writepages on the mapping using the provided wbc to control the
+ * writeout.
+ *
+ * Return: %0 on success, negative error code otherwise.
+ */
+int filemap_fdatawrite_wbc(struct address_space *mapping,
+			   struct writeback_control *wbc)
+{
+	int ret;
+
+	if (!mapping_can_writeback(mapping) ||
+	    !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
+		return 0;
+
+	wbc_attach_fdatawrite_inode(wbc, mapping->host);
+	ret = do_writepages(mapping, wbc);
+	wbc_detach_inode(wbc);
+	return ret;
+}
+EXPORT_SYMBOL(filemap_fdatawrite_wbc);
 
 /**
  * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range
@@ -397,7 +422,6 @@ static int filemap_check_and_keep_errors(struct address_space *mapping)
 int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start,
 				loff_t end, int sync_mode)
 {
-	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = sync_mode,
 		.nr_to_write = LONG_MAX,
@@ -405,14 +429,7 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start,
 		.range_end = end,
 	};
 
-	if (!mapping_can_writeback(mapping) ||
-	    !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
-		return 0;
-
-	wbc_attach_fdatawrite_inode(&wbc, mapping->host);
-	ret = do_writepages(mapping, &wbc);
-	wbc_detach_inode(&wbc);
-	return ret;
+	return filemap_fdatawrite_wbc(mapping, &wbc);
 }
 
 static inline int __filemap_fdatawrite(struct address_space *mapping,

From patchwork Mon Jun 28 15:37:11 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 12348267
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 177BCC49361
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:22 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 04707619B2
	for <linux-btrfs@archiver.kernel.org>; Mon, 28 Jun 2021 15:52:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233163AbhF1Pyq (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Mon, 28 Jun 2021 11:54:46 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53904 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234283AbhF1PyE (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Mon, 28 Jun 2021 11:54:04 -0400
Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com
 [IPv6:2607:f8b0:4864:20::f2c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B165C09CDE0
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:24 -0700 (PDT)
Received: by mail-qv1-xf2c.google.com with SMTP id f5so9469076qvu.8
        for <linux-btrfs@vger.kernel.org>;
 Mon, 28 Jun 2021 08:37:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
        h=from:to:subject:date:message-id:in-reply-to:references:mime-version
         :content-transfer-encoding;
        bh=PQ/W8mpYAbUtGA5QrU6gmz/RrpmvrTciMU8aA+nLdpU=;
        b=khibYnwqq5lM5pMwkW1s1YKU5O2p3ID2GgeDSiKiZTy2M61VIAlozaRHzNk+4+u2E6
         IXEatwX+9SJNkcEBYR8opywG7s/Hu0JPyZCIJjIEFvdreNrWzdT2DH8dQpfvQsf+2Uxa
         q/MfSsSR/hHdKB1Z7CMW1ikhvXEGXGTTTUK1LY0+SXjk/5On9MHUS+6+BdJj57fwd62z
         gsKhZr7uahNVL2tMWORu/obVa7FVkdJ4FExaxx/IER7FEdl8GIZCvcEiHMnHzmeRacAt
         wWLyzOUryo8Yy3Q97nqLRic5CaTEAYrW4cKqm5CKVJfIQoeFxv8naG5dFSzjPvRXau4Z
         KVRg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=PQ/W8mpYAbUtGA5QrU6gmz/RrpmvrTciMU8aA+nLdpU=;
        b=XMv+k3bewQTfPtoqGBLddQh0az3af3a0ezd+Whw8/pCERaF2munzYIijC88N2jjv1k
         hjHmHTSS2JoMwAiBO+rdeUVbzY8HAZaxz162mqGE0K14Z+27DjIR6FvDRjeU6LHHSpNe
         uLf70WNBd8U+yNtgdPov3MJ8YC5DNSVP8KUvKoEupMejl8NGZqCYm7L2ZCQBIn6mC8gr
         9MvAM2r2Sar/ZqqLE/zJrgSYl4IZFXhP7s3UXTeB/e5A4oH/EydhG6wjjNLqvcb+Fn3j
         bQYQVbUva0H5qQl0/1qSXT7bl91CLjHoZfbZgsWOqngf6u6N+fweV9ZLGtUJz+UiYdFP
         SszA==
X-Gm-Message-State: AOAM531e2y0g/vn/pRL5Oc6CEn0W7wR+8QI/PQGaCttM9ET4OWOrA+4o
        w8I3eqRLW2jY4PNM8MDupQuFG1VhLjYeyw==
X-Google-Smtp-Source: 
 ABdhPJxiJb/mzIIWLXVp9OGUyWoUSeFXtg3t1UNiibi8Ktk9aMBlwo+w1KSt+OQdHautRK+V5jSlrg==
X-Received: by 2002:a05:6214:15d0:: with SMTP id
 p16mr4588383qvz.21.1624894643431;
        Mon, 28 Jun 2021 08:37:23 -0700 (PDT)
Received: from localhost (cpe-174-109-172-136.nc.res.rr.com.
 [174.109.172.136])
        by smtp.gmail.com with ESMTPSA id
 e12sm940104qtr.32.2021.06.28.08.37.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 28 Jun 2021 08:37:23 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        kernel-team@fb.com
Subject: [PATCH 6/6] btrfs: use the filemap_fdatawrite_wbc helper for delalloc
 shrinking
Date: Mon, 28 Jun 2021 11:37:11 -0400
Message-Id: 
 <2acb56dd851d31d7b5547099821f0cbf6dfb5d29.1624894102.git.josef@toxicpanda.com>
X-Mailer: git-send-email 2.26.3
In-Reply-To: <cover.1624894102.git.josef@toxicpanda.com>
References: <cover.1624894102.git.josef@toxicpanda.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

sync_inode() has some holes that can cause problems if we're under heavy
ENOSPC pressure.  If there's writeback running on a separate thread
sync_inode() will skip writing the inode altogether.  What we really
want is to make sure writeback has been started on all the pages to make
sure we can see the ordered extents and wait on them if appropriate.
Switch to this new helper which will allow us to accomplish this and
avoid ENOSPC'ing early.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e388153c4ae4..b25c84aba743 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9713,7 +9713,7 @@ static int start_delalloc_inodes(struct btrfs_root *root,
 			btrfs_queue_work(root->fs_info->flush_workers,
 					 &work->work);
 		} else {
-			ret = sync_inode(inode, wbc);
+			ret = filemap_fdatawrite_wbc(inode->i_mapping, wbc);
 			btrfs_add_delayed_iput(inode);
 			if (ret || wbc->nr_to_write <= 0)
 				goto out;