diff mbox

[RESEND] Btrfs: fix early ENOSPC due to delalloc

Message ID 33ac56aa74691de6109709168939cfe9f56cbd70.1500588286.git.osandov@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Omar Sandoval July 20, 2017, 10:10 p.m. UTC
From: Omar Sandoval <osandov@fb.com>

If a lot of metadata is reserved for outstanding delayed allocations, we
rely on shrink_delalloc() to reclaim metadata space in order to fulfill
reservation tickets. However, shrink_delalloc() has a shortcut where if
it determines that space can be overcommitted, it will stop early. This
made sense before the ticketed enospc system, but now it means that
shrink_delalloc() will often not reclaim enough space to fulfill any
tickets, leading to an early ENOSPC. (Reservation tickets don't care
about being able to overcommit, they need every byte accounted for.)

Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
all of the metadata it is supposed to. This fixes early ENOSPCs we were
seeing when doing a btrfs receive to populate a new filesystem, as well
as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.

Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Tested-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Resending with the fixes tag, Cc: stable, Christoph's tested-by, and
some info about how Christoph hit the same issue. Nikolay also said it
helped slightly with some of his ENOSPC problems. Based on v4.13-rc1.
Can we get this in 4.13 and applied to stable?

 fs/btrfs/extent-tree.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Josef Bacik July 21, 2017, 7:10 p.m. UTC | #1
On Thu, Jul 20, 2017 at 03:10:35PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> If a lot of metadata is reserved for outstanding delayed allocations, we
> rely on shrink_delalloc() to reclaim metadata space in order to fulfill
> reservation tickets. However, shrink_delalloc() has a shortcut where if
> it determines that space can be overcommitted, it will stop early. This
> made sense before the ticketed enospc system, but now it means that
> shrink_delalloc() will often not reclaim enough space to fulfill any
> tickets, leading to an early ENOSPC. (Reservation tickets don't care
> about being able to overcommit, they need every byte accounted for.)
> 
> Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
> all of the metadata it is supposed to. This fixes early ENOSPCs we were
> seeing when doing a btrfs receive to populate a new filesystem, as well
> as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.
> 
> Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
> Tested-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
> Cc: stable@vger.kernel.org
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adam Borowski July 21, 2017, 10 p.m. UTC | #2
On Thu, Jul 20, 2017 at 03:10:35PM -0700, Omar Sandoval wrote:
> If a lot of metadata is reserved for outstanding delayed allocations, we
> rely on shrink_delalloc() to reclaim metadata space in order to fulfill
> reservation tickets. However, shrink_delalloc() has a shortcut where if
> it determines that space can be overcommitted, it will stop early. This
> made sense before the ticketed enospc system, but now it means that
> shrink_delalloc() will often not reclaim enough space to fulfill any
> tickets, leading to an early ENOSPC.

This happens a lot (like, 1/4 to 1/3 tries) when populating a freshly made
small filesystem, that makes running tests I've been recently doing (like
those degraded raid corruptions) really unfun.  These unexplained random
ENOSPCes were driving me mad — thanks for explaining those!  Now my tests
properly corrupt data as they should :þ.
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 375f8c728d91..5ef0cf399667 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4825,10 +4825,6 @@  static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim,
 		else
 			flush = BTRFS_RESERVE_NO_FLUSH;
 		spin_lock(&space_info->lock);
-		if (can_overcommit(fs_info, space_info, orig, flush, false)) {
-			spin_unlock(&space_info->lock);
-			break;
-		}
 		if (list_empty(&space_info->tickets) &&
 		    list_empty(&space_info->priority_tickets)) {
 			spin_unlock(&space_info->lock);