[1/1] xfs: fix severe performance problems when fstrimming a subset of an AG

From: Darrick J. Wong <djwong@kernel.org>

From: Darrick J. Wong <djwong@kernel.org>

XFS issues discard IOs while holding the free space btree and the AGF
buffers locked.  If the discard IOs are slow, this can lead to long
stalls for every other thread trying to access that AG.  On a 10TB high
performance flash storage device with a severely fragmented free space
btree in every AG, this results in many threads tripping the hangcheck
warnings while waiting for the AGF.  This happens even after we've run
fstrim a few times and waited for the nvme namespace utilization
counters to stabilize.

Strace for the entire 100TB looks like:
ioctl(3, FITRIM, {start=0x0, len=10995116277760, minlen=0}) = 0 <686.209839>

Reducing the size of the FITRIM requests to a single AG at a time
produces lower times for each individual call, but even this isn't quite
acceptable, because the lock hold times are still high enough to cause
stall warnings:

Strace for the first 4x 1TB AGs looks like (2):
ioctl(3, FITRIM, {start=0x0, len=1099511627776, minlen=0}) = 0 <68.352033>
ioctl(3, FITRIM, {start=0x10000000000, len=1099511627776, minlen=0}) = 0 <68.760323>
ioctl(3, FITRIM, {start=0x20000000000, len=1099511627776, minlen=0}) = 0 <67.235226>
ioctl(3, FITRIM, {start=0x30000000000, len=1099511627776, minlen=0}) = 0 <69.465744>

The fstrim code has to synchronize discards with block allocations, so
we must hold the AGF lock while issuing discard IOs.  Breaking up the
calls into smaller start/len segments ought to reduce the lock hold time
and allow other threads a chance to make progress.  Unfortunately, the
current fstrim implementation handles this poorly because it walks the
entire free space by length index (cntbt) and it's not clear if we can
cycle the AGF periodically to reduce latency because there's no
less-than btree lookup.

The first solution I thought of was to limit latency by scanning parts
of an AG at a time, but this doesn't solve the stalling problem when the
free space is heavily fragmented because each sub-AG scan has to walk
the entire cntbt to find free space that fits within the given range.
In fact, this dramatically increases the runtime!  This itself is a
problem, because sub-AG fstrim runtime is unnecessarily high.

For sub-AG scans, create a second implementation that will walk the
bnobt and perform the trims in block number order.  Since the cursor has
an obviously monotonically increasing value, it is easy to cycle the AGF
periodically to allow other threads to do work.  This implementation
avoids the worst problems of the original code, though it lacks the
desirable attribute of freeing the biggest chunks first.

On the other hand, this second implementation will be much easier to
constrain the locking latency, and makes it much easier to report fstrim
progress to anyone who's running xfs_scrub.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_discard.c |  172 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 169 insertions(+), 3 deletions(-)

Message ID	171150385535.3220448.4852463781154330350.stgit@frogsfrogsfrogs (mailing list archive)
State	Superseded
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2266D1B94D for <linux-xfs@vger.kernel.org>; Wed, 27 Mar 2024 02:07:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711505279; cv=none; b=Rp6wqLMzNooCdrOBx6vJJF1Q0MZAbzuLj2VLQPwICZJRnknQL5h6qnTGzC2fEMd9O3Ip7bLTCx62zSn3uHRBd84PoTIk5+SEDGFImRgjKhoFIf6NDolhUnkMgWSdty/1VtQk0DFsO3MDFwFT4O7Qh0mRfVRLksmhUaw6+ear1L0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711505279; c=relaxed/simple; bh=0psk16qIjfFOCINZ5mwTJH02R5VLLeF+J/c5RWC4ys8=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KHwv5yAXOo7IW+ac1g+MEh5BVImvYrO5LNJCSgS7KtSZ4Oe6FzILMws2kHvwGwhtBVq/NZxsOnSLQo/bdeY7NVLuUydjR+wsCCVexPS6yE0/G0UabzCUSMEPUTxE/UGRZrNc4Ug2fQO3Sj6+hyWOXxmyxewP5Aw9BkGzSkz+eIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=crCYNLK9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="crCYNLK9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EC9F7C433F1; Wed, 27 Mar 2024 02:07:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1711505279; bh=0psk16qIjfFOCINZ5mwTJH02R5VLLeF+J/c5RWC4ys8=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=crCYNLK9QKAB3+L8QiluRMFqQLrOmIHV8DFRcotZGag9aVCxhj3t8RmxxRGWqHGgl jjplSQbUD5ztFPQ+APGM23+VqK9jgM8ZaRO7TSqjuP3i4Jq57dGj5tQGpiJkzZgzcY zbZcBGuStNq8+JQBb8qsxRX13CbpwwuTGKsTdRJa5y3q7eQ36dt/Wp+wJXmrKh8g4o VdDd4fT68mO7vMpg5RIitZTimg5/xUDHxC9Ac6h3Enl91R9JU+KTxbmx3shtvJugQ0 nGwRRAlD1ccLSt2c6sVBkWa41hSM5Y+deLkVeYGDiiUSnCD/2gpSkg81Xb4Ut50eG6 c6577mgqkwrcA== Date: Tue, 26 Mar 2024 19:07:58 -0700 Subject: [PATCH 1/1] xfs: fix severe performance problems when fstrimming a subset of an AG From: "Darrick J. Wong" <djwong@kernel.org> To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <171150385535.3220448.4852463781154330350.stgit@frogsfrogsfrogs> In-Reply-To: <171150385517.3220448.15319110826705438395.stgit@frogsfrogsfrogs> References: <171150385517.3220448.15319110826705438395.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: <linux-xfs.vger.kernel.org> List-Subscribe: <mailto:linux-xfs+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-xfs+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit
Series	[1/1] xfs: fix severe performance problems when fstrimming a subset of an AG \| expand [1/1] xfs: fix severe performance problems when fstrimming a subset of an AG

[1/1] xfs: fix severe performance problems when fstrimming a subset of an AG

Commit Message

Comments

Patch