From patchwork Fri Dec 30 22:19:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD0A2C46467 for ; Sat, 31 Dec 2022 03:23:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231539AbiLaDXG (ORCPT ); Fri, 30 Dec 2022 22:23:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236371AbiLaDWn (ORCPT ); Fri, 30 Dec 2022 22:22:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08E1512AB2 for ; Fri, 30 Dec 2022 19:22:43 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 993D461D65 for ; Sat, 31 Dec 2022 03:22:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06298C433D2; Sat, 31 Dec 2022 03:22:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672456962; bh=jxPD6AMno68mYlUWqi8+37x51cTNTquJcuYDcCQEENc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=nWzffMecSfaZWRyNFl7Q9eXdGHSgVT0DyONJnkept8YjsQs5rshEUNje5pIcEkm4H 6xdOiYvnkMts8BtKSNeARSZDgTMgZ6RlMTH4z1bB6kcgEcVTSG5Y5PC58jyTMt9oQk N40d104lhGqYrUJ08Vr7RTb44aY7MhmMVIXtr12Di3wjk8bFOt6RA8zghh1IKlhY9U WrSX1McLL9z5/6vRBdFQRBlZH1IJukGcowKKLQzEMgWCdBTtposkExFsFWvKVjuRao USIp6umat6Fvk8tzpNDir3SaZQkNOckKWs/AfNWNlFrsz4NkyOOZyi/SsYnog0yPYa dziS35kT6KcEQ== Subject: [PATCH 1/3] xfs: only free posteof blocks on first close From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:19:20 -0800 Message-ID: <167243876038.726374.1959841327619200697.stgit@magnolia> In-Reply-To: <167243876021.726374.15071907725836376245.stgit@magnolia> References: <167243876021.726374.15071907725836376245.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Certain workloads fragment files on XFS very badly, such as a software package that creates a number of threads, each of which repeatedly run the sequence: open a file, perform a synchronous write, and close the file, which defeats the speculative preallocation mechanism. We work around this problem by only deleting posteof blocks the /first/ time a file is closed to preserve the behavior that unpacking a tarball lays out files one after the other with no gaps. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_inode.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index d50cbd0eb260..f0e44c96b769 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1381,9 +1381,7 @@ xfs_release( if (error) goto out_unlock; - /* delalloc blocks after truncation means it really is dirty */ - if (ip->i_delayed_blks) - xfs_iflags_set(ip, XFS_IDIRTY_RELEASE); + xfs_iflags_set(ip, XFS_IDIRTY_RELEASE); } out_unlock: From patchwork Fri Dec 30 22:19:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E69DC4332F for ; Sat, 31 Dec 2022 03:23:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236366AbiLaDXH (ORCPT ); Fri, 30 Dec 2022 22:23:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236379AbiLaDW7 (ORCPT ); Fri, 30 Dec 2022 22:22:59 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88CD912A91 for ; Fri, 30 Dec 2022 19:22:58 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 26A4161D66 for ; Sat, 31 Dec 2022 03:22:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8047EC433EF; Sat, 31 Dec 2022 03:22:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672456977; bh=vLc+GxpPbpk24+X4SakA0dNbcvVmem6Wbki191LYqKo=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=esgB8vS00t67D8RbnYhCS5350GEanPUnizIhF2v0aCtYcPTF79cz8YAiofJzTTbwf FyGIqyFW7tE0/YIFkP99zqGqYWZ+QpKBYQk9IF8IAENuiIZslPA0ECbpr8jbpdAkMi F5TM0HcxBqfAWSSNgd+ViP12YGnmKMtlkFD5/lKFUJtFkqA3W7XGoB8pI021AqxM/g e5jlMQJ73lbCOi0rA3OFyU7KI++lGEkF4HH3Lsszg+qUWRpWLDFw6vy9fAiwYLqu13 T58hvdPkbn2XhveN1N/UmVThj9YlzF+youM/QxCtxRSxBV5xa4IuERoTUdIGqhC/nx LH5slGppJFSkQ== Subject: [PATCH 2/3] xfs: don't free EOF blocks on read close From: "Darrick J. Wong" To: djwong@kernel.org Cc: Dave Chinner , linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:19:20 -0800 Message-ID: <167243876052.726374.3477350707567259751.stgit@magnolia> In-Reply-To: <167243876021.726374.15071907725836376245.stgit@magnolia> References: <167243876021.726374.15071907725836376245.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner When we have a workload that does open/read/close in parallel with other allocation, the file becomes rapidly fragmented. This is due to close() calling xfs_release() and removing the speculative preallocation beyond EOF. The existing open/*/close heuristic in xfs_release() does not catch this as a sync writer does not leave delayed allocation blocks allocated on the inode for later writeback that can be detected in xfs_release() and hence XFS_IDIRTY_RELEASE never gets set. In xfs_file_release(), we know more about the released file context, and so we need to communicate some of the details to xfs_release() so it can do the right thing here and skip EOF block truncation. This defers the EOF block cleanup for synchronous write contexts to the background EOF block cleaner which will clean up within a few minutes. Before: Test 1: sync write fragmentation counts /mnt/scratch/file.0: 919 /mnt/scratch/file.1: 916 /mnt/scratch/file.2: 919 /mnt/scratch/file.3: 920 /mnt/scratch/file.4: 920 /mnt/scratch/file.5: 921 /mnt/scratch/file.6: 916 /mnt/scratch/file.7: 918 After: Test 1: sync write fragmentation counts /mnt/scratch/file.0: 24 /mnt/scratch/file.1: 24 /mnt/scratch/file.2: 11 /mnt/scratch/file.3: 24 /mnt/scratch/file.4: 3 /mnt/scratch/file.5: 24 /mnt/scratch/file.6: 24 /mnt/scratch/file.7: 23 Signed-off-by: Dave Chinner [darrick: wordsmithing, fix commit message] Signed-off-by: Darrick J. Wong Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 14 ++++++++++++-- fs/xfs/xfs_inode.c | 9 +++++---- fs/xfs/xfs_inode.h | 2 +- 3 files changed, 18 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e172ca1b18df..87e836e1aeb3 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1381,12 +1381,22 @@ xfs_dir_open( return error; } +/* + * When we release the file, we don't want it to trim EOF blocks if it is a + * readonly context. This avoids open/read/close workloads from removing + * EOF blocks that other writers depend upon to reduce fragmentation. + */ STATIC int xfs_file_release( struct inode *inode, - struct file *filp) + struct file *file) { - return xfs_release(XFS_I(inode)); + bool free_eof_blocks = true; + + if ((file->f_mode & (FMODE_WRITE | FMODE_READ)) == FMODE_READ) + free_eof_blocks = false; + + return xfs_release(XFS_I(inode), free_eof_blocks); } STATIC int diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index f0e44c96b769..763f07867325 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1311,10 +1311,11 @@ xfs_itruncate_extents_flags( int xfs_release( - xfs_inode_t *ip) + struct xfs_inode *ip, + bool want_free_eofblocks) { - xfs_mount_t *mp = ip->i_mount; - int error = 0; + struct xfs_mount *mp = ip->i_mount; + int error = 0; if (!S_ISREG(VFS_I(ip)->i_mode) || (VFS_I(ip)->i_mode == 0)) return 0; @@ -1356,7 +1357,7 @@ xfs_release( * another chance to drop them once the last reference to the inode is * dropped, so we'll never leak blocks permanently. */ - if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) + if (!want_free_eofblocks || !xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) return 0; if (xfs_can_free_eofblocks(ip, false)) { diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 32a1d114dfaf..4ab0a63da367 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -493,7 +493,7 @@ enum layout_break_reason { #define XFS_INHERIT_GID(pip) \ (xfs_has_grpid((pip)->i_mount) || (VFS_I(pip)->i_mode & S_ISGID)) -int xfs_release(struct xfs_inode *ip); +int xfs_release(struct xfs_inode *ip, bool can_free_eofblocks); void xfs_inactive(struct xfs_inode *ip); int xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name, struct xfs_inode **ipp, struct xfs_name *ci_name); From patchwork Fri Dec 30 22:19:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60445C4167B for ; Sat, 31 Dec 2022 03:23:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236371AbiLaDXi (ORCPT ); Fri, 30 Dec 2022 22:23:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236373AbiLaDXQ (ORCPT ); Fri, 30 Dec 2022 22:23:16 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE6F612A91 for ; Fri, 30 Dec 2022 19:23:15 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8A2DEB81E72 for ; Sat, 31 Dec 2022 03:23:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 444D7C433EF; Sat, 31 Dec 2022 03:23:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672456993; bh=zCHdghJRa9K1y39/R6qkELcMJeb3gZRbO4QXFg8nNzg=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=JbQJXyy5RRYbw/FsOJqePJnvrjitVQFJTWOJCvRiCvVm9wXE7p6rGXJb8I5QyrGvn BZ8ZosCkN5fTfx0V361aQ646VuSbBIVjTkFIHHZ3wqLMg3flNSZaBrTGkzbaLdU4ta M9qdGuP4eO5e7I/gm5Fu+WUuvRowgZ/Iq2FSPw1+XT1cAQofRIiwDbjmYqOs1dxtEm SOXXGtr69aBH9UuZ0qbaFZwwo0/orA6kO81ygiSQLOLl75Z1AyBfvkncdb5fMzhinJ +q+V0W5peObaByiJ81IIQ/rqpUNeFjSILp4F0zLycG/qxqOjrqhzq/kVRxdEJBjv7c RJ01yrj0zkm/g== Subject: [PATCH 3/3] xfs: Don't free EOF blocks on close when extent size hints are set From: "Darrick J. Wong" To: djwong@kernel.org Cc: Dave Chinner , linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:19:20 -0800 Message-ID: <167243876065.726374.4890051106492069344.stgit@magnolia> In-Reply-To: <167243876021.726374.15071907725836376245.stgit@magnolia> References: <167243876021.726374.15071907725836376245.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner When we have a workload that does open/write/close on files with extent size hints set in parallel with other allocation, the file becomes rapidly fragmented. This is due to close() calling xfs_release() and removing the preallocated extent beyond EOF. This occurs for both buffered and direct writes that append to files with extent size hints. The existing open/write/close hueristic in xfs_release() does not catch this as writes to files using extent size hints do not use delayed allocation and hence do not leave delayed allocation blocks allocated on the inode that can be detected in xfs_release(). Hence XFS_IDIRTY_RELEASE never gets set. In xfs_file_release(), we can tell whether the inode has extent size hints set and skip EOF block truncation. We add this check to xfs_can_free_eofblocks() so that we treat the post-EOF preallocated extent like intentional preallocation and so are persistent unless directly removed by userspace. Before: Test 2: Extent size hint fragmentation counts /mnt/scratch/file.0: 1002 /mnt/scratch/file.1: 1002 /mnt/scratch/file.2: 1002 /mnt/scratch/file.3: 1002 /mnt/scratch/file.4: 1002 /mnt/scratch/file.5: 1002 /mnt/scratch/file.6: 1002 /mnt/scratch/file.7: 1002 After: Test 2: Extent size hint fragmentation counts /mnt/scratch/file.0: 4 /mnt/scratch/file.1: 4 /mnt/scratch/file.2: 4 /mnt/scratch/file.3: 4 /mnt/scratch/file.4: 4 /mnt/scratch/file.5: 4 /mnt/scratch/file.6: 4 /mnt/scratch/file.7: 4 Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index a54ed26e1cc0..558951710404 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -710,12 +710,15 @@ xfs_can_free_eofblocks( return false; /* - * Do not free real preallocated or append-only files unless the file - * has delalloc blocks and we are forced to remove them. + * Do not free extent size hints, real preallocated or append-only files + * unless the file has delalloc blocks and we are forced to remove + * them. */ - if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) + if (xfs_get_extsz_hint(ip) || + (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))) { if (!force || ip->i_delayed_blks == 0) return false; + } /* * Do not try to free post-EOF blocks if EOF is beyond the end of the