From patchwork Thu Jun 20 07:21:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704895 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9C70639 for ; Thu, 20 Jun 2024 07:21:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868115; cv=none; b=DzYxhaRngbk+2u4YsEdZbhJJqJOjJcKhTRWYuqcaMYG4coE4u+TNOgZ+0/7pMyjm7pAzxGo+ITrgR665ivT5+gNx66zQdzcjDMppDHjBlWZXfq0Tza6KlHXFc5K2OFivF2zb0vSsgrgZbaFwCQoydju99VP9Ylur31S51x3geP4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868115; c=relaxed/simple; bh=pcJXZ3cKA2T041aiaBZgToDQADccA7auqZaOaPyOI48=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i1Adp9GH0uNL/Elw3uSnD1j0ntHbn5OERsWIqr3/yW4CimSGXRjpQItD/Qp+pK5etjwzJ3gAGUpnp4vHIUYz8NYajjSHAMf3bTVqqvfZGY+U13qq/EghlmUOKLeS1U3ft+CXzvTrse89HwflqVDTn5d/zCjLhSCfP8xJZa3luEc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=F/XJsR7A; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="F/XJsR7A" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=gOZW60U0l4N7L00yEk9pU+wWQsIbEQf2Jk2KrgKQA+4=; b=F/XJsR7AAmLKaGhoKDHLHhhXwr FwTUJ6gxR9eKO4cfGPhHNyHrZo6XoQaLpgCr01Hcj0GgQyCJCeZbQBWK+jPAJ3h2aLBFzgR5Gxuwt uuJCDyL5k+raibLOrXQYhsMz83cMJetN3QcU/tmO/Ldj77A66gZB5DZ1nLwenFpFW7xQhwtwVnWxR Ml1HpWEZINhSXIZQ5ekz64jGEWZFpUhnjMVNi2acBiBAVp/ZRwiclfju1HWG0pH9qjqQAaoBbYXv2 R4/S9hfm68aX3A6wRyv+28yPYE3gdOUri72W6lbEOklb0Vw326ZezoDxLwFFrM05knVSKQZUF1ZA9 7z1QtCHw==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7F-00000003xXz-0q2t; Thu, 20 Jun 2024 07:21:53 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 01/11] xfs: fix the contact address for the sysfs ABI documentation Date: Thu, 20 Jun 2024 09:21:18 +0200 Message-ID: <20240620072146.530267-2-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html oss.sgi.com is long dead, refer to the current linux-xfs list instead. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- Documentation/ABI/testing/sysfs-fs-xfs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-fs-xfs b/Documentation/ABI/testing/sysfs-fs-xfs index f704925f6fe93b..82d8e2f79834b5 100644 --- a/Documentation/ABI/testing/sysfs-fs-xfs +++ b/Documentation/ABI/testing/sysfs-fs-xfs @@ -1,7 +1,7 @@ What: /sys/fs/xfs//log/log_head_lsn Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The log sequence number (LSN) of the current head of the log. The LSN is exported in "cycle:basic block" format. @@ -10,7 +10,7 @@ Users: xfstests What: /sys/fs/xfs//log/log_tail_lsn Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The log sequence number (LSN) of the current tail of the log. The LSN is exported in "cycle:basic block" format. @@ -18,7 +18,7 @@ Description: What: /sys/fs/xfs//log/reserve_grant_head Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The current state of the log reserve grant head. It represents the total log reservation of all currently @@ -29,7 +29,7 @@ Users: xfstests What: /sys/fs/xfs//log/write_grant_head Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The current state of the log write grant head. It represents the total log reservation of all currently From patchwork Thu Jun 20 07:21:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704896 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04DA9639 for ; Thu, 20 Jun 2024 07:21:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868118; cv=none; b=NGlNx0UpRRKsn9lGio48DEF9oQtVBEKW2U8MAQr1lTGQx2Hg3r1QMzxq6mqNexn0LG6cGpoKd9NjUJAHnTigFzAFu4yomg+XyWlCB4VAUuxa1pnh1FvEviYVVHmwUMOniIUTdt0IAlMPZizvXWA+soS8xIUIwheWTirEmkjLkdc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868118; c=relaxed/simple; bh=PceMULnaGdEBxMk4PU8nzqAUyP8Gj17Tqn56drb7Sm8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=N5FbCgyaNKDEX/9pH0glBQA59+TvkCCIg6v8tZ9tHEQmUG3eN6ZZt0mvYJXa56WRPKbU69xbjNFlSTVDkoy/PVprNafrxIyKtw1bvY6dKzxB9VLXPwFqwY4/mrtUDNPktUAAkH4kT6/w6EssK21LUgtPjFrPPhmd4f7OyuCRjLo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=dNZg+59p; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="dNZg+59p" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=uWsAPhBS+7nuiAbcTgT9Hdk/Zy8gmorRf8Aqmq9Gvpo=; b=dNZg+59piw7xEogrN1LeIfWaMR Q1LJ+wve9Di2PSFepv0vHbv751GpG4xQ58XDj/2Ua7rkIJuWor7JH2uXAhioohceoxKhzTRQhzCUR 5biDAl5634xdWJ9NGIjnmFe3QJiXPo6itnNbEGPvEWwNiX37n2UKmQBClvjJu7I3J/ToaE7Y39Wug QZ+EeakB66xHzyASaY2Cdwsj6l+uhYGFPF0aQP9ygY7zIX6aO/YIs/z3yVNN2Wmg2PUPr0LrdL8fU eS/xj7ddhvXpZpd9mSUbjz/W1vjWPrrtDUVGt4hYCWi9B/oqI2Sy94QWTHHVvhrtmvFDKHAn/gblB FzJo8Qww==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7H-00000003xYF-2hUS; Thu, 20 Jun 2024 07:21:56 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 02/11] xfs: move and rename xfs_trans_committed_bulk Date: Thu, 20 Jun 2024 09:21:19 +0200 Message-ID: <20240620072146.530267-3-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner Ever since the CIL and delayed logging was introduced, xfs_trans_committed_bulk() has been a purely CIL checkpoint completion function and not a transaction commit completion function. Now that we are adding log specific updates to this function, it really does not have anything to do with the transaction subsystem - it is really log and log item level functionality. This should be part of the CIL code as it is the callback that moves log items from the CIL checkpoint to the AIL. Move it and rename it to xlog_cil_ail_insert(). Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_log_cil.c | 132 +++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_trans.c | 129 --------------------------------------- fs/xfs/xfs_trans_priv.h | 3 - 3 files changed, 131 insertions(+), 133 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index f51cbc6405c15a..141bde08bd6e3c 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -694,6 +694,136 @@ xlog_cil_insert_items( } } +static inline void +xlog_cil_ail_insert_batch( + struct xfs_ail *ailp, + struct xfs_ail_cursor *cur, + struct xfs_log_item **log_items, + int nr_items, + xfs_lsn_t commit_lsn) +{ + int i; + + spin_lock(&ailp->ail_lock); + /* xfs_trans_ail_update_bulk drops ailp->ail_lock */ + xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn); + + for (i = 0; i < nr_items; i++) { + struct xfs_log_item *lip = log_items[i]; + + if (lip->li_ops->iop_unpin) + lip->li_ops->iop_unpin(lip, 0); + } +} + +/* + * Take the checkpoint's log vector chain of items and insert the attached log + * items into the AIL. This uses bulk insertion techniques to minimise AIL lock + * traffic. + * + * If we are called with the aborted flag set, it is because a log write during + * a CIL checkpoint commit has failed. In this case, all the items in the + * checkpoint have already gone through iop_committed and iop_committing, which + * means that checkpoint commit abort handling is treated exactly the same as an + * iclog write error even though we haven't started any IO yet. Hence in this + * case all we need to do is iop_committed processing, followed by an + * iop_unpin(aborted) call. + * + * The AIL cursor is used to optimise the insert process. If commit_lsn is not + * at the end of the AIL, the insert cursor avoids the need to walk the AIL to + * find the insertion point on every xfs_log_item_batch_insert() call. This + * saves a lot of needless list walking and is a net win, even though it + * slightly increases that amount of AIL lock traffic to set it up and tear it + * down. + */ +static void +xlog_cil_ail_insert( + struct xlog *log, + struct list_head *lv_chain, + xfs_lsn_t commit_lsn, + bool aborted) +{ +#define LOG_ITEM_BATCH_SIZE 32 + struct xfs_ail *ailp = log->l_ailp; + struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; + struct xfs_log_vec *lv; + struct xfs_ail_cursor cur; + int i = 0; + + spin_lock(&ailp->ail_lock); + xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn); + spin_unlock(&ailp->ail_lock); + + /* unpin all the log items */ + list_for_each_entry(lv, lv_chain, lv_list) { + struct xfs_log_item *lip = lv->lv_item; + xfs_lsn_t item_lsn; + + if (aborted) + set_bit(XFS_LI_ABORTED, &lip->li_flags); + + if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { + lip->li_ops->iop_release(lip); + continue; + } + + if (lip->li_ops->iop_committed) + item_lsn = lip->li_ops->iop_committed(lip, commit_lsn); + else + item_lsn = commit_lsn; + + /* item_lsn of -1 means the item needs no further processing */ + if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) + continue; + + /* + * if we are aborting the operation, no point in inserting the + * object into the AIL as we are in a shutdown situation. + */ + if (aborted) { + ASSERT(xlog_is_shutdown(ailp->ail_log)); + if (lip->li_ops->iop_unpin) + lip->li_ops->iop_unpin(lip, 1); + continue; + } + + if (item_lsn != commit_lsn) { + + /* + * Not a bulk update option due to unusual item_lsn. + * Push into AIL immediately, rechecking the lsn once + * we have the ail lock. Then unpin the item. This does + * not affect the AIL cursor the bulk insert path is + * using. + */ + spin_lock(&ailp->ail_lock); + if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0) + xfs_trans_ail_update(ailp, lip, item_lsn); + else + spin_unlock(&ailp->ail_lock); + if (lip->li_ops->iop_unpin) + lip->li_ops->iop_unpin(lip, 0); + continue; + } + + /* Item is a candidate for bulk AIL insert. */ + log_items[i++] = lv->lv_item; + if (i >= LOG_ITEM_BATCH_SIZE) { + xlog_cil_ail_insert_batch(ailp, &cur, log_items, + LOG_ITEM_BATCH_SIZE, commit_lsn); + i = 0; + } + } + + /* make sure we insert the remainder! */ + if (i) + xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn); + + spin_lock(&ailp->ail_lock); + xfs_trans_ail_cursor_done(&cur); + spin_unlock(&ailp->ail_lock); +} + static void xlog_cil_free_logvec( struct list_head *lv_chain) @@ -733,7 +863,7 @@ xlog_cil_committed( spin_unlock(&ctx->cil->xc_push_lock); } - xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, &ctx->lv_chain, + xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain, ctx->start_lsn, abort); xfs_extent_busy_sort(&ctx->busy_extents.extent_list); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 828da4ac4316d6..bdf3704dc30118 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -725,135 +725,6 @@ xfs_trans_free_items( } } -static inline void -xfs_log_item_batch_insert( - struct xfs_ail *ailp, - struct xfs_ail_cursor *cur, - struct xfs_log_item **log_items, - int nr_items, - xfs_lsn_t commit_lsn) -{ - int i; - - spin_lock(&ailp->ail_lock); - /* xfs_trans_ail_update_bulk drops ailp->ail_lock */ - xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn); - - for (i = 0; i < nr_items; i++) { - struct xfs_log_item *lip = log_items[i]; - - if (lip->li_ops->iop_unpin) - lip->li_ops->iop_unpin(lip, 0); - } -} - -/* - * Bulk operation version of xfs_trans_committed that takes a log vector of - * items to insert into the AIL. This uses bulk AIL insertion techniques to - * minimise lock traffic. - * - * If we are called with the aborted flag set, it is because a log write during - * a CIL checkpoint commit has failed. In this case, all the items in the - * checkpoint have already gone through iop_committed and iop_committing, which - * means that checkpoint commit abort handling is treated exactly the same - * as an iclog write error even though we haven't started any IO yet. Hence in - * this case all we need to do is iop_committed processing, followed by an - * iop_unpin(aborted) call. - * - * The AIL cursor is used to optimise the insert process. If commit_lsn is not - * at the end of the AIL, the insert cursor avoids the need to walk - * the AIL to find the insertion point on every xfs_log_item_batch_insert() - * call. This saves a lot of needless list walking and is a net win, even - * though it slightly increases that amount of AIL lock traffic to set it up - * and tear it down. - */ -void -xfs_trans_committed_bulk( - struct xfs_ail *ailp, - struct list_head *lv_chain, - xfs_lsn_t commit_lsn, - bool aborted) -{ -#define LOG_ITEM_BATCH_SIZE 32 - struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; - struct xfs_log_vec *lv; - struct xfs_ail_cursor cur; - int i = 0; - - spin_lock(&ailp->ail_lock); - xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn); - spin_unlock(&ailp->ail_lock); - - /* unpin all the log items */ - list_for_each_entry(lv, lv_chain, lv_list) { - struct xfs_log_item *lip = lv->lv_item; - xfs_lsn_t item_lsn; - - if (aborted) - set_bit(XFS_LI_ABORTED, &lip->li_flags); - - if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { - lip->li_ops->iop_release(lip); - continue; - } - - if (lip->li_ops->iop_committed) - item_lsn = lip->li_ops->iop_committed(lip, commit_lsn); - else - item_lsn = commit_lsn; - - /* item_lsn of -1 means the item needs no further processing */ - if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) - continue; - - /* - * if we are aborting the operation, no point in inserting the - * object into the AIL as we are in a shutdown situation. - */ - if (aborted) { - ASSERT(xlog_is_shutdown(ailp->ail_log)); - if (lip->li_ops->iop_unpin) - lip->li_ops->iop_unpin(lip, 1); - continue; - } - - if (item_lsn != commit_lsn) { - - /* - * Not a bulk update option due to unusual item_lsn. - * Push into AIL immediately, rechecking the lsn once - * we have the ail lock. Then unpin the item. This does - * not affect the AIL cursor the bulk insert path is - * using. - */ - spin_lock(&ailp->ail_lock); - if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0) - xfs_trans_ail_update(ailp, lip, item_lsn); - else - spin_unlock(&ailp->ail_lock); - if (lip->li_ops->iop_unpin) - lip->li_ops->iop_unpin(lip, 0); - continue; - } - - /* Item is a candidate for bulk AIL insert. */ - log_items[i++] = lv->lv_item; - if (i >= LOG_ITEM_BATCH_SIZE) { - xfs_log_item_batch_insert(ailp, &cur, log_items, - LOG_ITEM_BATCH_SIZE, commit_lsn); - i = 0; - } - } - - /* make sure we insert the remainder! */ - if (i) - xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn); - - spin_lock(&ailp->ail_lock); - xfs_trans_ail_cursor_done(&cur); - spin_unlock(&ailp->ail_lock); -} - /* * Sort transaction items prior to running precommit operations. This will * attempt to order the items such that they will always be locked in the same diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index d5400150358e85..52a45f0a5ef173 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -19,9 +19,6 @@ void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_del_item(struct xfs_log_item *); void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp); -void xfs_trans_committed_bulk(struct xfs_ail *ailp, - struct list_head *lv_chain, - xfs_lsn_t commit_lsn, bool aborted); /* * AIL traversal cursor. * From patchwork Thu Jun 20 07:21:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704897 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91719639 for ; Thu, 20 Jun 2024 07:21:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868121; cv=none; b=OshT1it8BBpD8sOdcdVcG/YZejgCysZvfiyWuRq9NVhGcRa+dAS954mhxxiSPwJmt4RbwyxFPvMYmpVACXxHMdJ4BvKCNxtnnkfQtifhic9sCucJSYMTTWxzy/5UPEi8hQqqogehpQFiT4xhVQtBs6F2OQql30qpl1/vvx5DVgE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868121; c=relaxed/simple; bh=a3Jo+LO/CyG6zD+Xlbgf6D5bXe7tb1HvhT1UO4vXriE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IoDYWyBwavQGUL/rmPjckWOGoL1Ip9b6sLmQtx1EbC8Qq4r+6jU4iwTBfNaeANUowwY0jJ7YIawG6zTog7C+0WONVvB+OKW2t6GgZQCsueAkwbBCUonNB8SpNtGnztpbD9wF6eitZYy8WisOs9XYB1FNuEdSERMeg9ZRjvWuIPQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=hdWs1x0z; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="hdWs1x0z" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=g0ga80Isy2lQ3XhNUFvjnZTc4oHaO3jKerY1DUQ84lQ=; b=hdWs1x0z3ENJOMuNv7HUIp+otD E3eK1UX0UIlaXZ02rWT3FRXQ83jGX2sO9o56JmhHgENvK5G0nOzCYz/QG52VBLH1WaKCwO3C+nQoc MoWxztz6s/asoH2P59dnWqXlhM5jbK7UuOehxu5QzJq3YOnnjTN5PryvH+0DQGdBohoA6tKUIe82x KDbAOdMXXR2juAPlyrkfyj+y6vhx6SC74HH9k+zBH7UJGyj3kgEiVmaFe1Vp6hAkbBPCOyNJlGHOX jJBc4rgtQOU5SkI/Fza+CIMkgiH4+DZQBEn3dv+2NnJRux96fK/e+wtXmBAWm9Z9vPdmdKOVRZmU7 C2gDzcOg==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7K-00000003xYW-3LBF; Thu, 20 Jun 2024 07:21:59 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 03/11] xfs: AIL doesn't need manual pushing Date: Thu, 20 Jun 2024 09:21:20 +0200 Message-ID: <20240620072146.530267-4-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner We have a mechanism that checks the amount of log space remaining available every time we make a transaction reservation. If the amount of space is below a threshold (25% free) we push on the AIL to tell it to do more work. To do this, we end up calculating the LSN that the AIL needs to push to on every reservation and updating the push target for the AIL with that new target LSN. This is silly and expensive. The AIL is perfectly capable of calculating the push target itself, and it will always be running when the AIL contains objects. What the target does is determine if the AIL needs to do any work before it goes back to sleep. If we haven't run out of reservation space or memory (or some other push all trigger), it will simply go back to sleep for a while if there is more than 25% of the journal space free without doing anything. If there are items in the AIL at a lower LSN than the target, it will try to push up to the target or to the point of getting stuck before going back to sleep and trying again soon after.` Hence we can modify the AIL to calculate it's own 25% push target before it starts a push using the same reserve grant head based calculation as is currently used, and remove all the places where we ask the AIL to push to a new 25% free target. We can also drop the minimum free space size of 256BBs from the calculation because the 25% of a minimum sized log is *always going to be larger than 256BBs. This does still require a manual push in certain circumstances. These circumstances arise when the AIL is not full, but the reservation grants consume the entire of the free space in the log. In this case, we still need to push on the AIL to free up space, so when we hit this condition (i.e. reservation going to sleep to wait on log space) we do a single push to tell the AIL it should empty itself. This will keep the AIL moving as new reservations come in and want more space, rather than keep queuing them and having to push the AIL repeatedly. The reason for using the "push all" when grant space runs out is that we can run out of grant space when there is more than 25% of the log free. Small logs are notorious for this, and we have a hack in the log callback code (xlog_state_set_callback()) where we push the AIL because the *head* moved) to ensure that we kick the AIL when we consume space in it because that can push us over the "less than 25% available" available that starts tail pushing back up again. Hence when we run out of grant space and are going to sleep, we have to consider that the grant space may be consuming almost all the log space and there is almost nothing in the AIL. In this situation, the AIL pins the tail and moving the tail forwards is the only way the grant space will come available, so we have to force the AIL to push everything to guarantee grant space will eventually be returned. Hence triggering a "push all" just before sleeping removes all the nasty corner cases we have in other parts of the code that work around the "we didn't ask the AIL to push enough to free grant space" condition that leads to log space hangs... Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_defer.c | 4 +- fs/xfs/xfs_log.c | 135 ++----------------------------- fs/xfs/xfs_log.h | 1 - fs/xfs/xfs_log_priv.h | 2 + fs/xfs/xfs_trans_ail.c | 162 +++++++++++++++++--------------------- fs/xfs/xfs_trans_priv.h | 33 ++++++-- 6 files changed, 108 insertions(+), 229 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 4a078e07e1a0a0..e2c8308d518b56 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -12,12 +12,14 @@ #include "xfs_mount.h" #include "xfs_defer.h" #include "xfs_trans.h" +#include "xfs_trans_priv.h" #include "xfs_buf_item.h" #include "xfs_inode.h" #include "xfs_inode_item.h" #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_log.h" +#include "xfs_log_priv.h" #include "xfs_rmap.h" #include "xfs_refcount.h" #include "xfs_bmap.h" @@ -556,7 +558,7 @@ xfs_defer_relog( * the log threshold once per call. */ if (threshold_lsn == NULLCOMMITLSN) { - threshold_lsn = xlog_grant_push_threshold(log, 0); + threshold_lsn = xfs_ail_push_target(log->l_ailp); if (threshold_lsn == NULLCOMMITLSN) break; } diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 416c154949832c..235fcf6dc4eeb5 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -30,10 +30,6 @@ xlog_alloc_log( struct xfs_buftarg *log_target, xfs_daddr_t blk_offset, int num_bblks); -STATIC int -xlog_space_left( - struct xlog *log, - atomic64_t *head); STATIC void xlog_dealloc_log( struct xlog *log); @@ -51,10 +47,6 @@ xlog_state_get_iclog_space( struct xlog_ticket *ticket, int *logoffsetp); STATIC void -xlog_grant_push_ail( - struct xlog *log, - int need_bytes); -STATIC void xlog_sync( struct xlog *log, struct xlog_in_core *iclog, @@ -242,42 +234,15 @@ xlog_grant_head_wake( { struct xlog_ticket *tic; int need_bytes; - bool woken_task = false; list_for_each_entry(tic, &head->waiters, t_queue) { - - /* - * There is a chance that the size of the CIL checkpoints in - * progress at the last AIL push target calculation resulted in - * limiting the target to the log head (l_last_sync_lsn) at the - * time. This may not reflect where the log head is now as the - * CIL checkpoints may have completed. - * - * Hence when we are woken here, it may be that the head of the - * log that has moved rather than the tail. As the tail didn't - * move, there still won't be space available for the - * reservation we require. However, if the AIL has already - * pushed to the target defined by the old log head location, we - * will hang here waiting for something else to update the AIL - * push target. - * - * Therefore, if there isn't space to wake the first waiter on - * the grant head, we need to push the AIL again to ensure the - * target reflects both the current log tail and log head - * position before we wait for the tail to move again. - */ - need_bytes = xlog_ticket_reservation(log, head, tic); - if (*free_bytes < need_bytes) { - if (!woken_task) - xlog_grant_push_ail(log, need_bytes); + if (*free_bytes < need_bytes) return false; - } *free_bytes -= need_bytes; trace_xfs_log_grant_wake_up(log, tic); wake_up_process(tic->t_task); - woken_task = true; } return true; @@ -296,13 +261,15 @@ xlog_grant_head_wait( do { if (xlog_is_shutdown(log)) goto shutdown; - xlog_grant_push_ail(log, need_bytes); __set_current_state(TASK_UNINTERRUPTIBLE); spin_unlock(&head->lock); XFS_STATS_INC(log->l_mp, xs_sleep_logspace); + /* Push on the AIL to free up all the log space. */ + xfs_ail_push_all(log->l_ailp); + trace_xfs_log_grant_sleep(log, tic); schedule(); trace_xfs_log_grant_wake(log, tic); @@ -418,9 +385,6 @@ xfs_log_regrant( * of rolling transactions in the log easily. */ tic->t_tid++; - - xlog_grant_push_ail(log, tic->t_unit_res); - tic->t_curr_res = tic->t_unit_res; if (tic->t_cnt > 0) return 0; @@ -477,12 +441,7 @@ xfs_log_reserve( ASSERT(*ticp == NULL); tic = xlog_ticket_alloc(log, unit_bytes, cnt, permanent); *ticp = tic; - - xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt - : tic->t_unit_res); - trace_xfs_log_reserve(log, tic); - error = xlog_grant_head_check(log, &log->l_reserve_head, tic, &need_bytes); if (error) @@ -1330,7 +1289,7 @@ xlog_assign_tail_lsn( * shortcut invalidity asserts in this case so that we don't trigger them * falsely. */ -STATIC int +int xlog_space_left( struct xlog *log, atomic64_t *head) @@ -1667,89 +1626,6 @@ xlog_alloc_log( return ERR_PTR(error); } /* xlog_alloc_log */ -/* - * Compute the LSN that we'd need to push the log tail towards in order to have - * (a) enough on-disk log space to log the number of bytes specified, (b) at - * least 25% of the log space free, and (c) at least 256 blocks free. If the - * log free space already meets all three thresholds, this function returns - * NULLCOMMITLSN. - */ -xfs_lsn_t -xlog_grant_push_threshold( - struct xlog *log, - int need_bytes) -{ - xfs_lsn_t threshold_lsn = 0; - xfs_lsn_t last_sync_lsn; - int free_blocks; - int free_bytes; - int threshold_block; - int threshold_cycle; - int free_threshold; - - ASSERT(BTOBB(need_bytes) < log->l_logBBsize); - - free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); - free_blocks = BTOBBT(free_bytes); - - /* - * Set the threshold for the minimum number of free blocks in the - * log to the maximum of what the caller needs, one quarter of the - * log, and 256 blocks. - */ - free_threshold = BTOBB(need_bytes); - free_threshold = max(free_threshold, (log->l_logBBsize >> 2)); - free_threshold = max(free_threshold, 256); - if (free_blocks >= free_threshold) - return NULLCOMMITLSN; - - xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle, - &threshold_block); - threshold_block += free_threshold; - if (threshold_block >= log->l_logBBsize) { - threshold_block -= log->l_logBBsize; - threshold_cycle += 1; - } - threshold_lsn = xlog_assign_lsn(threshold_cycle, - threshold_block); - /* - * Don't pass in an lsn greater than the lsn of the last - * log record known to be on disk. Use a snapshot of the last sync lsn - * so that it doesn't change between the compare and the set. - */ - last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); - if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0) - threshold_lsn = last_sync_lsn; - - return threshold_lsn; -} - -/* - * Push the tail of the log if we need to do so to maintain the free log space - * thresholds set out by xlog_grant_push_threshold. We may need to adopt a - * policy which pushes on an lsn which is further along in the log once we - * reach the high water mark. In this manner, we would be creating a low water - * mark. - */ -STATIC void -xlog_grant_push_ail( - struct xlog *log, - int need_bytes) -{ - xfs_lsn_t threshold_lsn; - - threshold_lsn = xlog_grant_push_threshold(log, need_bytes); - if (threshold_lsn == NULLCOMMITLSN || xlog_is_shutdown(log)) - return; - - /* - * Get the transaction layer to kick the dirty buffers out to - * disk asynchronously. No point in trying to do this if - * the filesystem is shutting down. - */ - xfs_ail_push(log->l_ailp, threshold_lsn); -} - /* * Stamp cycle number in every block */ @@ -2712,7 +2588,6 @@ xlog_state_set_callback( return; atomic64_set(&log->l_last_sync_lsn, header_lsn); - xlog_grant_push_ail(log, 0); } /* diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index d69acf881153d0..67c539cc9305c7 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -156,7 +156,6 @@ int xfs_log_quiesce(struct xfs_mount *mp); void xfs_log_clean(struct xfs_mount *mp); bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t); -xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes); bool xlog_force_shutdown(struct xlog *log, uint32_t shutdown_flags); int xfs_attr_use_log_assist(struct xfs_mount *mp); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 40e22ec0fbe69a..0482b11965e248 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -575,6 +575,8 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space) atomic64_set(head, xlog_assign_grant_head_val(cycle, space)); } +int xlog_space_left(struct xlog *log, atomic64_t *head); + /* * Committed Item List interfaces */ diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index e4c343096f95a3..a6b6fca1d13852 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -134,25 +134,6 @@ xfs_ail_min_lsn( return lsn; } -/* - * Return the maximum lsn held in the AIL, or zero if the AIL is empty. - */ -static xfs_lsn_t -xfs_ail_max_lsn( - struct xfs_ail *ailp) -{ - xfs_lsn_t lsn = 0; - struct xfs_log_item *lip; - - spin_lock(&ailp->ail_lock); - lip = xfs_ail_max(ailp); - if (lip) - lsn = lip->li_lsn; - spin_unlock(&ailp->ail_lock); - - return lsn; -} - /* * The cursor keeps track of where our current traversal is up to by tracking * the next item in the list for us. However, for this to be safe, removing an @@ -414,6 +395,56 @@ xfsaild_push_item( return lip->li_ops->iop_push(lip, &ailp->ail_buf_list); } +/* + * Compute the LSN that we'd need to push the log tail towards in order to have + * at least 25% of the log space free. If the log free space already meets this + * threshold, this function returns NULLCOMMITLSN. + */ +xfs_lsn_t +__xfs_ail_push_target( + struct xfs_ail *ailp) +{ + struct xlog *log = ailp->ail_log; + xfs_lsn_t threshold_lsn = 0; + xfs_lsn_t last_sync_lsn; + int free_blocks; + int free_bytes; + int threshold_block; + int threshold_cycle; + int free_threshold; + + free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); + free_blocks = BTOBBT(free_bytes); + + /* + * The threshold for the minimum number of free blocks is one quarter of + * the entire log space. + */ + free_threshold = log->l_logBBsize >> 2; + if (free_blocks >= free_threshold) + return NULLCOMMITLSN; + + xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle, + &threshold_block); + threshold_block += free_threshold; + if (threshold_block >= log->l_logBBsize) { + threshold_block -= log->l_logBBsize; + threshold_cycle += 1; + } + threshold_lsn = xlog_assign_lsn(threshold_cycle, + threshold_block); + /* + * Don't pass in an lsn greater than the lsn of the last + * log record known to be on disk. Use a snapshot of the last sync lsn + * so that it doesn't change between the compare and the set. + */ + last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); + if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0) + threshold_lsn = last_sync_lsn; + + return threshold_lsn; +} + static long xfsaild_push( struct xfs_ail *ailp) @@ -454,21 +485,24 @@ xfsaild_push( * capture updates that occur after the sync push waiter has gone to * sleep. */ - if (waitqueue_active(&ailp->ail_empty)) { + if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) || + waitqueue_active(&ailp->ail_empty)) { lip = xfs_ail_max(ailp); if (lip) target = lip->li_lsn; + else + clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate); } else { - /* barrier matches the ail_target update in xfs_ail_push() */ - smp_rmb(); - target = ailp->ail_target; - ailp->ail_target_prev = target; + target = __xfs_ail_push_target(ailp); } + if (target == NULLCOMMITLSN) + goto out_done; + /* we're done if the AIL is empty or our push has reached the end */ lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn); if (!lip) - goto out_done; + goto out_done_cursor; XFS_STATS_INC(mp, xs_push_ail); @@ -553,8 +587,9 @@ xfsaild_push( lsn = lip->li_lsn; } -out_done: +out_done_cursor: xfs_trans_ail_cursor_done(&cur); +out_done: spin_unlock(&ailp->ail_lock); if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list)) @@ -603,7 +638,7 @@ xfsaild( set_freezable(); while (1) { - if (tout && tout <= 20) + if (tout) set_current_state(TASK_KILLABLE|TASK_FREEZABLE); else set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); @@ -639,21 +674,9 @@ xfsaild( break; } + /* Idle if the AIL is empty. */ spin_lock(&ailp->ail_lock); - - /* - * Idle if the AIL is empty and we are not racing with a target - * update. We check the AIL after we set the task to a sleep - * state to guarantee that we either catch an ail_target update - * or that a wake_up resets the state to TASK_RUNNING. - * Otherwise, we run the risk of sleeping indefinitely. - * - * The barrier matches the ail_target update in xfs_ail_push(). - */ - smp_rmb(); - if (!xfs_ail_min(ailp) && - ailp->ail_target == ailp->ail_target_prev && - list_empty(&ailp->ail_buf_list)) { + if (!xfs_ail_min(ailp) && list_empty(&ailp->ail_buf_list)) { spin_unlock(&ailp->ail_lock); schedule(); tout = 0; @@ -675,56 +698,6 @@ xfsaild( return 0; } -/* - * This routine is called to move the tail of the AIL forward. It does this by - * trying to flush items in the AIL whose lsns are below the given - * threshold_lsn. - * - * The push is run asynchronously in a workqueue, which means the caller needs - * to handle waiting on the async flush for space to become available. - * We don't want to interrupt any push that is in progress, hence we only queue - * work if we set the pushing bit appropriately. - * - * We do this unlocked - we only need to know whether there is anything in the - * AIL at the time we are called. We don't need to access the contents of - * any of the objects, so the lock is not needed. - */ -void -xfs_ail_push( - struct xfs_ail *ailp, - xfs_lsn_t threshold_lsn) -{ - struct xfs_log_item *lip; - - lip = xfs_ail_min(ailp); - if (!lip || xlog_is_shutdown(ailp->ail_log) || - XFS_LSN_CMP(threshold_lsn, ailp->ail_target) <= 0) - return; - - /* - * Ensure that the new target is noticed in push code before it clears - * the XFS_AIL_PUSHING_BIT. - */ - smp_wmb(); - xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn); - smp_wmb(); - - wake_up_process(ailp->ail_task); -} - -/* - * Push out all items in the AIL immediately - */ -void -xfs_ail_push_all( - struct xfs_ail *ailp) -{ - xfs_lsn_t threshold_lsn = xfs_ail_max_lsn(ailp); - - if (threshold_lsn) - xfs_ail_push(ailp, threshold_lsn); -} - /* * Push out all items in the AIL immediately and wait until the AIL is empty. */ @@ -829,6 +802,13 @@ xfs_trans_ail_update_bulk( if (!list_empty(&tmp)) xfs_ail_splice(ailp, cur, &tmp, lsn); + /* + * If this is the first insert, wake up the push daemon so it can + * actively scan for items to push. + */ + if (!mlip) + wake_up_process(ailp->ail_task); + xfs_ail_update_finish(ailp, tail_lsn); } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 52a45f0a5ef173..9a131e7fae9467 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -52,16 +52,18 @@ struct xfs_ail { struct xlog *ail_log; struct task_struct *ail_task; struct list_head ail_head; - xfs_lsn_t ail_target; - xfs_lsn_t ail_target_prev; struct list_head ail_cursors; spinlock_t ail_lock; xfs_lsn_t ail_last_pushed_lsn; int ail_log_flush; + unsigned long ail_opstate; struct list_head ail_buf_list; wait_queue_head_t ail_empty; }; +/* Push all items out of the AIL immediately. */ +#define XFS_AIL_OPSTATE_PUSH_ALL 0u + /* * From xfs_trans_ail.c */ @@ -98,10 +100,29 @@ void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn) __releases(ailp->ail_lock); void xfs_trans_ail_delete(struct xfs_log_item *lip, int shutdown_type); -void xfs_ail_push(struct xfs_ail *, xfs_lsn_t); -void xfs_ail_push_all(struct xfs_ail *); -void xfs_ail_push_all_sync(struct xfs_ail *); -struct xfs_log_item *xfs_ail_min(struct xfs_ail *ailp); +static inline void xfs_ail_push(struct xfs_ail *ailp) +{ + wake_up_process(ailp->ail_task); +} + +static inline void xfs_ail_push_all(struct xfs_ail *ailp) +{ + if (!test_and_set_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate)) + xfs_ail_push(ailp); +} + +xfs_lsn_t __xfs_ail_push_target(struct xfs_ail *ailp); +static inline xfs_lsn_t xfs_ail_push_target(struct xfs_ail *ailp) +{ + xfs_lsn_t lsn; + + spin_lock(&ailp->ail_lock); + lsn = __xfs_ail_push_target(ailp); + spin_unlock(&ailp->ail_lock); + return lsn; +} + +void xfs_ail_push_all_sync(struct xfs_ail *ailp); xfs_lsn_t xfs_ail_min_lsn(struct xfs_ail *ailp); struct xfs_log_item * xfs_trans_ail_cursor_first(struct xfs_ail *ailp, From patchwork Thu Jun 20 07:21:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704898 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39D40639 for ; Thu, 20 Jun 2024 07:22:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868123; cv=none; b=rEu/EdOxaqPt5uSFpZdc6hs5smcpuR9s0mFqhhFoiB+lMww9CSmFX66hkvxJ12XbAaweJiH30V03jgGKIBvH1kKm2vKfKQ3mgm/79C2lae/lVv4dtcHacMyU0h3OojsQSlv27tljOyxxwasXEw9A7Q6uf7v5121O8kl82ZU8vvY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868123; c=relaxed/simple; bh=XFFeGXQGXLZvnvqbRQ/GdD7RlskbLRQ3tDL0nrvypvI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PxMMVOITYwqBpGCSlL786BqbaYmeAXhps/zlyMCUuGM5YPA1sRIQ6n3NnYwEsgVGLvQ7zsF1VINg2p3Dz7wsCKUadrx6ejVPMTBylrOeLiMD78+aiQZh6N+wEKqyfpRHuVFom10oK/f4sISbTYuV9L5AXAcfMzRfgwcAtyUg+Ss= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=x0luDyTa; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="x0luDyTa" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=aLqjl32smzVF0du46DWjQITWA++/JWp0RD4uyBY1jCs=; b=x0luDyTaIjk8avsy+IY09OEv/2 wjuEth3rkgfrAPxyxoJTJnQVi99qd3sIsLzKf97zj3PnAJ8jPMy265dI8LbIZfIeyg45+4+BN+T4x SFFHMnl63A+Eb0kJAG06Plue61Pt0eokmbaJXP4s4wxq4/dBYkd5Wjw6qHBTHlHbebQtMm3AgTY25 cwv01FDtw+spy7Qv6NXyImAKn2w7D5h+x1teDxYKnO48O3VwqBDTDI8dw6+iYYFLAIfApPwwW9RV0 FP2oa49z1F/18rlpVTyC1MGi9dp2noRfT5CDlKxhCLSuX+4ugXwyHWtYiPPDUhF1A0rcxIGA5SwTF ahCMDXzA==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7N-00000003xZC-1W30; Thu, 20 Jun 2024 07:22:01 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 04/11] xfs: background AIL push should target physical space Date: Thu, 20 Jun 2024 09:21:21 +0200 Message-ID: <20240620072146.530267-5-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner Currently the AIL attempts to keep 25% of the "log space" free, where the current used space is tracked by the reserve grant head. That is, it tracks both physical space used plus the amount reserved by transactions in progress. When we start tail pushing, we are trying to make space for new reservations by writing back older metadata and the log is generally physically full of dirty metadata, and reservations for modifications in flight take up whatever space the AIL can physically free up. Hence we don't really need to take into account the reservation space that has been used - we just need to keep the log tail moving as fast as we can to free up space for more reservations to be made. We know exactly how much physical space the journal is consuming in the AIL (i.e. max LSN - min LSN) so we can base push thresholds directly on this state rather than have to look at grant head reservations to determine how much to physically push out of the log. This also allows code that needs to know if log items in the current transaction need to be pushed or re-logged to simply sample the current target - they don't need to calculate the current target themselves. This avoids the need for any locking when doing such checks. Further, moving to a physical target means we don't need "push all until empty semantics" like were introduced in the previous patch. We can now test and clear the "push all" as a one-shot command to set the target to the current head of the AIL. This allows the xfsaild to maximise the use of log space right up to the point where conditions indicate that the xfsaild is not keeping up with load and it needs to work harder, and as soon as those constraints go away (i.e. external code no longer needs everything pushed) the xfsaild will return to maintaining the normal 25% free space thresholds. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_defer.c | 2 +- fs/xfs/xfs_log_priv.h | 18 ++++++ fs/xfs/xfs_trans_ail.c | 116 +++++++++++++++++++------------------- fs/xfs/xfs_trans_priv.h | 11 +--- 4 files changed, 80 insertions(+), 67 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index e2c8308d518b56..40021849b42f0a 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -558,7 +558,7 @@ xfs_defer_relog( * the log threshold once per call. */ if (threshold_lsn == NULLCOMMITLSN) { - threshold_lsn = xfs_ail_push_target(log->l_ailp); + threshold_lsn = xfs_ail_get_push_target(log->l_ailp); if (threshold_lsn == NULLCOMMITLSN) break; } diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 0482b11965e248..971871b84d8436 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -625,6 +625,24 @@ xlog_wait( int xlog_wait_on_iclog(struct xlog_in_core *iclog) __releases(iclog->ic_log->l_icloglock); +/* Calculate the distance between two LSNs in bytes */ +static inline uint64_t +xlog_lsn_sub( + struct xlog *log, + xfs_lsn_t high, + xfs_lsn_t low) +{ + uint32_t hi_cycle = CYCLE_LSN(high); + uint32_t hi_block = BLOCK_LSN(high); + uint32_t lo_cycle = CYCLE_LSN(low); + uint32_t lo_block = BLOCK_LSN(low); + + if (hi_cycle == lo_cycle) + return BBTOB(hi_block - lo_block); + ASSERT((hi_cycle == lo_cycle + 1) || xlog_is_shutdown(log)); + return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block); +} + /* * The LSN is valid so long as it is behind the current LSN. If it isn't, this * means that the next log record that includes this metadata could have a diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index a6b6fca1d13852..26d4d9b3e35789 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -398,51 +398,69 @@ xfsaild_push_item( /* * Compute the LSN that we'd need to push the log tail towards in order to have * at least 25% of the log space free. If the log free space already meets this - * threshold, this function returns NULLCOMMITLSN. + * threshold, this function returns the lowest LSN in the AIL to slowly keep + * writeback ticking over and the tail of the log moving forward. */ -xfs_lsn_t -__xfs_ail_push_target( +static xfs_lsn_t +xfs_ail_calc_push_target( struct xfs_ail *ailp) { - struct xlog *log = ailp->ail_log; - xfs_lsn_t threshold_lsn = 0; - xfs_lsn_t last_sync_lsn; - int free_blocks; - int free_bytes; - int threshold_block; - int threshold_cycle; - int free_threshold; - - free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); - free_blocks = BTOBBT(free_bytes); + struct xlog *log = ailp->ail_log; + struct xfs_log_item *lip; + xfs_lsn_t target_lsn; + xfs_lsn_t max_lsn; + xfs_lsn_t min_lsn; + int32_t free_bytes; + uint32_t target_block; + uint32_t target_cycle; + + lockdep_assert_held(&ailp->ail_lock); + + lip = xfs_ail_max(ailp); + if (!lip) + return NULLCOMMITLSN; + + max_lsn = lip->li_lsn; + min_lsn = __xfs_ail_min_lsn(ailp); /* - * The threshold for the minimum number of free blocks is one quarter of - * the entire log space. + * If we are supposed to push all the items in the AIL, we want to push + * to the current head. We then clear the push flag so that we don't + * keep pushing newly queued items beyond where the push all command was + * run. If the push waiter wants to empty the ail, it should queue + * itself on the ail_empty wait queue. */ - free_threshold = log->l_logBBsize >> 2; - if (free_blocks >= free_threshold) - return NULLCOMMITLSN; + if (test_and_clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate)) + return max_lsn; + + /* If someone wants the AIL empty, keep pushing everything we have. */ + if (waitqueue_active(&ailp->ail_empty)) + return max_lsn; - xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle, - &threshold_block); - threshold_block += free_threshold; - if (threshold_block >= log->l_logBBsize) { - threshold_block -= log->l_logBBsize; - threshold_cycle += 1; - } - threshold_lsn = xlog_assign_lsn(threshold_cycle, - threshold_block); /* - * Don't pass in an lsn greater than the lsn of the last - * log record known to be on disk. Use a snapshot of the last sync lsn - * so that it doesn't change between the compare and the set. + * Background pushing - attempt to keep 25% of the log free and if we + * have that much free retain the existing target. */ - last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); - if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0) - threshold_lsn = last_sync_lsn; + free_bytes = log->l_logsize - xlog_lsn_sub(log, max_lsn, min_lsn); + if (free_bytes >= log->l_logsize >> 2) + return ailp->ail_target; + + target_cycle = CYCLE_LSN(min_lsn); + target_block = BLOCK_LSN(min_lsn) + (log->l_logBBsize >> 2); + if (target_block >= log->l_logBBsize) { + target_block -= log->l_logBBsize; + target_cycle += 1; + } + target_lsn = xlog_assign_lsn(target_cycle, target_block); + + /* Cap the target to the highest LSN known to be in the AIL. */ + if (XFS_LSN_CMP(target_lsn, max_lsn) > 0) + return max_lsn; - return threshold_lsn; + /* If the existing target is higher than the new target, keep it. */ + if (XFS_LSN_CMP(ailp->ail_target, target_lsn) >= 0) + return ailp->ail_target; + return target_lsn; } static long @@ -453,7 +471,6 @@ xfsaild_push( struct xfs_ail_cursor cur; struct xfs_log_item *lip; xfs_lsn_t lsn; - xfs_lsn_t target = NULLCOMMITLSN; long tout; int stuck = 0; int flushing = 0; @@ -478,25 +495,8 @@ xfsaild_push( } spin_lock(&ailp->ail_lock); - - /* - * If we have a sync push waiter, we always have to push till the AIL is - * empty. Update the target to point to the end of the AIL so that - * capture updates that occur after the sync push waiter has gone to - * sleep. - */ - if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) || - waitqueue_active(&ailp->ail_empty)) { - lip = xfs_ail_max(ailp); - if (lip) - target = lip->li_lsn; - else - clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate); - } else { - target = __xfs_ail_push_target(ailp); - } - - if (target == NULLCOMMITLSN) + WRITE_ONCE(ailp->ail_target, xfs_ail_calc_push_target(ailp)); + if (ailp->ail_target == NULLCOMMITLSN) goto out_done; /* we're done if the AIL is empty or our push has reached the end */ @@ -506,10 +506,10 @@ xfsaild_push( XFS_STATS_INC(mp, xs_push_ail); - ASSERT(target != NULLCOMMITLSN); + ASSERT(ailp->ail_target != NULLCOMMITLSN); lsn = lip->li_lsn; - while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) { + while ((XFS_LSN_CMP(lip->li_lsn, ailp->ail_target) <= 0)) { int lock_result; /* @@ -595,7 +595,7 @@ xfsaild_push( if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list)) ailp->ail_log_flush++; - if (!count || XFS_LSN_CMP(lsn, target) >= 0) { + if (!count || XFS_LSN_CMP(lsn, ailp->ail_target) >= 0) { /* * We reached the target or the AIL is empty, so wait a bit * longer for I/O to complete and remove pushed items from the diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 9a131e7fae9467..60b4707c3a6583 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -59,6 +59,7 @@ struct xfs_ail { unsigned long ail_opstate; struct list_head ail_buf_list; wait_queue_head_t ail_empty; + xfs_lsn_t ail_target; }; /* Push all items out of the AIL immediately. */ @@ -111,15 +112,9 @@ static inline void xfs_ail_push_all(struct xfs_ail *ailp) xfs_ail_push(ailp); } -xfs_lsn_t __xfs_ail_push_target(struct xfs_ail *ailp); -static inline xfs_lsn_t xfs_ail_push_target(struct xfs_ail *ailp) +static inline xfs_lsn_t xfs_ail_get_push_target(struct xfs_ail *ailp) { - xfs_lsn_t lsn; - - spin_lock(&ailp->ail_lock); - lsn = __xfs_ail_push_target(ailp); - spin_unlock(&ailp->ail_lock); - return lsn; + return READ_ONCE(ailp->ail_target); } void xfs_ail_push_all_sync(struct xfs_ail *ailp); From patchwork Thu Jun 20 07:21:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704899 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 681D7639 for ; Thu, 20 Jun 2024 07:22:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868125; cv=none; b=mcioS6x5OMmO9f1vgLPe5vw4diZeklcH60+hs1CkkOIz5bwUl8qGX8diK8ES9zG9bOmlsNZ4ulPPf7WFzDy/1D/0I55pJjOsz2swZ5GqOIFk0FyGZP3DXQ7DpBQ58QFSYawPxHi5Q045sYUqEIKlnP4O484+S79TjCNFOV0eBIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868125; c=relaxed/simple; bh=7qUQfR0v5+MKtQcvIZ5Oici1b50SxjLDGHkdcT1hbFU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NPdKyEgx+Mgct8q4eTa9Qf6zZrpxZtrvjcP+pEEpKBsXGar2HIXp6DKGaaD+AZsO9PbfVvOczM5D5Fn2GbHvXREQEfk9RqX03UjHoXUJBcelF9a93PhAn80nPP3czalw5SMPBP/gDOcyzwsrQKEdRexg/yyuaOwDhcVLfOR5okQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=q0x+vQtG; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="q0x+vQtG" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=zctmHu6Okd4YMSirlzAVygl3zsJVUtmYpw6crZL4imw=; b=q0x+vQtGoSjgBBy1O3Q4AD2smb j2TsD3P8GlwvAMq2go2yvPPz1XgO+fMzwYIqQNs8665ro/lWQ1r11FswLx+ojOUfLYgXBTUMwuJ9Q +uxdGctQl5EfWpo0L9dr0NwFPM6xRrj0z+5f0k3OSDUkRiWolVJzjZu8DQcxzUST/M5vB3W+a8xQv WDSREtM6pGIW5H9I9B0+a3H5VxecfryY2y0QvB4BcrkreqqVGzl11FhXqX2JOSMiTw0dc0zKnJayl Gl1zosrK9oYrZeU5Urv9Y794nrUqQMemMGCwL71cVkePgyBwEEl4dk8j3hTc4AOtb29SCmqW2PhX7 sexa3qRw==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7P-00000003xa5-2iIA; Thu, 20 Jun 2024 07:22:04 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 05/11] xfs: ensure log tail is always up to date Date: Thu, 20 Jun 2024 09:21:22 +0200 Message-ID: <20240620072146.530267-6-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner Whenever we write an iclog, we call xlog_assign_tail_lsn() to update the current tail before we write it into the iclog header. This means we have to take the AIL lock on every iclog write just to check if the tail of the log has moved. This doesn't avoid races with log tail updates - the log tail could move immediately after we assign the tail to the iclog header and hence by the time the iclog reaches stable storage the tail LSN has moved forward in memory. Hence the log tail LSN in the iclog header is really just a point in time snapshot of the current state of the AIL. With this in mind, if we simply update the in memory log->l_tail_lsn every time it changes in the AIL, there is no need to update the in memory value when we are writing it into an iclog - it will already be up-to-date in memory and checking the AIL again will not change this. Hence xlog_state_release_iclog() does not need to check the AIL to update the tail lsn and can just sample it directly without needing to take the AIL lock. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 5 ++--- fs/xfs/xfs_trans_ail.c | 17 +++++++++++++++-- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 235fcf6dc4eeb5..ae22f361627fe4 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -530,7 +530,6 @@ xlog_state_release_iclog( struct xlog_in_core *iclog, struct xlog_ticket *ticket) { - xfs_lsn_t tail_lsn; bool last_ref; lockdep_assert_held(&log->l_icloglock); @@ -545,8 +544,8 @@ xlog_state_release_iclog( if ((iclog->ic_state == XLOG_STATE_WANT_SYNC || (iclog->ic_flags & XLOG_ICL_NEED_FUA)) && !iclog->ic_header.h_tail_lsn) { - tail_lsn = xlog_assign_tail_lsn(log->l_mp); - iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn); + iclog->ic_header.h_tail_lsn = + cpu_to_be64(atomic64_read(&log->l_tail_lsn)); } last_ref = atomic_dec_and_test(&iclog->ic_refcnt); diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 26d4d9b3e35789..7d6ccd21aae2e5 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -720,6 +720,13 @@ xfs_ail_push_all_sync( finish_wait(&ailp->ail_empty, &wait); } +/* + * Callers should pass the original tail lsn so that we can detect if the tail + * has moved as a result of the operation that was performed. If the caller + * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the + * "did the tail LSN change?" checks. If the caller wants to avoid a tail update + * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0. + */ void xfs_ail_update_finish( struct xfs_ail *ailp, @@ -804,10 +811,16 @@ xfs_trans_ail_update_bulk( /* * If this is the first insert, wake up the push daemon so it can - * actively scan for items to push. + * actively scan for items to push. We also need to do a log tail + * LSN update to ensure that it is correctly tracked by the log, so + * set the tail_lsn to NULLCOMMITLSN so that xfs_ail_update_finish() + * will see that the tail lsn has changed and will update the tail + * appropriately. */ - if (!mlip) + if (!mlip) { wake_up_process(ailp->ail_task); + tail_lsn = NULLCOMMITLSN; + } xfs_ail_update_finish(ailp, tail_lsn); } From patchwork Thu Jun 20 07:21:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704900 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F465639 for ; Thu, 20 Jun 2024 07:22:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868129; cv=none; b=sSqjYEs6LorcB4LdmDQFNEPMGa+t9mZc0F8rJOy2XvmzaowT6yPrp2ow1S1oGvWCz349tYNVZBR3gVeudzwiAr99WbbeOUO5ieV7w/NmWdGe5gao6LDwn3IuqNz2X2kODOZFqxw6x/go+7rKbdpTqCSHADXGOnrZwTPuPmv71Lg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868129; c=relaxed/simple; bh=fjibyn2uBA+R3HKCZB5Mp/FkR1/al0glPpsY+Kmg9bY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C8w8IftxlhZxuf/Dwza7ZoDtDNr51m7Qj/wBdaSGVRdUkwylF5ac/Gd1dGJP4/aW7IUlAAbukKx/r78UlTlbn+TgO3zo5NDr9NMFOxAX+DoyytKFO4NhKpnc5lSzaJX2c9pvx/wVk1pWKomwZvB1Zzx2xCHpW99EShvqFkB4tcw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Jo1o163D; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Jo1o163D" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=4P2ZymEoth/7ohyWbdedzvzbHpTQ4kceVGUgO5UO6rw=; b=Jo1o163Dk4zGuwpxz8i9bWy9m/ KFyJegFSUU7bI1RYSHSnrghU/hoZ4ld6e7KsfQhABAMBEyCtr/UnJ72G3bzzd6893tuPbchsCZdzd xZNZX9ZEqcuwZwCjukQ8DgegvZs6DK+LlaedM91qVASKo9Qth1XKafSg3y+YQuKm8Vka3oeORl7U4 nW6zQGTEgAFGFS2UfVrxO0eSnAY0nUgKz4co7yxMXdBfQuMZFLzjPHlj8QAIJf4mAaqI5iyPgzh2z iSlWBwUikBa8BQqZXfmY9b3GzGGKC6AQxGQn/gFLH1xNRNrgSpvm2cy5prZgts+2NWD1ZOHqmri7G SYMbygSw==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7R-00000003xaY-3vMs; Thu, 20 Jun 2024 07:22:06 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 06/11] xfs: l_last_sync_lsn is really AIL state Date: Thu, 20 Jun 2024 09:21:23 +0200 Message-ID: <20240620072146.530267-7-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner The current implementation of xlog_assign_tail_lsn() assumes that when the AIL is empty, the log tail matches the LSN of the last written commit record. This is recorded in xlog_state_set_callback() as log->l_last_sync_lsn when the iclog state changes to XLOG_STATE_CALLBACK. This change is then immediately followed by running the callbacks on the iclog which then insert the log items into the AIL at the "commit lsn" of that checkpoint. The AIL tracks log items via the start record LSN of the checkpoint, not the commit record LSN. This is because we can pipeline multiple checkpoints, and so the start record of checkpoint N+1 can be written before the commit record of checkpoint N. i.e: start N commit N +-------------+------------+----------------+ start N+1 commit N+1 The tail of the log cannot be moved to the LSN of commit N when all the items of that checkpoint are written back, because then the start record for N+1 is no longer in the active portion of the log and recovery will fail/corrupt the filesystem. Hence when all the log items in checkpoint N are written back, the tail of the log most now only move as far forwards as the start LSN of checkpoint N+1. Hence we cannot use the maximum start record LSN the AIL sees as a replacement the pointer to the current head of the on-disk log records. However, we currently only use the l_last_sync_lsn when the AIL is empty - when there is no start LSN remaining, the tail of the log moves to the LSN of the last commit record as this is where recovery needs to start searching for recoverable records. THe next checkpoint will have a start record LSN that is higher than l_last_sync_lsn, and so everything still works correctly when new checkpoints are written to an otherwise empty log. l_last_sync_lsn is an atomic variable because it is currently updated when an iclog with callbacks attached moves to the CALLBACK state. While we hold the icloglock at this point, we don't hold the AIL lock. When we assign the log tail, we hold the AIL lock, not the icloglock because we have to look up the AIL. Hence it is an atomic variable so it's not bound to a specific lock context. However, the iclog callbacks are only used for CIL checkpoints. We don't use callbacks with unmount record writes, so the l_last_sync_lsn variable only gets updated when we are processing CIL checkpoint callbacks. And those callbacks run under AIL lock contexts, not icloglock context. The CIL checkpoint already knows what the LSN of the iclog the commit record was written to (obtained when written into the iclog before submission) and so we can update the l_last_sync_lsn under the AIL lock in this callback. No other iclog callbacks will run until the currently executing one completes, and hence we can update the l_last_sync_lsn under the AIL lock safely. This means l_last_sync_lsn can move to the AIL as the "ail_head_lsn" and it can be used to replace the atomic l_last_sync_lsn in the iclog code. This makes tracking the log tail belong entirely to the AIL, rather than being smeared across log, iclog and AIL state and locking. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_log.c | 81 +++++----------------------------------- fs/xfs/xfs_log_cil.c | 54 ++++++++++++++++++++------- fs/xfs/xfs_log_priv.h | 9 ++--- fs/xfs/xfs_log_recover.c | 19 +++++----- fs/xfs/xfs_trace.c | 1 + fs/xfs/xfs_trace.h | 8 ++-- fs/xfs/xfs_trans_ail.c | 26 +++++++++++-- fs/xfs/xfs_trans_priv.h | 13 +++++++ 8 files changed, 102 insertions(+), 109 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index ae22f361627fe4..1977afecd385d5 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1230,47 +1230,6 @@ xfs_log_cover( return error; } -/* - * We may be holding the log iclog lock upon entering this routine. - */ -xfs_lsn_t -xlog_assign_tail_lsn_locked( - struct xfs_mount *mp) -{ - struct xlog *log = mp->m_log; - struct xfs_log_item *lip; - xfs_lsn_t tail_lsn; - - assert_spin_locked(&mp->m_ail->ail_lock); - - /* - * To make sure we always have a valid LSN for the log tail we keep - * track of the last LSN which was committed in log->l_last_sync_lsn, - * and use that when the AIL was empty. - */ - lip = xfs_ail_min(mp->m_ail); - if (lip) - tail_lsn = lip->li_lsn; - else - tail_lsn = atomic64_read(&log->l_last_sync_lsn); - trace_xfs_log_assign_tail_lsn(log, tail_lsn); - atomic64_set(&log->l_tail_lsn, tail_lsn); - return tail_lsn; -} - -xfs_lsn_t -xlog_assign_tail_lsn( - struct xfs_mount *mp) -{ - xfs_lsn_t tail_lsn; - - spin_lock(&mp->m_ail->ail_lock); - tail_lsn = xlog_assign_tail_lsn_locked(mp); - spin_unlock(&mp->m_ail->ail_lock); - - return tail_lsn; -} - /* * Return the space in the log between the tail and the head. The head * is passed in the cycle/bytes formal parms. In the special case where @@ -1501,7 +1460,6 @@ xlog_alloc_log( log->l_prev_block = -1; /* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */ xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0); - xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0); log->l_curr_cycle = 1; /* 0 is bad since this is initial value */ if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1) @@ -2549,44 +2507,23 @@ xlog_get_lowest_lsn( return lowest_lsn; } -/* - * Completion of a iclog IO does not imply that a transaction has completed, as - * transactions can be large enough to span many iclogs. We cannot change the - * tail of the log half way through a transaction as this may be the only - * transaction in the log and moving the tail to point to the middle of it - * will prevent recovery from finding the start of the transaction. Hence we - * should only update the last_sync_lsn if this iclog contains transaction - * completion callbacks on it. - * - * We have to do this before we drop the icloglock to ensure we are the only one - * that can update it. - * - * If we are moving the last_sync_lsn forwards, we also need to ensure we kick - * the reservation grant head pushing. This is due to the fact that the push - * target is bound by the current last_sync_lsn value. Hence if we have a large - * amount of log space bound up in this committing transaction then the - * last_sync_lsn value may be the limiting factor preventing tail pushing from - * freeing space in the log. Hence once we've updated the last_sync_lsn we - * should push the AIL to ensure the push target (and hence the grant head) is - * no longer bound by the old log head location and can move forwards and make - * progress again. - */ static void xlog_state_set_callback( struct xlog *log, struct xlog_in_core *iclog, xfs_lsn_t header_lsn) { + /* + * If there are no callbacks on this iclog, we can mark it clean + * immediately and return. Otherwise we need to run the + * callbacks. + */ + if (list_empty(&iclog->ic_callbacks)) { + xlog_state_clean_iclog(log, iclog); + return; + } trace_xlog_iclog_callback(iclog, _RET_IP_); iclog->ic_state = XLOG_STATE_CALLBACK; - - ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), - header_lsn) <= 0); - - if (list_empty_careful(&iclog->ic_callbacks)) - return; - - atomic64_set(&log->l_last_sync_lsn, header_lsn); } /* diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 141bde08bd6e3c..482955f1fa1f9f 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -721,6 +721,24 @@ xlog_cil_ail_insert_batch( * items into the AIL. This uses bulk insertion techniques to minimise AIL lock * traffic. * + * The AIL tracks log items via the start record LSN of the checkpoint, + * not the commit record LSN. This is because we can pipeline multiple + * checkpoints, and so the start record of checkpoint N+1 can be + * written before the commit record of checkpoint N. i.e: + * + * start N commit N + * +-------------+------------+----------------+ + * start N+1 commit N+1 + * + * The tail of the log cannot be moved to the LSN of commit N when all + * the items of that checkpoint are written back, because then the + * start record for N+1 is no longer in the active portion of the log + * and recovery will fail/corrupt the filesystem. + * + * Hence when all the log items in checkpoint N are written back, the + * tail of the log most now only move as far forwards as the start LSN + * of checkpoint N+1. + * * If we are called with the aborted flag set, it is because a log write during * a CIL checkpoint commit has failed. In this case, all the items in the * checkpoint have already gone through iop_committed and iop_committing, which @@ -738,24 +756,33 @@ xlog_cil_ail_insert_batch( */ static void xlog_cil_ail_insert( - struct xlog *log, - struct list_head *lv_chain, - xfs_lsn_t commit_lsn, + struct xfs_cil_ctx *ctx, bool aborted) { #define LOG_ITEM_BATCH_SIZE 32 - struct xfs_ail *ailp = log->l_ailp; + struct xfs_ail *ailp = ctx->cil->xc_log->l_ailp; struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; struct xfs_log_vec *lv; struct xfs_ail_cursor cur; int i = 0; + /* + * Update the AIL head LSN with the commit record LSN of this + * checkpoint. As iclogs are always completed in order, this should + * always be the same (as iclogs can contain multiple commit records) or + * higher LSN than the current head. We do this before insertion of the + * items so that log space checks during insertion will reflect the + * space that this checkpoint has already consumed. + */ + ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 || + aborted); spin_lock(&ailp->ail_lock); - xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn); + ailp->ail_head_lsn = ctx->commit_lsn; + xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn); spin_unlock(&ailp->ail_lock); /* unpin all the log items */ - list_for_each_entry(lv, lv_chain, lv_list) { + list_for_each_entry(lv, &ctx->lv_chain, lv_list) { struct xfs_log_item *lip = lv->lv_item; xfs_lsn_t item_lsn; @@ -768,9 +795,10 @@ xlog_cil_ail_insert( } if (lip->li_ops->iop_committed) - item_lsn = lip->li_ops->iop_committed(lip, commit_lsn); + item_lsn = lip->li_ops->iop_committed(lip, + ctx->start_lsn); else - item_lsn = commit_lsn; + item_lsn = ctx->start_lsn; /* item_lsn of -1 means the item needs no further processing */ if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) @@ -787,7 +815,7 @@ xlog_cil_ail_insert( continue; } - if (item_lsn != commit_lsn) { + if (item_lsn != ctx->start_lsn) { /* * Not a bulk update option due to unusual item_lsn. @@ -810,14 +838,15 @@ xlog_cil_ail_insert( log_items[i++] = lv->lv_item; if (i >= LOG_ITEM_BATCH_SIZE) { xlog_cil_ail_insert_batch(ailp, &cur, log_items, - LOG_ITEM_BATCH_SIZE, commit_lsn); + LOG_ITEM_BATCH_SIZE, ctx->start_lsn); i = 0; } } /* make sure we insert the remainder! */ if (i) - xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn); + xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, + ctx->start_lsn); spin_lock(&ailp->ail_lock); xfs_trans_ail_cursor_done(&cur); @@ -863,8 +892,7 @@ xlog_cil_committed( spin_unlock(&ctx->cil->xc_push_lock); } - xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain, - ctx->start_lsn, abort); + xlog_cil_ail_insert(ctx, abort); xfs_extent_busy_sort(&ctx->busy_extents.extent_list); xfs_extent_busy_clear(mp, &ctx->busy_extents.extent_list, diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 971871b84d8436..4b8ef926044599 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -431,13 +431,10 @@ struct xlog { int l_prev_block; /* previous logical log block */ /* - * l_last_sync_lsn and l_tail_lsn are atomics so they can be set and - * read without needing to hold specific locks. To avoid operations - * contending with other hot objects, place each of them on a separate - * cacheline. + * l_tail_lsn is atomic so it can be set and read without needing to + * hold specific locks. To avoid operations contending with other hot + * objects, it on a separate cacheline. */ - /* lsn of last LR on disk */ - atomic64_t l_last_sync_lsn ____cacheline_aligned_in_smp; /* lsn of 1st LR with unflushed * buffers */ atomic64_t l_tail_lsn ____cacheline_aligned_in_smp; diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 4fe627991e8653..63f667f92c322e 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1177,8 +1177,8 @@ xlog_check_unmount_rec( */ xlog_assign_atomic_lsn(&log->l_tail_lsn, log->l_curr_cycle, after_umount_blk); - xlog_assign_atomic_lsn(&log->l_last_sync_lsn, - log->l_curr_cycle, after_umount_blk); + log->l_ailp->ail_head_lsn = + atomic64_read(&log->l_tail_lsn); *tail_blk = after_umount_blk; *clean = true; @@ -1212,7 +1212,7 @@ xlog_set_state( if (bump_cycle) log->l_curr_cycle++; atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn)); - atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(rhead->h_lsn)); + log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn); xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle, BBTOB(log->l_curr_block)); xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle, @@ -3363,14 +3363,13 @@ xlog_do_recover( /* * We now update the tail_lsn since much of the recovery has completed - * and there may be space available to use. If there were no extent - * or iunlinks, we can free up the entire log and set the tail_lsn to - * be the last_sync_lsn. This was set in xlog_find_tail to be the - * lsn of the last known good LR on disk. If there are extent frees - * or iunlinks they will have some entries in the AIL; so we look at - * the AIL to determine how to set the tail_lsn. + * and there may be space available to use. If there were no extent or + * iunlinks, we can free up the entire log. This was set in + * xlog_find_tail to be the lsn of the last known good LR on disk. If + * there are extent frees or iunlinks they will have some entries in the + * AIL; so we look at the AIL to determine how to set the tail_lsn. */ - xlog_assign_tail_lsn(mp); + xfs_ail_assign_tail_lsn(log->l_ailp); /* * Now that we've finished replaying all buffer and inode updates, diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 9c7fbaae2717dd..1aa013fdc36fcf 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -22,6 +22,7 @@ #include "xfs_trans.h" #include "xfs_log.h" #include "xfs_log_priv.h" +#include "xfs_trans_priv.h" #include "xfs_buf_item.h" #include "xfs_quota.h" #include "xfs_dquot_item.h" diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 25ff6fe1eb6c8a..13f6e6cab572ae 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1404,19 +1404,19 @@ TRACE_EVENT(xfs_log_assign_tail_lsn, __field(dev_t, dev) __field(xfs_lsn_t, new_lsn) __field(xfs_lsn_t, old_lsn) - __field(xfs_lsn_t, last_sync_lsn) + __field(xfs_lsn_t, head_lsn) ), TP_fast_assign( __entry->dev = log->l_mp->m_super->s_dev; __entry->new_lsn = new_lsn; __entry->old_lsn = atomic64_read(&log->l_tail_lsn); - __entry->last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); + __entry->head_lsn = log->l_ailp->ail_head_lsn; ), - TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, last sync %d/%d", + TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, head lsn %d/%d", MAJOR(__entry->dev), MINOR(__entry->dev), CYCLE_LSN(__entry->new_lsn), BLOCK_LSN(__entry->new_lsn), CYCLE_LSN(__entry->old_lsn), BLOCK_LSN(__entry->old_lsn), - CYCLE_LSN(__entry->last_sync_lsn), BLOCK_LSN(__entry->last_sync_lsn)) + CYCLE_LSN(__entry->head_lsn), BLOCK_LSN(__entry->head_lsn)) ) DECLARE_EVENT_CLASS(xfs_file_class, diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 7d6ccd21aae2e5..5f03f82c46838e 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -720,6 +720,26 @@ xfs_ail_push_all_sync( finish_wait(&ailp->ail_empty, &wait); } +void +__xfs_ail_assign_tail_lsn( + struct xfs_ail *ailp) +{ + struct xlog *log = ailp->ail_log; + xfs_lsn_t tail_lsn; + + assert_spin_locked(&ailp->ail_lock); + + if (xlog_is_shutdown(log)) + return; + + tail_lsn = __xfs_ail_min_lsn(ailp); + if (!tail_lsn) + tail_lsn = ailp->ail_head_lsn; + + trace_xfs_log_assign_tail_lsn(log, tail_lsn); + atomic64_set(&log->l_tail_lsn, tail_lsn); +} + /* * Callers should pass the original tail lsn so that we can detect if the tail * has moved as a result of the operation that was performed. If the caller @@ -734,15 +754,13 @@ xfs_ail_update_finish( { struct xlog *log = ailp->ail_log; - /* if the tail lsn hasn't changed, don't do updates or wakeups. */ + /* If the tail lsn hasn't changed, don't do updates or wakeups. */ if (!old_lsn || old_lsn == __xfs_ail_min_lsn(ailp)) { spin_unlock(&ailp->ail_lock); return; } - if (!xlog_is_shutdown(log)) - xlog_assign_tail_lsn_locked(log->l_mp); - + __xfs_ail_assign_tail_lsn(ailp); if (list_empty(&ailp->ail_head)) wake_up_all(&ailp->ail_empty); spin_unlock(&ailp->ail_lock); diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 60b4707c3a6583..bd841df93021ff 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -55,6 +55,7 @@ struct xfs_ail { struct list_head ail_cursors; spinlock_t ail_lock; xfs_lsn_t ail_last_pushed_lsn; + xfs_lsn_t ail_head_lsn; int ail_log_flush; unsigned long ail_opstate; struct list_head ail_buf_list; @@ -130,6 +131,18 @@ struct xfs_log_item * xfs_trans_ail_cursor_next(struct xfs_ail *ailp, struct xfs_ail_cursor *cur); void xfs_trans_ail_cursor_done(struct xfs_ail_cursor *cur); +void __xfs_ail_assign_tail_lsn(struct xfs_ail *ailp); + +static inline void +xfs_ail_assign_tail_lsn( + struct xfs_ail *ailp) +{ + + spin_lock(&ailp->ail_lock); + __xfs_ail_assign_tail_lsn(ailp); + spin_unlock(&ailp->ail_lock); +} + #if BITS_PER_LONG != 64 static inline void xfs_trans_ail_copy_lsn( From patchwork Thu Jun 20 07:21:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704901 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EC9542058 for ; Thu, 20 Jun 2024 07:22:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868130; cv=none; b=D3T5cDWFGP9q+HGzLxd40v1MjxkOlPLgSwOTqKhf4iRUYlTqsHVabBsgUV1cVuoZGfAvvQr6Vd3aIMi2+I4N3C/8hG/fQ82taO4uiNBi8/2jw016yHIY4byLcsNgiiI40yy1S7W8ja8No9GG7Cb139E7yyepKL+G0J6CnrZskr4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868130; c=relaxed/simple; bh=B1eOuAZkx9Pu9OGsV4CKE/l4I3y/4bFwVjWBPXTyl4U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=el5guxAsY4Rc0CdMEE0qrAtlcmRpIPvGeF1Fpywx326JVQYgpd7PLXdHVDLi7Yujn/oC/kj4n+0rFiFcZXtUPhGV2jgpsWT5J4nvcCWdHnEe/zQdzUVXW5E5HY80Ua9vup8daYMUZA/qdjQBKrzg0ygMtMSY2Q8JwB8S/4cRHmI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=rHSX/2pq; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="rHSX/2pq" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=z8AW968Fz99jMlvhAw0+6auj7U/QMMjpl/YT0DsCG10=; b=rHSX/2pq5w6Z5yu7EWWdSNQcqP evpNdaCG5bdSwIkDyBM+dPZ7j7RKierXtJfn27eQrQ2CIx9PrhUWKR3PpCnX7Li7d5Ca6lKNFeUsY 5c8p/C89E9mo20bHc+fG5Yn2u7w9AbW7kOmUpCkuNf+3Dn42zTuGQNbW77nhz53GYiyj10W099t3m WnrBphD6XIaxXLnQnRIwW5fmwBsWY9q59bVzd2hRVYR35gzS/3JKgNvOYzYvaYc7VTz8p22CSDSdG F5pN1vUmdqOr/MZfAC4d3kzR7KIllffmz/UF7nf+uQgiUg070+d7nwvbSKr1a0teFYdycxNqBVUUb 7K5mQwAw==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7U-00000003xbb-32op; Thu, 20 Jun 2024 07:22:09 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 07/11] xfs: collapse xlog_state_set_callback in caller Date: Thu, 20 Jun 2024 09:21:24 +0200 Message-ID: <20240620072146.530267-8-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner The function is called from a single place, and it isn't just setting the iclog state to XLOG_STATE_CALLBACK - it can mark iclogs clean, which moves them to states after CALLBACK. Hence the function is now badly named, and should just be folded into the caller where the iclog completion logic makes a whole lot more sense. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 31 +++++++++++-------------------- 1 file changed, 11 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 1977afecd385d5..381d6143a78777 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2507,25 +2507,6 @@ xlog_get_lowest_lsn( return lowest_lsn; } -static void -xlog_state_set_callback( - struct xlog *log, - struct xlog_in_core *iclog, - xfs_lsn_t header_lsn) -{ - /* - * If there are no callbacks on this iclog, we can mark it clean - * immediately and return. Otherwise we need to run the - * callbacks. - */ - if (list_empty(&iclog->ic_callbacks)) { - xlog_state_clean_iclog(log, iclog); - return; - } - trace_xlog_iclog_callback(iclog, _RET_IP_); - iclog->ic_state = XLOG_STATE_CALLBACK; -} - /* * Return true if we need to stop processing, false to continue to the next * iclog. The caller will need to run callbacks if the iclog is returned in the @@ -2557,7 +2538,17 @@ xlog_state_iodone_process_iclog( lowest_lsn = xlog_get_lowest_lsn(log); if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) return false; - xlog_state_set_callback(log, iclog, header_lsn); + /* + * If there are no callbacks on this iclog, we can mark it clean + * immediately and return. Otherwise we need to run the + * callbacks. + */ + if (list_empty(&iclog->ic_callbacks)) { + xlog_state_clean_iclog(log, iclog); + return false; + } + trace_xlog_iclog_callback(iclog, _RET_IP_); + iclog->ic_state = XLOG_STATE_CALLBACK; return false; default: /* From patchwork Thu Jun 20 07:21:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704902 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE16639AEB for ; Thu, 20 Jun 2024 07:22:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868133; cv=none; b=qWKdp/z+TiefZOAFPmJ4UUf1KxmdWqf+v6I0flB4Nstn0WFh5li7mIvT6U5X87k7fDi0WJx8aX28N0vnbhhhERhFELsG7w+lJpfrQMJ1VMC5EjkUoEfq0ZoM+E/1X1gDLfr/TNqJLppqqS2wsEbprCYAhoxHquBJdE3PZXJMu+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868133; c=relaxed/simple; bh=DgxgOIKI+nipdHXTaxRxLSvpLJoo8Brp7+SZZku+nqA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MaJQE/Xc+crivu5OFrM4EXOJHBscQPWwvqFfsogmnix+apZY+gR3W4mlTIPp5uXu2DKRZpMMThStZ0CwEh8vQFKWtx476/DVdNuMeWJT68dJh31JKySbpd/SPhSpIisqfnL1Xwc+5ItbUffLIGon0ZR4bpQeUbzCGvItfXZkuCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=z6dFJ+H9; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="z6dFJ+H9" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=B6o/gMsIQz/QF9dF7ECx00qQqqshzCqi5DwEnSr5XTk=; b=z6dFJ+H9nXIbPvIMe9SZDDDmC8 PLXZN7zIbjEK4cxWml2WumWPW43AwC2yVJsCmJU3sFTY3okj51ZGwSywL2isN8vVDyqk5GCxTjJhb WErX1BDwyT2tI9SEQLFJt60EJFK+DKiMss68fx42XV63krBNRewSod2yVJObVC2A7NMOyR+8LwB8N kaHbn40wTu9E1HaJXM7MApNMdaCewhCa0gYDDz5AHuVlUcYOx/u1HDojA7FcjWEprcwcgh8JvDnn5 +rUlAZeM1nv305042hvHYgyOl1SMVESZiZbxUg/qy1EYCmyFXD083FTWSB6jMZlvnP045Z9NmuIU3 9OPt/qxQ==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7X-00000003xcX-0hWY; Thu, 20 Jun 2024 07:22:11 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 08/11] xfs: track log space pinned by the AIL Date: Thu, 20 Jun 2024 09:21:25 +0200 Message-ID: <20240620072146.530267-9-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner Currently we track space used in the log by grant heads. These store the reserved space as a physical log location and combine both space reserved for future use with space already used in the log in a single variable. The amount of space consumed in the log is then calculated as the distance between the log tail and the grant head. The problem with tracking the grant head as a physical location comes from the fact that it tracks both log cycle count and offset into the log in bytes in a single 64 bit variable. because the cycle count on disk is a 32 bit number, this also limits the offset into the log to 32 bits. ANd because that is in bytes, we are limited to being able to track only 2GB of log space in the grant head. Hence to support larger physical logs, we need to track used space differently in the grant head. We no longer use the grant head for guiding AIL pushing, so the only thing it is now used for is determining if we've run out of reservation space via the calculation in xlog_space_left(). What we really need to do is move the grant heads away from tracking physical space in the log. The issue here is that space consumed in the log is not directly tracked by the current mechanism - the space consumed in the log by grant head reservations gets returned to the free pool by the tail of the log moving forward. i.e. the space isn't directly tracked or calculated, but the used grant space gets "freed" as the physical limits of the log are updated without actually needing to update the grant heads. Hence to move away from implicit, zero-update log space tracking we need to explicitly track the amount of physical space the log actually consumes separately to the in-memory reservations for operations that will be committed to the journal. Luckily, we already track the information we need to calculate this in the AIL itself. That is, the space currently consumed by the journal is the maximum LSN that the AIL has seen minus the current log tail. As we update both of these items dynamically as the head and tail of the log moves, we always know exactly how much space the journal consumes. This means that we also know exactly how much space the currently active reservations require, and exactly how much free space we have remaining for new reservations to be made. Most importantly, we know what these spaces are indepedently of the physical locations of the head and tail of the log. Hence by separating out the physical space consumed by the journal, we can now track reservations in the grant heads purely as a byte count, and the log can be considered full when the tail space + reservation space exceeds the size of the log. This means we can use the full 64 bits of grant head space for reservation space, completely removing the 32 bit byte count limitation on log size that they impose. Hence the first step in this conversion is to track and update the "log tail space" every time the AIL tail or maximum seen LSN changes. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log_cil.c | 9 ++++++--- fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_trans_ail.c | 9 ++++++--- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 482955f1fa1f9f..92ccac7f905448 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -772,14 +772,17 @@ xlog_cil_ail_insert( * always be the same (as iclogs can contain multiple commit records) or * higher LSN than the current head. We do this before insertion of the * items so that log space checks during insertion will reflect the - * space that this checkpoint has already consumed. + * space that this checkpoint has already consumed. We call + * xfs_ail_update_finish() so that tail space and space-based wakeups + * will be recalculated appropriately. */ ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 || aborted); spin_lock(&ailp->ail_lock); - ailp->ail_head_lsn = ctx->commit_lsn; xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn); - spin_unlock(&ailp->ail_lock); + ailp->ail_head_lsn = ctx->commit_lsn; + /* xfs_ail_update_finish() drops the ail_lock */ + xfs_ail_update_finish(ailp, NULLCOMMITLSN); /* unpin all the log items */ list_for_each_entry(lv, &ctx->lv_chain, lv_list) { diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 4b8ef926044599..2896745989795d 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -440,6 +440,7 @@ struct xlog { struct xlog_grant_head l_reserve_head; struct xlog_grant_head l_write_head; + uint64_t l_tail_space; struct xfs_kobj l_kobj; diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 5f03f82c46838e..6a106a05fae017 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -736,6 +736,8 @@ __xfs_ail_assign_tail_lsn( if (!tail_lsn) tail_lsn = ailp->ail_head_lsn; + WRITE_ONCE(log->l_tail_space, + xlog_lsn_sub(log, ailp->ail_head_lsn, tail_lsn)); trace_xfs_log_assign_tail_lsn(log, tail_lsn); atomic64_set(&log->l_tail_lsn, tail_lsn); } @@ -743,9 +745,10 @@ __xfs_ail_assign_tail_lsn( /* * Callers should pass the original tail lsn so that we can detect if the tail * has moved as a result of the operation that was performed. If the caller - * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the - * "did the tail LSN change?" checks. If the caller wants to avoid a tail update - * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0. + * needs to force a tail space update, it should pass NULLCOMMITLSN to bypass + * the "did the tail LSN change?" checks. If the caller wants to avoid a tail + * update (e.g. it knows the tail did not change) it should pass an @old_lsn of + * 0. */ void xfs_ail_update_finish( From patchwork Thu Jun 20 07:21:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704903 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C566E39AEB for ; Thu, 20 Jun 2024 07:22:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868136; cv=none; b=BAEtiGYXCm8SCS3qVPJFBQdntLUw81RRSHVD3LBwXZUjNscelMTi9qjz4TLslK6k1kFxk+VIe+kLCT/tKK2WxxnSS4AjdG7xP6bj50wZlg8MUfurrpUhcIrcGO4VA+W8eHzQO8DnQr8QWm8iUwGDv1TupcQZ4XBLobMy0ALTWhs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868136; c=relaxed/simple; bh=73OsiYO+fLOxGGwf3O+zXJekSzHT0+uyOCWw0+WlE1g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Da0EMyzofqI3DCZbylmxfIB2/vMq7Z5P8eN3rxTG7NYXfnlCabqEhL4X1VVPWhTdFbVKRMeIIyb61Z6CZGFeKWuZLT30nYO1h1ClUkPOjJXdv9wunKxPbTpjJAEUszSi3VnIfKBwCsUi68ozYFgx1lan2ZuY8Xl/yWKe4tOLv2I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=JQ0Wl5NX; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="JQ0Wl5NX" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=kG9xvd3EqrdqfzDRMXOf0gtiUfZyDhFbTGgbbI3p3FE=; b=JQ0Wl5NXqxIGc/YNySkNHy4DxT cNHmN1sg//gwlPhhG3GDDv8vNbObm6JNYMixaMc3H4UJTfcDF6diT7TVF68kD3/fYINFfMf8wFijJ /5shEga5993khFEdQZujjgtffNoRv3rzw0qQDiu+8Rp7GUHrb3s+KDOk2VmTXKZsPrVjMpZd+L63s rhL1RJ30M1LQEG0zVZ8kQmHJ3280EZ1IJSmMr+O9f/Ajl335vg9iiPmTle0FZRD94DdMr+F5hvSPx nbdvQLquFNdGC4wvt2lMirkySZb0V65hEgge0nApobWc4C/mHEV+puFv2qWuf7KKLGTZJ1TxntQb2 aByXOsLA==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7Z-00000003xdU-3T4p; Thu, 20 Jun 2024 07:22:14 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 09/11] xfs: pass the full grant head to accounting functions Date: Thu, 20 Jun 2024 09:21:26 +0200 Message-ID: <20240620072146.530267-10-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner Because we are going to need them soon. API change only, no logic changes. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 157 +++++++++++++++++++++--------------------- fs/xfs/xfs_log_priv.h | 2 - 2 files changed, 77 insertions(+), 82 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 381d6143a78777..0e50b370f0e4c7 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -136,10 +136,10 @@ xlog_prepare_iovec( static void xlog_grant_sub_space( struct xlog *log, - atomic64_t *head, + struct xlog_grant_head *head, int bytes) { - int64_t head_val = atomic64_read(head); + int64_t head_val = atomic64_read(&head->grant); int64_t new, old; do { @@ -155,17 +155,17 @@ xlog_grant_sub_space( old = head_val; new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(head, old, new); + head_val = atomic64_cmpxchg(&head->grant, old, new); } while (head_val != old); } static void xlog_grant_add_space( struct xlog *log, - atomic64_t *head, + struct xlog_grant_head *head, int bytes) { - int64_t head_val = atomic64_read(head); + int64_t head_val = atomic64_read(&head->grant); int64_t new, old; do { @@ -184,7 +184,7 @@ xlog_grant_add_space( old = head_val; new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(head, old, new); + head_val = atomic64_cmpxchg(&head->grant, old, new); } while (head_val != old); } @@ -197,6 +197,63 @@ xlog_grant_head_init( spin_lock_init(&head->lock); } +/* + * Return the space in the log between the tail and the head. The head + * is passed in the cycle/bytes formal parms. In the special case where + * the reserve head has wrapped passed the tail, this calculation is no + * longer valid. In this case, just return 0 which means there is no space + * in the log. This works for all places where this function is called + * with the reserve head. Of course, if the write head were to ever + * wrap the tail, we should blow up. Rather than catch this case here, + * we depend on other ASSERTions in other parts of the code. XXXmiken + * + * If reservation head is behind the tail, we have a problem. Warn about it, + * but then treat it as if the log is empty. + * + * If the log is shut down, the head and tail may be invalid or out of whack, so + * shortcut invalidity asserts in this case so that we don't trigger them + * falsely. + */ +static int +xlog_grant_space_left( + struct xlog *log, + struct xlog_grant_head *head) +{ + int tail_bytes; + int tail_cycle; + int head_cycle; + int head_bytes; + + xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes); + xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes); + tail_bytes = BBTOB(tail_bytes); + if (tail_cycle == head_cycle && head_bytes >= tail_bytes) + return log->l_logsize - (head_bytes - tail_bytes); + if (tail_cycle + 1 < head_cycle) + return 0; + + /* Ignore potential inconsistency when shutdown. */ + if (xlog_is_shutdown(log)) + return log->l_logsize; + + if (tail_cycle < head_cycle) { + ASSERT(tail_cycle == (head_cycle - 1)); + return tail_bytes - head_bytes; + } + + /* + * The reservation head is behind the tail. In this case we just want to + * return the size of the log as the amount of space left. + */ + xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail"); + xfs_alert(log->l_mp, " tail_cycle = %d, tail_bytes = %d", + tail_cycle, tail_bytes); + xfs_alert(log->l_mp, " GH cycle = %d, GH bytes = %d", + head_cycle, head_bytes); + ASSERT(0); + return log->l_logsize; +} + STATIC void xlog_grant_head_wake_all( struct xlog_grant_head *head) @@ -277,7 +334,7 @@ xlog_grant_head_wait( spin_lock(&head->lock); if (xlog_is_shutdown(log)) goto shutdown; - } while (xlog_space_left(log, &head->grant) < need_bytes); + } while (xlog_grant_space_left(log, head) < need_bytes); list_del_init(&tic->t_queue); return 0; @@ -322,7 +379,7 @@ xlog_grant_head_check( * otherwise try to get some space for this transaction. */ *need_bytes = xlog_ticket_reservation(log, head, tic); - free_bytes = xlog_space_left(log, &head->grant); + free_bytes = xlog_grant_space_left(log, head); if (!list_empty_careful(&head->waiters)) { spin_lock(&head->lock); if (!xlog_grant_head_wake(log, head, &free_bytes) || @@ -396,7 +453,7 @@ xfs_log_regrant( if (error) goto out_error; - xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes); + xlog_grant_add_space(log, &log->l_write_head, need_bytes); trace_xfs_log_regrant_exit(log, tic); xlog_verify_grant_tail(log); return 0; @@ -447,8 +504,8 @@ xfs_log_reserve( if (error) goto out_error; - xlog_grant_add_space(log, &log->l_reserve_head.grant, need_bytes); - xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes); + xlog_grant_add_space(log, &log->l_reserve_head, need_bytes); + xlog_grant_add_space(log, &log->l_write_head, need_bytes); trace_xfs_log_reserve_exit(log, tic); xlog_verify_grant_tail(log); return 0; @@ -1107,7 +1164,7 @@ xfs_log_space_wake( ASSERT(!xlog_in_recovery(log)); spin_lock(&log->l_write_head.lock); - free_bytes = xlog_space_left(log, &log->l_write_head.grant); + free_bytes = xlog_grant_space_left(log, &log->l_write_head); xlog_grant_head_wake(log, &log->l_write_head, &free_bytes); spin_unlock(&log->l_write_head.lock); } @@ -1116,7 +1173,7 @@ xfs_log_space_wake( ASSERT(!xlog_in_recovery(log)); spin_lock(&log->l_reserve_head.lock); - free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); + free_bytes = xlog_grant_space_left(log, &log->l_reserve_head); xlog_grant_head_wake(log, &log->l_reserve_head, &free_bytes); spin_unlock(&log->l_reserve_head.lock); } @@ -1230,64 +1287,6 @@ xfs_log_cover( return error; } -/* - * Return the space in the log between the tail and the head. The head - * is passed in the cycle/bytes formal parms. In the special case where - * the reserve head has wrapped passed the tail, this calculation is no - * longer valid. In this case, just return 0 which means there is no space - * in the log. This works for all places where this function is called - * with the reserve head. Of course, if the write head were to ever - * wrap the tail, we should blow up. Rather than catch this case here, - * we depend on other ASSERTions in other parts of the code. XXXmiken - * - * If reservation head is behind the tail, we have a problem. Warn about it, - * but then treat it as if the log is empty. - * - * If the log is shut down, the head and tail may be invalid or out of whack, so - * shortcut invalidity asserts in this case so that we don't trigger them - * falsely. - */ -int -xlog_space_left( - struct xlog *log, - atomic64_t *head) -{ - int tail_bytes; - int tail_cycle; - int head_cycle; - int head_bytes; - - xlog_crack_grant_head(head, &head_cycle, &head_bytes); - xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes); - tail_bytes = BBTOB(tail_bytes); - if (tail_cycle == head_cycle && head_bytes >= tail_bytes) - return log->l_logsize - (head_bytes - tail_bytes); - if (tail_cycle + 1 < head_cycle) - return 0; - - /* Ignore potential inconsistency when shutdown. */ - if (xlog_is_shutdown(log)) - return log->l_logsize; - - if (tail_cycle < head_cycle) { - ASSERT(tail_cycle == (head_cycle - 1)); - return tail_bytes - head_bytes; - } - - /* - * The reservation head is behind the tail. In this case we just want to - * return the size of the log as the amount of space left. - */ - xfs_alert(log->l_mp, "xlog_space_left: head behind tail"); - xfs_alert(log->l_mp, " tail_cycle = %d, tail_bytes = %d", - tail_cycle, tail_bytes); - xfs_alert(log->l_mp, " GH cycle = %d, GH bytes = %d", - head_cycle, head_bytes); - ASSERT(0); - return log->l_logsize; -} - - static void xlog_ioend_work( struct work_struct *work) @@ -1881,8 +1880,8 @@ xlog_sync( if (ticket) { ticket->t_curr_res -= roundoff; } else { - xlog_grant_add_space(log, &log->l_reserve_head.grant, roundoff); - xlog_grant_add_space(log, &log->l_write_head.grant, roundoff); + xlog_grant_add_space(log, &log->l_reserve_head, roundoff); + xlog_grant_add_space(log, &log->l_write_head, roundoff); } /* put cycle number in every block */ @@ -2802,17 +2801,15 @@ xfs_log_ticket_regrant( if (ticket->t_cnt > 0) ticket->t_cnt--; - xlog_grant_sub_space(log, &log->l_reserve_head.grant, - ticket->t_curr_res); - xlog_grant_sub_space(log, &log->l_write_head.grant, - ticket->t_curr_res); + xlog_grant_sub_space(log, &log->l_reserve_head, ticket->t_curr_res); + xlog_grant_sub_space(log, &log->l_write_head, ticket->t_curr_res); ticket->t_curr_res = ticket->t_unit_res; trace_xfs_log_ticket_regrant_sub(log, ticket); /* just return if we still have some of the pre-reserved space */ if (!ticket->t_cnt) { - xlog_grant_add_space(log, &log->l_reserve_head.grant, + xlog_grant_add_space(log, &log->l_reserve_head, ticket->t_unit_res); trace_xfs_log_ticket_regrant_exit(log, ticket); @@ -2860,8 +2857,8 @@ xfs_log_ticket_ungrant( bytes += ticket->t_unit_res*ticket->t_cnt; } - xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes); - xlog_grant_sub_space(log, &log->l_write_head.grant, bytes); + xlog_grant_sub_space(log, &log->l_reserve_head, bytes); + xlog_grant_sub_space(log, &log->l_write_head, bytes); trace_xfs_log_ticket_ungrant_exit(log, ticket); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 2896745989795d..0838c57ca8ac22 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -573,8 +573,6 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space) atomic64_set(head, xlog_assign_grant_head_val(cycle, space)); } -int xlog_space_left(struct xlog *log, atomic64_t *head); - /* * Committed Item List interfaces */ From patchwork Thu Jun 20 07:21:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704904 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2E0C39AEB for ; Thu, 20 Jun 2024 07:22:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868139; cv=none; b=NRL37EyuORSKh6eyI+aewKXEgdXfTlUYL9svNeWApk2T0HFsxlUDgnSon0y3khlLcSH3UHG2zCeCnkS5CRakBygLbwno6DySIiNImP8Yws4irnRHTh/gpIM2cW9ZOQn/nx5AHctJj2nlM5UvRjs2elBmo/H9kDbNA7hVD2MeA9Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868139; c=relaxed/simple; bh=6f/ClMyIGiTyHAEn6J4ZMziitQJ0037pC+gtFwG2KUA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oeDakTCn7mBa3ERx5cIYQlg+Qs7MNtySwuepynMfw/bf6jCRMjDBhijCoKpr6epIfwJO768S1714I5Icq3DthRCLYC1gdjm5OyNw/ru7jVcDBe0P+8GyiC2Kx4FBjDzRxV3lVSCRXUwvPMhyXc3YAUcntJf1mg54cLR5e+XkZCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Y3ez5AHO; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Y3ez5AHO" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=hmPXPnYafwXrkDqhRPaztWm8Yxt0HLkTtQl33Gt2XfU=; b=Y3ez5AHODMkOwyGnhz8EF/l1so sY7qHrnMrTM0yEYsJ27CYFbnZEg8YU3spXMISiRBG2Ha1XawB0L8mYzDR8OD1XY6mFqO+DGmCj2Yh NxSgz+6/iKKzxMjvKQeAgG/leaDZFKZIHV+egkmrkRz1mlE2CBrRV3hKi95bJY1xO7PFlndfWa5DH BSo6esxc3+O3YUurxjHy+M0QhDD326wLO/KXHJ9UBCOP7q1gGoWvLMA0Klx4lzuniz2M6vaeBB9QS 4FElK2uDs/xg/TCT8rv0LcD0QTCwWjOS0Sk3fJASozU/O0y2VcwJB2lF8vRNn1JPOowD9FyqgLTLC eNSNr1Ng==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7c-00000003xeO-3MUt; Thu, 20 Jun 2024 07:22:17 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 10/11] xfs: grant heads track byte counts, not LSNs Date: Thu, 20 Jun 2024 09:21:27 +0200 Message-ID: <20240620072146.530267-11-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner The grant heads in the log track the space reserved in the log for running transactions. They do this by tracking how far ahead of the tail that the reservation has reached, and the units for doing this are {cycle,bytes} for the reserve head rather than {cycle,blocks} which are normal used by LSNs. This is annoyingly complex because we have to split, crack and combined these tuples for any calculation we do to determine log space and targets. This is computationally expensive as well as difficult to do atomically and locklessly, as well as limiting the size of the log to 2^32 bytes. Really, though, all the grant heads are tracking is how much space is currently available for use in the log. We can track this as a simply byte count - we just don't care what the actual physical location in the log the head and tail are at, just how much space we have remaining before the head and tail overlap. So, convert the grant heads to track the byte reservations that are active rather than the current (cycle, offset) tuples. This means an empty log has zero bytes consumed, and a full log is when the reservations reach the size of the log minus the space consumed by the AIL. This greatly simplifies the accounting and checks for whether there is space available. We no longer need to crack or combine LSNs to determine how much space the log has left, nor do we need to look at the head or tail of the log to determine how close to full we are. There is, however, a complexity that needs to be handled. We know how much space is being tracked in the AIL now via log->l_tail_space and the log tickets track active reservations and return the unused portions to the grant heads when ungranted. Unfortunately, we don't track the used portion of the grant, so when we transfer log items from the CIL to the AIL, the space accounted to the grant heads is transferred to the log tail space. Hence when we move the AIL head forwards on item insert, we have to remove that space from the grant heads. We also remove the xlog_verify_grant_tail() debug function as it is no longer useful. The check it performs has been racy since delayed logging was introduced, but now it is clearly only detecting false positives so remove it. The result of this substantially simpler accounting algorithm is an increase in sustained transaction rate from ~1.3 million transactions/s to ~1.9 million transactions/s with no increase in CPU usage. We also remove the 32 bit space limitation on the grant heads, which will allow us to increase the journal size beyond 2GB in future. Note that this renames the sysfs files exposing the log grant space now that the values are exported in bytes. This allows xfstests to auto-detect the old or new ABI. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong [hch: move xlog_grant_sub_space out of line, update the xlog_grant_{add,sub}_space prototypes, rename the sysfs files to allow auto-detection in xfstests] Signed-off-by: Christoph Hellwig --- Documentation/ABI/testing/sysfs-fs-xfs | 18 +- fs/xfs/xfs_log.c | 246 +++++++++---------------- fs/xfs/xfs_log_cil.c | 12 ++ fs/xfs/xfs_log_priv.h | 33 +--- fs/xfs/xfs_log_recover.c | 4 - fs/xfs/xfs_sysfs.c | 29 +-- fs/xfs/xfs_trace.h | 34 ++-- 7 files changed, 138 insertions(+), 238 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-fs-xfs b/Documentation/ABI/testing/sysfs-fs-xfs index 82d8e2f79834b5..7da4de948b46e7 100644 --- a/Documentation/ABI/testing/sysfs-fs-xfs +++ b/Documentation/ABI/testing/sysfs-fs-xfs @@ -15,25 +15,23 @@ Description: The log sequence number (LSN) of the current tail of the log. The LSN is exported in "cycle:basic block" format. -What: /sys/fs/xfs//log/reserve_grant_head -Date: July 2014 -KernelVersion: 3.17 +What: /sys/fs/xfs//log/reserve_grant_head_bytes +Date: June 2024 +KernelVersion: 6.11 Contact: linux-xfs@vger.kernel.org Description: The current state of the log reserve grant head. It represents the total log reservation of all currently - outstanding transactions. The grant head is exported in - "cycle:bytes" format. + outstanding transactions in bytes. Users: xfstests -What: /sys/fs/xfs//log/write_grant_head -Date: July 2014 -KernelVersion: 3.17 +What: /sys/fs/xfs//log/write_grant_head_bytes +Date: June 2024 +KernelVersion: 6.11 Contact: linux-xfs@vger.kernel.org Description: The current state of the log write grant head. It represents the total log reservation of all currently outstanding transactions, including regrants due to - rolling transactions. The grant head is exported in - "cycle:bytes" format. + rolling transactions in bytes. Users: xfstests diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 0e50b370f0e4c7..817ea7e0a8ab54 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -53,9 +53,6 @@ xlog_sync( struct xlog_ticket *ticket); #if defined(DEBUG) STATIC void -xlog_verify_grant_tail( - struct xlog *log); -STATIC void xlog_verify_iclog( struct xlog *log, struct xlog_in_core *iclog, @@ -65,7 +62,6 @@ xlog_verify_tail_lsn( struct xlog *log, struct xlog_in_core *iclog); #else -#define xlog_verify_grant_tail(a) #define xlog_verify_iclog(a,b,c) #define xlog_verify_tail_lsn(a,b) #endif @@ -133,125 +129,64 @@ xlog_prepare_iovec( return buf; } -static void +static inline void xlog_grant_sub_space( - struct xlog *log, struct xlog_grant_head *head, - int bytes) + int64_t bytes) { - int64_t head_val = atomic64_read(&head->grant); - int64_t new, old; - - do { - int cycle, space; - - xlog_crack_grant_head_val(head_val, &cycle, &space); - - space -= bytes; - if (space < 0) { - space += log->l_logsize; - cycle--; - } - - old = head_val; - new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(&head->grant, old, new); - } while (head_val != old); + atomic64_sub(bytes, &head->grant); } -static void +static inline void xlog_grant_add_space( - struct xlog *log, struct xlog_grant_head *head, - int bytes) + int64_t bytes) { - int64_t head_val = atomic64_read(&head->grant); - int64_t new, old; - - do { - int tmp; - int cycle, space; - - xlog_crack_grant_head_val(head_val, &cycle, &space); - - tmp = log->l_logsize - space; - if (tmp > bytes) - space += bytes; - else { - space = bytes - tmp; - cycle++; - } - - old = head_val; - new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(&head->grant, old, new); - } while (head_val != old); + atomic64_add(bytes, &head->grant); } -STATIC void +static void xlog_grant_head_init( struct xlog_grant_head *head) { - xlog_assign_grant_head(&head->grant, 1, 0); + atomic64_set(&head->grant, 0); INIT_LIST_HEAD(&head->waiters); spin_lock_init(&head->lock); } +void +xlog_grant_return_space( + struct xlog *log, + xfs_lsn_t old_head, + xfs_lsn_t new_head) +{ + int64_t diff = xlog_lsn_sub(log, new_head, old_head); + + xlog_grant_sub_space(&log->l_reserve_head, diff); + xlog_grant_sub_space(&log->l_write_head, diff); +} + /* - * Return the space in the log between the tail and the head. The head - * is passed in the cycle/bytes formal parms. In the special case where - * the reserve head has wrapped passed the tail, this calculation is no - * longer valid. In this case, just return 0 which means there is no space - * in the log. This works for all places where this function is called - * with the reserve head. Of course, if the write head were to ever - * wrap the tail, we should blow up. Rather than catch this case here, - * we depend on other ASSERTions in other parts of the code. XXXmiken - * - * If reservation head is behind the tail, we have a problem. Warn about it, - * but then treat it as if the log is empty. - * - * If the log is shut down, the head and tail may be invalid or out of whack, so - * shortcut invalidity asserts in this case so that we don't trigger them - * falsely. + * Return the space in the log between the tail and the head. In the case where + * we have overrun available reservation space, return 0. The memory barrier + * pairs with the smp_wmb() in xlog_cil_ail_insert() to ensure that grant head + * vs tail space updates are seen in the correct order and hence avoid + * transients as space is transferred from the grant heads to the AIL on commit + * completion. */ -static int +static uint64_t xlog_grant_space_left( struct xlog *log, struct xlog_grant_head *head) { - int tail_bytes; - int tail_cycle; - int head_cycle; - int head_bytes; - - xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes); - xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes); - tail_bytes = BBTOB(tail_bytes); - if (tail_cycle == head_cycle && head_bytes >= tail_bytes) - return log->l_logsize - (head_bytes - tail_bytes); - if (tail_cycle + 1 < head_cycle) - return 0; - - /* Ignore potential inconsistency when shutdown. */ - if (xlog_is_shutdown(log)) - return log->l_logsize; - - if (tail_cycle < head_cycle) { - ASSERT(tail_cycle == (head_cycle - 1)); - return tail_bytes - head_bytes; - } + int64_t free_bytes; - /* - * The reservation head is behind the tail. In this case we just want to - * return the size of the log as the amount of space left. - */ - xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail"); - xfs_alert(log->l_mp, " tail_cycle = %d, tail_bytes = %d", - tail_cycle, tail_bytes); - xfs_alert(log->l_mp, " GH cycle = %d, GH bytes = %d", - head_cycle, head_bytes); - ASSERT(0); - return log->l_logsize; + smp_rmb(); /* paired with smp_wmb in xlog_cil_ail_insert() */ + free_bytes = log->l_logsize - READ_ONCE(log->l_tail_space) - + atomic64_read(&head->grant); + if (free_bytes > 0) + return free_bytes; + return 0; } STATIC void @@ -453,9 +388,8 @@ xfs_log_regrant( if (error) goto out_error; - xlog_grant_add_space(log, &log->l_write_head, need_bytes); + xlog_grant_add_space(&log->l_write_head, need_bytes); trace_xfs_log_regrant_exit(log, tic); - xlog_verify_grant_tail(log); return 0; out_error: @@ -504,10 +438,9 @@ xfs_log_reserve( if (error) goto out_error; - xlog_grant_add_space(log, &log->l_reserve_head, need_bytes); - xlog_grant_add_space(log, &log->l_write_head, need_bytes); + xlog_grant_add_space(&log->l_reserve_head, need_bytes); + xlog_grant_add_space(&log->l_write_head, need_bytes); trace_xfs_log_reserve_exit(log, tic); - xlog_verify_grant_tail(log); return 0; out_error: @@ -1880,8 +1813,8 @@ xlog_sync( if (ticket) { ticket->t_curr_res -= roundoff; } else { - xlog_grant_add_space(log, &log->l_reserve_head, roundoff); - xlog_grant_add_space(log, &log->l_write_head, roundoff); + xlog_grant_add_space(&log->l_reserve_head, roundoff); + xlog_grant_add_space(&log->l_write_head, roundoff); } /* put cycle number in every block */ @@ -2801,16 +2734,15 @@ xfs_log_ticket_regrant( if (ticket->t_cnt > 0) ticket->t_cnt--; - xlog_grant_sub_space(log, &log->l_reserve_head, ticket->t_curr_res); - xlog_grant_sub_space(log, &log->l_write_head, ticket->t_curr_res); + xlog_grant_sub_space(&log->l_reserve_head, ticket->t_curr_res); + xlog_grant_sub_space(&log->l_write_head, ticket->t_curr_res); ticket->t_curr_res = ticket->t_unit_res; trace_xfs_log_ticket_regrant_sub(log, ticket); /* just return if we still have some of the pre-reserved space */ if (!ticket->t_cnt) { - xlog_grant_add_space(log, &log->l_reserve_head, - ticket->t_unit_res); + xlog_grant_add_space(&log->l_reserve_head, ticket->t_unit_res); trace_xfs_log_ticket_regrant_exit(log, ticket); ticket->t_curr_res = ticket->t_unit_res; @@ -2857,8 +2789,8 @@ xfs_log_ticket_ungrant( bytes += ticket->t_unit_res*ticket->t_cnt; } - xlog_grant_sub_space(log, &log->l_reserve_head, bytes); - xlog_grant_sub_space(log, &log->l_write_head, bytes); + xlog_grant_sub_space(&log->l_reserve_head, bytes); + xlog_grant_sub_space(&log->l_write_head, bytes); trace_xfs_log_ticket_ungrant_exit(log, ticket); @@ -3331,42 +3263,27 @@ xlog_ticket_alloc( } #if defined(DEBUG) -/* - * Check to make sure the grant write head didn't just over lap the tail. If - * the cycles are the same, we can't be overlapping. Otherwise, make sure that - * the cycles differ by exactly one and check the byte count. - * - * This check is run unlocked, so can give false positives. Rather than assert - * on failures, use a warn-once flag and a panic tag to allow the admin to - * determine if they want to panic the machine when such an error occurs. For - * debug kernels this will have the same effect as using an assert but, unlinke - * an assert, it can be turned off at runtime. - */ -STATIC void -xlog_verify_grant_tail( - struct xlog *log) +static void +xlog_verify_dump_tail( + struct xlog *log, + struct xlog_in_core *iclog) { - int tail_cycle, tail_blocks; - int cycle, space; - - xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &space); - xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_blocks); - if (tail_cycle != cycle) { - if (cycle - 1 != tail_cycle && - !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) { - xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES, - "%s: cycle - 1 != tail_cycle", __func__); - } - - if (space > BBTOB(tail_blocks) && - !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) { - xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES, - "%s: space > BBTOB(tail_blocks)", __func__); - } - } -} - -/* check if it will fit */ + xfs_alert(log->l_mp, +"ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x", + iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1, + atomic64_read(&log->l_tail_lsn), + log->l_ailp->ail_head_lsn, + log->l_curr_cycle, log->l_curr_block, + log->l_prev_cycle, log->l_prev_block); + xfs_alert(log->l_mp, +"write grant 0x%llx, reserve grant 0x%llx, tail_space 0x%llx, size 0x%x, iclog flags 0x%x", + atomic64_read(&log->l_write_head.grant), + atomic64_read(&log->l_reserve_head.grant), + log->l_tail_space, log->l_logsize, + iclog ? iclog->ic_flags : -1); +} + +/* Check if the new iclog will fit in the log. */ STATIC void xlog_verify_tail_lsn( struct xlog *log, @@ -3375,21 +3292,34 @@ xlog_verify_tail_lsn( xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn); int blocks; - if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) { - blocks = - log->l_logBBsize - (log->l_prev_block - BLOCK_LSN(tail_lsn)); - if (blocks < BTOBB(iclog->ic_offset)+BTOBB(log->l_iclog_hsize)) - xfs_emerg(log->l_mp, "%s: ran out of log space", __func__); - } else { - ASSERT(CYCLE_LSN(tail_lsn)+1 == log->l_prev_cycle); + if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) { + blocks = log->l_logBBsize - + (log->l_prev_block - BLOCK_LSN(tail_lsn)); + if (blocks < BTOBB(iclog->ic_offset) + + BTOBB(log->l_iclog_hsize)) { + xfs_emerg(log->l_mp, + "%s: ran out of log space", __func__); + xlog_verify_dump_tail(log, iclog); + } + return; + } - if (BLOCK_LSN(tail_lsn) == log->l_prev_block) + if (CYCLE_LSN(tail_lsn) + 1 != log->l_prev_cycle) { + xfs_emerg(log->l_mp, "%s: head has wrapped tail.", __func__); + xlog_verify_dump_tail(log, iclog); + return; + } + if (BLOCK_LSN(tail_lsn) == log->l_prev_block) { xfs_emerg(log->l_mp, "%s: tail wrapped", __func__); + xlog_verify_dump_tail(log, iclog); + return; + } blocks = BLOCK_LSN(tail_lsn) - log->l_prev_block; - if (blocks < BTOBB(iclog->ic_offset) + 1) - xfs_emerg(log->l_mp, "%s: ran out of log space", __func__); - } + if (blocks < BTOBB(iclog->ic_offset) + 1) { + xfs_emerg(log->l_mp, "%s: ran out of iclog space", __func__); + xlog_verify_dump_tail(log, iclog); + } } /* diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 92ccac7f905448..391a938d690c59 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -764,6 +764,7 @@ xlog_cil_ail_insert( struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; struct xfs_log_vec *lv; struct xfs_ail_cursor cur; + xfs_lsn_t old_head; int i = 0; /* @@ -780,10 +781,21 @@ xlog_cil_ail_insert( aborted); spin_lock(&ailp->ail_lock); xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn); + old_head = ailp->ail_head_lsn; ailp->ail_head_lsn = ctx->commit_lsn; /* xfs_ail_update_finish() drops the ail_lock */ xfs_ail_update_finish(ailp, NULLCOMMITLSN); + /* + * We move the AIL head forwards to account for the space used in the + * log before we remove that space from the grant heads. This prevents a + * transient condition where reservation space appears to become + * available on return, only for it to disappear again immediately as + * the AIL head update accounts in the log tail space. + */ + smp_wmb(); /* paired with smp_rmb in xlog_grant_space_left */ + xlog_grant_return_space(ailp->ail_log, old_head, ailp->ail_head_lsn); + /* unpin all the log items */ list_for_each_entry(lv, &ctx->lv_chain, lv_list) { struct xfs_log_item *lip = lv->lv_item; diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 0838c57ca8ac22..b8778a4fd6b64e 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -543,36 +543,6 @@ xlog_assign_atomic_lsn(atomic64_t *lsn, uint cycle, uint block) atomic64_set(lsn, xlog_assign_lsn(cycle, block)); } -/* - * When we crack the grant head, we sample it first so that the value will not - * change while we are cracking it into the component values. This means we - * will always get consistent component values to work from. - */ -static inline void -xlog_crack_grant_head_val(int64_t val, int *cycle, int *space) -{ - *cycle = val >> 32; - *space = val & 0xffffffff; -} - -static inline void -xlog_crack_grant_head(atomic64_t *head, int *cycle, int *space) -{ - xlog_crack_grant_head_val(atomic64_read(head), cycle, space); -} - -static inline int64_t -xlog_assign_grant_head_val(int cycle, int space) -{ - return ((int64_t)cycle << 32) | space; -} - -static inline void -xlog_assign_grant_head(atomic64_t *head, int cycle, int space) -{ - atomic64_set(head, xlog_assign_grant_head_val(cycle, space)); -} - /* * Committed Item List interfaces */ @@ -639,6 +609,9 @@ xlog_lsn_sub( return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block); } +void xlog_grant_return_space(struct xlog *log, xfs_lsn_t old_head, + xfs_lsn_t new_head); + /* * The LSN is valid so long as it is behind the current LSN. If it isn't, this * means that the next log record that includes this metadata could have a diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 63f667f92c322e..32c6d7070871dc 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1213,10 +1213,6 @@ xlog_set_state( log->l_curr_cycle++; atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn)); log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn); - xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle, - BBTOB(log->l_curr_block)); - xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle, - BBTOB(log->l_curr_block)); } /* diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c index d2391eec37fe9d..60cb5318fdae3c 100644 --- a/fs/xfs/xfs_sysfs.c +++ b/fs/xfs/xfs_sysfs.c @@ -432,39 +432,30 @@ log_tail_lsn_show( XFS_SYSFS_ATTR_RO(log_tail_lsn); STATIC ssize_t -reserve_grant_head_show( +reserve_grant_head_bytes_show( struct kobject *kobject, char *buf) - { - int cycle; - int bytes; - struct xlog *log = to_xlog(kobject); - - xlog_crack_grant_head(&log->l_reserve_head.grant, &cycle, &bytes); - return sysfs_emit(buf, "%d:%d\n", cycle, bytes); + return sysfs_emit(buf, "%lld\n", + atomic64_read(&to_xlog(kobject)->l_reserve_head.grant)); } -XFS_SYSFS_ATTR_RO(reserve_grant_head); +XFS_SYSFS_ATTR_RO(reserve_grant_head_bytes); STATIC ssize_t -write_grant_head_show( +write_grant_head_bytes_show( struct kobject *kobject, char *buf) { - int cycle; - int bytes; - struct xlog *log = to_xlog(kobject); - - xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &bytes); - return sysfs_emit(buf, "%d:%d\n", cycle, bytes); + return sysfs_emit(buf, "%lld\n", + atomic64_read(&to_xlog(kobject)->l_write_head.grant)); } -XFS_SYSFS_ATTR_RO(write_grant_head); +XFS_SYSFS_ATTR_RO(write_grant_head_bytes); static struct attribute *xfs_log_attrs[] = { ATTR_LIST(log_head_lsn), ATTR_LIST(log_tail_lsn), - ATTR_LIST(reserve_grant_head), - ATTR_LIST(write_grant_head), + ATTR_LIST(reserve_grant_head_bytes), + ATTR_LIST(write_grant_head_bytes), NULL, }; ATTRIBUTE_GROUPS(xfs_log); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 13f6e6cab572ae..a7ff0c7f6800a0 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1227,6 +1227,7 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, TP_ARGS(log, tic), TP_STRUCT__entry( __field(dev_t, dev) + __field(unsigned long, tic) __field(char, ocnt) __field(char, cnt) __field(int, curr_res) @@ -1234,16 +1235,16 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, __field(unsigned int, flags) __field(int, reserveq) __field(int, writeq) - __field(int, grant_reserve_cycle) - __field(int, grant_reserve_bytes) - __field(int, grant_write_cycle) - __field(int, grant_write_bytes) + __field(uint64_t, grant_reserve_bytes) + __field(uint64_t, grant_write_bytes) + __field(uint64_t, tail_space) __field(int, curr_cycle) __field(int, curr_block) __field(xfs_lsn_t, tail_lsn) ), TP_fast_assign( __entry->dev = log->l_mp->m_super->s_dev; + __entry->tic = (unsigned long)tic; __entry->ocnt = tic->t_ocnt; __entry->cnt = tic->t_cnt; __entry->curr_res = tic->t_curr_res; @@ -1251,23 +1252,22 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, __entry->flags = tic->t_flags; __entry->reserveq = list_empty(&log->l_reserve_head.waiters); __entry->writeq = list_empty(&log->l_write_head.waiters); - xlog_crack_grant_head(&log->l_reserve_head.grant, - &__entry->grant_reserve_cycle, - &__entry->grant_reserve_bytes); - xlog_crack_grant_head(&log->l_write_head.grant, - &__entry->grant_write_cycle, - &__entry->grant_write_bytes); + __entry->tail_space = READ_ONCE(log->l_tail_space); + __entry->grant_reserve_bytes = __entry->tail_space + + atomic64_read(&log->l_reserve_head.grant); + __entry->grant_write_bytes = __entry->tail_space + + atomic64_read(&log->l_write_head.grant); __entry->curr_cycle = log->l_curr_cycle; __entry->curr_block = log->l_curr_block; __entry->tail_lsn = atomic64_read(&log->l_tail_lsn); ), - TP_printk("dev %d:%d t_ocnt %u t_cnt %u t_curr_res %u " - "t_unit_res %u t_flags %s reserveq %s " - "writeq %s grant_reserve_cycle %d " - "grant_reserve_bytes %d grant_write_cycle %d " - "grant_write_bytes %d curr_cycle %d curr_block %d " + TP_printk("dev %d:%d tic 0x%lx t_ocnt %u t_cnt %u t_curr_res %u " + "t_unit_res %u t_flags %s reserveq %s writeq %s " + "tail space %llu grant_reserve_bytes %llu " + "grant_write_bytes %llu curr_cycle %d curr_block %d " "tail_cycle %d tail_block %d", MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->tic, __entry->ocnt, __entry->cnt, __entry->curr_res, @@ -1275,9 +1275,8 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, __print_flags(__entry->flags, "|", XLOG_TIC_FLAGS), __entry->reserveq ? "empty" : "active", __entry->writeq ? "empty" : "active", - __entry->grant_reserve_cycle, + __entry->tail_space, __entry->grant_reserve_bytes, - __entry->grant_write_cycle, __entry->grant_write_bytes, __entry->curr_cycle, __entry->curr_block, @@ -1305,6 +1304,7 @@ DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant); DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_sub); DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_exit); DEFINE_LOGGRANT_EVENT(xfs_log_cil_wait); +DEFINE_LOGGRANT_EVENT(xfs_log_cil_return); DECLARE_EVENT_CLASS(xfs_log_item_class, TP_PROTO(struct xfs_log_item *lip), From patchwork Thu Jun 20 07:21:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13704905 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3960043AD2 for ; Thu, 20 Jun 2024 07:22:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868141; cv=none; b=qlvu1AgYd25SBZak7am6OXUNTrQOtpXmrveqiqAs2M/HH6HvMtzICpTb3DwdVgVQdl9ZeTEzofwIj6VuZWOPKrkhKvqpnzuUlfu4OLXO36KJJbfdEEjCAzO8bKzDcI4RXAQ9H2c4TGhpcY/IpvySdyw1qd+gCdfy5uOLUc3KnWs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718868141; c=relaxed/simple; bh=0r3RxqTYTzuHm3waHTRw0kicMZQc75UcyWX2oeh/l9Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SJeO/G6QZIcj/NZ8pnSXiiQr3IM+ylY64i5AK3NLWLugGT7SJupfKVAkxro1HdS8lPwFhE/F6qDbfF23GeLgDAY71ELvLgYYq0IWdRYY3ATgZAHyeZuLRZmByJRawO8soBft2sEAr7mqeS1u96WaE3F3my9qvA2lRrHqomTX1Yw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=xk7UZis0; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="xk7UZis0" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=8RQKYZCp3JefLBZB47X/DDU7VahAsWlVHFU3qQrkDeE=; b=xk7UZis0PgEW7jCjGj3dB0a1y9 ml0PoQiJpCLmNqtkxusb80AtJ25TSv/sGjftCgkpvUMwrAHzlF+VwriC83ItpdQRXz4X1ONAihT3s e+RWsN0pRb8V3P/YPVkvNjDdWUAye9EgewY5liYk7NuS1j1myzT3qkn7rVtpFruSCMzc50/ViG5Jt Bd8WgZRy98ayC9l4jowXnbB7qBl+43i53eg/VkgxcfmM+HUuXT8JQ+gIj5UG1mm7xdDtNqyPlQw/i xI1+IQrFkX1QkCju9OlDLAzzQrynrji8C9Omb7xjzfERNGxYZ1QGSEtdwQ7JzDYC14L3KxQpg9B7k bEzfc83g==; Received: from 2a02-8389-2341-5b80-3a9c-dc0d-9615-f1ed.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:3a9c:dc0d:9615:f1ed] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sKC7f-00000003xet-2KKC; Thu, 20 Jun 2024 07:22:19 +0000 From: Christoph Hellwig To: Chandan Babu R Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Dave Chinner Subject: [PATCH 11/11] xfs: skip flushing log items during push Date: Thu, 20 Jun 2024 09:21:28 +0200 Message-ID: <20240620072146.530267-12-hch@lst.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240620072146.530267-1-hch@lst.de> References: <20240620072146.530267-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner The AIL pushing code spends a huge amount of time skipping over items that are already marked as flushing. It is not uncommon to see hundreds of thousands of items skipped every second due to inode clustering marking all the inodes in a cluster as flushing when the first one is flushed. However, to discover an item is already flushing and should be skipped we have to call the iop_push() method for it to try to flush the item. For inodes (where this matters most), we have to first check that inode is flushable first. We can optimise this overhead away by tracking whether the log item is flushing internally. This allows xfsaild_push() to check the log item directly for flushing state and immediately skip the log item. Whilst this doesn't remove the CPU cache misses for loading the log item, it does avoid the overhead of an indirect function call and the cache misses involved in accessing inode and backing cluster buffer structures to determine flushing state. When trying to flush hundreds of thousands of inodes each second, this CPU overhead saving adds up quickly. It's so noticeable that the biggest issue with pushing on the AIL on fast storage becomes the 10ms back-off wait when we hit enough pinned buffers to break out of the push loop but not enough for the AIL pushing to be considered stuck. This limits the xfsaild to about 70% total CPU usage, and on fast storage this isn't enough to keep the storage 100% busy. The xfsaild will block on IO submission on slow storage and so is self throttling - it does not need a backoff in the case where we are really just breaking out of the walk to submit the IO we have gathered. Further with no backoff we don't need to gather huge delwri lists to mitigate the impact of backoffs, so we can submit IO more frequently and reduce the time log items spend in flushing state by breaking out of the item push loop once we've gathered enough IO to batch submission effectively. Signed-off-by: Dave Chinner --- fs/xfs/xfs_inode.c | 1 + fs/xfs/xfs_inode_item.c | 6 +++++- fs/xfs/xfs_trans.h | 4 +++- fs/xfs/xfs_trans_ail.c | 8 +++++++- 4 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 58fb7a5062e1e6..da611ec56f1be0 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3713,6 +3713,7 @@ xfs_iflush( iip->ili_last_fields = iip->ili_fields; iip->ili_fields = 0; iip->ili_fsync_fields = 0; + set_bit(XFS_LI_FLUSHING, &iip->ili_item.li_flags); spin_unlock(&iip->ili_lock); /* diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index f28d653300d124..ba49e56820f0a7 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -933,6 +933,7 @@ xfs_iflush_finish( } iip->ili_last_fields = 0; iip->ili_flush_lsn = 0; + clear_bit(XFS_LI_FLUSHING, &lip->li_flags); spin_unlock(&iip->ili_lock); xfs_iflags_clear(iip->ili_inode, XFS_IFLUSHING); if (drop_buffer) @@ -991,8 +992,10 @@ xfs_buf_inode_io_fail( { struct xfs_log_item *lip; - list_for_each_entry(lip, &bp->b_li_list, li_bio_list) + list_for_each_entry(lip, &bp->b_li_list, li_bio_list) { set_bit(XFS_LI_FAILED, &lip->li_flags); + clear_bit(XFS_LI_FLUSHING, &lip->li_flags); + } } /* @@ -1011,6 +1014,7 @@ xfs_iflush_abort_clean( iip->ili_flush_lsn = 0; iip->ili_item.li_buf = NULL; list_del_init(&iip->ili_item.li_bio_list); + clear_bit(XFS_LI_FLUSHING, &iip->ili_item.li_flags); } /* diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 1636663707dc04..20eb6ea7ebaa04 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -58,13 +58,15 @@ struct xfs_log_item { #define XFS_LI_FAILED 2 #define XFS_LI_DIRTY 3 #define XFS_LI_WHITEOUT 4 +#define XFS_LI_FLUSHING 5 #define XFS_LI_FLAGS \ { (1u << XFS_LI_IN_AIL), "IN_AIL" }, \ { (1u << XFS_LI_ABORTED), "ABORTED" }, \ { (1u << XFS_LI_FAILED), "FAILED" }, \ { (1u << XFS_LI_DIRTY), "DIRTY" }, \ - { (1u << XFS_LI_WHITEOUT), "WHITEOUT" } + { (1u << XFS_LI_WHITEOUT), "WHITEOUT" }, \ + { (1u << XFS_LI_FLUSHING), "FLUSHING" } struct xfs_item_ops { unsigned flags; diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 6a106a05fae017..0fafcc9f3dbe44 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -512,6 +512,9 @@ xfsaild_push( while ((XFS_LSN_CMP(lip->li_lsn, ailp->ail_target) <= 0)) { int lock_result; + if (test_bit(XFS_LI_FLUSHING, &lip->li_flags)) + goto next_item; + /* * Note that iop_push may unlock and reacquire the AIL lock. We * rely on the AIL cursor implementation to be able to deal with @@ -581,9 +584,12 @@ xfsaild_push( if (stuck > 100) break; +next_item: lip = xfs_trans_ail_cursor_next(ailp, &cur); if (lip == NULL) break; + if (lip->li_lsn != lsn && count > 1000) + break; lsn = lip->li_lsn; } @@ -620,7 +626,7 @@ xfsaild_push( /* * Assume we have more work to do in a short while. */ - tout = 10; + tout = 0; } return tout;