From patchwork Thu Sep 21 01:48:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2222CD493B for ; Thu, 21 Sep 2023 01:48:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229628AbjIUBs6 (ORCPT ); Wed, 20 Sep 2023 21:48:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229696AbjIUBs5 (ORCPT ); Wed, 20 Sep 2023 21:48:57 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84CA0AF for ; Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1c44c7dbaf9so3632085ad.1 for ; Wed, 20 Sep 2023 18:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260930; x=1695865730; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=CzeEBWmN2kD1lC+HOlruj7m6tfoh3K/GU+TzmaCHAek=; b=zS9E8HOaupp7gVvWX1JLJq0GONcNkluo0ddSkEHjoPJGS2tfAHHFt1v2DiESZFJ1Xp tjtRlKKV2fU5Pd0BpLw9B2zKzYNl2XrBF7dGFD/mR+92t3FYzDcTPzTHI90txmF2s9Pt H8WqUumbZ+5XWIepChewgI3fN++2w7B2yp8cNPH6KwNFY/V4KqC+wTqjv+mzv5jgc9dl pnOgkbxm/mcb7yyKzMLdBgQ84TAd9lgPJ+FCBx6pj3Yvfs11PUT56FMrpH/Ssx7w1i3Z soTexm0kOfXyAqLhC0AQpBUJluOZFZgQQKMkS5zNkkHCmL6VPPQfwpL+R4VxVTXo4eds N5lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260930; x=1695865730; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CzeEBWmN2kD1lC+HOlruj7m6tfoh3K/GU+TzmaCHAek=; b=PpxyColpf9+X6yLNHAP8HLHgG/vDLvsayt5Tu8vff8qM6XAGJpcxxR6IWy3+HtINoY XfIwE+he4DtRUTyySbAi2XqPruzEXF3n4ewfmtW3ofIVNQG4seXaiZpavsL/QFd4NIba MC4j1bxC3fL8DRVFaKbv2ocrmCxfYejIyw+njHYtkvbYaAl9LWguEnoVQ3OYFQykuNbd hO1+vs4NUiJ+r7olx7gYzDybigTNsNuzADxEDLeERnuaTSfctbF6iSfot5oNZVkbANwl bUi+ChaCHIyMwJKmn67hmKs46TA9GbGaHUNRqSZXM/FAAqv+ZZ/juv4QCQ887hZimfBx METw== X-Gm-Message-State: AOJu0Ywvkn5WX8PpLUpNRhvXRF2hA7l2nuyvw/ckiPVltp8YAfHLI8Dx oq7TywNA30RdDvl04hzbnSCcQ/5MaPpFW/qhx6c= X-Google-Smtp-Source: AGHT+IEQ5zIlEYwp4ceSigI5J2sIxZB/eAnfnm4ZF2WOEhjcsz/49Uc0aOyyIp867IWwMrm7O/rAOw== X-Received: by 2002:a17:902:ea0b:b0:1c5:c218:9891 with SMTP id s11-20020a170902ea0b00b001c5c2189891mr2698461plg.20.1695260929955; Wed, 20 Sep 2023 18:48:49 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id iw11-20020a170903044b00b001c582de968dsm158256plb.72.2023.09.20.18.48.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:49 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T1Z-0Z for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oA-00000002VNu-3twC for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:46 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Date: Thu, 21 Sep 2023 11:48:36 +1000 Message-Id: <20230921014844.582667-2-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Ever since the CIL and delayed logging was introduced, xfs_trans_committed_bulk() has been a purely CIL checkpoint completion function and not a transaction commit completion function. Now that we are adding log specific updates to this function, it really does not have anything to do with the transaction subsystem - it is really log and log item level functionality. This should be part of the CIL code as it is the callback that moves log items from the CIL checkpoint to the AIL. Move it and rename it to xlog_cil_ail_insert(). Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_log_cil.c | 132 +++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_trans.c | 129 --------------------------------------- fs/xfs/xfs_trans_priv.h | 3 - 3 files changed, 131 insertions(+), 133 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index ebc70aaa299c..c1fee14be5c2 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -695,6 +695,136 @@ xlog_cil_insert_items( } } +static inline void +xlog_cil_ail_insert_batch( + struct xfs_ail *ailp, + struct xfs_ail_cursor *cur, + struct xfs_log_item **log_items, + int nr_items, + xfs_lsn_t commit_lsn) +{ + int i; + + spin_lock(&ailp->ail_lock); + /* xfs_trans_ail_update_bulk drops ailp->ail_lock */ + xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn); + + for (i = 0; i < nr_items; i++) { + struct xfs_log_item *lip = log_items[i]; + + if (lip->li_ops->iop_unpin) + lip->li_ops->iop_unpin(lip, 0); + } +} + +/* + * Take the checkpoint's log vector chain of items and insert the attached log + * items into the the AIL. This uses bulk insertion techniques to minimise AIL + * lock traffic. + * + * If we are called with the aborted flag set, it is because a log write during + * a CIL checkpoint commit has failed. In this case, all the items in the + * checkpoint have already gone through iop_committed and iop_committing, which + * means that checkpoint commit abort handling is treated exactly the same as an + * iclog write error even though we haven't started any IO yet. Hence in this + * case all we need to do is iop_committed processing, followed by an + * iop_unpin(aborted) call. + * + * The AIL cursor is used to optimise the insert process. If commit_lsn is not + * at the end of the AIL, the insert cursor avoids the need to walk the AIL to + * find the insertion point on every xfs_log_item_batch_insert() call. This + * saves a lot of needless list walking and is a net win, even though it + * slightly increases that amount of AIL lock traffic to set it up and tear it + * down. + */ +static void +xlog_cil_ail_insert( + struct xlog *log, + struct list_head *lv_chain, + xfs_lsn_t commit_lsn, + bool aborted) +{ +#define LOG_ITEM_BATCH_SIZE 32 + struct xfs_ail *ailp = log->l_ailp; + struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; + struct xfs_log_vec *lv; + struct xfs_ail_cursor cur; + int i = 0; + + spin_lock(&ailp->ail_lock); + xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn); + spin_unlock(&ailp->ail_lock); + + /* unpin all the log items */ + list_for_each_entry(lv, lv_chain, lv_list) { + struct xfs_log_item *lip = lv->lv_item; + xfs_lsn_t item_lsn; + + if (aborted) + set_bit(XFS_LI_ABORTED, &lip->li_flags); + + if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { + lip->li_ops->iop_release(lip); + continue; + } + + if (lip->li_ops->iop_committed) + item_lsn = lip->li_ops->iop_committed(lip, commit_lsn); + else + item_lsn = commit_lsn; + + /* item_lsn of -1 means the item needs no further processing */ + if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) + continue; + + /* + * if we are aborting the operation, no point in inserting the + * object into the AIL as we are in a shutdown situation. + */ + if (aborted) { + ASSERT(xlog_is_shutdown(ailp->ail_log)); + if (lip->li_ops->iop_unpin) + lip->li_ops->iop_unpin(lip, 1); + continue; + } + + if (item_lsn != commit_lsn) { + + /* + * Not a bulk update option due to unusual item_lsn. + * Push into AIL immediately, rechecking the lsn once + * we have the ail lock. Then unpin the item. This does + * not affect the AIL cursor the bulk insert path is + * using. + */ + spin_lock(&ailp->ail_lock); + if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0) + xfs_trans_ail_update(ailp, lip, item_lsn); + else + spin_unlock(&ailp->ail_lock); + if (lip->li_ops->iop_unpin) + lip->li_ops->iop_unpin(lip, 0); + continue; + } + + /* Item is a candidate for bulk AIL insert. */ + log_items[i++] = lv->lv_item; + if (i >= LOG_ITEM_BATCH_SIZE) { + xlog_cil_ail_insert_batch(ailp, &cur, log_items, + LOG_ITEM_BATCH_SIZE, commit_lsn); + i = 0; + } + } + + /* make sure we insert the remainder! */ + if (i) + xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn); + + spin_lock(&ailp->ail_lock); + xfs_trans_ail_cursor_done(&cur); + spin_unlock(&ailp->ail_lock); +} + static void xlog_cil_free_logvec( struct list_head *lv_chain) @@ -804,7 +934,7 @@ xlog_cil_committed( spin_unlock(&ctx->cil->xc_push_lock); } - xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, &ctx->lv_chain, + xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain, ctx->start_lsn, abort); xfs_extent_busy_sort(&ctx->busy_extents); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 8c0bfc9a33b1..4ebef316c128 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -717,135 +717,6 @@ xfs_trans_free_items( } } -static inline void -xfs_log_item_batch_insert( - struct xfs_ail *ailp, - struct xfs_ail_cursor *cur, - struct xfs_log_item **log_items, - int nr_items, - xfs_lsn_t commit_lsn) -{ - int i; - - spin_lock(&ailp->ail_lock); - /* xfs_trans_ail_update_bulk drops ailp->ail_lock */ - xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn); - - for (i = 0; i < nr_items; i++) { - struct xfs_log_item *lip = log_items[i]; - - if (lip->li_ops->iop_unpin) - lip->li_ops->iop_unpin(lip, 0); - } -} - -/* - * Bulk operation version of xfs_trans_committed that takes a log vector of - * items to insert into the AIL. This uses bulk AIL insertion techniques to - * minimise lock traffic. - * - * If we are called with the aborted flag set, it is because a log write during - * a CIL checkpoint commit has failed. In this case, all the items in the - * checkpoint have already gone through iop_committed and iop_committing, which - * means that checkpoint commit abort handling is treated exactly the same - * as an iclog write error even though we haven't started any IO yet. Hence in - * this case all we need to do is iop_committed processing, followed by an - * iop_unpin(aborted) call. - * - * The AIL cursor is used to optimise the insert process. If commit_lsn is not - * at the end of the AIL, the insert cursor avoids the need to walk - * the AIL to find the insertion point on every xfs_log_item_batch_insert() - * call. This saves a lot of needless list walking and is a net win, even - * though it slightly increases that amount of AIL lock traffic to set it up - * and tear it down. - */ -void -xfs_trans_committed_bulk( - struct xfs_ail *ailp, - struct list_head *lv_chain, - xfs_lsn_t commit_lsn, - bool aborted) -{ -#define LOG_ITEM_BATCH_SIZE 32 - struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; - struct xfs_log_vec *lv; - struct xfs_ail_cursor cur; - int i = 0; - - spin_lock(&ailp->ail_lock); - xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn); - spin_unlock(&ailp->ail_lock); - - /* unpin all the log items */ - list_for_each_entry(lv, lv_chain, lv_list) { - struct xfs_log_item *lip = lv->lv_item; - xfs_lsn_t item_lsn; - - if (aborted) - set_bit(XFS_LI_ABORTED, &lip->li_flags); - - if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { - lip->li_ops->iop_release(lip); - continue; - } - - if (lip->li_ops->iop_committed) - item_lsn = lip->li_ops->iop_committed(lip, commit_lsn); - else - item_lsn = commit_lsn; - - /* item_lsn of -1 means the item needs no further processing */ - if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) - continue; - - /* - * if we are aborting the operation, no point in inserting the - * object into the AIL as we are in a shutdown situation. - */ - if (aborted) { - ASSERT(xlog_is_shutdown(ailp->ail_log)); - if (lip->li_ops->iop_unpin) - lip->li_ops->iop_unpin(lip, 1); - continue; - } - - if (item_lsn != commit_lsn) { - - /* - * Not a bulk update option due to unusual item_lsn. - * Push into AIL immediately, rechecking the lsn once - * we have the ail lock. Then unpin the item. This does - * not affect the AIL cursor the bulk insert path is - * using. - */ - spin_lock(&ailp->ail_lock); - if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0) - xfs_trans_ail_update(ailp, lip, item_lsn); - else - spin_unlock(&ailp->ail_lock); - if (lip->li_ops->iop_unpin) - lip->li_ops->iop_unpin(lip, 0); - continue; - } - - /* Item is a candidate for bulk AIL insert. */ - log_items[i++] = lv->lv_item; - if (i >= LOG_ITEM_BATCH_SIZE) { - xfs_log_item_batch_insert(ailp, &cur, log_items, - LOG_ITEM_BATCH_SIZE, commit_lsn); - i = 0; - } - } - - /* make sure we insert the remainder! */ - if (i) - xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn); - - spin_lock(&ailp->ail_lock); - xfs_trans_ail_cursor_done(&cur); - spin_unlock(&ailp->ail_lock); -} - /* * Sort transaction items prior to running precommit operations. This will * attempt to order the items such that they will always be locked in the same diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index d5400150358e..52a45f0a5ef1 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -19,9 +19,6 @@ void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_del_item(struct xfs_log_item *); void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp); -void xfs_trans_committed_bulk(struct xfs_ail *ailp, - struct list_head *lv_chain, - xfs_lsn_t commit_lsn, bool aborted); /* * AIL traversal cursor. * From patchwork Thu Sep 21 01:48:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393592 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7120CD493A for ; Thu, 21 Sep 2023 01:48:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229696AbjIUBs7 (ORCPT ); Wed, 20 Sep 2023 21:48:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229734AbjIUBs6 (ORCPT ); Wed, 20 Sep 2023 21:48:58 -0400 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 558AAA9 for ; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-68fac346f6aso349563b3a.3 for ; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260931; x=1695865731; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Z3EL5r0s+aZF35FRWI/+x0IfAnaGtgXcc9Hw4v0urig=; b=UPMh1wnYpxx7n7YIt9Pa6RBvm/UTXDkGXFJr6VYm/HWBCzb+1vHSqF6s6LI/9smZzJ O4Fv/xnJ4E4ttaZcVdkhlCeQwBJ/PjTK+oMSV/zIOKHi5/UkwzxG5UJaT+WGJxecIrjE VQXEtPn3S+87+c9xBNpNXQyecvq3wW5JfxzbA4pqk8gML62JbOCAFPGrg+2KtT2qgDPA wv4184rfbZE0Fak8aUswovDJeeMpOjN7XJs2mw633t8+LG419NAowkOwqyLtSgzhZFI0 LpGqdQprS1ZmVXygOsekzD2+HMxiwoC43PjGkx14jS1pcs41x0xINg7ZxFRcheRBGX10 aR/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260931; x=1695865731; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z3EL5r0s+aZF35FRWI/+x0IfAnaGtgXcc9Hw4v0urig=; b=O0KvQ7kdeWNQwt2zfwz+9JKnNYfeiQ5yav07o3n5/2SUiWbSs/M+LVPwY+aJpaPJdt acTMGjcvx3mp89mh+jZRGqiNYX9Cl5HAH9lhQ0M5mn29a6DDpUZbwJfq8JxR5cJUgTqp lDHF+pN8kZp9HL2iIvmnnsuq1KtQr00ZW4LOaoIH0DlkO3cY4f3ZC1aY6Gar/Q39yvTO O6HgSmGA90XY0VVXxoOaZimbkPypKH+4igcNt6cb5+IvoLC4UQpJMY5leEWeyoaS9/7j ZEvZ+irh7srpmIeJjRxi0WX1jPxobNJcTsC7WKg2mS6eCXiKTypeUCZA4iaPVVQuzLsD /eIQ== X-Gm-Message-State: AOJu0YydMXboOtpTJnK8Gz0nBIaT5rVuwUxZrRkYyOupQysVw1uMtijb Jrcc42y9a6hn8082PJmKIKbDRZXFP5de25kdDu4= X-Google-Smtp-Source: AGHT+IHo2eAfCVHaOQO+AOoR1Hg3do6y+ZbFrXxYQaE5Q54z4WnyoOOwiNzj05E3hK5dIr8YXnYvlQ== X-Received: by 2002:a05:6a21:3d89:b0:149:2fd0:a4ac with SMTP id bj9-20020a056a213d8900b001492fd0a4acmr3949322pzc.42.1695260930616; Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id m1-20020a170902d18100b001b8b1f6619asm160408plb.75.2023.09.20.18.48.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T1d-0l for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oA-00000002VNy-45ub for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:46 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 2/9] xfs: AIL doesn't need manual pushing Date: Thu, 21 Sep 2023 11:48:37 +1000 Message-Id: <20230921014844.582667-3-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner We have a mechanism that checks the amount of log space remaining available every time we make a transaction reservation. If the amount of space is below a threshold (25% free) we push on the AIL to tell it to do more work. To do this, we end up calculating the LSN that the AIL needs to push to on every reservation and updating the push target for the AIL with that new target LSN. This is silly and expensive. The AIL is perfectly capable of calculating the push target itself, and it will always be running when the AIL contains objects. Modify the AIL to calculate it's 25% push target before it starts a push using the same reserve grant head based calculation as is currently used, and remove all the places where we ask the AIL to push to a new 25% free target. This does still require a manual push in certain circumstances. These circumstances arise when the AIL is not full, but the reservation grants consume the entire of the free space in the log. In this case, we still need to push on the AIL to free up space, so when we hit this condition (i.e. reservation going to sleep to wait on log space) we do a single push to tell the AIL it should empty itself. This will keep the AIL moving as new reservations come in and want more space, rather than keep queuing them and having to push the AIL repeatedly. Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_defer.c | 4 +- fs/xfs/xfs_log.c | 135 ++----------------------------- fs/xfs/xfs_log.h | 1 - fs/xfs/xfs_log_priv.h | 2 + fs/xfs/xfs_trans_ail.c | 162 +++++++++++++++++--------------------- fs/xfs/xfs_trans_priv.h | 33 ++++++-- 6 files changed, 108 insertions(+), 229 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index bcfb6a4203cd..05ee0b66772d 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -12,12 +12,14 @@ #include "xfs_mount.h" #include "xfs_defer.h" #include "xfs_trans.h" +#include "xfs_trans_priv.h" #include "xfs_buf_item.h" #include "xfs_inode.h" #include "xfs_inode_item.h" #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_log.h" +#include "xfs_log_priv.h" #include "xfs_rmap.h" #include "xfs_refcount.h" #include "xfs_bmap.h" @@ -440,7 +442,7 @@ xfs_defer_relog( * the log threshold once per call. */ if (threshold_lsn == NULLCOMMITLSN) { - threshold_lsn = xlog_grant_push_threshold(log, 0); + threshold_lsn = xfs_ail_push_target(log->l_ailp); if (threshold_lsn == NULLCOMMITLSN) break; } diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 51c100c86177..bd08d1af59cb 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -30,10 +30,6 @@ xlog_alloc_log( struct xfs_buftarg *log_target, xfs_daddr_t blk_offset, int num_bblks); -STATIC int -xlog_space_left( - struct xlog *log, - atomic64_t *head); STATIC void xlog_dealloc_log( struct xlog *log); @@ -51,10 +47,6 @@ xlog_state_get_iclog_space( struct xlog_ticket *ticket, int *logoffsetp); STATIC void -xlog_grant_push_ail( - struct xlog *log, - int need_bytes); -STATIC void xlog_sync( struct xlog *log, struct xlog_in_core *iclog, @@ -242,42 +234,15 @@ xlog_grant_head_wake( { struct xlog_ticket *tic; int need_bytes; - bool woken_task = false; list_for_each_entry(tic, &head->waiters, t_queue) { - - /* - * There is a chance that the size of the CIL checkpoints in - * progress at the last AIL push target calculation resulted in - * limiting the target to the log head (l_last_sync_lsn) at the - * time. This may not reflect where the log head is now as the - * CIL checkpoints may have completed. - * - * Hence when we are woken here, it may be that the head of the - * log that has moved rather than the tail. As the tail didn't - * move, there still won't be space available for the - * reservation we require. However, if the AIL has already - * pushed to the target defined by the old log head location, we - * will hang here waiting for something else to update the AIL - * push target. - * - * Therefore, if there isn't space to wake the first waiter on - * the grant head, we need to push the AIL again to ensure the - * target reflects both the current log tail and log head - * position before we wait for the tail to move again. - */ - need_bytes = xlog_ticket_reservation(log, head, tic); - if (*free_bytes < need_bytes) { - if (!woken_task) - xlog_grant_push_ail(log, need_bytes); + if (*free_bytes < need_bytes) return false; - } *free_bytes -= need_bytes; trace_xfs_log_grant_wake_up(log, tic); wake_up_process(tic->t_task); - woken_task = true; } return true; @@ -296,13 +261,15 @@ xlog_grant_head_wait( do { if (xlog_is_shutdown(log)) goto shutdown; - xlog_grant_push_ail(log, need_bytes); __set_current_state(TASK_UNINTERRUPTIBLE); spin_unlock(&head->lock); XFS_STATS_INC(log->l_mp, xs_sleep_logspace); + /* Push on the AIL to free up all the log space. */ + xfs_ail_push_all(log->l_ailp); + trace_xfs_log_grant_sleep(log, tic); schedule(); trace_xfs_log_grant_wake(log, tic); @@ -418,9 +385,6 @@ xfs_log_regrant( * of rolling transactions in the log easily. */ tic->t_tid++; - - xlog_grant_push_ail(log, tic->t_unit_res); - tic->t_curr_res = tic->t_unit_res; if (tic->t_cnt > 0) return 0; @@ -477,12 +441,7 @@ xfs_log_reserve( ASSERT(*ticp == NULL); tic = xlog_ticket_alloc(log, unit_bytes, cnt, permanent); *ticp = tic; - - xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt - : tic->t_unit_res); - trace_xfs_log_reserve(log, tic); - error = xlog_grant_head_check(log, &log->l_reserve_head, tic, &need_bytes); if (error) @@ -1330,7 +1289,7 @@ xlog_assign_tail_lsn( * shortcut invalidity asserts in this case so that we don't trigger them * falsely. */ -STATIC int +int xlog_space_left( struct xlog *log, atomic64_t *head) @@ -1671,89 +1630,6 @@ xlog_alloc_log( return ERR_PTR(error); } /* xlog_alloc_log */ -/* - * Compute the LSN that we'd need to push the log tail towards in order to have - * (a) enough on-disk log space to log the number of bytes specified, (b) at - * least 25% of the log space free, and (c) at least 256 blocks free. If the - * log free space already meets all three thresholds, this function returns - * NULLCOMMITLSN. - */ -xfs_lsn_t -xlog_grant_push_threshold( - struct xlog *log, - int need_bytes) -{ - xfs_lsn_t threshold_lsn = 0; - xfs_lsn_t last_sync_lsn; - int free_blocks; - int free_bytes; - int threshold_block; - int threshold_cycle; - int free_threshold; - - ASSERT(BTOBB(need_bytes) < log->l_logBBsize); - - free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); - free_blocks = BTOBBT(free_bytes); - - /* - * Set the threshold for the minimum number of free blocks in the - * log to the maximum of what the caller needs, one quarter of the - * log, and 256 blocks. - */ - free_threshold = BTOBB(need_bytes); - free_threshold = max(free_threshold, (log->l_logBBsize >> 2)); - free_threshold = max(free_threshold, 256); - if (free_blocks >= free_threshold) - return NULLCOMMITLSN; - - xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle, - &threshold_block); - threshold_block += free_threshold; - if (threshold_block >= log->l_logBBsize) { - threshold_block -= log->l_logBBsize; - threshold_cycle += 1; - } - threshold_lsn = xlog_assign_lsn(threshold_cycle, - threshold_block); - /* - * Don't pass in an lsn greater than the lsn of the last - * log record known to be on disk. Use a snapshot of the last sync lsn - * so that it doesn't change between the compare and the set. - */ - last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); - if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0) - threshold_lsn = last_sync_lsn; - - return threshold_lsn; -} - -/* - * Push the tail of the log if we need to do so to maintain the free log space - * thresholds set out by xlog_grant_push_threshold. We may need to adopt a - * policy which pushes on an lsn which is further along in the log once we - * reach the high water mark. In this manner, we would be creating a low water - * mark. - */ -STATIC void -xlog_grant_push_ail( - struct xlog *log, - int need_bytes) -{ - xfs_lsn_t threshold_lsn; - - threshold_lsn = xlog_grant_push_threshold(log, need_bytes); - if (threshold_lsn == NULLCOMMITLSN || xlog_is_shutdown(log)) - return; - - /* - * Get the transaction layer to kick the dirty buffers out to - * disk asynchronously. No point in trying to do this if - * the filesystem is shutting down. - */ - xfs_ail_push(log->l_ailp, threshold_lsn); -} - /* * Stamp cycle number in every block */ @@ -2715,7 +2591,6 @@ xlog_state_set_callback( return; atomic64_set(&log->l_last_sync_lsn, header_lsn); - xlog_grant_push_ail(log, 0); } /* diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 2728886c2963..6b6ee35b3885 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -156,7 +156,6 @@ int xfs_log_quiesce(struct xfs_mount *mp); void xfs_log_clean(struct xfs_mount *mp); bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t); -xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes); bool xlog_force_shutdown(struct xlog *log, uint32_t shutdown_flags); void xlog_use_incompat_feat(struct xlog *log); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index af87648331d5..d4124ef9d97f 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -576,6 +576,8 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space) atomic64_set(head, xlog_assign_grant_head_val(cycle, space)); } +int xlog_space_left(struct xlog *log, atomic64_t *head); + /* * Committed Item List interfaces */ diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 1098452e7f95..31a4e5e5d899 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -134,25 +134,6 @@ xfs_ail_min_lsn( return lsn; } -/* - * Return the maximum lsn held in the AIL, or zero if the AIL is empty. - */ -static xfs_lsn_t -xfs_ail_max_lsn( - struct xfs_ail *ailp) -{ - xfs_lsn_t lsn = 0; - struct xfs_log_item *lip; - - spin_lock(&ailp->ail_lock); - lip = xfs_ail_max(ailp); - if (lip) - lsn = lip->li_lsn; - spin_unlock(&ailp->ail_lock); - - return lsn; -} - /* * The cursor keeps track of where our current traversal is up to by tracking * the next item in the list for us. However, for this to be safe, removing an @@ -414,6 +395,56 @@ xfsaild_push_item( return lip->li_ops->iop_push(lip, &ailp->ail_buf_list); } +/* + * Compute the LSN that we'd need to push the log tail towards in order to have + * at least 25% of the log space free. If the log free space already meets this + * threshold, this function returns NULLCOMMITLSN. + */ +xfs_lsn_t +__xfs_ail_push_target( + struct xfs_ail *ailp) +{ + struct xlog *log = ailp->ail_log; + xfs_lsn_t threshold_lsn = 0; + xfs_lsn_t last_sync_lsn; + int free_blocks; + int free_bytes; + int threshold_block; + int threshold_cycle; + int free_threshold; + + free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); + free_blocks = BTOBBT(free_bytes); + + /* + * The threshold for the minimum number of free blocks is one quarter of + * the entire log space. + */ + free_threshold = log->l_logBBsize >> 2; + if (free_blocks >= free_threshold) + return NULLCOMMITLSN; + + xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle, + &threshold_block); + threshold_block += free_threshold; + if (threshold_block >= log->l_logBBsize) { + threshold_block -= log->l_logBBsize; + threshold_cycle += 1; + } + threshold_lsn = xlog_assign_lsn(threshold_cycle, + threshold_block); + /* + * Don't pass in an lsn greater than the lsn of the last + * log record known to be on disk. Use a snapshot of the last sync lsn + * so that it doesn't change between the compare and the set. + */ + last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); + if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0) + threshold_lsn = last_sync_lsn; + + return threshold_lsn; +} + static long xfsaild_push( struct xfs_ail *ailp) @@ -454,21 +485,24 @@ xfsaild_push( * capture updates that occur after the sync push waiter has gone to * sleep. */ - if (waitqueue_active(&ailp->ail_empty)) { + if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) || + waitqueue_active(&ailp->ail_empty)) { lip = xfs_ail_max(ailp); if (lip) target = lip->li_lsn; + else + clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate); } else { - /* barrier matches the ail_target update in xfs_ail_push() */ - smp_rmb(); - target = ailp->ail_target; - ailp->ail_target_prev = target; + target = __xfs_ail_push_target(ailp); } + if (target == NULLCOMMITLSN) + goto out_done; + /* we're done if the AIL is empty or our push has reached the end */ lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn); if (!lip) - goto out_done; + goto out_done_cursor; XFS_STATS_INC(mp, xs_push_ail); @@ -553,8 +587,9 @@ xfsaild_push( lsn = lip->li_lsn; } -out_done: +out_done_cursor: xfs_trans_ail_cursor_done(&cur); +out_done: spin_unlock(&ailp->ail_lock); if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list)) @@ -603,7 +638,7 @@ xfsaild( set_freezable(); while (1) { - if (tout && tout <= 20) + if (tout) set_current_state(TASK_KILLABLE|TASK_FREEZABLE); else set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); @@ -639,21 +674,9 @@ xfsaild( break; } + /* Idle if the AIL is empty. */ spin_lock(&ailp->ail_lock); - - /* - * Idle if the AIL is empty and we are not racing with a target - * update. We check the AIL after we set the task to a sleep - * state to guarantee that we either catch an ail_target update - * or that a wake_up resets the state to TASK_RUNNING. - * Otherwise, we run the risk of sleeping indefinitely. - * - * The barrier matches the ail_target update in xfs_ail_push(). - */ - smp_rmb(); - if (!xfs_ail_min(ailp) && - ailp->ail_target == ailp->ail_target_prev && - list_empty(&ailp->ail_buf_list)) { + if (!xfs_ail_min(ailp) && list_empty(&ailp->ail_buf_list)) { spin_unlock(&ailp->ail_lock); schedule(); tout = 0; @@ -675,56 +698,6 @@ xfsaild( return 0; } -/* - * This routine is called to move the tail of the AIL forward. It does this by - * trying to flush items in the AIL whose lsns are below the given - * threshold_lsn. - * - * The push is run asynchronously in a workqueue, which means the caller needs - * to handle waiting on the async flush for space to become available. - * We don't want to interrupt any push that is in progress, hence we only queue - * work if we set the pushing bit appropriately. - * - * We do this unlocked - we only need to know whether there is anything in the - * AIL at the time we are called. We don't need to access the contents of - * any of the objects, so the lock is not needed. - */ -void -xfs_ail_push( - struct xfs_ail *ailp, - xfs_lsn_t threshold_lsn) -{ - struct xfs_log_item *lip; - - lip = xfs_ail_min(ailp); - if (!lip || xlog_is_shutdown(ailp->ail_log) || - XFS_LSN_CMP(threshold_lsn, ailp->ail_target) <= 0) - return; - - /* - * Ensure that the new target is noticed in push code before it clears - * the XFS_AIL_PUSHING_BIT. - */ - smp_wmb(); - xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn); - smp_wmb(); - - wake_up_process(ailp->ail_task); -} - -/* - * Push out all items in the AIL immediately - */ -void -xfs_ail_push_all( - struct xfs_ail *ailp) -{ - xfs_lsn_t threshold_lsn = xfs_ail_max_lsn(ailp); - - if (threshold_lsn) - xfs_ail_push(ailp, threshold_lsn); -} - /* * Push out all items in the AIL immediately and wait until the AIL is empty. */ @@ -829,6 +802,13 @@ xfs_trans_ail_update_bulk( if (!list_empty(&tmp)) xfs_ail_splice(ailp, cur, &tmp, lsn); + /* + * If this is the first insert, wake up the push daemon so it can + * actively scan for items to push. + */ + if (!mlip) + wake_up_process(ailp->ail_task); + xfs_ail_update_finish(ailp, tail_lsn); } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 52a45f0a5ef1..9a131e7fae94 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -52,16 +52,18 @@ struct xfs_ail { struct xlog *ail_log; struct task_struct *ail_task; struct list_head ail_head; - xfs_lsn_t ail_target; - xfs_lsn_t ail_target_prev; struct list_head ail_cursors; spinlock_t ail_lock; xfs_lsn_t ail_last_pushed_lsn; int ail_log_flush; + unsigned long ail_opstate; struct list_head ail_buf_list; wait_queue_head_t ail_empty; }; +/* Push all items out of the AIL immediately. */ +#define XFS_AIL_OPSTATE_PUSH_ALL 0u + /* * From xfs_trans_ail.c */ @@ -98,10 +100,29 @@ void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn) __releases(ailp->ail_lock); void xfs_trans_ail_delete(struct xfs_log_item *lip, int shutdown_type); -void xfs_ail_push(struct xfs_ail *, xfs_lsn_t); -void xfs_ail_push_all(struct xfs_ail *); -void xfs_ail_push_all_sync(struct xfs_ail *); -struct xfs_log_item *xfs_ail_min(struct xfs_ail *ailp); +static inline void xfs_ail_push(struct xfs_ail *ailp) +{ + wake_up_process(ailp->ail_task); +} + +static inline void xfs_ail_push_all(struct xfs_ail *ailp) +{ + if (!test_and_set_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate)) + xfs_ail_push(ailp); +} + +xfs_lsn_t __xfs_ail_push_target(struct xfs_ail *ailp); +static inline xfs_lsn_t xfs_ail_push_target(struct xfs_ail *ailp) +{ + xfs_lsn_t lsn; + + spin_lock(&ailp->ail_lock); + lsn = __xfs_ail_push_target(ailp); + spin_unlock(&ailp->ail_lock); + return lsn; +} + +void xfs_ail_push_all_sync(struct xfs_ail *ailp); xfs_lsn_t xfs_ail_min_lsn(struct xfs_ail *ailp); struct xfs_log_item * xfs_trans_ail_cursor_first(struct xfs_ail *ailp, From patchwork Thu Sep 21 01:48:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393588 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35EBACD493A for ; Thu, 21 Sep 2023 01:48:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229718AbjIUBs5 (ORCPT ); Wed, 20 Sep 2023 21:48:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229560AbjIUBs4 (ORCPT ); Wed, 20 Sep 2023 21:48:56 -0400 Received: from mail-oa1-x29.google.com (mail-oa1-x29.google.com [IPv6:2001:4860:4864:20::29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C031AB for ; Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: by mail-oa1-x29.google.com with SMTP id 586e51a60fabf-1dc5df71745so237245fac.2 for ; Wed, 20 Sep 2023 18:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260929; x=1695865729; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=4AFXV0OsPKII9ZxUx13tD38dsvxB+MpCWFyYnczmSN0=; b=JXpV7UuB023QfroCTabwxotejzXbU5owdfvnDza5CY3DnjQ8rzsG0LRViLaGCmv3mx MJzcXtoFdgwB7+v4gY5jXP7htOHzafYq/Kf27bmzunBqjojLVo71O89MVQFIuPm83w/G qGRX0Z/iUOQyLXphHDRNU2H/8AuZ3BeX1cYeKeDc67bESQZRtFWmQbbr8kRvake5bwba blA/GtKbKmW7NziRUktJOKZLc0zzkNNjEGctt03Ab3TIHQhKuAVdaJQUDN1SVEmG0hte 3pKJqymacISctqywTTGM63nDh19VeBvFCYQKnhTHygTAOQnTDJa9nUfyZUqcJhOpKAg4 M4fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260929; x=1695865729; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4AFXV0OsPKII9ZxUx13tD38dsvxB+MpCWFyYnczmSN0=; b=aSDxIW9WF5swqpPeFU69WIOx5FfioO/yqW3FVe61h0Y3YD5NzJm6OrwzVxM+Oqxw7L C+owx9pxCwL/covW1HPIEpp9NSItoMbRHGaPy61u91/ojSOQNa+AbSWYiDejd6XznoaH 54wwyPgXm/chGXDVO9Y81sE5rBVLea2bYHosmr6M4yMF2GIMqPpHgqfvcrjWZo6mRzdV LWJmcZCjkFwvV42i55j2irGS9p8LxCQtEyluY/gJ/jiiewc0SVTE6XXADc4ZuG6fJofD 3nmIoDTqj7yxcw/RYWKHaVFgg69uImH9tqhY/fgnY1n+j3WBwQ+Ed5mUi4/F8YCUX03Z XAtw== X-Gm-Message-State: AOJu0YwtY7CONwj7onVaIagl4bJ0FXZhLfqWTZbGO7VR/R49MN6WtqGg TJG8UAT6gkbEdwYzsjQu/djHlbvDBLFMxXbRBSI= X-Google-Smtp-Source: AGHT+IHnO47sRjlMFQrpo31rJHX9kO6aGwupwDZnLn9iJpIbVnJihybit8Iiv2ywRl3fGvZCqcwHuA== X-Received: by 2002:a05:6870:c8a3:b0:1d2:4b4c:eb11 with SMTP id er35-20020a056870c8a300b001d24b4ceb11mr4717012oab.32.1695260929703; Wed, 20 Sep 2023 18:48:49 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id u20-20020aa78394000000b00690fe1c928esm166493pfm.91.2023.09.20.18.48.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:49 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T1g-0w for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oB-00000002VO2-07D2 for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 3/9] xfs: background AIL push targets physical space, not grant space Date: Thu, 21 Sep 2023 11:48:38 +1000 Message-Id: <20230921014844.582667-4-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Currently the AIL attempts to keep 25% of the "log space" free, where the current used space is tracked by the reserve grant head. That is, it tracks both physical space used plus the amount reserved by transactions in progress. When we start tail pushing, we are trying to make space for new reservations by writing back older metadata and the log is generally physically full of dirty metadata, and reservations for modifications in flight take up whatever space the AIL can physically free up. Hence we don't really need to take into account the reservation space that has been used - we just need to keep the log tail moving as fast as we can to free up space for more reservations to be made. We know exactly how much physical space the journal is consuming in the AIL (i.e. max LSN - min LSN) so we can base push thresholds directly on this state rather than have to look at grant head reservations to determine how much to physically push out of the log. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log_priv.h | 18 ++++++++++++ fs/xfs/xfs_trans_ail.c | 63 +++++++++++++++++++----------------------- 2 files changed, 47 insertions(+), 34 deletions(-) diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index d4124ef9d97f..01c333f712ae 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -625,6 +625,24 @@ xlog_wait( int xlog_wait_on_iclog(struct xlog_in_core *iclog); +/* Calculate the distance between two LSNs in bytes */ +static inline uint64_t +xlog_lsn_sub( + struct xlog *log, + xfs_lsn_t high, + xfs_lsn_t low) +{ + uint32_t hi_cycle = CYCLE_LSN(high); + uint32_t hi_block = BLOCK_LSN(high); + uint32_t lo_cycle = CYCLE_LSN(low); + uint32_t lo_block = BLOCK_LSN(low); + + if (hi_cycle == lo_cycle) + return BBTOB(hi_block - lo_block); + ASSERT((hi_cycle == lo_cycle + 1) || xlog_is_shutdown(log)); + return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block); +} + /* * The LSN is valid so long as it is behind the current LSN. If it isn't, this * means that the next log record that includes this metadata could have a diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 31a4e5e5d899..3103e16d6965 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -398,51 +398,46 @@ xfsaild_push_item( /* * Compute the LSN that we'd need to push the log tail towards in order to have * at least 25% of the log space free. If the log free space already meets this - * threshold, this function returns NULLCOMMITLSN. + * threshold, this function returns the lowest LSN in the AIL to slowly keep + * writeback ticking over and the tail of the log moving forward. */ xfs_lsn_t __xfs_ail_push_target( struct xfs_ail *ailp) { - struct xlog *log = ailp->ail_log; - xfs_lsn_t threshold_lsn = 0; - xfs_lsn_t last_sync_lsn; - int free_blocks; - int free_bytes; - int threshold_block; - int threshold_cycle; - int free_threshold; + struct xlog *log = ailp->ail_log; + struct xfs_log_item *lip; + xfs_lsn_t target_lsn = 0; + xfs_lsn_t max_lsn; + xfs_lsn_t min_lsn; + int32_t free_bytes; + uint32_t target_block; + uint32_t target_cycle; - free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); - free_blocks = BTOBBT(free_bytes); + lockdep_assert_held(&ailp->ail_lock); - /* - * The threshold for the minimum number of free blocks is one quarter of - * the entire log space. - */ - free_threshold = log->l_logBBsize >> 2; - if (free_blocks >= free_threshold) + lip = xfs_ail_max(ailp); + if (!lip) return NULLCOMMITLSN; + max_lsn = lip->li_lsn; + min_lsn = __xfs_ail_min_lsn(ailp); - xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle, - &threshold_block); - threshold_block += free_threshold; - if (threshold_block >= log->l_logBBsize) { - threshold_block -= log->l_logBBsize; - threshold_cycle += 1; + free_bytes = log->l_logsize - xlog_lsn_sub(log, max_lsn, min_lsn); + if (free_bytes >= log->l_logsize >> 2) + return NULLCOMMITLSN; + + target_cycle = CYCLE_LSN(min_lsn); + target_block = BLOCK_LSN(min_lsn) + (log->l_logBBsize >> 2); + if (target_block >= log->l_logBBsize) { + target_block -= log->l_logBBsize; + target_cycle += 1; } - threshold_lsn = xlog_assign_lsn(threshold_cycle, - threshold_block); - /* - * Don't pass in an lsn greater than the lsn of the last - * log record known to be on disk. Use a snapshot of the last sync lsn - * so that it doesn't change between the compare and the set. - */ - last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); - if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0) - threshold_lsn = last_sync_lsn; + target_lsn = xlog_assign_lsn(target_cycle, target_block); - return threshold_lsn; + /* Cap the target to the highest LSN known to be in the AIL. */ + if (XFS_LSN_CMP(target_lsn, max_lsn) > 0) + return max_lsn; + return target_lsn; } static long From patchwork Thu Sep 21 01:48:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECC79CD493D for ; Thu, 21 Sep 2023 01:48:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229774AbjIUBtA (ORCPT ); Wed, 20 Sep 2023 21:49:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229736AbjIUBs7 (ORCPT ); Wed, 20 Sep 2023 21:48:59 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CA5CAF for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-690d8fb3b7eso370119b3a.1 for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260932; x=1695865732; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=CX5jpL8hXnv7uqrCusK7w69MO8sgoFMYTB+neSEgWGQ=; b=fTt8gPFDAUW8U6TqQCr6ebovGhE2TSco+L1DZkrGGatkP3HCOBCfrEDe5kMZw6Kh9+ ymX1CKbH4DJgq01tcmCA/pZjbaW2T64WVgGgoJWiLuXS40MQThIXEcmj+2HRvzXBCuUU fC3iKcQEw23IGCsgqsg99ZnrclQqxs+jMg21WzA21cPrDYWU1SMaH1iVblwOKBmSaF88 +U0Cn492UP1UKUTf1h3VBUpHuI9KF6sXzxK0aEENE86ROZ/vHneGIU69KvgBpN0vZuR7 7jKcUuervm3+TxUpSCZAGjRpZgPz8CX174L5wWvuav1BLY7HqSGxrDzyCyxjEQ2Yn5xi STCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260932; x=1695865732; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CX5jpL8hXnv7uqrCusK7w69MO8sgoFMYTB+neSEgWGQ=; b=iKP/1z9wt6e+LxtSjlc+3VUvVJ9W7kNLAZJcuAHAxzjYZYp9OzxlWgqfUt1OLYC7ui 29CQ3ZdgX9sOKJwPEXDIFdthHlmgoRUAP7s3aeiOlnu70wj2GZlFjKPVya8csuCZQMBB AoJqdPX4NRldI5WQBS3CXdvy2NI6Lz6L+Oc72uJ/QpCkCrqtLXs5vVyVmboza0TgRyNU bOb5Yi8cRiPxcaGJp3eXsgf4LnTimMhcZF0o4WL8B6bLQK0Plnad0x6yrneW1WXiZ+i4 w2gQy14D9QMe5W0Um4Wn1w+TBvMUkdOwBbLt7NeMmmahrGG0dSRnZE4RqtApL5JPX7zW GaIA== X-Gm-Message-State: AOJu0YxV3hkDg9/rM4dkyJasIW4SVEDL756dF3gDwz32Ni5vNJ7iW9+S aC5G4xwczPmtzVjd9zFHoBuVeuLrdWMCT8Iy4pM= X-Google-Smtp-Source: AGHT+IFkjBOmJiYDWeVl3erUmpjjMxwNer2m0a7TdQZHl04mRNQ6jALgTuig/KTY+EvzUcjyIs54VQ== X-Received: by 2002:a05:6a00:179f:b0:68e:3bc7:3101 with SMTP id s31-20020a056a00179f00b0068e3bc73101mr5132984pfg.2.1695260932037; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id a23-20020a62e217000000b00690ca4356f1sm158130pfi.198.2023.09.20.18.48.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T1n-19 for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oB-00000002VO7-0MrE for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 4/9] xfs: ensure log tail is always up to date Date: Thu, 21 Sep 2023 11:48:39 +1000 Message-Id: <20230921014844.582667-5-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Whenever we write an iclog, we call xlog_assign_tail_lsn() to update the current tail before we write it into the iclog header. This means we have to take the AIL lock on every iclog write just to check if the tail of the log has moved. This doesn't avoid races with log tail updates - the log tail could move immediately after we assign the tail to the iclog header and hence by the time the iclog reaches stable storage the tail LSN has moved forward in memory. Hence the log tail LSN in the iclog header is really just a point in time snapshot of the current state of the AIL. With this in mind, if we simply update the in memory log->l_tail_lsn every time it changes in the AIL, there is no need to update the in memory value when we are writing it into an iclog - it will already be up-to-date in memory and checking the AIL again will not change this. Hence xlog_state_release_iclog() does not need to check the AIL to update the tail lsn and can just sample it directly without needing to take the AIL lock. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 5 ++--- fs/xfs/xfs_trans_ail.c | 17 +++++++++++++++-- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index bd08d1af59cb..b6db74de02e1 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -530,7 +530,6 @@ xlog_state_release_iclog( struct xlog_in_core *iclog, struct xlog_ticket *ticket) { - xfs_lsn_t tail_lsn; bool last_ref; lockdep_assert_held(&log->l_icloglock); @@ -545,8 +544,8 @@ xlog_state_release_iclog( if ((iclog->ic_state == XLOG_STATE_WANT_SYNC || (iclog->ic_flags & XLOG_ICL_NEED_FUA)) && !iclog->ic_header.h_tail_lsn) { - tail_lsn = xlog_assign_tail_lsn(log->l_mp); - iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn); + iclog->ic_header.h_tail_lsn = + cpu_to_be64(atomic64_read(&log->l_tail_lsn)); } last_ref = atomic_dec_and_test(&iclog->ic_refcnt); diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 3103e16d6965..e31bfeb01375 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -715,6 +715,13 @@ xfs_ail_push_all_sync( finish_wait(&ailp->ail_empty, &wait); } +/* + * Callers should pass the the original tail lsn so that we can detect if the + * tail has moved as a result of the operation that was performed. If the caller + * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the + * "did the tail LSN change?" checks. If the caller wants to avoid a tail update + * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0. + */ void xfs_ail_update_finish( struct xfs_ail *ailp, @@ -799,10 +806,16 @@ xfs_trans_ail_update_bulk( /* * If this is the first insert, wake up the push daemon so it can - * actively scan for items to push. + * actively scan for items to push. We also need to do a log tail + * LSN update to ensure that it is correctly tracked by the log, so + * set the tail_lsn to NULLCOMMITLSN so that xfs_ail_update_finish() + * will see that the tail lsn has changed and will update the tail + * appropriately. */ - if (!mlip) + if (!mlip) { wake_up_process(ailp->ail_task); + tail_lsn = NULLCOMMITLSN; + } xfs_ail_update_finish(ailp, tail_lsn); } From patchwork Thu Sep 21 01:48:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393594 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF1BECD493C for ; Thu, 21 Sep 2023 01:48:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229736AbjIUBtC (ORCPT ); Wed, 20 Sep 2023 21:49:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229734AbjIUBs7 (ORCPT ); Wed, 20 Sep 2023 21:48:59 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99FB9CF for ; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1c5bf7871dcso3518385ad.1 for ; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260931; x=1695865731; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=HV2egJurCOT4rMR+r9vc8FNmuHn4OSq9oKSLHVZsX58=; b=C0UnAiOVltQ5/JSFg/T8K8Y84KeYHJB9K6cEqBQXaBSmkCrKvQlEUnt93zD+x/Upy4 jPmw2cb5qqF6DQJWOupkHiCvulsy4USKXSRKnxhkaW9pZTGjPI9CbBqwR+si64kyMm5f mQr9qbwcmLkNVi4YWZxXJ0Cp0dvdqyZ3IfJ2nRlDGWt+ahN95fug58D6zIXja/HeRghW BqD4/yOXjKXnbcPUTj3M3k3VpVF3S7L28DWF20nkOMnFO3A4thE9CxmmRmc6ec0yvpIE 6CskHpH49bZw+/l7kXpPrhbkxNYRFe0QTTQq77RYaQ9e/4BYckjpI3ZV2JEYkiVd9CZQ Px0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260931; x=1695865731; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HV2egJurCOT4rMR+r9vc8FNmuHn4OSq9oKSLHVZsX58=; b=AunpnlCndwj1pvcaWtZ8vZW8FpiHP4+cgEhFdmEBOigH53ZuPsefV1YIKbhSGSe/NE GvyAdYkcMdN9MWL1QeK1OjNeqFfQ65pJr+3Xr4BtRBLI7cZQOZ3TWGTtAGqmfcZvFkd3 3dmTVnh7y3pklJweOykOJaDUcgEJcTEsR8SzCgqu50bf1dJAwcHkg7ln68hIN7uYNAnW CtFjLXMqYstR2NrF4VDrz9Z+DgWEDtEYguaWjlukAi+TOVIg/149RgYgCqXC3g4KWYBu yb+2QtBxu88CSuU07YW38Q36jUiWs49iU/UQDMmh9qbmGPXNFxJzzSa2jVnaTKEbWC4f Rrhw== X-Gm-Message-State: AOJu0YzOlq4gfD987r2r6X8/fm9TqDI39e+I4DSzYFCAyB2WUIQZjov1 PRpxR3qPTm3acnHA8xIe0OrDaAeRKLbbIanTM6s= X-Google-Smtp-Source: AGHT+IGgt/emevoSCHFsKKyczvisswwuXIo8AKQ13sY9ZsQtPIRj54a5uUSAO5udRey0Sx5PU+DhbQ== X-Received: by 2002:a17:903:32c5:b0:1c3:411c:9b7d with SMTP id i5-20020a17090332c500b001c3411c9b7dmr4562734plr.57.1695260930877; Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id k9-20020a170902c40900b001c56157f062sm148154plk.227.2023.09.20.18.48.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T1r-1H for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oB-00000002VOD-0Zkj for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state Date: Thu, 21 Sep 2023 11:48:40 +1000 Message-Id: <20230921014844.582667-6-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner The current implementation of xlog_assign_tail_lsn() assumes that when the AIL is empty, the log tail matches the LSN of the last written commit record. This is recorded in xlog_state_set_callback() as log->l_last_sync_lsn when the iclog state changes to XLOG_STATE_CALLBACK. This change is then immediately followed by running the callbacks on the iclog which then insert the log items into the AIL at the "commit lsn" of that checkpoint. The AIL tracks log items via the start record LSN of the checkpoint, not the commit record LSN. THis is because we can pipeline multiple checkpoints, and so the start record of checkpoint N+1 can be written before the commit record of checkpoint N. i.e: start N commit N +-------------+------------+----------------+ start N+1 commit N+1 The tail of the log cannot be moved to the LSN of commit N when all the items of that checkpoint are written back, because then the start record for N+1 is no longer in the active portion of the log and recovery will fail/corrupt the filesystem. Hence when all the log items in checkpoint N are written back, the tail of the log most now only move as far forwards as the start LSN of checkpoint N+1. Hence we cannot use the maximum start record LSN the AIL sees as a replacement the pointer to the current head of the on-disk log records. However, we currently only use the l_last_sync_lsn when the AIL is empty - when there is no start LSN remaining, the tail of the log moves to the LSN of the last commit record as this is where recovery needs to start searching for recoverable records. THe next checkpoint will have a start record LSN that is higher than l_last_sync_lsn, and so everything still works correctly when new checkpoints are written to an otherwise empty log. l_last_sync_lsn is an atomic variable because it is currently updated when an iclog with callbacks attached moves to the CALLBACK state. While we hold the icloglock at this point, we don't hold the AIL lock. When we assign the log tail, we hold the AIL lock, not the icloglock because we have to look up the AIL. Hence it is an atomic variable so it's not bound to a specific lock context. However, the iclog callbacks are only used for CIL checkpoints. We don't use callbacks with unmount record writes, so the l_last_sync_lsn variable only gets updated when we are processing CIL checkpoint callbacks. And those callbacks run under AIL lock contexts, not icloglock context. The CIL checkpoint already knows what the LSN of the iclog the commit record was written to (obtained when written into the iclog before submission) and so we can update the l_last_sync_lsn under the AIL lock in this callback. No other iclog callbacks will run until the currently executing one completes, and hence we can update the l_last_sync_lsn under the AIL lock safely. This means l_last_sync_lsn can move to the AIL as the "ail_head_lsn" and it can be used to replace the atomic l_last_sync_lsn in the iclog code. This makes tracking the log tail belong entirely to the AIL, rather than being smeared across log, iclog and AIL state and locking. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_log.c | 81 +++++----------------------------------- fs/xfs/xfs_log_cil.c | 54 ++++++++++++++++++++------- fs/xfs/xfs_log_priv.h | 9 ++--- fs/xfs/xfs_log_recover.c | 19 +++++----- fs/xfs/xfs_trace.c | 1 + fs/xfs/xfs_trace.h | 8 ++-- fs/xfs/xfs_trans_ail.c | 26 +++++++++++-- fs/xfs/xfs_trans_priv.h | 13 +++++++ 8 files changed, 102 insertions(+), 109 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b6db74de02e1..2ceadee276e2 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1230,47 +1230,6 @@ xfs_log_cover( return error; } -/* - * We may be holding the log iclog lock upon entering this routine. - */ -xfs_lsn_t -xlog_assign_tail_lsn_locked( - struct xfs_mount *mp) -{ - struct xlog *log = mp->m_log; - struct xfs_log_item *lip; - xfs_lsn_t tail_lsn; - - assert_spin_locked(&mp->m_ail->ail_lock); - - /* - * To make sure we always have a valid LSN for the log tail we keep - * track of the last LSN which was committed in log->l_last_sync_lsn, - * and use that when the AIL was empty. - */ - lip = xfs_ail_min(mp->m_ail); - if (lip) - tail_lsn = lip->li_lsn; - else - tail_lsn = atomic64_read(&log->l_last_sync_lsn); - trace_xfs_log_assign_tail_lsn(log, tail_lsn); - atomic64_set(&log->l_tail_lsn, tail_lsn); - return tail_lsn; -} - -xfs_lsn_t -xlog_assign_tail_lsn( - struct xfs_mount *mp) -{ - xfs_lsn_t tail_lsn; - - spin_lock(&mp->m_ail->ail_lock); - tail_lsn = xlog_assign_tail_lsn_locked(mp); - spin_unlock(&mp->m_ail->ail_lock); - - return tail_lsn; -} - /* * Return the space in the log between the tail and the head. The head * is passed in the cycle/bytes formal parms. In the special case where @@ -1504,7 +1463,6 @@ xlog_alloc_log( log->l_prev_block = -1; /* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */ xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0); - xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0); log->l_curr_cycle = 1; /* 0 is bad since this is initial value */ if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1) @@ -2552,44 +2510,23 @@ xlog_get_lowest_lsn( return lowest_lsn; } -/* - * Completion of a iclog IO does not imply that a transaction has completed, as - * transactions can be large enough to span many iclogs. We cannot change the - * tail of the log half way through a transaction as this may be the only - * transaction in the log and moving the tail to point to the middle of it - * will prevent recovery from finding the start of the transaction. Hence we - * should only update the last_sync_lsn if this iclog contains transaction - * completion callbacks on it. - * - * We have to do this before we drop the icloglock to ensure we are the only one - * that can update it. - * - * If we are moving the last_sync_lsn forwards, we also need to ensure we kick - * the reservation grant head pushing. This is due to the fact that the push - * target is bound by the current last_sync_lsn value. Hence if we have a large - * amount of log space bound up in this committing transaction then the - * last_sync_lsn value may be the limiting factor preventing tail pushing from - * freeing space in the log. Hence once we've updated the last_sync_lsn we - * should push the AIL to ensure the push target (and hence the grant head) is - * no longer bound by the old log head location and can move forwards and make - * progress again. - */ static void xlog_state_set_callback( struct xlog *log, struct xlog_in_core *iclog, xfs_lsn_t header_lsn) { + /* + * If there are no callbacks on this iclog, we can mark it clean + * immediately and return. Otherwise we need to run the + * callbacks. + */ + if (list_empty(&iclog->ic_callbacks)) { + xlog_state_clean_iclog(log, iclog); + return; + } trace_xlog_iclog_callback(iclog, _RET_IP_); iclog->ic_state = XLOG_STATE_CALLBACK; - - ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), - header_lsn) <= 0); - - if (list_empty_careful(&iclog->ic_callbacks)) - return; - - atomic64_set(&log->l_last_sync_lsn, header_lsn); } /* diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index c1fee14be5c2..1bedf03d863c 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -722,6 +722,24 @@ xlog_cil_ail_insert_batch( * items into the the AIL. This uses bulk insertion techniques to minimise AIL * lock traffic. * + * The AIL tracks log items via the start record LSN of the checkpoint, + * not the commit record LSN. This is because we can pipeline multiple + * checkpoints, and so the start record of checkpoint N+1 can be + * written before the commit record of checkpoint N. i.e: + * + * start N commit N + * +-------------+------------+----------------+ + * start N+1 commit N+1 + * + * The tail of the log cannot be moved to the LSN of commit N when all + * the items of that checkpoint are written back, because then the + * start record for N+1 is no longer in the active portion of the log + * and recovery will fail/corrupt the filesystem. + * + * Hence when all the log items in checkpoint N are written back, the + * tail of the log most now only move as far forwards as the start LSN + * of checkpoint N+1. + * * If we are called with the aborted flag set, it is because a log write during * a CIL checkpoint commit has failed. In this case, all the items in the * checkpoint have already gone through iop_committed and iop_committing, which @@ -739,24 +757,33 @@ xlog_cil_ail_insert_batch( */ static void xlog_cil_ail_insert( - struct xlog *log, - struct list_head *lv_chain, - xfs_lsn_t commit_lsn, + struct xfs_cil_ctx *ctx, bool aborted) { #define LOG_ITEM_BATCH_SIZE 32 - struct xfs_ail *ailp = log->l_ailp; + struct xfs_ail *ailp = ctx->cil->xc_log->l_ailp; struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; struct xfs_log_vec *lv; struct xfs_ail_cursor cur; int i = 0; + /* + * Update the AIL head LSN with the commit record LSN of this + * checkpoint. As iclogs are always completed in order, this should + * always be the same (as iclogs can contain multiple commit records) or + * higher LSN than the current head. We do this before insertion of the + * items so that log space checks during insertion will reflect the + * space that this checkpoint has already consumed. + */ + ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 || + aborted); spin_lock(&ailp->ail_lock); - xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn); + ailp->ail_head_lsn = ctx->commit_lsn; + xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn); spin_unlock(&ailp->ail_lock); /* unpin all the log items */ - list_for_each_entry(lv, lv_chain, lv_list) { + list_for_each_entry(lv, &ctx->lv_chain, lv_list) { struct xfs_log_item *lip = lv->lv_item; xfs_lsn_t item_lsn; @@ -769,9 +796,10 @@ xlog_cil_ail_insert( } if (lip->li_ops->iop_committed) - item_lsn = lip->li_ops->iop_committed(lip, commit_lsn); + item_lsn = lip->li_ops->iop_committed(lip, + ctx->start_lsn); else - item_lsn = commit_lsn; + item_lsn = ctx->start_lsn; /* item_lsn of -1 means the item needs no further processing */ if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) @@ -788,7 +816,7 @@ xlog_cil_ail_insert( continue; } - if (item_lsn != commit_lsn) { + if (item_lsn != ctx->start_lsn) { /* * Not a bulk update option due to unusual item_lsn. @@ -811,14 +839,15 @@ xlog_cil_ail_insert( log_items[i++] = lv->lv_item; if (i >= LOG_ITEM_BATCH_SIZE) { xlog_cil_ail_insert_batch(ailp, &cur, log_items, - LOG_ITEM_BATCH_SIZE, commit_lsn); + LOG_ITEM_BATCH_SIZE, ctx->start_lsn); i = 0; } } /* make sure we insert the remainder! */ if (i) - xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn); + xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, + ctx->start_lsn); spin_lock(&ailp->ail_lock); xfs_trans_ail_cursor_done(&cur); @@ -934,8 +963,7 @@ xlog_cil_committed( spin_unlock(&ctx->cil->xc_push_lock); } - xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain, - ctx->start_lsn, abort); + xlog_cil_ail_insert(ctx, abort); xfs_extent_busy_sort(&ctx->busy_extents); xfs_extent_busy_clear(mp, &ctx->busy_extents, diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 01c333f712ae..106483cf1fb6 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -429,13 +429,10 @@ struct xlog { int l_prev_block; /* previous logical log block */ /* - * l_last_sync_lsn and l_tail_lsn are atomics so they can be set and - * read without needing to hold specific locks. To avoid operations - * contending with other hot objects, place each of them on a separate - * cacheline. + * l_tail_lsn is atomic so it can be set and read without needing to + * hold specific locks. To avoid operations contending with other hot + * objects, it on a separate cacheline. */ - /* lsn of last LR on disk */ - atomic64_t l_last_sync_lsn ____cacheline_aligned_in_smp; /* lsn of 1st LR with unflushed * buffers */ atomic64_t l_tail_lsn ____cacheline_aligned_in_smp; diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 13b94d2e605b..8d963c3db4f3 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1177,8 +1177,8 @@ xlog_check_unmount_rec( */ xlog_assign_atomic_lsn(&log->l_tail_lsn, log->l_curr_cycle, after_umount_blk); - xlog_assign_atomic_lsn(&log->l_last_sync_lsn, - log->l_curr_cycle, after_umount_blk); + log->l_ailp->ail_head_lsn = + atomic64_read(&log->l_tail_lsn); *tail_blk = after_umount_blk; *clean = true; @@ -1212,7 +1212,7 @@ xlog_set_state( if (bump_cycle) log->l_curr_cycle++; atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn)); - atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(rhead->h_lsn)); + log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn); xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle, BBTOB(log->l_curr_block)); xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle, @@ -3299,14 +3299,13 @@ xlog_do_recover( /* * We now update the tail_lsn since much of the recovery has completed - * and there may be space available to use. If there were no extent - * or iunlinks, we can free up the entire log and set the tail_lsn to - * be the last_sync_lsn. This was set in xlog_find_tail to be the - * lsn of the last known good LR on disk. If there are extent frees - * or iunlinks they will have some entries in the AIL; so we look at - * the AIL to determine how to set the tail_lsn. + * and there may be space available to use. If there were no extent or + * iunlinks, we can free up the entire log. This was set in + * xlog_find_tail to be the lsn of the last known good LR on disk. If + * there are extent frees or iunlinks they will have some entries in the + * AIL; so we look at the AIL to determine how to set the tail_lsn. */ - xlog_assign_tail_lsn(mp); + xfs_ail_assign_tail_lsn(log->l_ailp); /* * Now that we've finished replaying all buffer and inode updates, diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 8a5dc1538aa8..2bf489b445ee 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -22,6 +22,7 @@ #include "xfs_trans.h" #include "xfs_log.h" #include "xfs_log_priv.h" +#include "xfs_trans_priv.h" #include "xfs_buf_item.h" #include "xfs_quota.h" #include "xfs_dquot_item.h" diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 3926cf7f2a6e..784c6d63daa9 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1388,19 +1388,19 @@ TRACE_EVENT(xfs_log_assign_tail_lsn, __field(dev_t, dev) __field(xfs_lsn_t, new_lsn) __field(xfs_lsn_t, old_lsn) - __field(xfs_lsn_t, last_sync_lsn) + __field(xfs_lsn_t, head_lsn) ), TP_fast_assign( __entry->dev = log->l_mp->m_super->s_dev; __entry->new_lsn = new_lsn; __entry->old_lsn = atomic64_read(&log->l_tail_lsn); - __entry->last_sync_lsn = atomic64_read(&log->l_last_sync_lsn); + __entry->head_lsn = log->l_ailp->ail_head_lsn; ), - TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, last sync %d/%d", + TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, head lsn %d/%d", MAJOR(__entry->dev), MINOR(__entry->dev), CYCLE_LSN(__entry->new_lsn), BLOCK_LSN(__entry->new_lsn), CYCLE_LSN(__entry->old_lsn), BLOCK_LSN(__entry->old_lsn), - CYCLE_LSN(__entry->last_sync_lsn), BLOCK_LSN(__entry->last_sync_lsn)) + CYCLE_LSN(__entry->head_lsn), BLOCK_LSN(__entry->head_lsn)) ) DECLARE_EVENT_CLASS(xfs_file_class, diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index e31bfeb01375..4fa7f2f77563 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -715,6 +715,26 @@ xfs_ail_push_all_sync( finish_wait(&ailp->ail_empty, &wait); } +void +__xfs_ail_assign_tail_lsn( + struct xfs_ail *ailp) +{ + struct xlog *log = ailp->ail_log; + xfs_lsn_t tail_lsn; + + assert_spin_locked(&ailp->ail_lock); + + if (xlog_is_shutdown(log)) + return; + + tail_lsn = __xfs_ail_min_lsn(ailp); + if (!tail_lsn) + tail_lsn = ailp->ail_head_lsn; + + trace_xfs_log_assign_tail_lsn(log, tail_lsn); + atomic64_set(&log->l_tail_lsn, tail_lsn); +} + /* * Callers should pass the the original tail lsn so that we can detect if the * tail has moved as a result of the operation that was performed. If the caller @@ -729,15 +749,13 @@ xfs_ail_update_finish( { struct xlog *log = ailp->ail_log; - /* if the tail lsn hasn't changed, don't do updates or wakeups. */ + /* If the tail lsn hasn't changed, don't do updates or wakeups. */ if (!old_lsn || old_lsn == __xfs_ail_min_lsn(ailp)) { spin_unlock(&ailp->ail_lock); return; } - if (!xlog_is_shutdown(log)) - xlog_assign_tail_lsn_locked(log->l_mp); - + __xfs_ail_assign_tail_lsn(ailp); if (list_empty(&ailp->ail_head)) wake_up_all(&ailp->ail_empty); spin_unlock(&ailp->ail_lock); diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 9a131e7fae94..6541a6c3ea22 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -55,6 +55,7 @@ struct xfs_ail { struct list_head ail_cursors; spinlock_t ail_lock; xfs_lsn_t ail_last_pushed_lsn; + xfs_lsn_t ail_head_lsn; int ail_log_flush; unsigned long ail_opstate; struct list_head ail_buf_list; @@ -135,6 +136,18 @@ struct xfs_log_item * xfs_trans_ail_cursor_next(struct xfs_ail *ailp, struct xfs_ail_cursor *cur); void xfs_trans_ail_cursor_done(struct xfs_ail_cursor *cur); +void __xfs_ail_assign_tail_lsn(struct xfs_ail *ailp); + +static inline void +xfs_ail_assign_tail_lsn( + struct xfs_ail *ailp) +{ + + spin_lock(&ailp->ail_lock); + __xfs_ail_assign_tail_lsn(ailp); + spin_unlock(&ailp->ail_lock); +} + #if BITS_PER_LONG != 64 static inline void xfs_trans_ail_copy_lsn( From patchwork Thu Sep 21 01:48:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C9C1CD4938 for ; Thu, 21 Sep 2023 01:48:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229560AbjIUBs5 (ORCPT ); Wed, 20 Sep 2023 21:48:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229628AbjIUBs5 (ORCPT ); Wed, 20 Sep 2023 21:48:57 -0400 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D044B7 for ; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: by mail-oi1-x22e.google.com with SMTP id 5614622812f47-3ae015b6441so51513b6e.1 for ; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260930; x=1695865730; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=dwerFL2Xlng+UKi9MgZzfcivTNi/U3UsTqlgcNiGGGE=; b=G96pAx/qH3Tc5YKVjvZIzQmeOCfld3VAs/OKSLOwTdNkWrb34MdSE2nA3nCfr01u1o IKZx4XaIOG33bMuavURu3liGPbrkcWidvg7ty67fn82gIbASbMLUI7kH3IfcsF2tigY0 +5R19ws+iAS6gz6rYzagSliWt4AbLzJMN99whP3jeCc6RWCZCs6Fig1ezJzw6ue3spa/ NQPs4o4uVHsdLY585UIREXYZK6HhQdL29lJEJPquPJib/bMSNvSKu3+T49DpPGrBzb1K uoLC7AxCztgkKrbz9NAAhySIiuCm36igXzHw0H3PZStJIvj8GkOXP62W1iI5SIPzoSiU ZzOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260930; x=1695865730; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dwerFL2Xlng+UKi9MgZzfcivTNi/U3UsTqlgcNiGGGE=; b=s2RgKplkfwv9sOI7STHlEHIlMDnFdeF4P7ouWsBgEnDNN9UWmNH20Tn1pMSlUpBBAv mNXh2zOC1/3F43pPGNO24yaRKW5jnNjMCxcbrLuzqG2CAQwSYEHCSt4s+gSLZw7A1uhT fR5rw86PIhuswlmQCauIqNikBaPny/TKUAwcu9FJEUhp47Zy/UY8Rw3H9swaa1oCg6Oq BgpEJBuf3XF1nWNc+J0RALBjMLDaqF3b02e3WZWORqghwVDV3FtJKoWTi61LSbTEDQWH sYj0MWZQCXXXgz8fC34tdXHyB+XoRkKas0/nU7YJAIzHNJGxvkMGfWtdxcGWuVZbolkl ROjA== X-Gm-Message-State: AOJu0Yya0P3hRJT9aApwwDaud+n+kz7zMG4cy8gMoXtchZjAz6M5tugB RD3Ne/SEmB/tIndsEVNOr9NpIKB9avy2uedoGcY= X-Google-Smtp-Source: AGHT+IF8P0iunp1Q2F9nGD6Mz4loMoccQO9o+nw1dvDio9I8Gx42Ev0xpJOX/1WLtezZ3oQdK4xRRw== X-Received: by 2002:a05:6808:3091:b0:3a5:cc7d:3d66 with SMTP id bl17-20020a056808309100b003a5cc7d3d66mr4521380oib.49.1695260930370; Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id g30-20020a63375e000000b0057401997c22sm138949pgn.11.2023.09.20.18.48.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T1v-1S for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oB-00000002VOI-0mPK for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller Date: Thu, 21 Sep 2023 11:48:41 +1000 Message-Id: <20230921014844.582667-7-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner The function is called from a single place, and it isn't just setting the iclog state to XLOG_STATE_CALLBACK - it can mark iclogs clean, which moves them to states after CALLBACK. Hence the function is now badly named, and should just be folded into the caller where the iclog completion logic makes a whole lot more sense. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 31 +++++++++++-------------------- 1 file changed, 11 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 2ceadee276e2..83a5eb992574 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2510,25 +2510,6 @@ xlog_get_lowest_lsn( return lowest_lsn; } -static void -xlog_state_set_callback( - struct xlog *log, - struct xlog_in_core *iclog, - xfs_lsn_t header_lsn) -{ - /* - * If there are no callbacks on this iclog, we can mark it clean - * immediately and return. Otherwise we need to run the - * callbacks. - */ - if (list_empty(&iclog->ic_callbacks)) { - xlog_state_clean_iclog(log, iclog); - return; - } - trace_xlog_iclog_callback(iclog, _RET_IP_); - iclog->ic_state = XLOG_STATE_CALLBACK; -} - /* * Return true if we need to stop processing, false to continue to the next * iclog. The caller will need to run callbacks if the iclog is returned in the @@ -2560,7 +2541,17 @@ xlog_state_iodone_process_iclog( lowest_lsn = xlog_get_lowest_lsn(log); if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) return false; - xlog_state_set_callback(log, iclog, header_lsn); + /* + * If there are no callbacks on this iclog, we can mark it clean + * immediately and return. Otherwise we need to run the + * callbacks. + */ + if (list_empty(&iclog->ic_callbacks)) { + xlog_state_clean_iclog(log, iclog); + return false; + } + trace_xlog_iclog_callback(iclog, _RET_IP_); + iclog->ic_state = XLOG_STATE_CALLBACK; return false; default: /* From patchwork Thu Sep 21 01:48:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393590 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04720CD493C for ; Thu, 21 Sep 2023 01:48:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229726AbjIUBs6 (ORCPT ); Wed, 20 Sep 2023 21:48:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229696AbjIUBs6 (ORCPT ); Wed, 20 Sep 2023 21:48:58 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C4D3AB for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1c0c6d4d650so3611475ad.0 for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260932; x=1695865732; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=r63Y1c0sWYe57/4CSkh7AA0Bh5FRPJfwMyXaI6C4DhQ=; b=fn6/cv99d0LXg/G6aiRo5wp83FZCeJX4CnnLCyZARuC4cDWslmDbkrWv4wPcyzUMKB pprMsO8Bh29G4oqlG/e2q7ilarzY9zNtKBw///Yxb+jeLgaljn7LUZvcSpWg4j6We2Ld RUR9q9YKYIZi/qLxdyr5qwGXO2c9wRKVxhzEamzXWXAlwvUjB5Qwom5tRV1QtUNNl94u J5fInXI7mM7vikxt1fihlle9Z91C8xWWUU8yA/WyxaS73WnpCYWMPcTb79cC9edZp5vH ww++HuodA4NjOL4pQaN3W1iib7982+VvnQsLfLapDJM/e4U+aQOCGrJF+FAgiMIStcm2 f3Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260932; x=1695865732; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r63Y1c0sWYe57/4CSkh7AA0Bh5FRPJfwMyXaI6C4DhQ=; b=Ao7jvKvAXqVBGTBvuiv7ttyw57R3iS+wUj29R4pf4CeBwAazUxG6Q+IzHjgcfzhfmK nVoLRmttgDsBiY9f1Ds4KQWmh6fL054J2AoP1fJsD+VQhwUKYT3Cp5bfjWrWGFlb+HFk dHiPg70t8J1l1Li0/5AYLnzVKFGO4Jw+rTWEP2YTQ8OKOVnjCLqwtK1vu5aLOSkb8fbm nZC0u8KZYPe7qpnF/y61mRy8rdW47Fwdoo5Cu8RGPNo2JKd1KbzxyvPHO6UTvxfJnKr9 HuhVOUdLAVFzhVnuO6H29r3pb/nZgiLESZJBPmYjuGqUry4612njR/+gRIoWe8n4qtHP foAA== X-Gm-Message-State: AOJu0Yx890jFQZGdiZJ+3qudLQHFZP5ko/Sb/w19c+vVboTRrMPK5uKN G92/ibRsq7ef+82YLd/khWwbc/dRy3zhpy6Bzlk= X-Google-Smtp-Source: AGHT+IFqdEyHRUDz9zwuX2R4MfjVh8ojJnPsFHAg791ltzilgyv8n++pN0FAkU2g8G+nv3pXQrURsA== X-Received: by 2002:a17:903:191:b0:1c3:a4f2:7ca3 with SMTP id z17-20020a170903019100b001c3a4f27ca3mr5010077plg.66.1695260931778; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id i11-20020a17090332cb00b001bc0f974117sm159255plr.57.2023.09.20.18.48.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T1z-1g for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oB-00000002VON-10Uc for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 7/9] xfs: track log space pinned by the AIL Date: Thu, 21 Sep 2023 11:48:42 +1000 Message-Id: <20230921014844.582667-8-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Currently we track space used in the log by grant heads. These store the reserved space as a physical log location and combine both space reserved for future use with space already used in the log in a single variable. The amount of space consumed in the log is then calculated as the distance between the log tail and the grant head. The problem with tracking the grant head as a physical location comes from the fact that it tracks both log cycle count and offset into the log in bytes in a single 64 bit variable. because the cycle count on disk is a 32 bit number, this also limits the offset into the log to 32 bits. ANd because that is in bytes, we are limited to being able to track only 2GB of log space in the grant head. Hence to support larger physical logs, we need to track used space differently in the grant head. We no longer use the grant head for guiding AIL pushing, so the only thing it is now used for is determining if we've run out of reservation space via the calculation in xlog_space_left(). What we really need to do is move the grant heads away from tracking physical space in the log. The issue here is that space consumed in the log is not directly tracked by the current mechanism - the space consumed in the log by grant head reservations gets returned to the free pool by the tail of the log moving forward. i.e. the space isn't directly tracked or calculated, but the used grant space gets "freed" as the physical limits of the log are updated without actually needing to update the grant heads. Hence to move away from implicit, zero-update log space tracking we need to explicitly track the amount of physical space the log actually consumes separately to the in-memory reservations for operations that will be committed to the journal. Luckily, we already track the information we need to calculate this in the AIL itself. That is, the space currently consumed by the journal is the maximum LSN that the AIL has seen minus the current log tail. As we update both of these items dynamically as the head and tail of the log moves, we always know exactly how much space the journal consumes. This means that we also know exactly how much space the currently active reservations require, and exactly how much free space we have remaining for new reservations to be made. Most importantly, we know what these spaces are indepedently of the physical locations of the head and tail of the log. Hence by separating out the physical space consumed by the journal, we can now track reservations in the grant heads purely as a byte count, and the log can be considered full when the tail space + reservation space exceeds the size of the log. This means we can use the full 64 bits of grant head space for reservation space, completely removing the 32 bit byte count limitation on log size that they impose. Hence the first step in this conversion is to track and update the "log tail space" every time the AIL tail or maximum seen LSN changes. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log_cil.c | 9 ++++++--- fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_trans_ail.c | 9 ++++++--- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 1bedf03d863c..6c05097ebfdf 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -773,14 +773,17 @@ xlog_cil_ail_insert( * always be the same (as iclogs can contain multiple commit records) or * higher LSN than the current head. We do this before insertion of the * items so that log space checks during insertion will reflect the - * space that this checkpoint has already consumed. + * space that this checkpoint has already consumed. We call + * xfs_ail_update_finish() so that tail space and space-based wakeups + * will be recalculated appropriately. */ ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 || aborted); spin_lock(&ailp->ail_lock); - ailp->ail_head_lsn = ctx->commit_lsn; xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn); - spin_unlock(&ailp->ail_lock); + ailp->ail_head_lsn = ctx->commit_lsn; + /* xfs_ail_update_finish() drops the ail_lock */ + xfs_ail_update_finish(ailp, NULLCOMMITLSN); /* unpin all the log items */ list_for_each_entry(lv, &ctx->lv_chain, lv_list) { diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 106483cf1fb6..1390251bf670 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -438,6 +438,7 @@ struct xlog { struct xlog_grant_head l_reserve_head; struct xlog_grant_head l_write_head; + uint64_t l_tail_space; struct xfs_kobj l_kobj; diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 4fa7f2f77563..38a631986ef0 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -731,6 +731,8 @@ __xfs_ail_assign_tail_lsn( if (!tail_lsn) tail_lsn = ailp->ail_head_lsn; + WRITE_ONCE(log->l_tail_space, + xlog_lsn_sub(log, ailp->ail_head_lsn, tail_lsn)); trace_xfs_log_assign_tail_lsn(log, tail_lsn); atomic64_set(&log->l_tail_lsn, tail_lsn); } @@ -738,9 +740,10 @@ __xfs_ail_assign_tail_lsn( /* * Callers should pass the the original tail lsn so that we can detect if the * tail has moved as a result of the operation that was performed. If the caller - * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the - * "did the tail LSN change?" checks. If the caller wants to avoid a tail update - * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0. + * needs to force a tail space update, it should pass NULLCOMMITLSN to bypass + * the "did the tail LSN change?" checks. If the caller wants to avoid a tail + * update (e.g. it knows the tail did not change) it should pass an @old_lsn of + * 0. */ void xfs_ail_update_finish( From patchwork Thu Sep 21 01:48:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393596 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8688DCD493D for ; Thu, 21 Sep 2023 01:48:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229473AbjIUBtD (ORCPT ); Wed, 20 Sep 2023 21:49:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229770AbjIUBtA (ORCPT ); Wed, 20 Sep 2023 21:49:00 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0E24DE for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-690bd59322dso312124b3a.3 for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260932; x=1695865732; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Jr7o+h+1YcqphLBZmqt8RANttT0ZCL8tLIuzKTp6/OI=; b=AjOVSNuPKb/S7M4tZQa7/xX6nr5OkzmpzAHazz5aAXAcPvb9maaTTH2+hnhpigTzc5 0snIbl4WnGwiQZGVJuAekZo6g18DQKF3T8TH1udEFKgU8K/+pdzX9dziA+1w4H3E0Sz4 CWkFDECJRYkkulbigVjk91eHQ7RyzBBydCLCy1+03XfH3hORGnPc6INcIrLcyoardTQF olejrUBE759TUCeCLHkYfstuQqRdQD0NyCb5DM0nWxQEJOKXCcjYDw4wDq0ZwXEFccFK 6mPNuHa3QpDR0Kw4diPh+cRsR4LuIu3sgoVerF2+dX9fpmjWEksukgdQoe6P5ShoyTbF Rh0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260932; x=1695865732; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jr7o+h+1YcqphLBZmqt8RANttT0ZCL8tLIuzKTp6/OI=; b=neM8yHAT3MCks0dByd9yqGzpAZEN++M00/TSJnM3GiatnTZD5kdqZX9y1nmhVvhp4X HRS41vtjiN71RWFiuja7Z15WMu4aU5/AwEfUkVjIE5smYtXAJIqyz5J0dcxZvcW/C5HX pJEtEvlnr+ckzyq02Ffh7rhmamQrbuOoTdUxotUS3lkGIIRmo3b3moUpfxKGybFIZ8wk KF1jRSAuBzstGH7HTrEBmCHJxGKCIiVZYzQMRYJ8FQewBLOIlK2tcWEw8nOcRQXBAFbY reSb90K7F83r42xiOhdTWj+jEd/IDDuHMgl3SyRIu2mBKBJiC+FdVV/6BGHb8DvbZ3/T n2iA== X-Gm-Message-State: AOJu0YzDY0pU5zkhpj4/gFnAlWGyFbwKhJN5grKLpoO0jm7w7FLpAIkD +tgJzKLwByr8KKfBkVirJowgwB8SRF71pVjTuNA= X-Google-Smtp-Source: AGHT+IEwOoj8bKtVt5Hd+VrSQQ5b3nXkBDfHWtK37ePIEOff1/IWOSQubcQJM212BaFvEUKDtYmr2A== X-Received: by 2002:a05:6a20:5608:b0:154:e7e6:85bd with SMTP id ir8-20020a056a20560800b00154e7e685bdmr3843873pzc.20.1695260932290; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id b8-20020a170902d50800b001b891259eddsm148104plg.197.2023.09.20.18.48.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T23-1q for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oB-00000002VOS-1GE6 for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 8/9] xfs: pass the full grant head to accounting functions Date: Thu, 21 Sep 2023 11:48:43 +1000 Message-Id: <20230921014844.582667-9-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Because we are going to need them soon. API change only, no logic changes. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 157 +++++++++++++++++++++--------------------- fs/xfs/xfs_log_priv.h | 2 - 2 files changed, 77 insertions(+), 82 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 83a5eb992574..ed1e1f1dc8e4 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -136,10 +136,10 @@ xlog_prepare_iovec( static void xlog_grant_sub_space( struct xlog *log, - atomic64_t *head, + struct xlog_grant_head *head, int bytes) { - int64_t head_val = atomic64_read(head); + int64_t head_val = atomic64_read(&head->grant); int64_t new, old; do { @@ -155,17 +155,17 @@ xlog_grant_sub_space( old = head_val; new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(head, old, new); + head_val = atomic64_cmpxchg(&head->grant, old, new); } while (head_val != old); } static void xlog_grant_add_space( struct xlog *log, - atomic64_t *head, + struct xlog_grant_head *head, int bytes) { - int64_t head_val = atomic64_read(head); + int64_t head_val = atomic64_read(&head->grant); int64_t new, old; do { @@ -184,7 +184,7 @@ xlog_grant_add_space( old = head_val; new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(head, old, new); + head_val = atomic64_cmpxchg(&head->grant, old, new); } while (head_val != old); } @@ -197,6 +197,63 @@ xlog_grant_head_init( spin_lock_init(&head->lock); } +/* + * Return the space in the log between the tail and the head. The head + * is passed in the cycle/bytes formal parms. In the special case where + * the reserve head has wrapped passed the tail, this calculation is no + * longer valid. In this case, just return 0 which means there is no space + * in the log. This works for all places where this function is called + * with the reserve head. Of course, if the write head were to ever + * wrap the tail, we should blow up. Rather than catch this case here, + * we depend on other ASSERTions in other parts of the code. XXXmiken + * + * If reservation head is behind the tail, we have a problem. Warn about it, + * but then treat it as if the log is empty. + * + * If the log is shut down, the head and tail may be invalid or out of whack, so + * shortcut invalidity asserts in this case so that we don't trigger them + * falsely. + */ +static int +xlog_grant_space_left( + struct xlog *log, + struct xlog_grant_head *head) +{ + int tail_bytes; + int tail_cycle; + int head_cycle; + int head_bytes; + + xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes); + xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes); + tail_bytes = BBTOB(tail_bytes); + if (tail_cycle == head_cycle && head_bytes >= tail_bytes) + return log->l_logsize - (head_bytes - tail_bytes); + if (tail_cycle + 1 < head_cycle) + return 0; + + /* Ignore potential inconsistency when shutdown. */ + if (xlog_is_shutdown(log)) + return log->l_logsize; + + if (tail_cycle < head_cycle) { + ASSERT(tail_cycle == (head_cycle - 1)); + return tail_bytes - head_bytes; + } + + /* + * The reservation head is behind the tail. In this case we just want to + * return the size of the log as the amount of space left. + */ + xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail"); + xfs_alert(log->l_mp, " tail_cycle = %d, tail_bytes = %d", + tail_cycle, tail_bytes); + xfs_alert(log->l_mp, " GH cycle = %d, GH bytes = %d", + head_cycle, head_bytes); + ASSERT(0); + return log->l_logsize; +} + STATIC void xlog_grant_head_wake_all( struct xlog_grant_head *head) @@ -277,7 +334,7 @@ xlog_grant_head_wait( spin_lock(&head->lock); if (xlog_is_shutdown(log)) goto shutdown; - } while (xlog_space_left(log, &head->grant) < need_bytes); + } while (xlog_grant_space_left(log, head) < need_bytes); list_del_init(&tic->t_queue); return 0; @@ -322,7 +379,7 @@ xlog_grant_head_check( * otherwise try to get some space for this transaction. */ *need_bytes = xlog_ticket_reservation(log, head, tic); - free_bytes = xlog_space_left(log, &head->grant); + free_bytes = xlog_grant_space_left(log, head); if (!list_empty_careful(&head->waiters)) { spin_lock(&head->lock); if (!xlog_grant_head_wake(log, head, &free_bytes) || @@ -396,7 +453,7 @@ xfs_log_regrant( if (error) goto out_error; - xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes); + xlog_grant_add_space(log, &log->l_write_head, need_bytes); trace_xfs_log_regrant_exit(log, tic); xlog_verify_grant_tail(log); return 0; @@ -447,8 +504,8 @@ xfs_log_reserve( if (error) goto out_error; - xlog_grant_add_space(log, &log->l_reserve_head.grant, need_bytes); - xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes); + xlog_grant_add_space(log, &log->l_reserve_head, need_bytes); + xlog_grant_add_space(log, &log->l_write_head, need_bytes); trace_xfs_log_reserve_exit(log, tic); xlog_verify_grant_tail(log); return 0; @@ -1107,7 +1164,7 @@ xfs_log_space_wake( ASSERT(!xlog_in_recovery(log)); spin_lock(&log->l_write_head.lock); - free_bytes = xlog_space_left(log, &log->l_write_head.grant); + free_bytes = xlog_grant_space_left(log, &log->l_write_head); xlog_grant_head_wake(log, &log->l_write_head, &free_bytes); spin_unlock(&log->l_write_head.lock); } @@ -1116,7 +1173,7 @@ xfs_log_space_wake( ASSERT(!xlog_in_recovery(log)); spin_lock(&log->l_reserve_head.lock); - free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); + free_bytes = xlog_grant_space_left(log, &log->l_reserve_head); xlog_grant_head_wake(log, &log->l_reserve_head, &free_bytes); spin_unlock(&log->l_reserve_head.lock); } @@ -1230,64 +1287,6 @@ xfs_log_cover( return error; } -/* - * Return the space in the log between the tail and the head. The head - * is passed in the cycle/bytes formal parms. In the special case where - * the reserve head has wrapped passed the tail, this calculation is no - * longer valid. In this case, just return 0 which means there is no space - * in the log. This works for all places where this function is called - * with the reserve head. Of course, if the write head were to ever - * wrap the tail, we should blow up. Rather than catch this case here, - * we depend on other ASSERTions in other parts of the code. XXXmiken - * - * If reservation head is behind the tail, we have a problem. Warn about it, - * but then treat it as if the log is empty. - * - * If the log is shut down, the head and tail may be invalid or out of whack, so - * shortcut invalidity asserts in this case so that we don't trigger them - * falsely. - */ -int -xlog_space_left( - struct xlog *log, - atomic64_t *head) -{ - int tail_bytes; - int tail_cycle; - int head_cycle; - int head_bytes; - - xlog_crack_grant_head(head, &head_cycle, &head_bytes); - xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes); - tail_bytes = BBTOB(tail_bytes); - if (tail_cycle == head_cycle && head_bytes >= tail_bytes) - return log->l_logsize - (head_bytes - tail_bytes); - if (tail_cycle + 1 < head_cycle) - return 0; - - /* Ignore potential inconsistency when shutdown. */ - if (xlog_is_shutdown(log)) - return log->l_logsize; - - if (tail_cycle < head_cycle) { - ASSERT(tail_cycle == (head_cycle - 1)); - return tail_bytes - head_bytes; - } - - /* - * The reservation head is behind the tail. In this case we just want to - * return the size of the log as the amount of space left. - */ - xfs_alert(log->l_mp, "xlog_space_left: head behind tail"); - xfs_alert(log->l_mp, " tail_cycle = %d, tail_bytes = %d", - tail_cycle, tail_bytes); - xfs_alert(log->l_mp, " GH cycle = %d, GH bytes = %d", - head_cycle, head_bytes); - ASSERT(0); - return log->l_logsize; -} - - static void xlog_ioend_work( struct work_struct *work) @@ -1884,8 +1883,8 @@ xlog_sync( if (ticket) { ticket->t_curr_res -= roundoff; } else { - xlog_grant_add_space(log, &log->l_reserve_head.grant, roundoff); - xlog_grant_add_space(log, &log->l_write_head.grant, roundoff); + xlog_grant_add_space(log, &log->l_reserve_head, roundoff); + xlog_grant_add_space(log, &log->l_write_head, roundoff); } /* put cycle number in every block */ @@ -2805,17 +2804,15 @@ xfs_log_ticket_regrant( if (ticket->t_cnt > 0) ticket->t_cnt--; - xlog_grant_sub_space(log, &log->l_reserve_head.grant, - ticket->t_curr_res); - xlog_grant_sub_space(log, &log->l_write_head.grant, - ticket->t_curr_res); + xlog_grant_sub_space(log, &log->l_reserve_head, ticket->t_curr_res); + xlog_grant_sub_space(log, &log->l_write_head, ticket->t_curr_res); ticket->t_curr_res = ticket->t_unit_res; trace_xfs_log_ticket_regrant_sub(log, ticket); /* just return if we still have some of the pre-reserved space */ if (!ticket->t_cnt) { - xlog_grant_add_space(log, &log->l_reserve_head.grant, + xlog_grant_add_space(log, &log->l_reserve_head, ticket->t_unit_res); trace_xfs_log_ticket_regrant_exit(log, ticket); @@ -2863,8 +2860,8 @@ xfs_log_ticket_ungrant( bytes += ticket->t_unit_res*ticket->t_cnt; } - xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes); - xlog_grant_sub_space(log, &log->l_write_head.grant, bytes); + xlog_grant_sub_space(log, &log->l_reserve_head, bytes); + xlog_grant_sub_space(log, &log->l_write_head, bytes); trace_xfs_log_ticket_ungrant_exit(log, ticket); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 1390251bf670..24dc788c02d3 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -574,8 +574,6 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space) atomic64_set(head, xlog_assign_grant_head_val(cycle, space)); } -int xlog_space_left(struct xlog *log, atomic64_t *head); - /* * Committed Item List interfaces */ From patchwork Thu Sep 21 01:48:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13393597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5A08CD493B for ; Thu, 21 Sep 2023 01:48:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229702AbjIUBtC (ORCPT ); Wed, 20 Sep 2023 21:49:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbjIUBs7 (ORCPT ); Wed, 20 Sep 2023 21:48:59 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DA1DDC for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-59be8a2099bso6021017b3.0 for ; Wed, 20 Sep 2023 18:48:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695260931; x=1695865731; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=0xNJIowaHB8E1YQlcNbDnV+wI/CgkbM8FWjC+a4Q4NA=; b=kQmR9Te31gLBGUYUN9YyKwCY2q5mavKLwhq1cFcu/JSFhAuW/oPWKInQs8pSY8HpTY A7CM/VvPQKVhhmxbalPlKPDa332Txb8cm0arB9F5EGKdd3zwjNfLKryeLUoFNuO70nXz cwyaissdffqYbARE1ROQ8gkFwltpotaRYkTOeSae5QHmXClfqojcq76+9CNT3+Enfoy7 REF0YlIj8CfTW/R/xinprZ9lWyuuYgVoxy/10dIzEJDxn1+niLzdbOA4zgSRI3ZkwBDx xmQGm7zu0LasF6P29nijdSh6ceFVyma34frkbtwMPgN84NEIpu28c6VNw3icLCj9YVfr Swmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695260931; x=1695865731; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0xNJIowaHB8E1YQlcNbDnV+wI/CgkbM8FWjC+a4Q4NA=; b=lx7NNCZYrdWFhxTpO49BDkjUkClHPZe1ZqjQ3LneFeOnFRnuEFK2773eX6N/e1N06x yjoLzFKoX5BYiO4RU0qQs6VGY8Gs5lFT1yxJrRNVma853PNzMZZhOYQQAD8N15bdXk9g IFNociRgWhIFZIDoeTtQ40Ans7diaVzXf9nx8d4X647/XEVTDD5LsAwyPAwyoXNB1kBu RIYlOsCXHBxI5nJsxeoAXEs4S5IrwoMpW7ScIy+NDV99kMeqkYK3sJdw7JFpOAnCLFJh LPQfYZN9fDBEe8zitA2yqr4YKjE+O6UM2Gfh8LocZIxMbUYc7QCXOeLCi7aLr4kKSQPa GORQ== X-Gm-Message-State: AOJu0Ywl/ExldM7Mb2dbVMSLmSdJOo8XSt4C0DLKYDoyaIiJ4X4b7wed +EHXlkkUFFUYlU0nlOuk66etdPeIsn5az0mTHxs= X-Google-Smtp-Source: AGHT+IHmewAMgG8LDaNRRU+0VTecHh/NkVnRuKKBZ05jNG2MnK9h5pWMcdgAnhQrouXoOsTVsPzN/A== X-Received: by 2002:a0d:dec3:0:b0:59b:52bd:4d2a with SMTP id h186-20020a0ddec3000000b0059b52bd4d2amr3730069ywe.23.1695260931183; Wed, 20 Sep 2023 18:48:51 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id w3-20020aa78583000000b0068fc48fcaa8sm159196pfn.155.2023.09.20.18.48.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 18:48:50 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1qj8oB-003T27-20 for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1qj8oB-00000002VOX-1UGH for linux-xfs@vger.kernel.org; Thu, 21 Sep 2023 11:48:47 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 9/9] xfs: grant heads track byte counts, not LSNs Date: Thu, 21 Sep 2023 11:48:44 +1000 Message-Id: <20230921014844.582667-10-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230921014844.582667-1-david@fromorbit.com> References: <20230921014844.582667-1-david@fromorbit.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner The grant heads in the log track the space reserved in the log for running transactions. They do this by tracking how far ahead of the tail that the reservation has reached, and the units for doing this are {cycle,bytes} for the reserve head rather than {cycle,blocks} which are normal used by LSNs. This is annoyingly complex because we have to split, crack and combined these tuples for any calculation we do to determine log space and targets. This is computationally expensive as well as difficult to do atomically and locklessly, as well as limiting the size of the log to 2^32 bytes. Really, though, all the grant heads are tracking is how much space is currently available for use in the log. We can track this as a simply byte count - we just don't care what the actual physical location in the log the head and tail are at, just how much space we have remaining before the head and tail overlap. So, convert the grant heads to track the byte reservations that are active rather than the current (cycle, offset) tuples. This means an empty log has zero bytes consumed, and a full log is when the the reservations reach the size of the log minus the space consumed by the AIL. This greatly simplifies the accounting and checks for whether there is space available. We no longer need to crack or combine LSNs to determine how much space the log has left, nor do we need to look at the head or tail of the log to determine how close to full we are. There is, however, a complexity that needs to be handled. We know how much space is being tracked in the AIL now via log->l_tail_space and the log tickets track active reservations and return the unused portions to the grant heads when ungranted. Unfortunately, we don't track the used portion of the grant, so when we transfer log items from the CIL to the AIL, the space accounted to the grant heads is transferred to the log tail space. Hence when we move the AIL head forwards on item insert, we have to remove that space from the grant heads. We also remove the xlog_verify_grant_tail() debug function as it is no longer useful. The check it performs has been racy since delayed logging was introduced, but now it is clearly only detecting false positives so remove it. The result of this substantially simpler accounting algorithm is an increase in sustained transaction rate from ~1.3 million transactions/s to ~1.9 million transactions/s with no increase in CPU usage. We also remove the 32 bit space limitation on the grant heads, which will allow us to increase the journal size beyond 2GB in future. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 203 ++++++++++++--------------------------- fs/xfs/xfs_log_cil.c | 12 +++ fs/xfs/xfs_log_priv.h | 45 +++------ fs/xfs/xfs_log_recover.c | 4 - fs/xfs/xfs_sysfs.c | 17 ++-- fs/xfs/xfs_trace.h | 33 ++++--- 6 files changed, 112 insertions(+), 202 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index ed1e1f1dc8e4..09a2ae5e62f3 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -53,9 +53,6 @@ xlog_sync( struct xlog_ticket *ticket); #if defined(DEBUG) STATIC void -xlog_verify_grant_tail( - struct xlog *log); -STATIC void xlog_verify_iclog( struct xlog *log, struct xlog_in_core *iclog, @@ -65,7 +62,6 @@ xlog_verify_tail_lsn( struct xlog *log, struct xlog_in_core *iclog); #else -#define xlog_verify_grant_tail(a) #define xlog_verify_iclog(a,b,c) #define xlog_verify_tail_lsn(a,b) #endif @@ -133,30 +129,13 @@ xlog_prepare_iovec( return buf; } -static void +void xlog_grant_sub_space( struct xlog *log, struct xlog_grant_head *head, int bytes) { - int64_t head_val = atomic64_read(&head->grant); - int64_t new, old; - - do { - int cycle, space; - - xlog_crack_grant_head_val(head_val, &cycle, &space); - - space -= bytes; - if (space < 0) { - space += log->l_logsize; - cycle--; - } - - old = head_val; - new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(&head->grant, old, new); - } while (head_val != old); + atomic64_sub(bytes, &head->grant); } static void @@ -165,93 +144,39 @@ xlog_grant_add_space( struct xlog_grant_head *head, int bytes) { - int64_t head_val = atomic64_read(&head->grant); - int64_t new, old; - - do { - int tmp; - int cycle, space; - - xlog_crack_grant_head_val(head_val, &cycle, &space); - - tmp = log->l_logsize - space; - if (tmp > bytes) - space += bytes; - else { - space = bytes - tmp; - cycle++; - } - - old = head_val; - new = xlog_assign_grant_head_val(cycle, space); - head_val = atomic64_cmpxchg(&head->grant, old, new); - } while (head_val != old); + atomic64_add(bytes, &head->grant); } -STATIC void +static void xlog_grant_head_init( struct xlog_grant_head *head) { - xlog_assign_grant_head(&head->grant, 1, 0); + atomic64_set(&head->grant, 0); INIT_LIST_HEAD(&head->waiters); spin_lock_init(&head->lock); } /* - * Return the space in the log between the tail and the head. The head - * is passed in the cycle/bytes formal parms. In the special case where - * the reserve head has wrapped passed the tail, this calculation is no - * longer valid. In this case, just return 0 which means there is no space - * in the log. This works for all places where this function is called - * with the reserve head. Of course, if the write head were to ever - * wrap the tail, we should blow up. Rather than catch this case here, - * we depend on other ASSERTions in other parts of the code. XXXmiken - * - * If reservation head is behind the tail, we have a problem. Warn about it, - * but then treat it as if the log is empty. - * - * If the log is shut down, the head and tail may be invalid or out of whack, so - * shortcut invalidity asserts in this case so that we don't trigger them - * falsely. + * Return the space in the log between the tail and the head. In the case where + * we have overrun available reservation space, return 0. The memory barrier + * pairs with the smp_wmb() in xlog_cil_ail_insert() to ensure that grant head + * vs tail space updates are seen in the correct order and hence avoid + * transients as space is transferred from the grant heads to the AIL on commit + * completion. */ -static int +static uint64_t xlog_grant_space_left( struct xlog *log, struct xlog_grant_head *head) { - int tail_bytes; - int tail_cycle; - int head_cycle; - int head_bytes; - - xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes); - xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes); - tail_bytes = BBTOB(tail_bytes); - if (tail_cycle == head_cycle && head_bytes >= tail_bytes) - return log->l_logsize - (head_bytes - tail_bytes); - if (tail_cycle + 1 < head_cycle) - return 0; - - /* Ignore potential inconsistency when shutdown. */ - if (xlog_is_shutdown(log)) - return log->l_logsize; - - if (tail_cycle < head_cycle) { - ASSERT(tail_cycle == (head_cycle - 1)); - return tail_bytes - head_bytes; - } - - /* - * The reservation head is behind the tail. In this case we just want to - * return the size of the log as the amount of space left. - */ - xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail"); - xfs_alert(log->l_mp, " tail_cycle = %d, tail_bytes = %d", - tail_cycle, tail_bytes); - xfs_alert(log->l_mp, " GH cycle = %d, GH bytes = %d", - head_cycle, head_bytes); - ASSERT(0); - return log->l_logsize; + int64_t free_bytes; + + smp_rmb(); // paired with smp_wmb in xlog_cil_ail_insert() + free_bytes = log->l_logsize - READ_ONCE(log->l_tail_space) - + atomic64_read(&head->grant); + if (free_bytes > 0) + return free_bytes; + return 0; } STATIC void @@ -455,7 +380,6 @@ xfs_log_regrant( xlog_grant_add_space(log, &log->l_write_head, need_bytes); trace_xfs_log_regrant_exit(log, tic); - xlog_verify_grant_tail(log); return 0; out_error: @@ -507,7 +431,6 @@ xfs_log_reserve( xlog_grant_add_space(log, &log->l_reserve_head, need_bytes); xlog_grant_add_space(log, &log->l_write_head, need_bytes); trace_xfs_log_reserve_exit(log, tic); - xlog_verify_grant_tail(log); return 0; out_error: @@ -3333,42 +3256,27 @@ xlog_ticket_alloc( } #if defined(DEBUG) -/* - * Check to make sure the grant write head didn't just over lap the tail. If - * the cycles are the same, we can't be overlapping. Otherwise, make sure that - * the cycles differ by exactly one and check the byte count. - * - * This check is run unlocked, so can give false positives. Rather than assert - * on failures, use a warn-once flag and a panic tag to allow the admin to - * determine if they want to panic the machine when such an error occurs. For - * debug kernels this will have the same effect as using an assert but, unlinke - * an assert, it can be turned off at runtime. - */ -STATIC void -xlog_verify_grant_tail( - struct xlog *log) +static void +xlog_verify_dump_tail( + struct xlog *log, + struct xlog_in_core *iclog) { - int tail_cycle, tail_blocks; - int cycle, space; - - xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &space); - xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_blocks); - if (tail_cycle != cycle) { - if (cycle - 1 != tail_cycle && - !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) { - xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES, - "%s: cycle - 1 != tail_cycle", __func__); - } - - if (space > BBTOB(tail_blocks) && - !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) { - xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES, - "%s: space > BBTOB(tail_blocks)", __func__); - } - } + xfs_alert(log->l_mp, +"ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x", + iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1, + atomic64_read(&log->l_tail_lsn), + log->l_ailp->ail_head_lsn, + log->l_curr_cycle, log->l_curr_block, + log->l_prev_cycle, log->l_prev_block); + xfs_alert(log->l_mp, +"write grant 0x%llx, reserve grant 0x%llx, tail_space 0x%llx, size 0x%x, iclog flags 0x%x", + atomic64_read(&log->l_write_head.grant), + atomic64_read(&log->l_reserve_head.grant), + log->l_tail_space, log->l_logsize, + iclog ? iclog->ic_flags : -1); } -/* check if it will fit */ +/* Check if the new iclog will fit in the log. */ STATIC void xlog_verify_tail_lsn( struct xlog *log, @@ -3377,21 +3285,34 @@ xlog_verify_tail_lsn( xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn); int blocks; - if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) { - blocks = - log->l_logBBsize - (log->l_prev_block - BLOCK_LSN(tail_lsn)); - if (blocks < BTOBB(iclog->ic_offset)+BTOBB(log->l_iclog_hsize)) - xfs_emerg(log->l_mp, "%s: ran out of log space", __func__); - } else { - ASSERT(CYCLE_LSN(tail_lsn)+1 == log->l_prev_cycle); + if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) { + blocks = log->l_logBBsize - + (log->l_prev_block - BLOCK_LSN(tail_lsn)); + if (blocks < BTOBB(iclog->ic_offset) + + BTOBB(log->l_iclog_hsize)) { + xfs_emerg(log->l_mp, + "%s: ran out of log space", __func__); + xlog_verify_dump_tail(log, iclog); + } + return; + } - if (BLOCK_LSN(tail_lsn) == log->l_prev_block) + if (CYCLE_LSN(tail_lsn) + 1 != log->l_prev_cycle) { + xfs_emerg(log->l_mp, "%s: head has wrapped tail.", __func__); + xlog_verify_dump_tail(log, iclog); + return; + } + if (BLOCK_LSN(tail_lsn) == log->l_prev_block) { xfs_emerg(log->l_mp, "%s: tail wrapped", __func__); + xlog_verify_dump_tail(log, iclog); + return; + } blocks = BLOCK_LSN(tail_lsn) - log->l_prev_block; - if (blocks < BTOBB(iclog->ic_offset) + 1) - xfs_emerg(log->l_mp, "%s: ran out of log space", __func__); - } + if (blocks < BTOBB(iclog->ic_offset) + 1) { + xfs_emerg(log->l_mp, "%s: ran out of iclog space", __func__); + xlog_verify_dump_tail(log, iclog); + } } /* diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 6c05097ebfdf..3aec5589d717 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -765,6 +765,7 @@ xlog_cil_ail_insert( struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; struct xfs_log_vec *lv; struct xfs_ail_cursor cur; + xfs_lsn_t old_head; int i = 0; /* @@ -781,10 +782,21 @@ xlog_cil_ail_insert( aborted); spin_lock(&ailp->ail_lock); xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn); + old_head = ailp->ail_head_lsn; ailp->ail_head_lsn = ctx->commit_lsn; /* xfs_ail_update_finish() drops the ail_lock */ xfs_ail_update_finish(ailp, NULLCOMMITLSN); + /* + * We move the AIL head forwards to account for the space used in the + * log before we remove that space from the grant heads. This prevents a + * transient condition where reservation space appears to become + * available on return, only for it to disappear again immediately as + * the AIL head update accounts in the log tail space. + */ + smp_wmb(); // paired with smp_rmb in xlog_grant_space_left + xlog_grant_return_space(ailp->ail_log, old_head, ailp->ail_head_lsn); + /* unpin all the log items */ list_for_each_entry(lv, &ctx->lv_chain, lv_list) { struct xfs_log_item *lip = lv->lv_item; diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 24dc788c02d3..9e276514cfb5 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -544,36 +544,6 @@ xlog_assign_atomic_lsn(atomic64_t *lsn, uint cycle, uint block) atomic64_set(lsn, xlog_assign_lsn(cycle, block)); } -/* - * When we crack the grant head, we sample it first so that the value will not - * change while we are cracking it into the component values. This means we - * will always get consistent component values to work from. - */ -static inline void -xlog_crack_grant_head_val(int64_t val, int *cycle, int *space) -{ - *cycle = val >> 32; - *space = val & 0xffffffff; -} - -static inline void -xlog_crack_grant_head(atomic64_t *head, int *cycle, int *space) -{ - xlog_crack_grant_head_val(atomic64_read(head), cycle, space); -} - -static inline int64_t -xlog_assign_grant_head_val(int cycle, int space) -{ - return ((int64_t)cycle << 32) | space; -} - -static inline void -xlog_assign_grant_head(atomic64_t *head, int cycle, int space) -{ - atomic64_set(head, xlog_assign_grant_head_val(cycle, space)); -} - /* * Committed Item List interfaces */ @@ -639,6 +609,21 @@ xlog_lsn_sub( return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block); } +void xlog_grant_sub_space(struct xlog *log, struct xlog_grant_head *head, + int bytes); + +static inline void +xlog_grant_return_space( + struct xlog *log, + xfs_lsn_t old_head, + xfs_lsn_t new_head) +{ + int64_t diff = xlog_lsn_sub(log, new_head, old_head); + + xlog_grant_sub_space(log, &log->l_reserve_head, diff); + xlog_grant_sub_space(log, &log->l_write_head, diff); +} + /* * The LSN is valid so long as it is behind the current LSN. If it isn't, this * means that the next log record that includes this metadata could have a diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 8d963c3db4f3..69466ad30cfa 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1213,10 +1213,6 @@ xlog_set_state( log->l_curr_cycle++; atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn)); log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn); - xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle, - BBTOB(log->l_curr_block)); - xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle, - BBTOB(log->l_curr_block)); } /* diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c index a3c6b1548723..741704fea2b0 100644 --- a/fs/xfs/xfs_sysfs.c +++ b/fs/xfs/xfs_sysfs.c @@ -376,14 +376,11 @@ STATIC ssize_t reserve_grant_head_show( struct kobject *kobject, char *buf) - { - int cycle; - int bytes; - struct xlog *log = to_xlog(kobject); + struct xlog *log = to_xlog(kobject); + uint64_t bytes = atomic64_read(&log->l_reserve_head.grant); - xlog_crack_grant_head(&log->l_reserve_head.grant, &cycle, &bytes); - return sysfs_emit(buf, "%d:%d\n", cycle, bytes); + return sysfs_emit(buf, "%lld\n", bytes); } XFS_SYSFS_ATTR_RO(reserve_grant_head); @@ -392,12 +389,10 @@ write_grant_head_show( struct kobject *kobject, char *buf) { - int cycle; - int bytes; - struct xlog *log = to_xlog(kobject); + struct xlog *log = to_xlog(kobject); + uint64_t bytes = atomic64_read(&log->l_write_head.grant); - xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &bytes); - return sysfs_emit(buf, "%d:%d\n", cycle, bytes); + return sysfs_emit(buf, "%lld\n", bytes); } XFS_SYSFS_ATTR_RO(write_grant_head); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 784c6d63daa9..fb12d638b6dc 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1211,6 +1211,7 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, TP_ARGS(log, tic), TP_STRUCT__entry( __field(dev_t, dev) + __field(unsigned long, tic) __field(char, ocnt) __field(char, cnt) __field(int, curr_res) @@ -1218,16 +1219,16 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, __field(unsigned int, flags) __field(int, reserveq) __field(int, writeq) - __field(int, grant_reserve_cycle) - __field(int, grant_reserve_bytes) - __field(int, grant_write_cycle) - __field(int, grant_write_bytes) + __field(uint64_t, grant_reserve_bytes) + __field(uint64_t, grant_write_bytes) + __field(uint64_t, tail_space) __field(int, curr_cycle) __field(int, curr_block) __field(xfs_lsn_t, tail_lsn) ), TP_fast_assign( __entry->dev = log->l_mp->m_super->s_dev; + __entry->tic = (unsigned long)tic; __entry->ocnt = tic->t_ocnt; __entry->cnt = tic->t_cnt; __entry->curr_res = tic->t_curr_res; @@ -1235,23 +1236,23 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, __entry->flags = tic->t_flags; __entry->reserveq = list_empty(&log->l_reserve_head.waiters); __entry->writeq = list_empty(&log->l_write_head.waiters); - xlog_crack_grant_head(&log->l_reserve_head.grant, - &__entry->grant_reserve_cycle, - &__entry->grant_reserve_bytes); - xlog_crack_grant_head(&log->l_write_head.grant, - &__entry->grant_write_cycle, - &__entry->grant_write_bytes); + __entry->tail_space = READ_ONCE(log->l_tail_space); + __entry->grant_reserve_bytes = __entry->tail_space + + atomic64_read(&log->l_reserve_head.grant); + __entry->grant_write_bytes = __entry->tail_space + + atomic64_read(&log->l_write_head.grant); __entry->curr_cycle = log->l_curr_cycle; __entry->curr_block = log->l_curr_block; __entry->tail_lsn = atomic64_read(&log->l_tail_lsn); ), - TP_printk("dev %d:%d t_ocnt %u t_cnt %u t_curr_res %u " + TP_printk("dev %d:%d tic 0x%lx t_ocnt %u t_cnt %u t_curr_res %u " "t_unit_res %u t_flags %s reserveq %s " - "writeq %s grant_reserve_cycle %d " - "grant_reserve_bytes %d grant_write_cycle %d " - "grant_write_bytes %d curr_cycle %d curr_block %d " + "writeq %s " + "tail space %llu grant_reserve_bytes %llu " + "grant_write_bytes %llu curr_cycle %d curr_block %d " "tail_cycle %d tail_block %d", MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->tic, __entry->ocnt, __entry->cnt, __entry->curr_res, @@ -1259,9 +1260,8 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class, __print_flags(__entry->flags, "|", XLOG_TIC_FLAGS), __entry->reserveq ? "empty" : "active", __entry->writeq ? "empty" : "active", - __entry->grant_reserve_cycle, + __entry->tail_space, __entry->grant_reserve_bytes, - __entry->grant_write_cycle, __entry->grant_write_bytes, __entry->curr_cycle, __entry->curr_block, @@ -1289,6 +1289,7 @@ DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant); DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_sub); DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_exit); DEFINE_LOGGRANT_EVENT(xfs_log_cil_wait); +DEFINE_LOGGRANT_EVENT(xfs_log_cil_return); DECLARE_EVENT_CLASS(xfs_log_item_class, TP_PROTO(struct xfs_log_item *lip),