From patchwork Wed Jul 1 16:51:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636907 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 54C0B14E3 for ; Wed, 1 Jul 2020 16:51:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3CD852078D for ; Wed, 1 Jul 2020 16:51:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HD5zBQeR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732525AbgGAQvZ (ORCPT ); Wed, 1 Jul 2020 12:51:25 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:34213 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732478AbgGAQvV (ORCPT ); Wed, 1 Jul 2020 12:51:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622280; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6nNKVfkrxEo76ohioy+lNdoFYj8qB6zn7Ff6r3Fwc4A=; b=HD5zBQeR5iofLNU145JkV6eGLtJA+i1Rltk5GtFXMuaqpCB5zenTdBycTCLJWaH3xBtRg6 hd9/DV97Htxzfw2SwuvIODoI1mX7O97iQ2X2co79OfS9rdsydfSLTpCNBpEYNv7TUMGGIF YOQzam87A4HzL6YdAhSRwdpmeDB1KnQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-211-uEA4CsYPPduqHNC9T-851w-1; Wed, 01 Jul 2020 12:51:18 -0400 X-MC-Unique: uEA4CsYPPduqHNC9T-851w-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7A6C6804002 for ; Wed, 1 Jul 2020 16:51:17 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id 36FB95C3FD for ; Wed, 1 Jul 2020 16:51:17 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 01/10] xfs: automatic relogging item management Date: Wed, 1 Jul 2020 12:51:07 -0400 Message-Id: <20200701165116.47344-2-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Add a log item flag to track relog state and a couple helpers to set and clear the flag. The flag will be set on any log item that is to be automatically relogged by log tail pressure. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/xfs_trace.h | 2 ++ fs/xfs/xfs_trans.c | 20 ++++++++++++++++++++ fs/xfs/xfs_trans.h | 4 +++- fs/xfs/xfs_trans_priv.h | 2 ++ 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 460136628a79..f6fd598c3912 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1068,6 +1068,8 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push); DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned); DEFINE_LOG_ITEM_EVENT(xfs_ail_locked); DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing); +DEFINE_LOG_ITEM_EVENT(xfs_relog_item); +DEFINE_LOG_ITEM_EVENT(xfs_relog_item_cancel); DECLARE_EVENT_CLASS(xfs_ail_class, TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn), diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 3c94e5ff4316..5190b792cc68 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -651,6 +651,26 @@ xfs_trans_del_item( list_del_init(&lip->li_trans); } +void +xfs_trans_relog_item( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + if (test_and_set_bit(XFS_LI_RELOG, &lip->li_flags)) + return; + trace_xfs_relog_item(lip); +} + +void +xfs_trans_relog_item_cancel( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + if (!test_and_clear_bit(XFS_LI_RELOG, &lip->li_flags)) + return; + trace_xfs_relog_item_cancel(lip); +} + /* Detach and unlock all of the items in a transaction */ static void xfs_trans_free_items( diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 8308bf6d7e40..6349e78af002 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -60,13 +60,15 @@ struct xfs_log_item { #define XFS_LI_FAILED 2 #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ #define XFS_LI_RECOVERED 4 /* log intent item has been recovered */ +#define XFS_LI_RELOG 5 /* automatically relog item */ #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ { (1 << XFS_LI_ABORTED), "ABORTED" }, \ { (1 << XFS_LI_FAILED), "FAILED" }, \ { (1 << XFS_LI_DIRTY), "DIRTY" }, \ - { (1 << XFS_LI_RECOVERED), "RECOVERED" } + { (1 << XFS_LI_RECOVERED), "RECOVERED" }, \ + { (1 << XFS_LI_RELOG), "RELOG" } struct xfs_item_ops { unsigned flags; diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 3004aeac9110..64965a861346 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -16,6 +16,8 @@ struct xfs_log_vec; void xfs_trans_init(struct xfs_mount *); void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_del_item(struct xfs_log_item *); +void xfs_trans_relog_item(struct xfs_trans *, struct xfs_log_item *); +void xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp); void xfs_trans_committed_bulk(struct xfs_ail *ailp, struct xfs_log_vec *lv, From patchwork Wed Jul 1 16:51:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636905 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 211EC60D for ; Wed, 1 Jul 2020 16:51:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02BE220748 for ; Wed, 1 Jul 2020 16:51:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FAvFrGy+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732556AbgGAQvZ (ORCPT ); Wed, 1 Jul 2020 12:51:25 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:48120 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732525AbgGAQvX (ORCPT ); Wed, 1 Jul 2020 12:51:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+/l9AWpZXFFOxsrk3+0TznxZQjWL+FNAtx/9p0cKrjs=; b=FAvFrGy+8Wzzje8qY5w+75Tlr0iDKYY31zsypGHLyukY1BC3YvRtWxMxjeQazl0b86J9Sc StXee2lT9Qbs5sTEmdI7ytCD4/thQSFdjdRsmRXcbOWNeRrEYrxcc701ExQp6GS2B0soYg suhg9euS4xrsUDzm8K2hYcRgb71g5s0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-360-NzWnCjtyNj-CVQI4-gLsoQ-1; Wed, 01 Jul 2020 12:51:19 -0400 X-MC-Unique: NzWnCjtyNj-CVQI4-gLsoQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E838C18FF660 for ; Wed, 1 Jul 2020 16:51:17 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id A0A995C6C0 for ; Wed, 1 Jul 2020 16:51:17 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 02/10] xfs: create helper for ticket-less log res ungrant Date: Wed, 1 Jul 2020 12:51:08 -0400 Message-Id: <20200701165116.47344-3-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Log reservation is currently acquired and released via log tickets. The relog mechanism introduces behavior where relog reservation is transferred between transaction log tickets and an external pool of relog reservation for active relog items. Certain contexts will be able to release outstanding relog reservation without the need for a log ticket. Factor out a helper to allow byte granularity log reservation ungrant. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/xfs_log.c | 20 ++++++++++++++++---- fs/xfs/xfs_log.h | 1 + 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 00fda2e8e738..d6b63490a78b 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2980,6 +2980,21 @@ xfs_log_ticket_regrant( xfs_log_ticket_put(ticket); } +/* + * Restore log reservation directly to the grant heads. + */ +void +xfs_log_ungrant_bytes( + struct xfs_mount *mp, + int bytes) +{ + struct xlog *log = mp->m_log; + + xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes); + xlog_grant_sub_space(log, &log->l_write_head.grant, bytes); + xfs_log_space_wake(mp); +} + /* * Give back the space left from a reservation. * @@ -3018,12 +3033,9 @@ xfs_log_ticket_ungrant( bytes += ticket->t_unit_res*ticket->t_cnt; } - xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes); - xlog_grant_sub_space(log, &log->l_write_head.grant, bytes); - + xfs_log_ungrant_bytes(log->l_mp, bytes); trace_xfs_log_ticket_ungrant_exit(log, ticket); - xfs_log_space_wake(log->l_mp); xfs_log_ticket_put(ticket); } diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 1412d6993f1e..6d2f30f42245 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -125,6 +125,7 @@ int xfs_log_reserve(struct xfs_mount *mp, uint8_t clientid, bool permanent); int xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic); +void xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes); void xfs_log_unmount(struct xfs_mount *mp); int xfs_log_force_umount(struct xfs_mount *mp, int logerror); From patchwork Wed Jul 1 16:51:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636913 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C0A8174A for ; Wed, 1 Jul 2020 16:51:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3FE0D2078A for ; Wed, 1 Jul 2020 16:51:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fw8A4Vi4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732546AbgGAQv0 (ORCPT ); Wed, 1 Jul 2020 12:51:26 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:55436 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732344AbgGAQvX (ORCPT ); Wed, 1 Jul 2020 12:51:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622280; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4zLhdIfCF0DCeC8l9UeahLQLLrt9wVuBdkKiIpjMeqI=; b=fw8A4Vi45Djox+u/2CITFR80+wreJgxi7ZJOq0Dx8v7Aot04UB4zPPCwwXyU00XnR4LpoH O7L5rnau7WAT01y90x9BhIw/7EKUwCrjDViX2xC1IpcYd4tT3OwX8V06J0NBhLSZUnNBbA /NgzZz1O793plF7B+QcHy+fx4VIHzmI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-111-uIZ9IqxTNJK9o33rCN4zUg-1; Wed, 01 Jul 2020 12:51:19 -0400 X-MC-Unique: uIZ9IqxTNJK9o33rCN4zUg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6067AEC1A0 for ; Wed, 1 Jul 2020 16:51:18 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1C2E05C3FD for ; Wed, 1 Jul 2020 16:51:18 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions Date: Wed, 1 Jul 2020 12:51:09 -0400 Message-Id: <20200701165116.47344-4-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Every transaction reservation includes runtime overhead on top of the reservation calculated in the struct xfs_trans_res. This overhead is required for things like the CIL context ticket, log headers, etc., that are stolen from individual transactions. Since reservation for the relog transaction is entirely contributed by regular transactions, this runtime reservation overhead must be contributed as well. This means that a transaction that relogs one or more items must include overhead for the current transaction as well as for the relog transaction. Define a new transaction flag to indicate that a transaction is relog enabled. Plumb this state down to the log ticket allocation and use it to bump the worst case overhead included in the transaction. The overhead will eventually be transferred to the relog system as needed for individual log items. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/libxfs/xfs_shared.h | 1 + fs/xfs/xfs_log.c | 12 +++++++++--- fs/xfs/xfs_log.h | 3 ++- fs/xfs/xfs_log_cil.c | 2 +- fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_trans.c | 3 ++- 6 files changed, 16 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index c45acbd3add9..1ede1e720a5c 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -65,6 +65,7 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_TRANS_DQ_DIRTY 0x10 /* at least one dquot in trx dirty */ #define XFS_TRANS_RESERVE 0x20 /* OK to use reserved data blocks */ #define XFS_TRANS_NO_WRITECOUNT 0x40 /* do not elevate SB writecount */ +#define XFS_TRANS_RELOG 0x80 /* requires extra relog overhead */ /* * LOWMODE is used by the allocator to activate the lowspace algorithm - when * free space is running low the extent allocator may choose to allocate an diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index d6b63490a78b..b55abde6c142 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -418,7 +418,8 @@ xfs_log_reserve( int cnt, struct xlog_ticket **ticp, uint8_t client, - bool permanent) + bool permanent, + bool relog) { struct xlog *log = mp->m_log; struct xlog_ticket *tic; @@ -433,7 +434,8 @@ xfs_log_reserve( XFS_STATS_INC(mp, xs_try_logspace); ASSERT(*ticp == NULL); - tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, 0); + tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, relog, + 0); *ticp = tic; xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt @@ -831,7 +833,7 @@ xlog_unmount_write( uint flags = XLOG_UNMOUNT_TRANS; int error; - error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0); + error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, false, false); if (error) goto out_err; @@ -3421,6 +3423,7 @@ xlog_ticket_alloc( int cnt, char client, bool permanent, + bool relog, xfs_km_flags_t alloc_flags) { struct xlog_ticket *tic; @@ -3431,6 +3434,9 @@ xlog_ticket_alloc( return NULL; unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes); + /* double the overhead for the relog transaction */ + if (relog) + unit_res += (unit_res - unit_bytes); atomic_set(&tic->t_ref, 1); tic->t_task = current; diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 6d2f30f42245..f1089a4b299c 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -123,7 +123,8 @@ int xfs_log_reserve(struct xfs_mount *mp, int count, struct xlog_ticket **ticket, uint8_t clientid, - bool permanent); + bool permanent, + bool relog); int xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic); void xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes); void xfs_log_unmount(struct xfs_mount *mp); diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 9ed90368ab31..dfa25370f8af 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -37,7 +37,7 @@ xlog_cil_ticket_alloc( { struct xlog_ticket *tic; - tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, 0, + tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, false, false, KM_NOFS); /* diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 75a62870b63a..bcc3d7a9c2c9 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -465,6 +465,7 @@ xlog_ticket_alloc( int count, char client, bool permanent, + bool relog, xfs_km_flags_t alloc_flags); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 5190b792cc68..cfa9915523e1 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -174,6 +174,7 @@ xfs_trans_reserve( */ if (resp->tr_logres > 0) { bool permanent = false; + bool relog = (tp->t_flags & XFS_TRANS_RELOG); ASSERT(tp->t_log_res == 0 || tp->t_log_res == resp->tr_logres); @@ -196,7 +197,7 @@ xfs_trans_reserve( resp->tr_logres, resp->tr_logcount, &tp->t_ticket, XFS_TRANSACTION, - permanent); + permanent, relog); } if (error) From patchwork Wed Jul 1 16:51:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636921 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C35114E3 for ; Wed, 1 Jul 2020 16:51:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7E79420748 for ; Wed, 1 Jul 2020 16:51:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Delhqv14" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732561AbgGAQv1 (ORCPT ); Wed, 1 Jul 2020 12:51:27 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:49275 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732560AbgGAQvZ (ORCPT ); Wed, 1 Jul 2020 12:51:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HdL0BFoz9++pln5KOWvrdUCVFVSfI7MbxyzXTOdrUBs=; b=Delhqv14L8lQbTY4/OKV//PXdSapvfHEfb7YnSdHUMirRpwqFL6VT5I9tH6nANMpJhFqMm kIEuHtOCDcN4qf7G982+Fe5NFk+mvbbQdzeePScMZ0AX5ukRdOdacDwFlfIqkBmck9RI99 CFzSjPvV4cCch8YLi+SYXOUD7o1rQyI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-256-v3bFQFxoOPeDMDjZXFj75A-1; Wed, 01 Jul 2020 12:51:19 -0400 X-MC-Unique: v3bFQFxoOPeDMDjZXFj75A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CC117804003 for ; Wed, 1 Jul 2020 16:51:18 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id 886255C3FD for ; Wed, 1 Jul 2020 16:51:18 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 04/10] xfs: relog log reservation stealing and accounting Date: Wed, 1 Jul 2020 12:51:10 -0400 Message-Id: <20200701165116.47344-5-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The transaction that eventually commits relog enabled log items requires log reservation like any other transaction. It is not safe to acquire reservation on-demand because relogged items aren't processed until they are likely at the tail of the log and require movement in order to free up space in the log. As such, a relog transaction that blocks on log reservation is a likely deadlock vector. To address this problem, implement a model where relog reservation is contributed by the transaction that enables relogging on a particular item. Update the relog helper to transfer reservation from the transaction to the relog pool. The relog pool holds outstanding reservation such that it can be used to commit the item in an otherwise empty transaction. The upcoming relog mechanism is responsible to replenish the relog reservation as items are relogged. When relog is cancelled on a log item, transfer the outstanding relog reservation to the current transaction (if provided) for eventual release or otherwise release it directly to the grant heads. Note that this approach has several caveats: - Log reservation calculations for transactions that relog items must be increased accordingly. - The currently per-transaction overhead reservation (i.e. for things like the CIL ticket) must be included for each reloggable item because said items can be relogged in arbitrary combinations. - Relog reservation must be based on the worst case requirement for a log item. This is not a concern for fixed size log items, such as most intents. Items with more granular logging capability, such as buffers, can have additional ranges dirtied after relogging has been enabled and the relog subsystem must have enough reservation to accommodate. Signed-off-by: Brian Foster --- fs/xfs/xfs_log.c | 3 +++ fs/xfs/xfs_trans.c | 20 ++++++++++++++++++++ fs/xfs/xfs_trans.h | 31 +++++++++++++++++++++++++++++++ fs/xfs/xfs_trans_ail.c | 2 ++ fs/xfs/xfs_trans_priv.h | 14 ++++++++++++++ 5 files changed, 70 insertions(+) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b55abde6c142..940e5bb9786c 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -983,6 +983,9 @@ xfs_log_item_init( item->li_type = type; item->li_ops = ops; item->li_lv = NULL; +#ifdef DEBUG + atomic64_set(&item->li_relog_res, 0); +#endif INIT_LIST_HEAD(&item->li_ail); INIT_LIST_HEAD(&item->li_cil); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index cfa9915523e1..ba2540d8a6c9 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -20,6 +20,7 @@ #include "xfs_trace.h" #include "xfs_error.h" #include "xfs_defer.h" +#include "xfs_log_priv.h" kmem_zone_t *xfs_trans_zone; @@ -657,9 +658,19 @@ xfs_trans_relog_item( struct xfs_trans *tp, struct xfs_log_item *lip) { + int nbytes; + + ASSERT(tp->t_flags & XFS_TRANS_RELOG); + if (test_and_set_bit(XFS_LI_RELOG, &lip->li_flags)) return; trace_xfs_relog_item(lip); + + nbytes = xfs_relog_calc_res(lip); + + tp->t_ticket->t_curr_res -= nbytes; + xfs_relog_res_account(lip, nbytes); + tp->t_flags |= XFS_TRANS_DIRTY; } void @@ -667,9 +678,18 @@ xfs_trans_relog_item_cancel( struct xfs_trans *tp, struct xfs_log_item *lip) { + int res; + if (!test_and_clear_bit(XFS_LI_RELOG, &lip->li_flags)) return; trace_xfs_relog_item_cancel(lip); + + res = xfs_relog_calc_res(lip); + if (tp) + tp->t_ticket->t_curr_res += res; + else + xfs_log_ungrant_bytes(lip->li_mountp, res); + xfs_relog_res_account(lip, -res); } /* Detach and unlock all of the items in a transaction */ diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 6349e78af002..70373e2b8f6d 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -48,6 +48,9 @@ struct xfs_log_item { struct xfs_log_vec *li_lv; /* active log vector */ struct xfs_log_vec *li_lv_shadow; /* standby vector */ xfs_lsn_t li_seq; /* CIL commit seq */ +#ifdef DEBUG + atomic64_t li_relog_res; /* automatic relog log res */ +#endif }; /* @@ -216,6 +219,34 @@ xfs_trans_read_buf( flags, bpp, ops); } +/* + * Calculate the log reservation required to enable relogging of a log item. + */ +static inline int +xfs_relog_calc_res( + struct xfs_log_item *lip) +{ + int niovecs = 0; + int nbytes = 0; + + /* + * The reservation consumed by a transaction at commit time consists of + * the total size of the formatted log vectors of the items dirtied by + * the transaction, an op header for each iovec in the log vectors, the + * unit reservation of the CIL context ticket, and extra iclog and op + * headers if the CIL context spans multiple iclogs (i.e. split + * reservation). The CIL ticket and split reservation are included by + * xfs_log_calc_unit_res(). + */ + lip->li_ops->iop_size(lip, &niovecs, &nbytes); + ASSERT(niovecs == 1); + + nbytes += niovecs * sizeof(xlog_op_header_t); + nbytes = xfs_log_calc_unit_res(lip->li_mountp, nbytes); + + return nbytes; +} + struct xfs_buf *xfs_trans_getsb(xfs_trans_t *, struct xfs_mount *); void xfs_trans_brelse(xfs_trans_t *, struct xfs_buf *); diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index ac5019361a13..5c862821171f 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -894,6 +894,7 @@ xfs_trans_ail_init( spin_lock_init(&ailp->ail_lock); INIT_LIST_HEAD(&ailp->ail_buf_list); init_waitqueue_head(&ailp->ail_empty); + atomic64_set(&ailp->ail_relog_res, 0); ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s", ailp->ail_mount->m_super->s_id); @@ -914,6 +915,7 @@ xfs_trans_ail_destroy( { struct xfs_ail *ailp = mp->m_ail; + ASSERT(atomic64_read(&ailp->ail_relog_res) == 0); kthread_stop(ailp->ail_task); kmem_free(ailp); } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 64965a861346..d923e79676af 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -63,6 +63,7 @@ struct xfs_ail { int ail_log_flush; struct list_head ail_buf_list; wait_queue_head_t ail_empty; + atomic64_t ail_relog_res; }; /* @@ -169,4 +170,17 @@ xfs_set_li_failed( } } +static inline int64_t +xfs_relog_res_account( + struct xfs_log_item *lip, + int64_t bytes) +{ +#ifdef DEBUG + int64_t res; + + res = atomic64_add_return(bytes, &lip->li_relog_res); + ASSERT(res == bytes || (bytes < 0 && res == 0)); +#endif + return atomic64_add_return(bytes, &lip->li_ailp->ail_relog_res); +} #endif /* __XFS_TRANS_PRIV_H__ */ From patchwork Wed Jul 1 16:51:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636909 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4BF260D for ; Wed, 1 Jul 2020 16:51:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B05D420771 for ; Wed, 1 Jul 2020 16:51:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FuN5+J7W" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732478AbgGAQv0 (ORCPT ); Wed, 1 Jul 2020 12:51:26 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:54557 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732542AbgGAQvY (ORCPT ); Wed, 1 Jul 2020 12:51:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PcmA/8D2st/ikVsKrV9dg7pJ0kk9nKtzrNmzJdHufOg=; b=FuN5+J7WezLcyr2JrC0AZ6zbjtVmgbBegcmZVX2A+QB11wzwxPxOBHnOTOu+BYWV2cDKdo /frUmVgbCAGS9M3nnDxrBdekWrXI0BSbDUx+7kg5ce9knuh5T9nMX5BgZWC6GgpiP5GNmB dLlbMLXmIbZiGWjBoraqwM1bdVGTUKo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-156-Tb4zE18tP26oGYf_ysn3IQ-1; Wed, 01 Jul 2020 12:51:20 -0400 X-MC-Unique: Tb4zE18tP26oGYf_ysn3IQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 425CEEC1A1 for ; Wed, 1 Jul 2020 16:51:19 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id F35DB5C3FD for ; Wed, 1 Jul 2020 16:51:18 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 05/10] xfs: automatic log item relog mechanism Date: Wed, 1 Jul 2020 12:51:11 -0400 Message-Id: <20200701165116.47344-6-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Now that relog reservation is available and relog state tracking is in place, all that remains to automatically relog items is the relog mechanism itself. An item with relogging enabled is basically pinned from writeback until relog is disabled. Instead of being written back, the item must instead be periodically committed in a new transaction to move it forward in the physical log. The purpose of moving the item is to avoid long term tail pinning and thus avoid log deadlocks for long running operations. The ideal time to relog an item is in response to tail pushing pressure. This accommodates the current workload at any given time as opposed to a fixed time interval or log reservation heuristic, which risks performance regression. This is essentially the same heuristic that drives metadata writeback. XFS already implements various log tail pushing heuristics that attempt to keep the log progressing on an active fileystem under various workloads. The act of relogging an item simply requires to add it to a transaction and commit. This pushes the already dirty item into a subsequent log checkpoint and frees up its previous location in the on-disk log. Joining an item to a transaction of course requires locking the item first, which means we have to be aware of type-specific locks and lock ordering wherever the relog takes place. Fundamentally, this points to xfsaild as the ideal location to process relog enabled items. xfsaild already processes log resident items, is driven by log tail pushing pressure, processes arbitrary log item types through callbacks, and is sensitive to type-specific locking rules by design. The fact that automatic relogging essentially diverts items between writeback or relog also suggests xfsaild as an ideal location to process items one way or the other. Of course, we don't want xfsaild to process transactions as it is a critical component of the log subsystem for driving metadata writeback and freeing up log space. Therefore, similar to how xfsaild builds up a writeback queue of dirty items and queues writes asynchronously, make xfsaild responsible only for directing pending relog items into an appropriate queue and create an async (workqueue) context for processing the queue. The workqueue context utilizes the pre-reserved log reservation to drain the queue by rolling a permanent transaction. Update the AIL pushing infrastructure to support a new RELOG item state. If a log item push returns the relog state, queue the item for relog instead of writeback. On completion of a push cycle, schedule the relog task at the same point metadata buffer I/O is submitted. This allows items to be relogged automatically under the same locking rules and pressure heuristics that govern metadata writeback. Signed-off-by: Brian Foster --- fs/xfs/xfs_trace.h | 2 + fs/xfs/xfs_trans.c | 13 ++++- fs/xfs/xfs_trans.h | 6 ++- fs/xfs/xfs_trans_ail.c | 117 +++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_trans_priv.h | 14 ++++- 5 files changed, 147 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index f6fd598c3912..1f81a47c7f6d 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1068,6 +1068,8 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push); DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned); DEFINE_LOG_ITEM_EVENT(xfs_ail_locked); DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing); +DEFINE_LOG_ITEM_EVENT(xfs_ail_relog); +DEFINE_LOG_ITEM_EVENT(xfs_ail_relog_queue); DEFINE_LOG_ITEM_EVENT(xfs_relog_item); DEFINE_LOG_ITEM_EVENT(xfs_relog_item_cancel); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index ba2540d8a6c9..310beaccbc4c 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -676,7 +676,8 @@ xfs_trans_relog_item( void xfs_trans_relog_item_cancel( struct xfs_trans *tp, - struct xfs_log_item *lip) + struct xfs_log_item *lip, + bool wait) { int res; @@ -684,6 +685,15 @@ xfs_trans_relog_item_cancel( return; trace_xfs_relog_item_cancel(lip); + /* + * Must wait on active relog to complete before reclaiming reservation. + * Currently a big hammer because the QUEUED state isn't cleared until + * AIL (re)insertion. A separate state might be warranted. + */ + while (wait && wait_on_bit_timeout(&lip->li_flags, XFS_LI_RELOG_QUEUED, + TASK_UNINTERRUPTIBLE, HZ)) + xfs_log_force(lip->li_mountp, XFS_LOG_SYNC); + res = xfs_relog_calc_res(lip); if (tp) tp->t_ticket->t_curr_res += res; @@ -777,6 +787,7 @@ xfs_trans_committed_bulk( if (aborted) set_bit(XFS_LI_ABORTED, &lip->li_flags); + clear_and_wake_up_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags); if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { lip->li_ops->iop_release(lip); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 70373e2b8f6d..7f409b0d456a 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -64,6 +64,7 @@ struct xfs_log_item { #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ #define XFS_LI_RECOVERED 4 /* log intent item has been recovered */ #define XFS_LI_RELOG 5 /* automatically relog item */ +#define XFS_LI_RELOG_QUEUED 6 /* queued for relog */ #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ @@ -71,7 +72,8 @@ struct xfs_log_item { { (1 << XFS_LI_FAILED), "FAILED" }, \ { (1 << XFS_LI_DIRTY), "DIRTY" }, \ { (1 << XFS_LI_RECOVERED), "RECOVERED" }, \ - { (1 << XFS_LI_RELOG), "RELOG" } + { (1 << XFS_LI_RELOG), "RELOG" }, \ + { (1 << XFS_LI_RELOG_QUEUED), "RELOG_QUEUED" } struct xfs_item_ops { unsigned flags; @@ -86,6 +88,7 @@ struct xfs_item_ops { void (*iop_error)(struct xfs_log_item *, xfs_buf_t *); int (*iop_recover)(struct xfs_log_item *lip, struct xfs_trans *tp); bool (*iop_match)(struct xfs_log_item *item, uint64_t id); + void (*iop_relog)(struct xfs_log_item *, struct xfs_trans *); }; /* @@ -104,6 +107,7 @@ void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item, #define XFS_ITEM_PINNED 1 #define XFS_ITEM_LOCKED 2 #define XFS_ITEM_FLUSHING 3 +#define XFS_ITEM_RELOG 4 /* * Deferred operation item relogging limits. diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 5c862821171f..6c4d219801a6 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -17,6 +17,7 @@ #include "xfs_errortag.h" #include "xfs_error.h" #include "xfs_log.h" +#include "xfs_log_priv.h" #ifdef DEBUG /* @@ -152,6 +153,88 @@ xfs_ail_max_lsn( return lsn; } +/* + * Relog log items on the AIL relog queue. + * + * Note that relog is incompatible with filesystem freeze due to the + * multi-transaction nature of its users. The freeze sequence blocks all + * transactions and expects to drain the AIL. Allowing the relog transaction to + * proceed while freeze is in progress is not sufficient because it is not + * responsible for cancellation of relog state. The higher level operations must + * be guaranteed to progress to completion before the AIL can be drained of + * relog enabled items. This is currently accomplished by holding + * ->s_umount (quotaoff) or superblock write references (scrub) across the high + * level operations that depend on relog. + */ +static void +xfs_ail_relog( + struct work_struct *work) +{ + struct xfs_ail *ailp = container_of(work, struct xfs_ail, + ail_relog_work); + struct xfs_mount *mp = ailp->ail_mount; + struct xfs_trans_res tres = {}; + struct xfs_trans *tp; + struct xfs_log_item *lip, *lipp; + int error; + LIST_HEAD(relog_list); + + /* + * Open code allocation of an empty transaction and log ticket. The + * ticket requires no initial reservation because the all outstanding + * relog reservation is attached to log items. + */ + error = xfs_trans_alloc(mp, &tres, 0, 0, 0, &tp); + if (error) + goto out; + ASSERT(tp && !tp->t_ticket); + tp->t_flags |= XFS_TRANS_PERM_LOG_RES; + tp->t_log_count = 1; + tp->t_ticket = xlog_ticket_alloc(mp->m_log, 0, 1, XFS_TRANSACTION, + true, false, 0); + /* reset to zero to undo res overhead calculation on ticket alloc */ + tp->t_ticket->t_curr_res = 0; + tp->t_ticket->t_unit_res = 0; + tp->t_log_res = 0; + + spin_lock(&ailp->ail_lock); + while (!list_empty(&ailp->ail_relog_list)) { + list_splice_init(&ailp->ail_relog_list, &relog_list); + spin_unlock(&ailp->ail_lock); + + list_for_each_entry_safe(lip, lipp, &relog_list, li_trans) { + list_del_init(&lip->li_trans); + + trace_xfs_ail_relog(lip); + ASSERT(lip->li_ops->iop_relog); + if (lip->li_ops->iop_relog) + lip->li_ops->iop_relog(lip, tp); + } + + error = xfs_trans_roll(&tp); + if (error) { + xfs_trans_cancel(tp); + goto out; + } + + /* + * Now that the transaction has rolled, reset the ticket to + * zero to reflect that the log reservation held by the + * attached items has been replenished. + */ + tp->t_ticket->t_curr_res = 0; + tp->t_ticket->t_unit_res = 0; + tp->t_log_res = 0; + + spin_lock(&ailp->ail_lock); + } + spin_unlock(&ailp->ail_lock); + xfs_trans_cancel(tp); + +out: + ASSERT(!error || XFS_FORCED_SHUTDOWN(mp)); +} + /* * The cursor keeps track of where our current traversal is up to by tracking * the next item in the list for us. However, for this to be safe, removing an @@ -413,7 +496,7 @@ static long xfsaild_push( struct xfs_ail *ailp) { - xfs_mount_t *mp = ailp->ail_mount; + struct xfs_mount *mp = ailp->ail_mount; struct xfs_ail_cursor cur; struct xfs_log_item *lip; xfs_lsn_t lsn; @@ -475,6 +558,23 @@ xfsaild_push( ailp->ail_last_pushed_lsn = lsn; break; + case XFS_ITEM_RELOG: + /* + * The item requires a relog. Add to the relog queue + * and set a bit to prevent further relog requests + * until AIL reinsertion. + */ + if (test_and_set_bit(XFS_LI_RELOG_QUEUED, + &lip->li_flags)) { + ailp->ail_log_flush++; + break; + } + + trace_xfs_ail_relog_queue(lip); + ASSERT(list_empty(&lip->li_trans)); + list_add_tail(&lip->li_trans, &ailp->ail_relog_list); + break; + case XFS_ITEM_FLUSHING: /* * The item or its backing buffer is already being @@ -541,6 +641,9 @@ xfsaild_push( if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list)) ailp->ail_log_flush++; + if (!list_empty(&ailp->ail_relog_list)) + queue_work(ailp->ail_relog_wq, &ailp->ail_relog_work); + if (!count || XFS_LSN_CMP(lsn, target) >= 0) { out_done: /* @@ -894,16 +997,25 @@ xfs_trans_ail_init( spin_lock_init(&ailp->ail_lock); INIT_LIST_HEAD(&ailp->ail_buf_list); init_waitqueue_head(&ailp->ail_empty); + INIT_LIST_HEAD(&ailp->ail_relog_list); + INIT_WORK(&ailp->ail_relog_work, xfs_ail_relog); atomic64_set(&ailp->ail_relog_res, 0); + ailp->ail_relog_wq = alloc_workqueue("xfs-relog/%s", WQ_FREEZABLE, 0, + mp->m_super->s_id); + if (!ailp->ail_relog_wq) + goto out_free_ailp; + ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s", ailp->ail_mount->m_super->s_id); if (IS_ERR(ailp->ail_task)) - goto out_free_ailp; + goto out_destroy_wq; mp->m_ail = ailp; return 0; +out_destroy_wq: + destroy_workqueue(ailp->ail_relog_wq); out_free_ailp: kmem_free(ailp); return -ENOMEM; @@ -917,5 +1029,6 @@ xfs_trans_ail_destroy( ASSERT(atomic64_read(&ailp->ail_relog_res) == 0); kthread_stop(ailp->ail_task); + destroy_workqueue(ailp->ail_relog_wq); kmem_free(ailp); } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index d923e79676af..6c15a4f39b68 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -17,7 +17,8 @@ void xfs_trans_init(struct xfs_mount *); void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_del_item(struct xfs_log_item *); void xfs_trans_relog_item(struct xfs_trans *, struct xfs_log_item *); -void xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *); +void xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *, + bool wait); void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp); void xfs_trans_committed_bulk(struct xfs_ail *ailp, struct xfs_log_vec *lv, @@ -64,6 +65,9 @@ struct xfs_ail { struct list_head ail_buf_list; wait_queue_head_t ail_empty; atomic64_t ail_relog_res; + struct work_struct ail_relog_work; + struct list_head ail_relog_list; + struct workqueue_struct *ail_relog_wq; }; /* @@ -85,6 +89,14 @@ xfs_ail_min( li_ail); } +static inline bool +xfs_item_needs_relog( + struct xfs_log_item *lip) +{ + return test_bit(XFS_LI_RELOG, &lip->li_flags) && + !test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags); +} + static inline void xfs_trans_ail_update( struct xfs_ail *ailp, From patchwork Wed Jul 1 16:51:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636915 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD1B960D for ; Wed, 1 Jul 2020 16:51:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A1A6420771 for ; Wed, 1 Jul 2020 16:51:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fRxF51LL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732344AbgGAQv0 (ORCPT ); Wed, 1 Jul 2020 12:51:26 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:49272 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732559AbgGAQvY (ORCPT ); Wed, 1 Jul 2020 12:51:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J6ERAzvMnQufPex/eRMnULXuuof0eCu/aeKjMvuYp3Q=; b=fRxF51LLKdm8yQc09i2QN9sgizkU5xK5Iiqh+hBvG6cmJqMvcvVv4RgnAx/EsiSnCY9nP7 2pNB2KNSBMtvFzDMMh30sfeeQYDHycHR6qlcGb+dLuiB4csZ5yM9zTTX4/yyCH20M5KDzY r6n0DPYqFFGLgqWNQSGpA/7xxjR9pm0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-292-GG_PNNtTOX2TTnAydKktMw-1; Wed, 01 Jul 2020 12:51:20 -0400 X-MC-Unique: GG_PNNtTOX2TTnAydKktMw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id ACC3A18FF662 for ; Wed, 1 Jul 2020 16:51:19 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id 695995C3FD for ; Wed, 1 Jul 2020 16:51:19 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 06/10] xfs: automatically relog the quotaoff start intent Date: Wed, 1 Jul 2020 12:51:12 -0400 Message-Id: <20200701165116.47344-7-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The quotaoff operation has a rare but longstanding deadlock vector in terms of how the operation is logged. A quotaoff start intent is logged (synchronously) at the onset to ensure recovery can handle the operation if interrupted before in-core changes are made. This quotaoff intent pins the log tail while the quotaoff sequence scans and purges dquots from all in-core inodes. While this operation generally doesn't generate much log traffic on its own, it can be time consuming. If unrelated, concurrent filesystem activity consumes remaining log space before quotaoff is able to acquire log reservation for the quotaoff end intent, the filesystem locks up indefinitely. quotaoff cannot allocate the end intent before the scan because the latter can result in transaction allocation itself in certain indirect cases (releasing an inode, for example). Further, rolling the original transaction is difficult because the scanning work occurs multiple layers down where caller context is lost and not much information is available to determine how often to roll the transaction. To address this problem, enable automatic relogging of the quotaoff start intent. This automatically relogs the intent whenever AIL pushing finds the item at the tail of the log. When quotaoff completes, wait for relogging to complete as the end intent expects to be able to permanently remove the start intent from the log subsystem. This ensures that the log tail is kept moving during a particularly long quotaoff operation and avoids the log reservation deadlock. Note that the quotaoff reservation calculation does not need to be updated for relog as it already (incorrectly) accounts for two quotaoff intents. Signed-off-by: Brian Foster --- fs/xfs/xfs_dquot_item.c | 26 ++++++++++++++++++++++++-- fs/xfs/xfs_qm_syscalls.c | 12 +++++++++++- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index 349c92d26570..86dcb6932aab 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -17,6 +17,7 @@ #include "xfs_trans_priv.h" #include "xfs_qm.h" #include "xfs_log.h" +#include "xfs_log_priv.h" static inline struct xfs_dq_logitem *DQUOT_ITEM(struct xfs_log_item *lip) { @@ -275,14 +276,17 @@ xfs_qm_qoff_logitem_format( } /* - * There isn't much you can do to push a quotaoff item. It is simply - * stuck waiting for the log to be flushed to disk. + * The quotaoff log item is stuck in the log until quotaoff completes. Either + * relog it to keep the tail moving or consider it locked. */ STATIC uint xfs_qm_qoff_logitem_push( struct xfs_log_item *lip, struct list_head *buffer_list) { + + if (xfs_item_needs_relog(lip)) + return XFS_ITEM_RELOG; return XFS_ITEM_LOCKED; } @@ -314,6 +318,23 @@ xfs_qm_qoff_logitem_release( } } +STATIC void +xfs_qm_qoff_logitem_relog( + struct xfs_log_item *lip, + struct xfs_trans *tp) +{ + int res; + + res = xfs_relog_calc_res(lip); + + xfs_trans_add_item(tp, lip); + tp->t_ticket->t_curr_res += res; + tp->t_ticket->t_unit_res += res; + tp->t_log_res += res; + tp->t_flags |= XFS_TRANS_DIRTY; + set_bit(XFS_LI_DIRTY, &lip->li_flags); +} + static const struct xfs_item_ops xfs_qm_qoffend_logitem_ops = { .iop_size = xfs_qm_qoff_logitem_size, .iop_format = xfs_qm_qoff_logitem_format, @@ -327,6 +348,7 @@ static const struct xfs_item_ops xfs_qm_qoff_logitem_ops = { .iop_format = xfs_qm_qoff_logitem_format, .iop_push = xfs_qm_qoff_logitem_push, .iop_release = xfs_qm_qoff_logitem_release, + .iop_relog = xfs_qm_qoff_logitem_relog, }; /* diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 7effd7a28136..5602ed2b7e8d 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -18,6 +18,7 @@ #include "xfs_quota.h" #include "xfs_qm.h" #include "xfs_icache.h" +#include "xfs_trans_priv.h" STATIC int xfs_qm_log_quotaoff( @@ -29,12 +30,14 @@ xfs_qm_log_quotaoff( int error; struct xfs_qoff_logitem *qoffi; - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, + XFS_TRANS_RELOG, &tp); if (error) goto out; qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT); xfs_trans_log_quotaoff_item(tp, qoffi); + xfs_trans_relog_item(tp, &qoffi->qql_item); spin_lock(&mp->m_sb_lock); mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL; @@ -71,6 +74,13 @@ xfs_qm_log_quotaoff_end( if (error) return error; + /* + * startqoff must be in the AIL and not the CIL when the end intent + * commits to ensure it is not readded to the AIL out of order. Wait on + * relog activity to drain to isolate startqoff to the AIL. + */ + xfs_trans_relog_item_cancel(tp, &(*startqoff)->qql_item, true); + qoffi = xfs_trans_get_qoff_item(tp, *startqoff, flags & XFS_ALL_QUOTA_ACCT); xfs_trans_log_quotaoff_item(tp, qoffi); From patchwork Wed Jul 1 16:51:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636911 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F0D7912 for ; Wed, 1 Jul 2020 16:51:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC02920748 for ; Wed, 1 Jul 2020 16:51:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gGoVevKH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732542AbgGAQv0 (ORCPT ); Wed, 1 Jul 2020 12:51:26 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:55287 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732546AbgGAQvX (ORCPT ); Wed, 1 Jul 2020 12:51:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622282; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zHW36s/snJE2ggr6hpkbMpu5eR3RUmCUNf25VBddjNA=; b=gGoVevKHRYxKcYJ8goto06FdzYhojRBykerUFU++F8EuBBy92xlz+rTJm0WuJaXlHUwbTX iHfqboIRMk+McIqCOwXkT7gI3Wbj080w2Fw9uhOun+gSjrS6nwNob3YphAib8IHzGSk6oP yntBZQk9vP2G7OVU0e+j+JGvXpDATxU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-91-_yTHETGtNbipXHbnfr9duw-1; Wed, 01 Jul 2020 12:51:21 -0400 X-MC-Unique: _yTHETGtNbipXHbnfr9duw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 23FC9107ACCD for ; Wed, 1 Jul 2020 16:51:20 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id D44055C3FD for ; Wed, 1 Jul 2020 16:51:19 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 07/10] xfs: prevent fs freeze with outstanding relog items Date: Wed, 1 Jul 2020 12:51:13 -0400 Message-Id: <20200701165116.47344-8-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The automatic relog mechanism is currently incompatible with filesystem freeze in a generic sense. Freeze protection is currently implemented via locks that cannot be held on return to userspace, which means we can't hold a superblock write reference for the duration relogging is enabled on an item. It's too late to block when the freeze sequence calls into the filesystem because the transaction subsystem has already begun to be frozen. Not only can this block the relog transaction, but blocking any unrelated transaction essentially prevents a particular operation from progressing to the point where it can disable relogging on an item. Therefore marking the relog transaction as "nowrite" does not solve the problem. This is not a problem in practice because the two primary use cases already exclude freeze via other means. quotaoff holds ->s_umount read locked across the operation and scrub explicitly takes a superblock write reference, both of which block freeze of the transaction subsystem for the duration of relog enabled items. As a fallback for future use cases and the upcoming random buffer relogging test code, fail fs freeze attempts when the global relog reservation counter is elevated to prevent deadlock. This is a partial punt of the problem as compared to teaching freeze to wait on relogged items because the only current dependency is test code. In other words, this patch prevents deadlock if a user happens to issue a freeze in conjunction with random buffer relog injection. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/xfs_super.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 379cbff438bc..f77af5298a80 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -35,6 +35,7 @@ #include "xfs_refcount_item.h" #include "xfs_bmap_item.h" #include "xfs_reflink.h" +#include "xfs_trans_priv.h" #include #include @@ -914,6 +915,9 @@ xfs_fs_freeze( { struct xfs_mount *mp = XFS_M(sb); + if (WARN_ON_ONCE(atomic64_read(&mp->m_ail->ail_relog_res))) + return -EAGAIN; + xfs_stop_block_reaping(mp); xfs_save_resvblks(mp); xfs_quiesce_attr(mp); From patchwork Wed Jul 1 16:51:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636925 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1521160D for ; Wed, 1 Jul 2020 16:51:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F0AFD20771 for ; Wed, 1 Jul 2020 16:51:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="I1W8hXIt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732562AbgGAQv2 (ORCPT ); Wed, 1 Jul 2020 12:51:28 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:31661 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732557AbgGAQvZ (ORCPT ); Wed, 1 Jul 2020 12:51:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gEZXTaYMhY7js03Kqs/HmZoIhYoqhW3gtmDpQ/uE3ZI=; b=I1W8hXIt5GYAApKycsArbZk8/djluhv1ipNj/K7QG9zeoY+gSFYGD8wcxXKnMVbTpTI7LZ FKjLdqlknn2RXByVOlEGl9R1s8j6maDAGh12S6cAK+Ys0CYVvX8vY8nP4pAQGYKelzsQ0r bPjne6az4I1+ZzjGSNtekdHqojFcOWI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-321-WKakSgtXMIKPBh7oEqLLFA-1; Wed, 01 Jul 2020 12:51:21 -0400 X-MC-Unique: WKakSgtXMIKPBh7oEqLLFA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 917FF800C64 for ; Wed, 1 Jul 2020 16:51:20 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C66B5C3FD for ; Wed, 1 Jul 2020 16:51:20 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH RFC 08/10] xfs: buffer relogging support prototype Date: Wed, 1 Jul 2020 12:51:14 -0400 Message-Id: <20200701165116.47344-9-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Implement buffer relogging support. There is currently no use case for buffer relogging. This is for testing and experimental purposes and serves as an example to demonstrate the ability to relog arbitrary items in the future, if necessary. Add helpers to manage relogged buffers, update the buffer log item push handler to support relogged BLIs and add a log item relog callback to properly join buffers to the relog transaction. Note that buffers associated with higher level log items (i.e., inodes and dquots) are skipped. Signed-off-by: Brian Foster --- fs/xfs/xfs_buf.c | 4 +++ fs/xfs/xfs_buf_item.c | 60 ++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_trans.h | 5 +++- fs/xfs/xfs_trans_buf.c | 66 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 128 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 20b748f7e186..eec482204336 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -16,6 +16,8 @@ #include "xfs_log.h" #include "xfs_errortag.h" #include "xfs_error.h" +#include "xfs_trans.h" +#include "xfs_buf_item.h" static kmem_zone_t *xfs_buf_zone; @@ -1500,6 +1502,8 @@ __xfs_buf_submit( trace_xfs_buf_submit(bp, _RET_IP_); ASSERT(!(bp->b_flags & _XBF_DELWRI_Q)); + ASSERT(!bp->b_log_item || + !test_bit(XFS_LI_RELOG, &bp->b_log_item->bli_item.li_flags)); /* on shutdown we stale and complete the buffer immediately */ if (XFS_FORCED_SHUTDOWN(bp->b_mount)) { diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 9e75e8d6042e..eb827a31b47f 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -16,7 +16,7 @@ #include "xfs_trans_priv.h" #include "xfs_trace.h" #include "xfs_log.h" - +#include "xfs_log_priv.h" kmem_zone_t *xfs_buf_item_zone; @@ -141,7 +141,6 @@ xfs_buf_item_size( struct xfs_buf_log_item *bip = BUF_ITEM(lip); int i; - ASSERT(atomic_read(&bip->bli_refcount) > 0); if (bip->bli_flags & XFS_BLI_STALE) { /* * The buffer is stale, so all we need to log @@ -157,7 +156,7 @@ xfs_buf_item_size( return; } - ASSERT(bip->bli_flags & XFS_BLI_LOGGED); + ASSERT(bip->bli_flags & XFS_BLI_DIRTY); if (bip->bli_flags & XFS_BLI_ORDERED) { /* @@ -418,6 +417,10 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin(bip); + /* cancel relogging on abort before we drop the bli reference */ + if (remove) + xfs_trans_relog_buf_cancel(NULL, bp); + freed = atomic_dec_and_test(&bip->bli_refcount); if (atomic_dec_and_test(&bp->b_pin_count)) @@ -462,6 +465,13 @@ xfs_buf_item_unpin( list_del_init(&bp->b_li_list); bp->b_iodone = NULL; } else { + /* racy */ + ASSERT(!test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags)); + if (test_bit(XFS_LI_RELOG, &lip->li_flags)) { + atomic_dec(&bp->b_pin_count); + xfs_trans_relog_item_cancel(NULL, lip, true); + } + xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR); xfs_buf_item_relse(bp); ASSERT(bp->b_log_item == NULL); @@ -488,8 +498,6 @@ xfs_buf_item_push( struct xfs_buf *bp = bip->bli_buf; uint rval = XFS_ITEM_SUCCESS; - if (xfs_buf_ispinned(bp)) - return XFS_ITEM_PINNED; if (!xfs_buf_trylock(bp)) { /* * If we have just raced with a buffer being pinned and it has @@ -503,6 +511,15 @@ xfs_buf_item_push( return XFS_ITEM_LOCKED; } + /* relog bufs are pinned so check relog state first */ + if (xfs_item_needs_relog(lip)) + return XFS_ITEM_RELOG; + + if (xfs_buf_ispinned(bp)) { + xfs_buf_unlock(bp); + return XFS_ITEM_PINNED; + } + ASSERT(!(bip->bli_flags & XFS_BLI_STALE)); trace_xfs_buf_item_push(bip); @@ -532,6 +549,7 @@ xfs_buf_item_put( struct xfs_buf_log_item *bip) { struct xfs_log_item *lip = &bip->bli_item; + struct xfs_buf *bp = bip->bli_buf; bool aborted; bool dirty; @@ -557,8 +575,10 @@ xfs_buf_item_put( * transaction that invalidated a dirty bli and cleared the dirty * state. */ - if (aborted) + if (aborted) { + xfs_trans_relog_buf_cancel(NULL, bp); xfs_trans_ail_delete(lip, 0); + } xfs_buf_item_relse(bip->bli_buf); return true; } @@ -668,6 +688,28 @@ xfs_buf_item_committed( return lsn; } +STATIC void +xfs_buf_item_relog( + struct xfs_log_item *lip, + struct xfs_trans *tp) +{ + struct xfs_buf_log_item *bip = BUF_ITEM(lip); + int res; + + /* + * Grab a reference to the buffer for the transaction before we join + * and dirty it. + */ + xfs_buf_hold(bip->bli_buf); + xfs_trans_bjoin(tp, bip->bli_buf); + xfs_trans_dirty_buf(tp, bip->bli_buf); + + res = xfs_relog_calc_res(lip); + tp->t_ticket->t_curr_res += res; + tp->t_ticket->t_unit_res += res; + tp->t_log_res += res; +} + static const struct xfs_item_ops xfs_buf_item_ops = { .iop_size = xfs_buf_item_size, .iop_format = xfs_buf_item_format, @@ -677,6 +719,7 @@ static const struct xfs_item_ops xfs_buf_item_ops = { .iop_committing = xfs_buf_item_committing, .iop_committed = xfs_buf_item_committed, .iop_push = xfs_buf_item_push, + .iop_relog = xfs_buf_item_relog, }; STATIC void @@ -930,6 +973,11 @@ STATIC void xfs_buf_item_free( struct xfs_buf_log_item *bip) { + ASSERT(!test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags)); +#ifdef DEBUG + ASSERT(!atomic64_read(&bip->bli_item.li_relog_res)); +#endif + xfs_buf_item_free_format(bip); kmem_free(bip->bli_item.li_lv_shadow); kmem_cache_free(xfs_buf_item_zone, bip); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 7f409b0d456a..0262a883969f 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -243,7 +243,7 @@ xfs_relog_calc_res( * xfs_log_calc_unit_res(). */ lip->li_ops->iop_size(lip, &niovecs, &nbytes); - ASSERT(niovecs == 1); + ASSERT(niovecs == 1 || lip->li_type == XFS_LI_BUF); nbytes += niovecs * sizeof(xlog_op_header_t); nbytes = xfs_log_calc_unit_res(lip->li_mountp, nbytes); @@ -262,6 +262,9 @@ void xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *); void xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *); bool xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *); void xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint); +bool xfs_trans_relog_buf(struct xfs_trans *, struct xfs_buf *); +void xfs_trans_relog_buf_cancel(struct xfs_trans *, + struct xfs_buf *); void xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *); void xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int); void xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint); diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c index 08174ffa2118..b5b552a4bcfb 100644 --- a/fs/xfs/xfs_trans_buf.c +++ b/fs/xfs/xfs_trans_buf.c @@ -588,6 +588,8 @@ xfs_trans_binval( return; } + /* return relog res before we reset dirty state */ + xfs_trans_relog_buf_cancel(tp, bp); xfs_buf_stale(bp); bip->bli_flags |= XFS_BLI_STALE; @@ -787,3 +789,67 @@ xfs_trans_dquot_buf( xfs_trans_buf_set_type(tp, bp, type); } + +/* + * Enable automatic relogging on a buffer. This essentially pins a dirty buffer + * in-core until relogging is disabled. + */ +bool +xfs_trans_relog_buf( + struct xfs_trans *tp, + struct xfs_buf *bp) +{ + struct xfs_buf_log_item *bip = bp->b_log_item; + enum xfs_blft blft; + + ASSERT(xfs_buf_islocked(bp)); + + if (bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) + return false; + /* + * Don't bother with queued buffers since we're about to pin it for an + * indeterminate amount of time and we don't want the responsibility of + * failing it if an abort happens to remove it from the AIL. + */ + if (bp->b_flags & _XBF_DELWRI_Q) + return false; + + /* + * Skip buffers with higher level log items. Those items must be + * relogged directly to move in the log. + */ + blft = xfs_blft_from_flags(&bip->__bli_format); + switch (blft) { + case XFS_BLFT_DINO_BUF: + case XFS_BLFT_UDQUOT_BUF: + case XFS_BLFT_PDQUOT_BUF: + case XFS_BLFT_GDQUOT_BUF: + return false; + default: + break; + } + + /* + * Relog expects a worst case reservation from ->iop_size. Hack that in + * here by logging the entire buffer in this transaction. Also grab a + * buffer pin to prevent it from being written out. + */ + xfs_buf_item_log(bip, 0, BBTOB(bp->b_length) - 1); + atomic_inc(&bp->b_pin_count); + xfs_trans_relog_item(tp, &bip->bli_item); + return true; +} + +void +xfs_trans_relog_buf_cancel( + struct xfs_trans *tp, + struct xfs_buf *bp) +{ + struct xfs_buf_log_item *bip = bp->b_log_item; + + if (!test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags)) + return; + + atomic_dec(&bp->b_pin_count); + xfs_trans_relog_item_cancel(tp, &bip->bli_item, false); +} From patchwork Wed Jul 1 16:51:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636919 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38856912 for ; Wed, 1 Jul 2020 16:51:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21A552078A for ; Wed, 1 Jul 2020 16:51:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="a54HKeTz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732462AbgGAQv1 (ORCPT ); Wed, 1 Jul 2020 12:51:27 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:34784 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732561AbgGAQvZ (ORCPT ); Wed, 1 Jul 2020 12:51:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mLaeC0bAIn1f4jBZgiYGOFQJ0ynuWLVeif+dg9mEP5M=; b=a54HKeTzM9BiXiYHP3E/zSRke0L1wTSi004Hwrx/Brc31BHIMaE1wTW3jg9zD4znZ44Jr0 Es7TRNrN713YCaU1e+R245PVa9deztTwlOg8AgcTTn3HUATua3OuvRABMrPonxtC0uAkKK 4OV+az8V1sQ30zy1aj/x+jaRpoxriiE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-351-pD4BUMQJNTGA_azWV95uUQ-1; Wed, 01 Jul 2020 12:51:22 -0400 X-MC-Unique: pD4BUMQJNTGA_azWV95uUQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0819E107ACCA for ; Wed, 1 Jul 2020 16:51:21 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id B92E25C3FD for ; Wed, 1 Jul 2020 16:51:20 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH RFC 09/10] xfs: create an error tag for random relog reservation Date: Wed, 1 Jul 2020 12:51:15 -0400 Message-Id: <20200701165116.47344-10-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Create an errortag to enable relogging on random transactions. Since relogging requires extra transaction reservation, artificially bump the reservation on selected transactions and tag them with the relog flag such that the requisite reservation overhead is added by the ticket allocation code. This allows subsequent random buffer relog events to target transactions where reservation is included. This is necessary to avoid transaction reservation overruns on non-relog transactions. Note that this does not yet enable relogging of any particular items. The tag will be reused in a subsequent patch to enable random buffer relogging. Signed-off-by: Brian Foster --- fs/xfs/libxfs/xfs_errortag.h | 4 +++- fs/xfs/xfs_error.c | 3 +++ fs/xfs/xfs_trans.c | 21 ++++++++++++++++----- 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h index 53b305dea381..8f360cfc666c 100644 --- a/fs/xfs/libxfs/xfs_errortag.h +++ b/fs/xfs/libxfs/xfs_errortag.h @@ -56,7 +56,8 @@ #define XFS_ERRTAG_FORCE_SUMMARY_RECALC 33 #define XFS_ERRTAG_IUNLINK_FALLBACK 34 #define XFS_ERRTAG_BUF_IOERROR 35 -#define XFS_ERRTAG_MAX 36 +#define XFS_ERRTAG_RELOG 36 +#define XFS_ERRTAG_MAX 37 /* * Random factors for above tags, 1 means always, 2 means 1/2 time, etc. @@ -97,5 +98,6 @@ #define XFS_RANDOM_FORCE_SUMMARY_RECALC 1 #define XFS_RANDOM_IUNLINK_FALLBACK (XFS_RANDOM_DEFAULT/10) #define XFS_RANDOM_BUF_IOERROR XFS_RANDOM_DEFAULT +#define XFS_RANDOM_RELOG XFS_RANDOM_DEFAULT #endif /* __XFS_ERRORTAG_H_ */ diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c index 7f6e20899473..562e00f7dcf5 100644 --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = { XFS_RANDOM_FORCE_SUMMARY_RECALC, XFS_RANDOM_IUNLINK_FALLBACK, XFS_RANDOM_BUF_IOERROR, + XFS_RANDOM_RELOG, }; struct xfs_errortag_attr { @@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair, XFS_ERRTAG_FORCE_SCRUB_REPAIR); XFS_ERRORTAG_ATTR_RW(bad_summary, XFS_ERRTAG_FORCE_SUMMARY_RECALC); XFS_ERRORTAG_ATTR_RW(iunlink_fallback, XFS_ERRTAG_IUNLINK_FALLBACK); XFS_ERRORTAG_ATTR_RW(buf_ioerror, XFS_ERRTAG_BUF_IOERROR); +XFS_ERRORTAG_ATTR_RW(relog, XFS_ERRTAG_RELOG); static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(noerror), @@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(bad_summary), XFS_ERRORTAG_ATTR_LIST(iunlink_fallback), XFS_ERRORTAG_ATTR_LIST(buf_ioerror), + XFS_ERRORTAG_ATTR_LIST(relog), NULL, }; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 310beaccbc4c..df94ca45c7c8 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -21,6 +21,7 @@ #include "xfs_error.h" #include "xfs_defer.h" #include "xfs_log_priv.h" +#include "xfs_errortag.h" kmem_zone_t *xfs_trans_zone; @@ -176,9 +177,10 @@ xfs_trans_reserve( if (resp->tr_logres > 0) { bool permanent = false; bool relog = (tp->t_flags & XFS_TRANS_RELOG); + int logres = resp->tr_logres; ASSERT(tp->t_log_res == 0 || - tp->t_log_res == resp->tr_logres); + tp->t_log_res == logres); ASSERT(tp->t_log_count == 0 || tp->t_log_count == resp->tr_logcount); @@ -194,9 +196,18 @@ xfs_trans_reserve( ASSERT(resp->tr_logflags & XFS_TRANS_PERM_LOG_RES); error = xfs_log_regrant(mp, tp->t_ticket); } else { - error = xfs_log_reserve(mp, - resp->tr_logres, - resp->tr_logcount, + /* + * Enable relog overhead on random transactions to support + * random item relogging. + */ + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RELOG) && + !relog) { + tp->t_flags |= XFS_TRANS_RELOG; + relog = true; + logres <<= 1; + } + + error = xfs_log_reserve(mp, logres, resp->tr_logcount, &tp->t_ticket, XFS_TRANSACTION, permanent, relog); } @@ -204,7 +215,7 @@ xfs_trans_reserve( if (error) goto undo_blocks; - tp->t_log_res = resp->tr_logres; + tp->t_log_res = logres; tp->t_log_count = resp->tr_logcount; } From patchwork Wed Jul 1 16:51:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11636923 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9EA1912 for ; Wed, 1 Jul 2020 16:51:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D20FF20771 for ; Wed, 1 Jul 2020 16:51:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="M3EXGptc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732560AbgGAQv2 (ORCPT ); Wed, 1 Jul 2020 12:51:28 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:20501 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732562AbgGAQvZ (ORCPT ); Wed, 1 Jul 2020 12:51:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593622284; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aMaPcAQ4kMJDoMrx+zXEwltl2IQd0OWjiVXQDFFrEuI=; b=M3EXGptc5gCikr43edCueq6PRcUZbW8scVB2norwWRDY5bkkzHIQJGWltAOgTFtBTA5JQ2 I2swSy0i33GbFFiXxw7mDAXqDZWlGnTAMwkcARIRX0yWsDsdA9qbHCoWIMJutj496/2/DP ptofbZVClWhUouxjhXvCrY/KHckuJ70= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-229-RY_odaFlPRiZDDNIO4ISnA-1; Wed, 01 Jul 2020 12:51:22 -0400 X-MC-Unique: RY_odaFlPRiZDDNIO4ISnA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 73E3818FF660 for ; Wed, 1 Jul 2020 16:51:21 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-120-48.rdu2.redhat.com [10.10.120.48]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2FF2A5C3FD for ; Wed, 1 Jul 2020 16:51:21 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH RFC 10/10] xfs: relog random buffers based on errortag Date: Wed, 1 Jul 2020 12:51:16 -0400 Message-Id: <20200701165116.47344-11-bfoster@redhat.com> In-Reply-To: <20200701165116.47344-1-bfoster@redhat.com> References: <20200701165116.47344-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Since there is currently no specific use case for buffer relogging, add some hacky and experimental code to relog random buffers when the associated errortag is enabled. Use fixed termination logic regardless of the user-specified error rate to help ensure that the relog queue doesn't grow indefinitely. Note that this patch was useful in causing log reservation deadlocks on an fsstress workload if the relog mechanism code is modified to acquire its own log reservation rather than rely on the pre-reservation mechanism. In other words, this helps prove that the relog reservation management code effectively avoids log reservation deadlocks. Signed-off-by: Brian Foster --- fs/xfs/xfs_buf_item.c | 1 + fs/xfs/xfs_trans.h | 4 +++- fs/xfs/xfs_trans_ail.c | 33 +++++++++++++++++++++++++++++++++ fs/xfs/xfs_trans_buf.c | 14 ++++++++++++++ 4 files changed, 51 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index eb827a31b47f..fb277187a2cf 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -469,6 +469,7 @@ xfs_buf_item_unpin( ASSERT(!test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags)); if (test_bit(XFS_LI_RELOG, &lip->li_flags)) { atomic_dec(&bp->b_pin_count); + clear_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags); xfs_trans_relog_item_cancel(NULL, lip, true); } diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 0262a883969f..18714e6af476 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -65,6 +65,7 @@ struct xfs_log_item { #define XFS_LI_RECOVERED 4 /* log intent item has been recovered */ #define XFS_LI_RELOG 5 /* automatically relog item */ #define XFS_LI_RELOG_QUEUED 6 /* queued for relog */ +#define XFS_LI_RELOG_RAND 7 #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ @@ -73,7 +74,8 @@ struct xfs_log_item { { (1 << XFS_LI_DIRTY), "DIRTY" }, \ { (1 << XFS_LI_RECOVERED), "RECOVERED" }, \ { (1 << XFS_LI_RELOG), "RELOG" }, \ - { (1 << XFS_LI_RELOG_QUEUED), "RELOG_QUEUED" } + { (1 << XFS_LI_RELOG_QUEUED), "RELOG_QUEUED" }, \ + { (1 << XFS_LI_RELOG_RAND), "RELOG_RAND" } struct xfs_item_ops { unsigned flags; diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 6c4d219801a6..3a8a1abc6c4c 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -18,6 +18,7 @@ #include "xfs_error.h" #include "xfs_log.h" #include "xfs_log_priv.h" +#include "xfs_buf_item.h" #ifdef DEBUG /* @@ -176,6 +177,7 @@ xfs_ail_relog( struct xfs_trans_res tres = {}; struct xfs_trans *tp; struct xfs_log_item *lip, *lipp; + int cancelres; int error; LIST_HEAD(relog_list); @@ -209,6 +211,37 @@ xfs_ail_relog( ASSERT(lip->li_ops->iop_relog); if (lip->li_ops->iop_relog) lip->li_ops->iop_relog(lip, tp); + + /* + * Cancel random buffer relogs at a fixed rate to + * prevent too much buildup. + */ + if (test_bit(XFS_LI_RELOG_RAND, &lip->li_flags) && + ((prandom_u32() & 1) || + (mp->m_flags & XFS_MOUNT_UNMOUNTING))) { + struct xfs_buf_log_item *bli; + bli = container_of(lip, struct xfs_buf_log_item, + bli_item); + xfs_trans_relog_buf_cancel(tp, bli->bli_buf); + } + } + + /* + * Cancelling relog reservation in the same transaction as + * consuming it means the current transaction over releases + * reservation on commit and the next transaction reservation + * restores the grant heads to even. To avoid this behavior, + * remove surplus reservation (->t_curr_res) from the committing + * transaction and replace it with a reduction in the + * reservation requirement (->t_unit_res) for the next. This has + * no net effect on reservation accounting, but ensures we don't + * cause problems elsewhere with odd reservation behavior. + */ + cancelres = tp->t_ticket->t_curr_res - tp->t_ticket->t_unit_res; + if (cancelres) { + tp->t_ticket->t_curr_res -= cancelres; + tp->t_ticket->t_unit_res -= cancelres; + tp->t_log_res -= cancelres; } error = xfs_trans_roll(&tp); diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c index b5b552a4bcfb..565386912e4d 100644 --- a/fs/xfs/xfs_trans_buf.c +++ b/fs/xfs/xfs_trans_buf.c @@ -14,6 +14,8 @@ #include "xfs_buf_item.h" #include "xfs_trans_priv.h" #include "xfs_trace.h" +#include "xfs_error.h" +#include "xfs_errortag.h" /* * Check to see if a buffer matching the given parameters is already @@ -527,6 +529,17 @@ xfs_trans_log_buf( trace_xfs_trans_log_buf(bip); xfs_buf_item_log(bip, first, last); + + /* + * Relog random buffers so long as the transaction is relog enabled and + * the buffer wasn't already relogged explicitly. + */ + if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_RELOG) && + (tp->t_flags & XFS_TRANS_RELOG) && + !test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags)) { + if (xfs_trans_relog_buf(tp, bp)) + set_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags); + } } @@ -852,4 +865,5 @@ xfs_trans_relog_buf_cancel( atomic_dec(&bp->b_pin_count); xfs_trans_relog_item_cancel(tp, &bip->bli_item, false); + clear_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags); }