From patchwork Mon Apr 6 12:36:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475489 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D46AF1744 for ; Mon, 6 Apr 2020 12:36:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B2B9C21D82 for ; Mon, 6 Apr 2020 12:36:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NPNgIPxG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727988AbgDFMgj (ORCPT ); Mon, 6 Apr 2020 08:36:39 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:34961 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727652AbgDFMgi (ORCPT ); Mon, 6 Apr 2020 08:36:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hFaHcyKUQI57qhzsYp+uXg4rgeMkWYVn5dmwWJjFxrU=; b=NPNgIPxGqRvH1IabCfvB4B+Z61QbS0R3NUcoZ+zoL8vmPcN3R13I0AFkSGZltC+Qi9QZmf B/b6mmUWJ5EDI2se1JCmkSbUHhT6ynsDXyAyKKMAc0eVNe7UCZb3QSrku41ayv/bIaFyEZ KkLwRnCeb+RwqWdRK9wzABSKyhXJhko= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-462-JBPlytlJO8Gi12pD3h7x_g-1; Mon, 06 Apr 2020 08:36:35 -0400 X-MC-Unique: JBPlytlJO8Gi12pD3h7x_g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1AEB9DBB2 for ; Mon, 6 Apr 2020 12:36:34 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id C37FF60BFB for ; Mon, 6 Apr 2020 12:36:33 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 01/10] xfs: automatic relogging item management Date: Mon, 6 Apr 2020 08:36:23 -0400 Message-Id: <20200406123632.20873-2-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Add a log item flag to track relog state and a couple helpers to set and clear the flag. The flag will be set on any log item that is to be automatically relogged by log tail pressure. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/xfs_trace.h | 2 ++ fs/xfs/xfs_trans.c | 20 ++++++++++++++++++++ fs/xfs/xfs_trans.h | 4 +++- fs/xfs/xfs_trans_priv.h | 2 ++ 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index a4323a63438d..b683c94556a3 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1068,6 +1068,8 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push); DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned); DEFINE_LOG_ITEM_EVENT(xfs_ail_locked); DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing); +DEFINE_LOG_ITEM_EVENT(xfs_relog_item); +DEFINE_LOG_ITEM_EVENT(xfs_relog_item_cancel); DECLARE_EVENT_CLASS(xfs_ail_class, TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn), diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 28b983ff8b11..4fbe11485bbb 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -770,6 +770,26 @@ xfs_trans_del_item( list_del_init(&lip->li_trans); } +void +xfs_trans_relog_item( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + if (test_and_set_bit(XFS_LI_RELOG, &lip->li_flags)) + return; + trace_xfs_relog_item(lip); +} + +void +xfs_trans_relog_item_cancel( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + if (!test_and_clear_bit(XFS_LI_RELOG, &lip->li_flags)) + return; + trace_xfs_relog_item_cancel(lip); +} + /* Detach and unlock all of the items in a transaction */ static void xfs_trans_free_items( diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 752c7fef9de7..9ea0f415e5ca 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -59,12 +59,14 @@ struct xfs_log_item { #define XFS_LI_ABORTED 1 #define XFS_LI_FAILED 2 #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ +#define XFS_LI_RELOG 4 /* automatically relog item */ #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ { (1 << XFS_LI_ABORTED), "ABORTED" }, \ { (1 << XFS_LI_FAILED), "FAILED" }, \ - { (1 << XFS_LI_DIRTY), "DIRTY" } + { (1 << XFS_LI_DIRTY), "DIRTY" }, \ + { (1 << XFS_LI_RELOG), "RELOG" } struct xfs_item_ops { unsigned flags; diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 35655eac01a6..c890794314a6 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -16,6 +16,8 @@ struct xfs_log_vec; void xfs_trans_init(struct xfs_mount *); void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_del_item(struct xfs_log_item *); +void xfs_trans_relog_item(struct xfs_trans *, struct xfs_log_item *); +void xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp); void xfs_trans_committed_bulk(struct xfs_ail *ailp, struct xfs_log_vec *lv, From patchwork Mon Apr 6 12:36:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475487 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A987514B4 for ; Mon, 6 Apr 2020 12:36:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 81E8221D82 for ; Mon, 6 Apr 2020 12:36:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BIa0nzsY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727984AbgDFMgi (ORCPT ); Mon, 6 Apr 2020 08:36:38 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:42372 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727951AbgDFMgi (ORCPT ); Mon, 6 Apr 2020 08:36:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MtfDitbHqj7UDT/T3jF2Q6LlVQtM8OAZ9G+ILpaaOkU=; b=BIa0nzsY8xpFf7ir2Dnsyzv0LceOYEqwfvNXs2HQM/2WaUUaz5xH2qJN3UAK/MMhDdR6SN DRPCbTmGZLBNvrfpF07aTpKMox/riC5KvE7eLBiIwbbkhM5NtBcZyMg3rUGg2QmG3of89T XNM5zH8ETHr2eVOpkCflFa9nWF3DUYE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-481-4q-oBF28PXGFXHInYRduGw-1; Mon, 06 Apr 2020 08:36:35 -0400 X-MC-Unique: 4q-oBF28PXGFXHInYRduGw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8779D477 for ; Mon, 6 Apr 2020 12:36:34 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3E61460BFB for ; Mon, 6 Apr 2020 12:36:34 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 02/10] xfs: create helper for ticket-less log res ungrant Date: Mon, 6 Apr 2020 08:36:24 -0400 Message-Id: <20200406123632.20873-3-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Log reservation is currently acquired and released via log tickets. The relog mechanism introduces behavior where relog reservation is transferred between transaction log tickets and an external pool of relog reservation for active relog items. Certain contexts will be able to release outstanding relog reservation without the need for a log ticket. Factor out a helper to allow byte granularity log reservation ungrant. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/xfs_log.c | 20 ++++++++++++++++---- fs/xfs/xfs_log.h | 1 + 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 00fda2e8e738..d6b63490a78b 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2980,6 +2980,21 @@ xfs_log_ticket_regrant( xfs_log_ticket_put(ticket); } +/* + * Restore log reservation directly to the grant heads. + */ +void +xfs_log_ungrant_bytes( + struct xfs_mount *mp, + int bytes) +{ + struct xlog *log = mp->m_log; + + xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes); + xlog_grant_sub_space(log, &log->l_write_head.grant, bytes); + xfs_log_space_wake(mp); +} + /* * Give back the space left from a reservation. * @@ -3018,12 +3033,9 @@ xfs_log_ticket_ungrant( bytes += ticket->t_unit_res*ticket->t_cnt; } - xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes); - xlog_grant_sub_space(log, &log->l_write_head.grant, bytes); - + xfs_log_ungrant_bytes(log->l_mp, bytes); trace_xfs_log_ticket_ungrant_exit(log, ticket); - xfs_log_space_wake(log->l_mp); xfs_log_ticket_put(ticket); } diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 1412d6993f1e..6d2f30f42245 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -125,6 +125,7 @@ int xfs_log_reserve(struct xfs_mount *mp, uint8_t clientid, bool permanent); int xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic); +void xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes); void xfs_log_unmount(struct xfs_mount *mp); int xfs_log_force_umount(struct xfs_mount *mp, int logerror); From patchwork Mon Apr 6 12:36:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C04E174A for ; Mon, 6 Apr 2020 12:36:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DE520221E9 for ; Mon, 6 Apr 2020 12:36:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dkRlXfzm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727652AbgDFMgj (ORCPT ); Mon, 6 Apr 2020 08:36:39 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:27692 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727952AbgDFMgi (ORCPT ); Mon, 6 Apr 2020 08:36:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xh8JNhOR7YwcG0UOCYS8BiaHJWd/yQxs0ZRI8IX11Kw=; b=dkRlXfzmLBPwPdWRBvUPBzOjJVPa761EdiqeM7IxyvZb4rD5km0lgp5Ilv2FW5/KqYphrB gpACQHWb9bQlthxiapFqTVKwL5vLLiwyQBkq8aKPw4zG0KcbY0lLpnlZxFvw9GMA3Z1s0q mhLREc53sUlszfS2I5z3bYRCohdzOPE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-121-jkdAlEHNPfKLY9Re1Tcqcg-1; Mon, 06 Apr 2020 08:36:35 -0400 X-MC-Unique: jkdAlEHNPfKLY9Re1Tcqcg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F17F78017F3 for ; Mon, 6 Apr 2020 12:36:34 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC0F960BFB for ; Mon, 6 Apr 2020 12:36:34 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions Date: Mon, 6 Apr 2020 08:36:25 -0400 Message-Id: <20200406123632.20873-4-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Every transaction reservation includes runtime overhead on top of the reservation calculated in the struct xfs_trans_res. This overhead is required for things like the CIL context ticket, log headers, etc., that are stolen from individual transactions. Since reservation for the relog transaction is entirely contributed by regular transactions, this runtime reservation overhead must be contributed as well. This means that a transaction that relogs one or more items must include overhead for the current transaction as well as for the relog transaction. Define a new transaction flag to indicate that a transaction is relog enabled. Plumb this state down to the log ticket allocation and use it to bump the worst case overhead included in the transaction. The overhead will eventually be transferred to the relog system as needed for individual log items. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/libxfs/xfs_shared.h | 1 + fs/xfs/xfs_log.c | 12 +++++++++--- fs/xfs/xfs_log.h | 3 ++- fs/xfs/xfs_log_cil.c | 2 +- fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_trans.c | 3 ++- 6 files changed, 16 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index c45acbd3add9..1ede1e720a5c 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -65,6 +65,7 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_TRANS_DQ_DIRTY 0x10 /* at least one dquot in trx dirty */ #define XFS_TRANS_RESERVE 0x20 /* OK to use reserved data blocks */ #define XFS_TRANS_NO_WRITECOUNT 0x40 /* do not elevate SB writecount */ +#define XFS_TRANS_RELOG 0x80 /* requires extra relog overhead */ /* * LOWMODE is used by the allocator to activate the lowspace algorithm - when * free space is running low the extent allocator may choose to allocate an diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index d6b63490a78b..b55abde6c142 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -418,7 +418,8 @@ xfs_log_reserve( int cnt, struct xlog_ticket **ticp, uint8_t client, - bool permanent) + bool permanent, + bool relog) { struct xlog *log = mp->m_log; struct xlog_ticket *tic; @@ -433,7 +434,8 @@ xfs_log_reserve( XFS_STATS_INC(mp, xs_try_logspace); ASSERT(*ticp == NULL); - tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, 0); + tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, relog, + 0); *ticp = tic; xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt @@ -831,7 +833,7 @@ xlog_unmount_write( uint flags = XLOG_UNMOUNT_TRANS; int error; - error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0); + error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, false, false); if (error) goto out_err; @@ -3421,6 +3423,7 @@ xlog_ticket_alloc( int cnt, char client, bool permanent, + bool relog, xfs_km_flags_t alloc_flags) { struct xlog_ticket *tic; @@ -3431,6 +3434,9 @@ xlog_ticket_alloc( return NULL; unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes); + /* double the overhead for the relog transaction */ + if (relog) + unit_res += (unit_res - unit_bytes); atomic_set(&tic->t_ref, 1); tic->t_task = current; diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 6d2f30f42245..f1089a4b299c 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -123,7 +123,8 @@ int xfs_log_reserve(struct xfs_mount *mp, int count, struct xlog_ticket **ticket, uint8_t clientid, - bool permanent); + bool permanent, + bool relog); int xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic); void xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes); void xfs_log_unmount(struct xfs_mount *mp); diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index b43f0e8f43f2..1c48e95402aa 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -37,7 +37,7 @@ xlog_cil_ticket_alloc( { struct xlog_ticket *tic; - tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, 0, + tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, false, false, KM_NOFS); /* diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index ec22c7a3867f..08d8ff9bce1a 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -465,6 +465,7 @@ xlog_ticket_alloc( int count, char client, bool permanent, + bool relog, xfs_km_flags_t alloc_flags); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 4fbe11485bbb..1b25980315bd 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -177,6 +177,7 @@ xfs_trans_reserve( */ if (resp->tr_logres > 0) { bool permanent = false; + bool relog = (tp->t_flags & XFS_TRANS_RELOG); ASSERT(tp->t_log_res == 0 || tp->t_log_res == resp->tr_logres); @@ -199,7 +200,7 @@ xfs_trans_reserve( resp->tr_logres, resp->tr_logcount, &tp->t_ticket, XFS_TRANSACTION, - permanent); + permanent, relog); } if (error) From patchwork Mon Apr 6 12:36:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E8B8174A for ; Mon, 6 Apr 2020 12:36:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F104821D7F for ; Mon, 6 Apr 2020 12:36:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="a1OV/sEq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727952AbgDFMgk (ORCPT ); Mon, 6 Apr 2020 08:36:40 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:26064 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727982AbgDFMgj (ORCPT ); Mon, 6 Apr 2020 08:36:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xqoa2BOZfE5IyDOoGdc3Y5li6iMDuFdv/q7/FB8ATb4=; b=a1OV/sEqGD9Q2ENa2FvuXIndFIYlP/PW7jM//HrKPxhTh1pWgqMN/p+aK/cPY0yebWVOoe +5T+/b/sQiKbKwbCzM2C4dZB2eTBqdAXcRGIzwFaSjVKOe80zfAM8Kx50WiEHTHMHeKbQH eRVjWp5VXnxjt0YYWbua9k85gI5wuRE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-300-Fh35_43lOCeOtQmGwO6pIw-1; Mon, 06 Apr 2020 08:36:36 -0400 X-MC-Unique: Fh35_43lOCeOtQmGwO6pIw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 70AE1107ACC9 for ; Mon, 6 Apr 2020 12:36:35 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 24A7860BFB for ; Mon, 6 Apr 2020 12:36:35 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 04/10] xfs: relog log reservation stealing and accounting Date: Mon, 6 Apr 2020 08:36:26 -0400 Message-Id: <20200406123632.20873-5-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The transaction that eventually commits relog enabled log items requires log reservation like any other transaction. It is not safe to acquire reservation on-demand because relogged items aren't processed until they are likely at the tail of the log and require movement in order to free up space in the log. As such, a relog transaction that blocks on log reservation is a likely deadlock vector. To address this problem, implement a model where relog reservation is contributed by the transaction that enables relogging on a particular item. Update the relog helper to transfer reservation from the transaction to the relog pool. The relog pool holds outstanding reservation such that it can be used to commit the item in an otherwise empty transaction. The upcoming relog mechanism is responsible to replenish the relog reservation as items are relogged. When relog is cancelled on a log item, transfer the outstanding relog reservation to the current transaction (if provided) for eventual release or otherwise release it directly to the grant heads. Note that this approach has several caveats: - Log reservation calculations for transactions that relog items must be increased accordingly. - Each current transaction reservation includes extra space for various overheads, such as for the CIL ticket, etc. Since items can be relogged in arbitrary combinations, this reservation overhead must be included for each relogged item. - Relog reservation must be based on the worst case log requirement for the lifetime of an item. This is not a concern for fixed size log items, such as most intents. More granular log items like buffers can have additional ranges dirtied after relogging has been enabled for the item and the relog subsystem must have enough reservation to accommodate. Signed-off-by: Brian Foster --- fs/xfs/xfs_log.c | 3 +++ fs/xfs/xfs_trans.c | 20 ++++++++++++++++++++ fs/xfs/xfs_trans.h | 17 +++++++++++++++++ fs/xfs/xfs_trans_ail.c | 2 ++ fs/xfs/xfs_trans_priv.h | 14 ++++++++++++++ 5 files changed, 56 insertions(+) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b55abde6c142..940e5bb9786c 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -983,6 +983,9 @@ xfs_log_item_init( item->li_type = type; item->li_ops = ops; item->li_lv = NULL; +#ifdef DEBUG + atomic64_set(&item->li_relog_res, 0); +#endif INIT_LIST_HEAD(&item->li_ail); INIT_LIST_HEAD(&item->li_cil); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 1b25980315bd..6fc81c3b8500 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -20,6 +20,7 @@ #include "xfs_trace.h" #include "xfs_error.h" #include "xfs_defer.h" +#include "xfs_log_priv.h" kmem_zone_t *xfs_trans_zone; @@ -776,9 +777,19 @@ xfs_trans_relog_item( struct xfs_trans *tp, struct xfs_log_item *lip) { + int nbytes; + + ASSERT(tp->t_flags & XFS_TRANS_RELOG); + if (test_and_set_bit(XFS_LI_RELOG, &lip->li_flags)) return; trace_xfs_relog_item(lip); + + nbytes = xfs_relog_calc_res(lip); + + tp->t_ticket->t_curr_res -= nbytes; + xfs_relog_res_account(lip, nbytes); + tp->t_flags |= XFS_TRANS_DIRTY; } void @@ -786,9 +797,18 @@ xfs_trans_relog_item_cancel( struct xfs_trans *tp, struct xfs_log_item *lip) { + int res; + if (!test_and_clear_bit(XFS_LI_RELOG, &lip->li_flags)) return; trace_xfs_relog_item_cancel(lip); + + res = xfs_relog_calc_res(lip); + if (tp) + tp->t_ticket->t_curr_res += res; + else + xfs_log_ungrant_bytes(lip->li_mountp, res); + xfs_relog_res_account(lip, -res); } /* Detach and unlock all of the items in a transaction */ diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 9ea0f415e5ca..b4bb2a6c5251 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -48,6 +48,9 @@ struct xfs_log_item { struct xfs_log_vec *li_lv; /* active log vector */ struct xfs_log_vec *li_lv_shadow; /* standby vector */ xfs_lsn_t li_seq; /* CIL commit seq */ +#ifdef DEBUG + atomic64_t li_relog_res; /* automatic relog log res */ +#endif }; /* @@ -212,6 +215,20 @@ xfs_trans_read_buf( flags, bpp, ops); } +static inline int +xfs_relog_calc_res( + struct xfs_log_item *lip) +{ + int niovecs = 0; + int nbytes = 0; + + lip->li_ops->iop_size(lip, &niovecs, &nbytes); + ASSERT(niovecs == 1); + nbytes = xfs_log_calc_unit_res(lip->li_mountp, nbytes); + + return nbytes; +} + struct xfs_buf *xfs_trans_getsb(xfs_trans_t *, struct xfs_mount *); void xfs_trans_brelse(xfs_trans_t *, struct xfs_buf *); diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 564253550b75..0b4b1c3cc4de 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -861,6 +861,7 @@ xfs_trans_ail_init( spin_lock_init(&ailp->ail_lock); INIT_LIST_HEAD(&ailp->ail_buf_list); init_waitqueue_head(&ailp->ail_empty); + atomic64_set(&ailp->ail_relog_res, 0); ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s", ailp->ail_mount->m_super->s_id); @@ -881,6 +882,7 @@ xfs_trans_ail_destroy( { struct xfs_ail *ailp = mp->m_ail; + ASSERT(atomic64_read(&ailp->ail_relog_res) == 0); kthread_stop(ailp->ail_task); kmem_free(ailp); } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index c890794314a6..926bafdf6cd7 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -63,6 +63,7 @@ struct xfs_ail { int ail_log_flush; struct list_head ail_buf_list; wait_queue_head_t ail_empty; + atomic64_t ail_relog_res; }; /* @@ -182,4 +183,17 @@ xfs_set_li_failed( } } +static inline int64_t +xfs_relog_res_account( + struct xfs_log_item *lip, + int64_t bytes) +{ +#ifdef DEBUG + int64_t res; + + res = atomic64_add_return(bytes, &lip->li_relog_res); + ASSERT(res == bytes || (bytes < 0 && res == 0)); +#endif + return atomic64_add_return(bytes, &lip->li_ailp->ail_relog_res); +} #endif /* __XFS_TRANS_PRIV_H__ */ From patchwork Mon Apr 6 12:36:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD38A14B4 for ; Mon, 6 Apr 2020 12:36:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 912A221D7F for ; Mon, 6 Apr 2020 12:36:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ORJtNmSt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727992AbgDFMgj (ORCPT ); Mon, 6 Apr 2020 08:36:39 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:39435 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727951AbgDFMgj (ORCPT ); Mon, 6 Apr 2020 08:36:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176598; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HHAkIguXmx7WzA2UEoYnDj+TgV9RGxsss+6pfJt/AOM=; b=ORJtNmStx6ERnQREtk3b0h13lPU0IZSBcD330/UhzwoMCq3EnnHU+SKuvjqAokE9wBI8x+ NPPPDcqjTKYO7jWISRTGxUj9r+u/Qcy/oluEuZP/Z7fFYO5n8Jwz44QdYjanFnQbFkcFii X+/uWsfn/vjAl8/uuSSy3cdEQqvjIHc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-22-7TailhAyNHGOOx6wnj0e8g-1; Mon, 06 Apr 2020 08:36:36 -0400 X-MC-Unique: 7TailhAyNHGOOx6wnj0e8g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D8BFFDBA8 for ; Mon, 6 Apr 2020 12:36:35 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9176360BFB for ; Mon, 6 Apr 2020 12:36:35 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 05/10] xfs: automatic log item relog mechanism Date: Mon, 6 Apr 2020 08:36:27 -0400 Message-Id: <20200406123632.20873-6-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Now that relog reservation is available and relog state tracking is in place, all that remains to automatically relog items is the relog mechanism itself. An item with relogging enabled is basically pinned from writeback until relog is disabled. Instead of being written back, the item must instead be periodically committed in a new transaction to move it forward in the physical log. The purpose of moving the item is to avoid long term tail pinning and thus avoid log deadlocks for long running operations. The ideal time to relog an item is in response to tail pushing pressure. This accommodates the current workload at any given time as opposed to a fixed time interval or log reservation heuristic, which risks performance regression. This is essentially the same heuristic that drives metadata writeback. XFS already implements various log tail pushing heuristics that attempt to keep the log progressing on an active fileystem under various workloads. The act of relogging an item simply requires to add it to a transaction and commit. This pushes the already dirty item into a subsequent log checkpoint and frees up its previous location in the on-disk log. Joining an item to a transaction of course requires locking the item first, which means we have to be aware of type-specific locks and lock ordering wherever the relog takes place. Fundamentally, this points to xfsaild as the ideal location to process relog enabled items. xfsaild already processes log resident items, is driven by log tail pushing pressure, processes arbitrary log item types through callbacks, and is sensitive to type-specific locking rules by design. The fact that automatic relogging essentially diverts items between writeback or relog also suggests xfsaild as an ideal location to process items one way or the other. Of course, we don't want xfsaild to process transactions as it is a critical component of the log subsystem for driving metadata writeback and freeing up log space. Therefore, similar to how xfsaild builds up a writeback queue of dirty items and queues writes asynchronously, make xfsaild responsible only for directing pending relog items into an appropriate queue and create an async (workqueue) context for processing the queue. The workqueue context utilizes the pre-reserved log reservation to drain the queue by rolling a permanent transaction. Update the AIL pushing infrastructure to support a new RELOG item state. If a log item push returns the relog state, queue the item for relog instead of writeback. On completion of a push cycle, schedule the relog task at the same point metadata buffer I/O is submitted. This allows items to be relogged automatically under the same locking rules and pressure heuristics that govern metadata writeback. Signed-off-by: Brian Foster --- fs/xfs/xfs_trace.h | 2 + fs/xfs/xfs_trans.c | 13 ++++- fs/xfs/xfs_trans.h | 8 ++- fs/xfs/xfs_trans_ail.c | 111 +++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_trans_priv.h | 6 ++- 5 files changed, 134 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index b683c94556a3..6edb40e28a9a 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1068,6 +1068,8 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push); DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned); DEFINE_LOG_ITEM_EVENT(xfs_ail_locked); DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing); +DEFINE_LOG_ITEM_EVENT(xfs_ail_relog); +DEFINE_LOG_ITEM_EVENT(xfs_ail_relog_queue); DEFINE_LOG_ITEM_EVENT(xfs_relog_item); DEFINE_LOG_ITEM_EVENT(xfs_relog_item_cancel); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 6fc81c3b8500..31ef5f671341 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -795,7 +795,8 @@ xfs_trans_relog_item( void xfs_trans_relog_item_cancel( struct xfs_trans *tp, - struct xfs_log_item *lip) + struct xfs_log_item *lip, + bool wait) { int res; @@ -803,6 +804,15 @@ xfs_trans_relog_item_cancel( return; trace_xfs_relog_item_cancel(lip); + /* + * Must wait on active relog to complete before reclaiming reservation. + * Currently a big hammer because the QUEUED state isn't cleared until + * AIL (re)insertion. A separate state might be warranted. + */ + while (wait && wait_on_bit_timeout(&lip->li_flags, XFS_LI_RELOG_QUEUED, + TASK_UNINTERRUPTIBLE, HZ)) + xfs_log_force(lip->li_mountp, XFS_LOG_SYNC); + res = xfs_relog_calc_res(lip); if (tp) tp->t_ticket->t_curr_res += res; @@ -896,6 +906,7 @@ xfs_trans_committed_bulk( if (aborted) set_bit(XFS_LI_ABORTED, &lip->li_flags); + clear_and_wake_up_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags); if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { lip->li_ops->iop_release(lip); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index b4bb2a6c5251..51f7c92a4ffb 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -63,13 +63,15 @@ struct xfs_log_item { #define XFS_LI_FAILED 2 #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ #define XFS_LI_RELOG 4 /* automatically relog item */ +#define XFS_LI_RELOG_QUEUED 5 /* queued for relog */ #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ { (1 << XFS_LI_ABORTED), "ABORTED" }, \ { (1 << XFS_LI_FAILED), "FAILED" }, \ { (1 << XFS_LI_DIRTY), "DIRTY" }, \ - { (1 << XFS_LI_RELOG), "RELOG" } + { (1 << XFS_LI_RELOG), "RELOG" }, \ + { (1 << XFS_LI_RELOG_QUEUED), "RELOG_QUEUED" } struct xfs_item_ops { unsigned flags; @@ -81,7 +83,8 @@ struct xfs_item_ops { void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn); void (*iop_release)(struct xfs_log_item *); xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t); - void (*iop_error)(struct xfs_log_item *, xfs_buf_t *); + void (*iop_error)(struct xfs_log_item *, struct xfs_buf *); + void (*iop_relog)(struct xfs_log_item *, struct xfs_trans *); }; /* @@ -100,6 +103,7 @@ void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item, #define XFS_ITEM_PINNED 1 #define XFS_ITEM_LOCKED 2 #define XFS_ITEM_FLUSHING 3 +#define XFS_ITEM_RELOG 4 /* * Deferred operation item relogging limits. diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 0b4b1c3cc4de..b78d026d6564 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -17,6 +17,7 @@ #include "xfs_errortag.h" #include "xfs_error.h" #include "xfs_log.h" +#include "xfs_log_priv.h" #ifdef DEBUG /* @@ -152,6 +153,88 @@ xfs_ail_max_lsn( return lsn; } +/* + * Relog log items on the AIL relog queue. + * + * Note that relog is incompatible with filesystem freeze due to the + * multi-transaction nature of its users. The freeze sequence blocks all + * transactions and expects to drain the AIL. Allowing the relog transaction to + * proceed while freeze is in progress is not sufficient because it is not + * responsible for cancellation of relog state. The higher level operations must + * be guaranteed to progress to completion before the AIL can be drained of + * relog enabled items. This is currently accomplished by holding + * ->s_umount (quotaoff) or superblock write references (scrub) across the high + * level operations that depend on relog. + */ +static void +xfs_ail_relog( + struct work_struct *work) +{ + struct xfs_ail *ailp = container_of(work, struct xfs_ail, + ail_relog_work); + struct xfs_mount *mp = ailp->ail_mount; + struct xfs_trans_res tres = {}; + struct xfs_trans *tp; + struct xfs_log_item *lip, *lipp; + int error; + LIST_HEAD(relog_list); + + /* + * Open code allocation of an empty transaction and log ticket. The + * ticket requires no initial reservation because the all outstanding + * relog reservation is attached to log items. + */ + error = xfs_trans_alloc(mp, &tres, 0, 0, 0, &tp); + if (error) + goto out; + ASSERT(tp && !tp->t_ticket); + tp->t_flags |= XFS_TRANS_PERM_LOG_RES; + tp->t_log_count = 1; + tp->t_ticket = xlog_ticket_alloc(mp->m_log, 0, 1, XFS_TRANSACTION, + true, false, 0); + /* reset to zero to undo res overhead calculation on ticket alloc */ + tp->t_ticket->t_curr_res = 0; + tp->t_ticket->t_unit_res = 0; + tp->t_log_res = 0; + + spin_lock(&ailp->ail_lock); + while (!list_empty(&ailp->ail_relog_list)) { + list_splice_init(&ailp->ail_relog_list, &relog_list); + spin_unlock(&ailp->ail_lock); + + list_for_each_entry_safe(lip, lipp, &relog_list, li_trans) { + list_del_init(&lip->li_trans); + + trace_xfs_ail_relog(lip); + ASSERT(lip->li_ops->iop_relog); + if (lip->li_ops->iop_relog) + lip->li_ops->iop_relog(lip, tp); + } + + error = xfs_trans_roll(&tp); + if (error) { + xfs_trans_cancel(tp); + goto out; + } + + /* + * Now that the transaction has rolled, reset the ticket to + * zero to reflect that the log reservation held by the + * attached items has been replenished. + */ + tp->t_ticket->t_curr_res = 0; + tp->t_ticket->t_unit_res = 0; + tp->t_log_res = 0; + + spin_lock(&ailp->ail_lock); + } + spin_unlock(&ailp->ail_lock); + xfs_trans_cancel(tp); + +out: + ASSERT(!error || XFS_FORCED_SHUTDOWN(mp)); +} + /* * The cursor keeps track of where our current traversal is up to by tracking * the next item in the list for us. However, for this to be safe, removing an @@ -372,7 +455,7 @@ static long xfsaild_push( struct xfs_ail *ailp) { - xfs_mount_t *mp = ailp->ail_mount; + struct xfs_mount *mp = ailp->ail_mount; struct xfs_ail_cursor cur; struct xfs_log_item *lip; xfs_lsn_t lsn; @@ -434,6 +517,17 @@ xfsaild_push( ailp->ail_last_pushed_lsn = lsn; break; + case XFS_ITEM_RELOG: + /* + * The item requires a relog. Add to the relog queue + * and set a bit to prevent further relog requests. + */ + trace_xfs_ail_relog_queue(lip); + ASSERT(list_empty(&lip->li_trans)); + set_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags); + list_add_tail(&lip->li_trans, &ailp->ail_relog_list); + break; + case XFS_ITEM_FLUSHING: /* * The item or its backing buffer is already being @@ -500,6 +594,9 @@ xfsaild_push( if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list)) ailp->ail_log_flush++; + if (!list_empty(&ailp->ail_relog_list)) + queue_work(ailp->ail_relog_wq, &ailp->ail_relog_work); + if (!count || XFS_LSN_CMP(lsn, target) >= 0) { out_done: /* @@ -861,16 +958,25 @@ xfs_trans_ail_init( spin_lock_init(&ailp->ail_lock); INIT_LIST_HEAD(&ailp->ail_buf_list); init_waitqueue_head(&ailp->ail_empty); + INIT_LIST_HEAD(&ailp->ail_relog_list); + INIT_WORK(&ailp->ail_relog_work, xfs_ail_relog); atomic64_set(&ailp->ail_relog_res, 0); + ailp->ail_relog_wq = alloc_workqueue("xfs-relog/%s", WQ_FREEZABLE, 0, + mp->m_super->s_id); + if (!ailp->ail_relog_wq) + goto out_free_ailp; + ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s", ailp->ail_mount->m_super->s_id); if (IS_ERR(ailp->ail_task)) - goto out_free_ailp; + goto out_destroy_wq; mp->m_ail = ailp; return 0; +out_destroy_wq: + destroy_workqueue(ailp->ail_relog_wq); out_free_ailp: kmem_free(ailp); return -ENOMEM; @@ -884,5 +990,6 @@ xfs_trans_ail_destroy( ASSERT(atomic64_read(&ailp->ail_relog_res) == 0); kthread_stop(ailp->ail_task); + destroy_workqueue(ailp->ail_relog_wq); kmem_free(ailp); } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 926bafdf6cd7..9301ea62d0ab 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -17,7 +17,8 @@ void xfs_trans_init(struct xfs_mount *); void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_del_item(struct xfs_log_item *); void xfs_trans_relog_item(struct xfs_trans *, struct xfs_log_item *); -void xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *); +void xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *, + bool wait); void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp); void xfs_trans_committed_bulk(struct xfs_ail *ailp, struct xfs_log_vec *lv, @@ -64,6 +65,9 @@ struct xfs_ail { struct list_head ail_buf_list; wait_queue_head_t ail_empty; atomic64_t ail_relog_res; + struct work_struct ail_relog_work; + struct list_head ail_relog_list; + struct workqueue_struct *ail_relog_wq; }; /* From patchwork Mon Apr 6 12:36:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4915C14DD for ; Mon, 6 Apr 2020 12:36:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 27D8921D93 for ; Mon, 6 Apr 2020 12:36:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ibtJhXJv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727982AbgDFMgk (ORCPT ); Mon, 6 Apr 2020 08:36:40 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:35776 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727989AbgDFMgj (ORCPT ); Mon, 6 Apr 2020 08:36:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176598; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xgEwxMau1Fn+2+3dl4Neb/rSwczg8z3UEnhB1hb5RRs=; b=ibtJhXJvK9gn3hqVq9SXIkOZz5VaTgmYHyuKIxFH3FwsW57BJBcPlg4IidahciGqXsNDD0 gfXsT566ccxQx0KTh+yKqD+PksiT0+5WdUOKl+pUfMauvthNTJUytGdgPmqr7sb5o5qpJa z6uiexrcfmxCPftj+eaNFu4v9/HhZCA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-330-UGcdGyxsME2rlx1yHR6QuA-1; Mon, 06 Apr 2020 08:36:37 -0400 X-MC-Unique: UGcdGyxsME2rlx1yHR6QuA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 530BC8017FF for ; Mon, 6 Apr 2020 12:36:36 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A05960C63 for ; Mon, 6 Apr 2020 12:36:35 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 06/10] xfs: automatically relog the quotaoff start intent Date: Mon, 6 Apr 2020 08:36:28 -0400 Message-Id: <20200406123632.20873-7-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The quotaoff operation has a rare but longstanding deadlock vector in terms of how the operation is logged. A quotaoff start intent is logged (synchronously) at the onset to ensure recovery can handle the operation if interrupted before in-core changes are made. This quotaoff intent pins the log tail while the quotaoff sequence scans and purges dquots from all in-core inodes. While this operation generally doesn't generate much log traffic on its own, it can be time consuming. If unrelated, concurrent filesystem activity consumes remaining log space before quotaoff is able to acquire log reservation for the quotaoff end intent, the filesystem locks up indefinitely. quotaoff cannot allocate the end intent before the scan because the latter can result in transaction allocation itself in certain indirect cases (releasing an inode, for example). Further, rolling the original transaction is difficult because the scanning work occurs multiple layers down where caller context is lost and not much information is available to determine how often to roll the transaction. To address this problem, enable automatic relogging of the quotaoff start intent. This automatically relogs the intent whenever AIL pushing finds the item at the tail of the log. When quotaoff completes, wait for relogging to complete as the end intent expects to be able to permanently remove the start intent from the log subsystem. This ensures that the log tail is kept moving during a particularly long quotaoff operation and avoids the log reservation deadlock. Note that the quotaoff reservation calculation does not need to be updated for relog as it already (incorrectly) accounts for two quotaoff intents. Signed-off-by: Brian Foster --- fs/xfs/xfs_dquot_item.c | 26 ++++++++++++++++++++++++++ fs/xfs/xfs_qm_syscalls.c | 12 +++++++++++- 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index baad1748d0d1..22f8b0750afc 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -17,6 +17,7 @@ #include "xfs_trans_priv.h" #include "xfs_qm.h" #include "xfs_log.h" +#include "xfs_log_priv.h" static inline struct xfs_dq_logitem *DQUOT_ITEM(struct xfs_log_item *lip) { @@ -298,6 +299,13 @@ xfs_qm_qoff_logitem_push( struct xfs_log_item *lip, struct list_head *buffer_list) { + struct xfs_log_item *mlip = xfs_ail_min(lip->li_ailp); + + if (test_bit(XFS_LI_RELOG, &lip->li_flags) && + !test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags) && + !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn)) + return XFS_ITEM_RELOG; + return XFS_ITEM_LOCKED; } @@ -329,6 +337,23 @@ xfs_qm_qoff_logitem_release( } } +STATIC void +xfs_qm_qoff_logitem_relog( + struct xfs_log_item *lip, + struct xfs_trans *tp) +{ + int res; + + res = xfs_relog_calc_res(lip); + + xfs_trans_add_item(tp, lip); + tp->t_ticket->t_curr_res += res; + tp->t_ticket->t_unit_res += res; + tp->t_log_res += res; + tp->t_flags |= XFS_TRANS_DIRTY; + set_bit(XFS_LI_DIRTY, &lip->li_flags); +} + static const struct xfs_item_ops xfs_qm_qoffend_logitem_ops = { .iop_size = xfs_qm_qoff_logitem_size, .iop_format = xfs_qm_qoff_logitem_format, @@ -342,6 +367,7 @@ static const struct xfs_item_ops xfs_qm_qoff_logitem_ops = { .iop_format = xfs_qm_qoff_logitem_format, .iop_push = xfs_qm_qoff_logitem_push, .iop_release = xfs_qm_qoff_logitem_release, + .iop_relog = xfs_qm_qoff_logitem_relog, }; /* diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 5d5ac65aa1cc..dc154051ec7b 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -18,6 +18,7 @@ #include "xfs_quota.h" #include "xfs_qm.h" #include "xfs_icache.h" +#include "xfs_trans_priv.h" STATIC int xfs_qm_log_quotaoff( @@ -29,12 +30,14 @@ xfs_qm_log_quotaoff( int error; struct xfs_qoff_logitem *qoffi; - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, + XFS_TRANS_RELOG, &tp); if (error) goto out; qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT); xfs_trans_log_quotaoff_item(tp, qoffi); + xfs_trans_relog_item(tp, &qoffi->qql_item); spin_lock(&mp->m_sb_lock); mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL; @@ -71,6 +74,13 @@ xfs_qm_log_quotaoff_end( if (error) return error; + /* + * startqoff must be in the AIL and not the CIL when the end intent + * commits to ensure it is not readded to the AIL out of order. Wait on + * relog activity to drain to isolate startqoff to the AIL. + */ + xfs_trans_relog_item_cancel(tp, &(*startqoff)->qql_item, true); + qoffi = xfs_trans_get_qoff_item(tp, *startqoff, flags & XFS_ALL_QUOTA_ACCT); xfs_trans_log_quotaoff_item(tp, qoffi); From patchwork Mon Apr 6 12:36:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475495 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7C011744 for ; Mon, 6 Apr 2020 12:36:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C61A021D7F for ; Mon, 6 Apr 2020 12:36:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IUk8FbPP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727996AbgDFMgk (ORCPT ); Mon, 6 Apr 2020 08:36:40 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:24169 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727952AbgDFMgj (ORCPT ); Mon, 6 Apr 2020 08:36:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zEpjhkqKfDhENSK0q05xHaVYnJhXM6Cq/MGKDW/x7qQ=; b=IUk8FbPPnC5XUWhoOD+HbOeAiLf9kT3GqHRG6DZFCBkGc/5iFhkvctaY2xYuu2cKH0vGU9 lz5bEsHqmuksy/fns81LAmwVONBlG15kSrgAhlWOx5HqCKadcjr0Ph/L3u1rBG4HUAxlLT 6r7oxwOnG7OLYwn5nzpCdAgkfO0rUWs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-398-waGEQcnoPmSoOcu2UaUQCQ-1; Mon, 06 Apr 2020 08:36:37 -0400 X-MC-Unique: waGEQcnoPmSoOcu2UaUQCQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BE5BD107ACC7 for ; Mon, 6 Apr 2020 12:36:36 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7654E60BFB for ; Mon, 6 Apr 2020 12:36:36 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 07/10] xfs: prevent fs freeze with outstanding relog items Date: Mon, 6 Apr 2020 08:36:29 -0400 Message-Id: <20200406123632.20873-8-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The automatic relog mechanism is currently incompatible with filesystem freeze in a generic sense. Freeze protection is currently implemented via locks that cannot be held on return to userspace, which means we can't hold a superblock write reference for the duration relogging is enabled on an item. It's too late to block when the freeze sequence calls into the filesystem because the transaction subsystem has already begun to be frozen. Not only can this block the relog transaction, but blocking any unrelated transaction essentially prevents a particular operation from progressing to the point where it can disable relogging on an item. Therefore marking the relog transaction as "nowrite" does not solve the problem. This is not a problem in practice because the two primary use cases already exclude freeze via other means. quotaoff holds ->s_umount read locked across the operation and scrub explicitly takes a superblock write reference, both of which block freeze of the transaction subsystem for the duration of relog enabled items. As a fallback for future use cases and the upcoming random buffer relogging test code, fail fs freeze attempts when the global relog reservation counter is elevated to prevent deadlock. This is a partial punt of the problem as compared to teaching freeze to wait on relogged items because the only current dependency is test code. In other words, this patch prevents deadlock if a user happens to issue a freeze in conjunction with random buffer relog injection. Signed-off-by: Brian Foster Reviewed-by: Allison Collins --- fs/xfs/xfs_super.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index abf06bf9c3f3..0efa9dc70d71 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -35,6 +35,7 @@ #include "xfs_refcount_item.h" #include "xfs_bmap_item.h" #include "xfs_reflink.h" +#include "xfs_trans_priv.h" #include #include @@ -870,6 +871,9 @@ xfs_fs_freeze( { struct xfs_mount *mp = XFS_M(sb); + if (WARN_ON_ONCE(atomic64_read(&mp->m_ail->ail_relog_res))) + return -EAGAIN; + xfs_stop_block_reaping(mp); xfs_save_resvblks(mp); xfs_quiesce_attr(mp); From patchwork Mon Apr 6 12:36:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475503 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C66A14B4 for ; Mon, 6 Apr 2020 12:36:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5038421D7F for ; Mon, 6 Apr 2020 12:36:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GvZtS2yy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727951AbgDFMgl (ORCPT ); Mon, 6 Apr 2020 08:36:41 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:57150 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727999AbgDFMgk (ORCPT ); Mon, 6 Apr 2020 08:36:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VhtMJtn8hD7IWJPnjN7ZZO5JZphTWKxXL3urFidfSZ4=; b=GvZtS2yy+XaT8zc0sRVBYPFESKSUoNzNqZFkRldsgLJhBQElGdkZHJpdqVr4KKo9ond+Hq shLc6nFsKJzb1gLwzzVh0HlKcImbVZIcSGi9Wm9FsaIUpB+MlkCu+Lt3sDd28V9ijVzm7Q fnJbxagys8pBWgqn6xAOOpJ9ADWkm8w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-225-7xwF0aj4OKOEFKZwYkpfvQ-1; Mon, 06 Apr 2020 08:36:38 -0400 X-MC-Unique: 7xwF0aj4OKOEFKZwYkpfvQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 35D7B8010F0 for ; Mon, 6 Apr 2020 12:36:37 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id E2BA060C63 for ; Mon, 6 Apr 2020 12:36:36 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 08/10] xfs: buffer relogging support prototype Date: Mon, 6 Apr 2020 08:36:30 -0400 Message-Id: <20200406123632.20873-9-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Add a quick and dirty implementation of buffer relogging support. There is currently no use case for buffer relogging. This is for experimental use only and serves as an example to demonstrate the ability to relog arbitrary items in the future, if necessary. Add helpers to manage relogged buffers, update the buffer log item push handler to support relogged BLIs and add a log item relog callback to properly join buffers to the relog transaction. Note that buffers associated with higher level log items (i.e., inodes and dquots) are skipped. Signed-off-by: Brian Foster --- fs/xfs/xfs_buf.c | 4 +++ fs/xfs/xfs_buf_item.c | 51 +++++++++++++++++++++++++++++++++--- fs/xfs/xfs_trans.h | 9 ++++++- fs/xfs/xfs_trans_buf.c | 59 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 118 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 9ec3eaf1c618..2d16c7d6dee0 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -16,6 +16,8 @@ #include "xfs_log.h" #include "xfs_errortag.h" #include "xfs_error.h" +#include "xfs_trans.h" +#include "xfs_buf_item.h" static kmem_zone_t *xfs_buf_zone; @@ -1477,6 +1479,8 @@ __xfs_buf_submit( trace_xfs_buf_submit(bp, _RET_IP_); ASSERT(!(bp->b_flags & _XBF_DELWRI_Q)); + ASSERT(!bp->b_log_item || + !test_bit(XFS_LI_RELOG, &bp->b_log_item->bli_item.li_flags)); /* on shutdown we stale and complete the buffer immediately */ if (XFS_FORCED_SHUTDOWN(bp->b_mount)) { diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 1545657c3ca0..762359e6ab65 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -16,7 +16,7 @@ #include "xfs_trans_priv.h" #include "xfs_trace.h" #include "xfs_log.h" - +#include "xfs_log_priv.h" kmem_zone_t *xfs_buf_item_zone; @@ -157,7 +157,7 @@ xfs_buf_item_size( return; } - ASSERT(bip->bli_flags & XFS_BLI_LOGGED); + ASSERT(bip->bli_flags & XFS_BLI_DIRTY); if (bip->bli_flags & XFS_BLI_ORDERED) { /* @@ -463,6 +463,13 @@ xfs_buf_item_unpin( list_del_init(&bp->b_li_list); bp->b_iodone = NULL; } else { + /* racy */ + ASSERT(!test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags)); + if (test_bit(XFS_LI_RELOG, &lip->li_flags)) { + atomic_dec(&bp->b_pin_count); + xfs_trans_relog_item_cancel(NULL, lip, true); + } + spin_lock(&ailp->ail_lock); xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR); xfs_buf_item_relse(bp); @@ -513,8 +520,6 @@ xfs_buf_item_push( struct xfs_buf *bp = bip->bli_buf; uint rval = XFS_ITEM_SUCCESS; - if (xfs_buf_ispinned(bp)) - return XFS_ITEM_PINNED; if (!xfs_buf_trylock(bp)) { /* * If we have just raced with a buffer being pinned and it has @@ -528,6 +533,16 @@ xfs_buf_item_push( return XFS_ITEM_LOCKED; } + /* relog bufs are pinned so check relog state first */ + if (test_bit(XFS_LI_RELOG, &lip->li_flags) && + !test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags)) + return XFS_ITEM_RELOG; + + if (xfs_buf_ispinned(bp)) { + xfs_buf_unlock(bp); + return XFS_ITEM_PINNED; + } + ASSERT(!(bip->bli_flags & XFS_BLI_STALE)); trace_xfs_buf_item_push(bip); @@ -694,6 +709,28 @@ xfs_buf_item_committed( return lsn; } +STATIC void +xfs_buf_item_relog( + struct xfs_log_item *lip, + struct xfs_trans *tp) +{ + struct xfs_buf_log_item *bip = BUF_ITEM(lip); + int res; + + /* + * Grab a reference to the buffer for the transaction before we join + * and dirty it. + */ + xfs_buf_hold(bip->bli_buf); + xfs_trans_bjoin(tp, bip->bli_buf); + xfs_trans_dirty_buf(tp, bip->bli_buf); + + res = xfs_relog_calc_res(lip); + tp->t_ticket->t_curr_res += res; + tp->t_ticket->t_unit_res += res; + tp->t_log_res += res; +} + static const struct xfs_item_ops xfs_buf_item_ops = { .iop_size = xfs_buf_item_size, .iop_format = xfs_buf_item_format, @@ -703,6 +740,7 @@ static const struct xfs_item_ops xfs_buf_item_ops = { .iop_committing = xfs_buf_item_committing, .iop_committed = xfs_buf_item_committed, .iop_push = xfs_buf_item_push, + .iop_relog = xfs_buf_item_relog, }; STATIC void @@ -956,6 +994,11 @@ STATIC void xfs_buf_item_free( struct xfs_buf_log_item *bip) { + ASSERT(!test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags)); +#ifdef DEBUG + ASSERT(!atomic64_read(&bip->bli_item.li_relog_res)); +#endif + xfs_buf_item_free_format(bip); kmem_free(bip->bli_item.li_lv_shadow); kmem_cache_free(xfs_buf_item_zone, bip); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 51f7c92a4ffb..a7a70430b8b2 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -227,7 +227,11 @@ xfs_relog_calc_res( int nbytes = 0; lip->li_ops->iop_size(lip, &niovecs, &nbytes); - ASSERT(niovecs == 1); + ASSERT(niovecs == 1 || lip->li_type == XFS_LI_BUF); + /* ideally this is somewhere buffer specific.. */ + if (lip->li_type == XFS_LI_BUF) + nbytes += round_up(sizeof(struct xlog_op_header) + + sizeof(struct xfs_buf_log_format), 128); nbytes = xfs_log_calc_unit_res(lip->li_mountp, nbytes); return nbytes; @@ -244,6 +248,9 @@ void xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *); void xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *); bool xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *); void xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint); +bool xfs_trans_relog_buf(struct xfs_trans *, struct xfs_buf *); +void xfs_trans_relog_buf_cancel(struct xfs_trans *, + struct xfs_buf *); void xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *); void xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int); void xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint); diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c index 08174ffa2118..2bed5d615541 100644 --- a/fs/xfs/xfs_trans_buf.c +++ b/fs/xfs/xfs_trans_buf.c @@ -588,6 +588,8 @@ xfs_trans_binval( return; } + /* return relog res before we reset dirty state */ + xfs_trans_relog_buf_cancel(tp, bp); xfs_buf_stale(bp); bip->bli_flags |= XFS_BLI_STALE; @@ -787,3 +789,60 @@ xfs_trans_dquot_buf( xfs_trans_buf_set_type(tp, bp, type); } + +/* + * Enable automatic relogging on a buffer. This essentially pins a dirty buffer + * in-core until relogging is disabled. + */ +bool +xfs_trans_relog_buf( + struct xfs_trans *tp, + struct xfs_buf *bp) +{ + struct xfs_buf_log_item *bip = bp->b_log_item; + enum xfs_blft blft; + + ASSERT(xfs_buf_islocked(bp)); + + if (bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) + return false; + + /* + * Skip buffers with higher level log items. Those items must be + * relogged directly to move in the log. + */ + blft = xfs_blft_from_flags(&bip->__bli_format); + switch (blft) { + case XFS_BLFT_DINO_BUF: + case XFS_BLFT_UDQUOT_BUF: + case XFS_BLFT_PDQUOT_BUF: + case XFS_BLFT_GDQUOT_BUF: + return false; + default: + break; + } + + /* + * Relog expects a worst case reservation from ->iop_size. Hack that in + * here by logging the entire buffer in this transaction. Also grab a + * buffer pin to prevent it from being written out. + */ + xfs_buf_item_log(bip, 0, BBTOB(bp->b_length) - 1); + atomic_inc(&bp->b_pin_count); + xfs_trans_relog_item(tp, &bip->bli_item); + return true; +} + +void +xfs_trans_relog_buf_cancel( + struct xfs_trans *tp, + struct xfs_buf *bp) +{ + struct xfs_buf_log_item *bip = bp->b_log_item; + + if (!test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags)) + return; + + atomic_dec(&bp->b_pin_count); + xfs_trans_relog_item_cancel(tp, &bip->bli_item, false); +} From patchwork Mon Apr 6 12:36:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475505 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C57CE1744 for ; Mon, 6 Apr 2020 12:36:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A454D21D90 for ; Mon, 6 Apr 2020 12:36:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Jqh82WVG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727999AbgDFMgl (ORCPT ); Mon, 6 Apr 2020 08:36:41 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:31358 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728000AbgDFMgl (ORCPT ); Mon, 6 Apr 2020 08:36:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VvvC4xC5phYqySWw9mG5mHi26cYNkk3iyHHo8HBoJ+U=; b=Jqh82WVGydDXsmFz/UBNA1avlyS7ud2Z4BaOgt9RsffWIfa2iWlT9vxXzfBY3XbUzDZ9b2 U2Bz3ZyvX/a47SFz9Okhzs51ZWNgqmv3INEPPPWVt4HGT+5Y1ATLTSEHC7yTM9iaqMDCSz ltYbDz81ewqzxZ+/xcq00taifg463dY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-36-fMDUAmoWN-KkaVejo1d0mw-1; Mon, 06 Apr 2020 08:36:38 -0400 X-MC-Unique: fMDUAmoWN-KkaVejo1d0mw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9F88C19251A4 for ; Mon, 6 Apr 2020 12:36:37 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5B30960BFB for ; Mon, 6 Apr 2020 12:36:37 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 09/10] xfs: create an error tag for random relog reservation Date: Mon, 6 Apr 2020 08:36:31 -0400 Message-Id: <20200406123632.20873-10-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Create an errortag to enable relogging on random transactions. Since relogging requires extra transaction reservation, artificially bump the reservation on selected transactions and tag them with the relog flag such that the requisite reservation overhead is added by the ticket allocation code. This allows subsequent random buffer relog events to target transactions where reservation is included. This is necessary to avoid transaction reservation overruns on non-relog transactions. Note that this does not yet enable relogging of any particular items. The tag will be reused in a subsequent patch to enable random buffer relogging. Signed-off-by: Brian Foster --- fs/xfs/libxfs/xfs_errortag.h | 4 +++- fs/xfs/xfs_error.c | 3 +++ fs/xfs/xfs_trans.c | 21 ++++++++++++++++----- 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h index 79e6c4fb1d8a..ca7bcadb9455 100644 --- a/fs/xfs/libxfs/xfs_errortag.h +++ b/fs/xfs/libxfs/xfs_errortag.h @@ -55,7 +55,8 @@ #define XFS_ERRTAG_FORCE_SCRUB_REPAIR 32 #define XFS_ERRTAG_FORCE_SUMMARY_RECALC 33 #define XFS_ERRTAG_IUNLINK_FALLBACK 34 -#define XFS_ERRTAG_MAX 35 +#define XFS_ERRTAG_RELOG 35 +#define XFS_ERRTAG_MAX 36 /* * Random factors for above tags, 1 means always, 2 means 1/2 time, etc. @@ -95,5 +96,6 @@ #define XFS_RANDOM_FORCE_SCRUB_REPAIR 1 #define XFS_RANDOM_FORCE_SUMMARY_RECALC 1 #define XFS_RANDOM_IUNLINK_FALLBACK (XFS_RANDOM_DEFAULT/10) +#define XFS_RANDOM_RELOG XFS_RANDOM_DEFAULT #endif /* __XFS_ERRORTAG_H_ */ diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c index a21e9cc6516a..9d1f9b4c50ea 100644 --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -53,6 +53,7 @@ static unsigned int xfs_errortag_random_default[] = { XFS_RANDOM_FORCE_SCRUB_REPAIR, XFS_RANDOM_FORCE_SUMMARY_RECALC, XFS_RANDOM_IUNLINK_FALLBACK, + XFS_RANDOM_RELOG, }; struct xfs_errortag_attr { @@ -162,6 +163,7 @@ XFS_ERRORTAG_ATTR_RW(buf_lru_ref, XFS_ERRTAG_BUF_LRU_REF); XFS_ERRORTAG_ATTR_RW(force_repair, XFS_ERRTAG_FORCE_SCRUB_REPAIR); XFS_ERRORTAG_ATTR_RW(bad_summary, XFS_ERRTAG_FORCE_SUMMARY_RECALC); XFS_ERRORTAG_ATTR_RW(iunlink_fallback, XFS_ERRTAG_IUNLINK_FALLBACK); +XFS_ERRORTAG_ATTR_RW(relog, XFS_ERRTAG_RELOG); static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(noerror), @@ -199,6 +201,7 @@ static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(force_repair), XFS_ERRORTAG_ATTR_LIST(bad_summary), XFS_ERRORTAG_ATTR_LIST(iunlink_fallback), + XFS_ERRORTAG_ATTR_LIST(relog), NULL, }; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 31ef5f671341..d7ca70a35d33 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -21,6 +21,7 @@ #include "xfs_error.h" #include "xfs_defer.h" #include "xfs_log_priv.h" +#include "xfs_errortag.h" kmem_zone_t *xfs_trans_zone; @@ -179,9 +180,10 @@ xfs_trans_reserve( if (resp->tr_logres > 0) { bool permanent = false; bool relog = (tp->t_flags & XFS_TRANS_RELOG); + int logres = resp->tr_logres; ASSERT(tp->t_log_res == 0 || - tp->t_log_res == resp->tr_logres); + tp->t_log_res == logres); ASSERT(tp->t_log_count == 0 || tp->t_log_count == resp->tr_logcount); @@ -197,9 +199,18 @@ xfs_trans_reserve( ASSERT(resp->tr_logflags & XFS_TRANS_PERM_LOG_RES); error = xfs_log_regrant(mp, tp->t_ticket); } else { - error = xfs_log_reserve(mp, - resp->tr_logres, - resp->tr_logcount, + /* + * Enable relog overhead on random transactions to support + * random item relogging. + */ + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RELOG) && + !relog) { + tp->t_flags |= XFS_TRANS_RELOG; + relog = true; + logres <<= 1; + } + + error = xfs_log_reserve(mp, logres, resp->tr_logcount, &tp->t_ticket, XFS_TRANSACTION, permanent, relog); } @@ -207,7 +218,7 @@ xfs_trans_reserve( if (error) goto undo_blocks; - tp->t_log_res = resp->tr_logres; + tp->t_log_res = logres; tp->t_log_count = resp->tr_logcount; } From patchwork Mon Apr 6 12:36:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11475507 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2ACDF14B4 for ; Mon, 6 Apr 2020 12:36:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0A05521D90 for ; Mon, 6 Apr 2020 12:36:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LJOvlUD2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728000AbgDFMgm (ORCPT ); Mon, 6 Apr 2020 08:36:42 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:36315 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727989AbgDFMgm (ORCPT ); Mon, 6 Apr 2020 08:36:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586176600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s9Z9InYqlywD36FBHxvBkfw04gVhFHh0TeKUFxyvwnQ=; b=LJOvlUD2CzFZpxiGzg50Wet3c/XOGbUiKBgz0iiGHlU8tJ9CAo7bN5Vfc/EsJZw0oO8iSS WOqc8IHeueoIWeHUW1AKySSIJZ6QAO9nj44GoCoXUer4XRGwuwMdRKpA26hI4ZUsPaks/c 3u220BL/YX7aVEqkMjkaRKhZaX6yaMU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-478-yEFvHJmXNXuGt3m5UVthbQ-1; Mon, 06 Apr 2020 08:36:38 -0400 X-MC-Unique: yEFvHJmXNXuGt3m5UVthbQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2888D800597 for ; Mon, 6 Apr 2020 12:36:38 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id C749660BFB for ; Mon, 6 Apr 2020 12:36:37 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v6 PATCH 10/10] xfs: relog random buffers based on errortag Date: Mon, 6 Apr 2020 08:36:32 -0400 Message-Id: <20200406123632.20873-11-bfoster@redhat.com> In-Reply-To: <20200406123632.20873-1-bfoster@redhat.com> References: <20200406123632.20873-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Since there is currently no specific use case for buffer relogging, add some hacky and experimental code to relog random buffers when the associated errortag is enabled. Use fixed termination logic regardless of the user-specified error rate to help ensure that the relog queue doesn't grow indefinitely. Note that this patch was useful in causing log reservation deadlocks on an fsstress workload if the relog mechanism code is modified to acquire its own log reservation rather than rely on the pre-reservation mechanism. In other words, this helps prove that the relog reservation management code effectively avoids log reservation deadlocks. Signed-off-by: Brian Foster --- fs/xfs/xfs_buf_item.c | 1 + fs/xfs/xfs_trans.h | 4 +++- fs/xfs/xfs_trans_ail.c | 33 +++++++++++++++++++++++++++++++++ fs/xfs/xfs_trans_buf.c | 14 ++++++++++++++ 4 files changed, 51 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 762359e6ab65..ed91de33278b 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -467,6 +467,7 @@ xfs_buf_item_unpin( ASSERT(!test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags)); if (test_bit(XFS_LI_RELOG, &lip->li_flags)) { atomic_dec(&bp->b_pin_count); + clear_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags); xfs_trans_relog_item_cancel(NULL, lip, true); } diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index a7a70430b8b2..0f40ce784367 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -64,6 +64,7 @@ struct xfs_log_item { #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ #define XFS_LI_RELOG 4 /* automatically relog item */ #define XFS_LI_RELOG_QUEUED 5 /* queued for relog */ +#define XFS_LI_RELOG_RAND 6 #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ @@ -71,7 +72,8 @@ struct xfs_log_item { { (1 << XFS_LI_FAILED), "FAILED" }, \ { (1 << XFS_LI_DIRTY), "DIRTY" }, \ { (1 << XFS_LI_RELOG), "RELOG" }, \ - { (1 << XFS_LI_RELOG_QUEUED), "RELOG_QUEUED" } + { (1 << XFS_LI_RELOG_QUEUED), "RELOG_QUEUED" }, \ + { (1 << XFS_LI_RELOG_RAND), "RELOG_RAND" } struct xfs_item_ops { unsigned flags; diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index b78d026d6564..a34a5e73427e 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -18,6 +18,7 @@ #include "xfs_error.h" #include "xfs_log.h" #include "xfs_log_priv.h" +#include "xfs_buf_item.h" #ifdef DEBUG /* @@ -176,6 +177,7 @@ xfs_ail_relog( struct xfs_trans_res tres = {}; struct xfs_trans *tp; struct xfs_log_item *lip, *lipp; + int cancelres; int error; LIST_HEAD(relog_list); @@ -209,6 +211,37 @@ xfs_ail_relog( ASSERT(lip->li_ops->iop_relog); if (lip->li_ops->iop_relog) lip->li_ops->iop_relog(lip, tp); + + /* + * Cancel random buffer relogs at a fixed rate to + * prevent too much buildup. + */ + if (test_bit(XFS_LI_RELOG_RAND, &lip->li_flags) && + ((prandom_u32() & 1) || + (mp->m_flags & XFS_MOUNT_UNMOUNTING))) { + struct xfs_buf_log_item *bli; + bli = container_of(lip, struct xfs_buf_log_item, + bli_item); + xfs_trans_relog_buf_cancel(tp, bli->bli_buf); + } + } + + /* + * Cancelling relog reservation in the same transaction as + * consuming it means the current transaction over releases + * reservation on commit and the next transaction reservation + * restores the grant heads to even. To avoid this behavior, + * remove surplus reservation (->t_curr_res) from the committing + * transaction and replace it with a reduction in the + * reservation requirement (->t_unit_res) for the next. This has + * no net effect on reservation accounting, but ensures we don't + * cause problems elsewhere with odd reservation behavior. + */ + cancelres = tp->t_ticket->t_curr_res - tp->t_ticket->t_unit_res; + if (cancelres) { + tp->t_ticket->t_curr_res -= cancelres; + tp->t_ticket->t_unit_res -= cancelres; + tp->t_log_res -= cancelres; } error = xfs_trans_roll(&tp); diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c index 2bed5d615541..9dadf39a40bb 100644 --- a/fs/xfs/xfs_trans_buf.c +++ b/fs/xfs/xfs_trans_buf.c @@ -14,6 +14,8 @@ #include "xfs_buf_item.h" #include "xfs_trans_priv.h" #include "xfs_trace.h" +#include "xfs_error.h" +#include "xfs_errortag.h" /* * Check to see if a buffer matching the given parameters is already @@ -527,6 +529,17 @@ xfs_trans_log_buf( trace_xfs_trans_log_buf(bip); xfs_buf_item_log(bip, first, last); + + /* + * Relog random buffers so long as the transaction is relog enabled and + * the buffer wasn't already relogged explicitly. + */ + if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_RELOG) && + (tp->t_flags & XFS_TRANS_RELOG) && + !test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags)) { + if (xfs_trans_relog_buf(tp, bp)) + set_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags); + } } @@ -845,4 +858,5 @@ xfs_trans_relog_buf_cancel( atomic_dec(&bp->b_pin_count); xfs_trans_relog_item_cancel(tp, &bip->bli_item, false); + clear_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags); }