From patchwork Thu Dec 5 17:50:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11275241 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D00E1593 for ; Thu, 5 Dec 2019 17:50:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DF7BE2464E for ; Thu, 5 Dec 2019 17:50:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Iu1S648C" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730093AbfLERul (ORCPT ); Thu, 5 Dec 2019 12:50:41 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:55870 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730003AbfLERul (ORCPT ); Thu, 5 Dec 2019 12:50:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575568239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pn59hpzLZ0CWZm0yivPD4N4smj7t8srx56HV+b6BHYA=; b=Iu1S648CzKuab0HyxooMuHV2JUSajzc8AzHBEvHVoDvSbrwEzI4zZr2wIRZKQWmntgMOyK cWf1KLXgLZJOQmoUWZ6ln/KQSp7v7vNKojenUpRM9vbQSMPbgeEyj35Ji4lX5kbyutk8Rd cVaRl50t1RuH/Nj69NVM9c2hSDSNR5Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-430--m6_tQM4MIuZ78aKJDxQ8Q-1; Thu, 05 Dec 2019 12:50:37 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2014FDBA3 for ; Thu, 5 Dec 2019 17:50:37 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id C092910013D9 for ; Thu, 5 Dec 2019 17:50:36 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v4 1/2] xfs: automatic log item relog mechanism Date: Thu, 5 Dec 2019 12:50:36 -0500 Message-Id: <20191205175037.52529-2-bfoster@redhat.com> In-Reply-To: <20191205175037.52529-1-bfoster@redhat.com> References: <20191205175037.52529-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-MC-Unique: -m6_tQM4MIuZ78aKJDxQ8Q-1 X-Mimecast-Spam-Score: 0 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org This is an AIL based mechanism to enable automatic relogging of selected log items. The use case is for particular operations that commit an item known to pin the tail of the log for a potentially long period of time and otherwise cannot use a rolling transaction. While this does not provide the deadlock avoidance guarantees of a rolling transaction, it ties the relog transaction into AIL pushing pressure such that we should expect the transaction to reserve the necessary log space long before deadlock becomes a problem. To enable relogging, a bit is set on the log item before it is first committed to the log subsystem. Once the item commits to the on-disk log and inserts to the AIL, AIL pushing dictates when the item is ready for a relog. When that occurs, the item relogs in an independent transaction to ensure the log tail keeps moving without intervention from the original committer. To disable relogging, the original committer clears the log item bit and optionally waits on relogging activity to cease if it needs to reuse the item before the operation completes. While the current use case for automatic relogging is limited, the mechanism is AIL based because it 1.) provides existing callbacks into all possible log item types for future support and 2.) has applicable context to determine when to relog particular items (such as when an item pins the log tail). This provides enough flexibility to support various log item types and future workloads without introducing complexity up front for currently unknown use cases. Further complexity, such as preallocated or regranted relog transaction reservation or custom relog handlers can be considered as the need arises. Signed-off-by: Brian Foster --- fs/xfs/xfs_trace.h | 1 + fs/xfs/xfs_trans.c | 30 ++++++++++++++++++++++ fs/xfs/xfs_trans.h | 7 +++++- fs/xfs/xfs_trans_ail.c | 56 +++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_trans_priv.h | 5 ++++ 5 files changed, 96 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index c13bb3655e48..6c2a9cdadd03 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1063,6 +1063,7 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push); DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned); DEFINE_LOG_ITEM_EVENT(xfs_ail_locked); DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing); +DEFINE_LOG_ITEM_EVENT(xfs_ail_relog); DECLARE_EVENT_CLASS(xfs_ail_class, TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn), diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 3b208f9a865c..f2c06cdd1074 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -763,6 +763,35 @@ xfs_trans_del_item( list_del_init(&lip->li_trans); } +void +xfs_trans_enable_relog( + struct xfs_log_item *lip) +{ + set_bit(XFS_LI_RELOG, &lip->li_flags); +} + +void +xfs_trans_disable_relog( + struct xfs_log_item *lip, + bool drain) /* wait for relogging to cease */ +{ + struct xfs_mount *mp = lip->li_mountp; + + clear_bit(XFS_LI_RELOG, &lip->li_flags); + + if (!drain) + return; + + /* + * Some operations might require relog activity to cease before they can + * proceed. For example, an operation must wait before including a + * non-lockable log item (i.e. intent) in another transaction. + */ + while (wait_on_bit_timeout(&lip->li_flags, XFS_LI_RELOGGED, + TASK_UNINTERRUPTIBLE, HZ)) + xfs_log_force(mp, XFS_LOG_SYNC); +} + /* Detach and unlock all of the items in a transaction */ static void xfs_trans_free_items( @@ -848,6 +877,7 @@ xfs_trans_committed_bulk( if (aborted) set_bit(XFS_LI_ABORTED, &lip->li_flags); + clear_and_wake_up_bit(XFS_LI_RELOGGED, &lip->li_flags); if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { lip->li_ops->iop_release(lip); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 64d7f171ebd3..6d4311d82c4c 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -59,12 +59,16 @@ struct xfs_log_item { #define XFS_LI_ABORTED 1 #define XFS_LI_FAILED 2 #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ +#define XFS_LI_RELOG 4 /* automatic relogging */ +#define XFS_LI_RELOGGED 5 /* relogged by xfsaild */ #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ { (1 << XFS_LI_ABORTED), "ABORTED" }, \ { (1 << XFS_LI_FAILED), "FAILED" }, \ - { (1 << XFS_LI_DIRTY), "DIRTY" } + { (1 << XFS_LI_DIRTY), "DIRTY" }, \ + { (1 << XFS_LI_RELOG), "RELOG" }, \ + { (1 << XFS_LI_RELOGGED), "RELOGGED" } struct xfs_item_ops { unsigned flags; @@ -95,6 +99,7 @@ void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item, #define XFS_ITEM_PINNED 1 #define XFS_ITEM_LOCKED 2 #define XFS_ITEM_FLUSHING 3 +#define XFS_ITEM_RELOG 4 /* * Deferred operation item relogging limits. diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 00cc5b8734be..bb54d00ae095 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -143,6 +143,38 @@ xfs_ail_max_lsn( return lsn; } +/* + * Relog log items on the AIL relog queue. + */ +static void +xfs_ail_relog( + struct work_struct *work) +{ + struct xfs_ail *ailp = container_of(work, struct xfs_ail, + ail_relog_work); + struct xfs_mount *mp = ailp->ail_mount; + struct xfs_trans *tp; + struct xfs_log_item *lip, *lipp; + int error; + + /* XXX: define a ->tr_relog reservation */ + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); + if (error) + return; + + spin_lock(&ailp->ail_lock); + list_for_each_entry_safe(lip, lipp, &ailp->ail_relog_list, li_trans) { + list_del_init(&lip->li_trans); + xfs_trans_add_item(tp, lip); + set_bit(XFS_LI_DIRTY, &lip->li_flags); + tp->t_flags |= XFS_TRANS_DIRTY; + } + spin_unlock(&ailp->ail_lock); + + error = xfs_trans_commit(tp); + ASSERT(!error); +} + /* * The cursor keeps track of where our current traversal is up to by tracking * the next item in the list for us. However, for this to be safe, removing an @@ -363,7 +395,7 @@ static long xfsaild_push( struct xfs_ail *ailp) { - xfs_mount_t *mp = ailp->ail_mount; + struct xfs_mount *mp = ailp->ail_mount; struct xfs_ail_cursor cur; struct xfs_log_item *lip; xfs_lsn_t lsn; @@ -425,6 +457,13 @@ xfsaild_push( ailp->ail_last_pushed_lsn = lsn; break; + case XFS_ITEM_RELOG: + trace_xfs_ail_relog(lip); + ASSERT(list_empty(&lip->li_trans)); + list_add_tail(&lip->li_trans, &ailp->ail_relog_list); + set_bit(XFS_LI_RELOGGED, &lip->li_flags); + break; + case XFS_ITEM_FLUSHING: /* * The item or its backing buffer is already being @@ -491,6 +530,9 @@ xfsaild_push( if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list)) ailp->ail_log_flush++; + if (!list_empty(&ailp->ail_relog_list)) + queue_work(ailp->ail_relog_wq, &ailp->ail_relog_work); + if (!count || XFS_LSN_CMP(lsn, target) >= 0) { out_done: /* @@ -834,15 +876,24 @@ xfs_trans_ail_init( spin_lock_init(&ailp->ail_lock); INIT_LIST_HEAD(&ailp->ail_buf_list); init_waitqueue_head(&ailp->ail_empty); + INIT_LIST_HEAD(&ailp->ail_relog_list); + INIT_WORK(&ailp->ail_relog_work, xfs_ail_relog); + + ailp->ail_relog_wq = alloc_workqueue("xfs-relog/%s", WQ_FREEZABLE, 0, + mp->m_super->s_id); + if (!ailp->ail_relog_wq) + goto out_free_ailp; ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s", ailp->ail_mount->m_super->s_id); if (IS_ERR(ailp->ail_task)) - goto out_free_ailp; + goto out_destroy_wq; mp->m_ail = ailp; return 0; +out_destroy_wq: + destroy_workqueue(ailp->ail_relog_wq); out_free_ailp: kmem_free(ailp); return -ENOMEM; @@ -855,5 +906,6 @@ xfs_trans_ail_destroy( struct xfs_ail *ailp = mp->m_ail; kthread_stop(ailp->ail_task); + destroy_workqueue(ailp->ail_relog_wq); kmem_free(ailp); } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 2e073c1c4614..3cefc821350e 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -16,6 +16,8 @@ struct xfs_log_vec; void xfs_trans_init(struct xfs_mount *); void xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *); void xfs_trans_del_item(struct xfs_log_item *); +void xfs_trans_enable_relog(struct xfs_log_item *); +void xfs_trans_disable_relog(struct xfs_log_item *, bool); void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp); void xfs_trans_committed_bulk(struct xfs_ail *ailp, struct xfs_log_vec *lv, @@ -61,6 +63,9 @@ struct xfs_ail { int ail_log_flush; struct list_head ail_buf_list; wait_queue_head_t ail_empty; + struct work_struct ail_relog_work; + struct list_head ail_relog_list; + struct workqueue_struct *ail_relog_wq; }; /* From patchwork Thu Dec 5 17:50:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11275239 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5B61112B for ; Thu, 5 Dec 2019 17:50:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B3CAC224F8 for ; Thu, 5 Dec 2019 17:50:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JP2zLJ1d" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729396AbfLERul (ORCPT ); Thu, 5 Dec 2019 12:50:41 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:48496 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730093AbfLERul (ORCPT ); Thu, 5 Dec 2019 12:50:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575568239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6XkDUkMCcxV+21HfHnZ8OhGe3WUt7hxvp+J03sU78Sc=; b=JP2zLJ1dLn3ceUpcDlXKK28uuASOKn5DZe/dA2edcQvC+eCwfQX4E2mKjjxGnC1pULVNDl 8+oAxvWXQzHfujcZfh8w3Bcq6jdE0zeX1wfBcYTUFwVp7f2Z1Z1dTD2SHvnX1iXobgJr7c UJp4Xh+ZtYLrqzXwZRY+sCWNnGqv56U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-417-BzFF3ub2NveWqcUlKBTTLA-1; Thu, 05 Dec 2019 12:50:38 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 84662800D41 for ; Thu, 5 Dec 2019 17:50:37 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43FF410013D9 for ; Thu, 5 Dec 2019 17:50:37 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [RFC v4 2/2] xfs: automatically relog the quotaoff start intent Date: Thu, 5 Dec 2019 12:50:37 -0500 Message-Id: <20191205175037.52529-3-bfoster@redhat.com> In-Reply-To: <20191205175037.52529-1-bfoster@redhat.com> References: <20191205175037.52529-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-MC-Unique: BzFF3ub2NveWqcUlKBTTLA-1 X-Mimecast-Spam-Score: 0 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org The quotaoff operation has a rare but longstanding deadlock vector in terms of how the operation is logged. A quotaoff start intent is logged (synchronously) at the onset to ensure recovery can continue with the operation before in-core changes are made. This quotaoff intent pins the log tail while the quotaoff sequence scans and purges dquots from all in-core inodes. While this operation generally doesn't generate much log traffic on its own, it can be time consuming. If unrelated filesystem activity consumes remaining log space before quotaoff is able to allocate the quotaoff end intent, the filesystem locks up indefinitely. quotaoff cannot allocate the end intent before the scan because the latter can result in transaction allocation itself in certain indirect cases (releasing an inode, for example). Further, rolling the original transaction is difficult because the scanning work occurs multiple layers down where caller context is lost and not much information is available to determine how often to roll the transaction. To address this problem, enable automatic relogging of the quotaoff start intent. Trigger a relog whenever AIL pushing finds the item at the tail of the log. When complete, wait for relogging to complete as the end intent expects to be able to permanently remove the start intent from the log subsystem. This ensures that the log tail is kept moving during a particularly long quotaoff operation and avoids deadlock via unrelated fs activity. Signed-off-by: Brian Foster --- fs/xfs/xfs_dquot_item.c | 7 +++++++ fs/xfs/xfs_qm_syscalls.c | 9 +++++++++ 2 files changed, 16 insertions(+) diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index d60647d7197b..ea5123678466 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push( struct xfs_log_item *lip, struct list_head *buffer_list) { + struct xfs_log_item *mlip = xfs_ail_min(lip->li_ailp); + + if (test_bit(XFS_LI_RELOG, &lip->li_flags) && + !test_bit(XFS_LI_RELOGGED, &lip->li_flags) && + !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn)) + return XFS_ITEM_RELOG; + return XFS_ITEM_LOCKED; } diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 1ea82764bf89..b68a08e87c30 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -18,6 +18,7 @@ #include "xfs_quota.h" #include "xfs_qm.h" #include "xfs_icache.h" +#include "xfs_trans_priv.h" STATIC int xfs_qm_log_quotaoff( @@ -37,6 +38,7 @@ xfs_qm_log_quotaoff( qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT); xfs_trans_log_quotaoff_item(tp, qoffi); + xfs_trans_enable_relog(&qoffi->qql_item); spin_lock(&mp->m_sb_lock); mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL; @@ -69,6 +71,13 @@ xfs_qm_log_quotaoff_end( int error; struct xfs_qoff_logitem *qoffi; + /* + * startqoff must be in the AIL and not the CIL when the end intent + * commits to ensure it is not readded to the AIL out of order. Wait on + * relog activity to drain to isolate startqoff to the AIL. + */ + xfs_trans_disable_relog(&startqoff->qql_item, true); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp); if (error) return error;