From patchwork Sun Dec 6 06:40:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11953687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1C3FC433FE for ; Sun, 6 Dec 2020 06:41:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 92A2522DFB for ; Sun, 6 Dec 2020 06:41:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92A2522DFB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 229176B0068; Sun, 6 Dec 2020 01:41:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D2886B006C; Sun, 6 Dec 2020 01:41:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 073766B006E; Sun, 6 Dec 2020 01:41:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id DB4716B0068 for ; Sun, 6 Dec 2020 01:41:56 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9D223180AD807 for ; Sun, 6 Dec 2020 06:41:56 +0000 (UTC) X-FDA: 77561912232.13.coast50_390e197273d3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 7F52918140B60 for ; Sun, 6 Dec 2020 06:41:56 +0000 (UTC) X-HE-Tag: coast50_390e197273d3 X-Filterd-Recvd-Size: 12411 Received: from mail-oi1-f194.google.com (mail-oi1-f194.google.com [209.85.167.194]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Sun, 6 Dec 2020 06:41:55 +0000 (UTC) Received: by mail-oi1-f194.google.com with SMTP id o25so11625089oie.5 for ; Sat, 05 Dec 2020 22:41:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Yf7zPQOJGOGST1kr7aRDb+hEVvmcjR4NTe2pf0HQRUE=; b=QePB48uCarK3ddowCiXI+o6c6RAi6TQftuyLc76APBu0NfSAmdrsEXIeFLp6tdrJdO pB+m9RjVOLxQ5eznWFcPJaKeZlWFvlU6VNkRADlR8IKnNFX7tEeGrF7DFl5U+8ojK+GS C1xSa2YZsYwi9hpNuVVorGcMNoogrQeJA7D/aYlJ8uwZ91tX431R4VpHcYs7ZqABFOTp tzazXtxOawiufJSNwe4QvFXCp6KQLFoDjE+lvB/6yw2rB8g8234wvFa1IIjn00Vh365q ipwnCMdlm1hHAKQ+dkb35RRq30y7DGJKjlugmAxf8iG13WDvRJ4oFXrG/dTBZ+KxTBhE Aqzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Yf7zPQOJGOGST1kr7aRDb+hEVvmcjR4NTe2pf0HQRUE=; b=T+C7Jgv0WLjaoa1i7s5nEqQgQjwGGn9DjCkqyOHsUhncCZLRLzklzbqxvKQ0x3oCcW ECYLgCMlu/CleDqv2tqEmoIc324I0fuH+/pDwCqwGViSvfXJKejszE+lsYxfDfQjkxEv UGGN63lmFI+do5QD4BPPiHkuzOZOCECqYusmKbA0seemT35N5wyCyILAZLs2RWX2pTfe r4Ol0gxBPeIBBqhG6EK5w9H83qcojVzoEyKLvbq8eE4ruxia4sY1xQ+Cmquesxgkh3UT JD421ZlzX2/V8F5CEBnE16MSScP1YpsFZPTmhgdAHIG68focXpfYd886sVQq+N4r+hjP QIuA== X-Gm-Message-State: AOAM533kULKgR9WIQnadc/KMnyYTV5K60AM0J7RWw9vH+Sujz1M9Qxx9 A5jbHI4OJJlPRkiyfob+oZw= X-Google-Smtp-Source: ABdhPJwHxlLgoXfodNhX/Rsnt84ieUkqXh0UaPXjPCSJQav7RfU1+W5SjuDerhP+7UyY/90pR8R6zg== X-Received: by 2002:aca:eccb:: with SMTP id k194mr4957090oih.112.1607236915440; Sat, 05 Dec 2020 22:41:55 -0800 (PST) Received: from localhost.localdomain ([122.225.203.131]) by smtp.gmail.com with ESMTPSA id y18sm1817553ooj.20.2020.12.05.22.41.45 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 05 Dec 2020 22:41:54 -0800 (PST) From: Yafang Shao To: darrick.wong@oracle.com, willy@infradead.org, hch@infradead.org, david@fromorbit.com, mhocko@kernel.org, 000akpm@linux-foundation.org, dhowells@redhat.com, jlayton@redhat.com Cc: linux-fsdevel@vger.kernel.org, linux-cachefs@redhat.com, linux-xfs@vger.kernel.org, linux-mm@kvack.org, Yafang Shao , Christoph Hellwig Subject: [PATCH v9 2/2] xfs: avoid transaction reservation recursion Date: Sun, 6 Dec 2020 14:40:46 +0800 Message-Id: <20201206064046.2921-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20201206064046.2921-1-laoar.shao@gmail.com> References: <20201206064046.2921-1-laoar.shao@gmail.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: PF_FSTRANS which is used to avoid transaction reservation recursion, is dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which means to avoid filesystem reclaim recursion. That change is subtle. Let's take the exmple of the check of WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to PF_MEMALLOC_NOFS is not proper. Below comment is quoted from Dave, > It wasn't for memory allocation recursion protection in XFS - it was for > transaction reservation recursion protection by something trying to flush > data pages while holding a transaction reservation. Doing > this could deadlock the journal because the existing reservation > could prevent the nested reservation for being able to reserve space > in the journal and that is a self-deadlock vector. > IOWs, this check is not protecting against memory reclaim recursion > bugs at all (that's the previous check [1]). This check is > protecting against the filesystem calling writepages directly from a > context where it can self-deadlock. > So what we are seeing here is that the PF_FSTRANS -> > PF_MEMALLOC_NOFS abstraction lost all the actual useful information > about what type of error this check was protecting against. As a result, we should reintroduce PF_FSTRANS. As current->journal_info isn't used in XFS, we can reuse it to indicate whehter the task is in fstrans or not, Per Willy. To achieve that, some new helpers are introduce in this patch, per Dave: - xfs_trans_context_set() Used in xfs_trans_alloc() - xfs_trans_context_clear() Used in xfs_trans_commit() and xfs_trans_cancel() - xfs_trans_context_active() To check whehter current is in fs transcation or not Darrick helped fix the error occurred in xfs/141.[2] No obvious error occurred when I run xfstests in my test machine. [1]. Below check is to avoid memory reclaim recursion. if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)) goto redirty; [2]. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia/ Cc: Darrick J. Wong Cc: Matthew Wilcox (Oracle) Cc: Christoph Hellwig Cc: Dave Chinner Cc: Michal Hocko Cc: David Howells Cc: Jeff Layton Signed-off-by: Yafang Shao --- fs/iomap/buffered-io.c | 7 ------- fs/xfs/xfs_aops.c | 23 +++++++++++++++++++++-- fs/xfs/xfs_linux.h | 4 ---- fs/xfs/xfs_trans.c | 25 +++++++++++++------------ fs/xfs/xfs_trans.h | 23 +++++++++++++++++++++++ 5 files changed, 57 insertions(+), 25 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 10cc7979ce38..3c53fa6ce64d 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data) PF_MEMALLOC)) goto redirty; - /* - * Given that we do not allow direct reclaim to call us, we should - * never be called in a recursive filesystem reclaim context. - */ - if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) - goto redirty; - /* * Is this page beyond the end of the file? * diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 4304c6416fbb..1541ea5956fa 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -62,7 +62,8 @@ xfs_setfilesize_trans_alloc( * We hand off the transaction to the completion thread now, so * clear the flag here. */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_context_clear(tp); + return 0; } @@ -125,7 +126,7 @@ xfs_setfilesize_ioend( * thus we need to mark ourselves as being in a transaction manually. * Similarly for freeze protection. */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_context_set(tp); __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS); /* we abort the update if there was an IO error */ @@ -568,6 +569,16 @@ xfs_vm_writepage( { struct xfs_writepage_ctx wpc = { }; + /* + * Given that we do not allow direct reclaim to call us, we should + * never be called while in a filesystem transaction. + */ + if (xfs_trans_context_active()) { + redirty_page_for_writepage(wbc, page); + unlock_page(page); + return 0; + } + return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops); } @@ -579,6 +590,14 @@ xfs_vm_writepages( struct xfs_writepage_ctx wpc = { }; xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); + + /* + * Given that we do not allow direct reclaim to call us, we should + * never be called while in a filesystem transaction. + */ + if (xfs_trans_context_active()) + return 0; + return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops); } diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 5b7a1e201559..6ab0f8043c73 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -102,10 +102,6 @@ typedef __u32 xfs_nlink_t; #define xfs_cowb_secs xfs_params.cowb_timer.val #define current_cpu() (raw_smp_processor_id()) -#define current_set_flags_nested(sp, f) \ - (*(sp) = current->flags, current->flags |= (f)) -#define current_restore_flags_nested(sp, f) \ - (current->flags = ((current->flags & ~(f)) | (*(sp) & (f)))) #define NBBY 8 /* number of bits per byte */ diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index c94e71f741b6..09ae5c181299 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -67,6 +67,11 @@ xfs_trans_free( xfs_extent_busy_sort(&tp->t_busy); xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false); + /* Detach the transaction from this thread. */ + ASSERT(current->journal_info != NULL); + if (current->journal_info == tp) + xfs_trans_context_clear(tp); + trace_xfs_trans_free(tp, _RET_IP_); if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT)) sb_end_intwrite(tp->t_mountp->m_super); @@ -119,7 +124,11 @@ xfs_trans_dup( ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used; tp->t_rtx_res = tp->t_rtx_res_used; + + /* Associate the new transaction with this thread. */ + ASSERT(current->journal_info == tp); ntp->t_pflags = tp->t_pflags; + current->journal_info = ntp; /* move deferred ops over to the new tp */ xfs_defer_move(ntp, tp); @@ -153,8 +162,6 @@ xfs_trans_reserve( int error = 0; bool rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0; - /* Mark this thread as being in a transaction */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); /* * Attempt to reserve the needed disk blocks by decrementing @@ -163,10 +170,8 @@ xfs_trans_reserve( */ if (blocks > 0) { error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); - if (error != 0) { - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + if (error != 0) return -ENOSPC; - } tp->t_blk_res += blocks; } @@ -241,8 +246,6 @@ xfs_trans_reserve( tp->t_blk_res = 0; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); - return error; } @@ -284,6 +287,8 @@ xfs_trans_alloc( INIT_LIST_HEAD(&tp->t_dfops); tp->t_firstblock = NULLFSBLOCK; + /* Mark this thread as being in a transaction */ + xfs_trans_context_set(tp); error = xfs_trans_reserve(tp, resp, blocks, rtextents); if (error) { xfs_trans_cancel(tp); @@ -878,7 +883,6 @@ __xfs_trans_commit( xfs_log_commit_cil(mp, tp, &commit_lsn, regrant); - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); xfs_trans_free(tp); /* @@ -910,7 +914,7 @@ __xfs_trans_commit( xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket); tp->t_ticket = NULL; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + xfs_trans_free_items(tp, !!error); xfs_trans_free(tp); @@ -970,9 +974,6 @@ xfs_trans_cancel( tp->t_ticket = NULL; } - /* mark this thread as no longer being in a transaction */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); - xfs_trans_free_items(tp, dirty); xfs_trans_free(tp); } diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 084658946cc8..ceb530bf5c4b 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -268,4 +268,27 @@ xfs_trans_item_relog( return lip->li_ops->iop_relog(lip, tp); } +static inline void +xfs_trans_context_set(struct xfs_trans *tp) +{ + ASSERT(!current->journal_info); + current->journal_info = tp; + tp->t_pflags = memalloc_nofs_save(); +} + +static inline void +xfs_trans_context_clear(struct xfs_trans *tp) +{ + ASSERT(current->journal_info == tp); + current->journal_info = NULL; + memalloc_nofs_restore(tp->t_pflags); +} + +static inline bool +xfs_trans_context_active(void) +{ + /* Use journal_info to indicate current is in a transaction */ + return current->journal_info != NULL; +} + #endif /* __XFS_TRANS_H__ */