From patchwork Fri Aug 4 17:10:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98F77C04A94 for ; Fri, 4 Aug 2023 17:11:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232285AbjHDRLV (ORCPT ); Fri, 4 Aug 2023 13:11:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231940AbjHDRLA (ORCPT ); Fri, 4 Aug 2023 13:11:00 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B51651994 for ; Fri, 4 Aug 2023 10:10:51 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAUuB018434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169032; bh=+2lKeAYexsiBQHoYlD7IRFSaNxF1ThVXJKj1/5B/U90=; h=From:Subject:Date:Message-Id:MIME-Version; b=Od0oHpaDHC8jauHDobhk/kmJnMECNZjt2HSyJ08w4/PBFK47LdtXYy6u/HBXMHhM+ HWeHdSJl3PdYVwMmQROxeH/BHSI1IuceqtboZmOMStyBZs24E/oksrXkDbuZsbvINt IpEYuI43lPK/DJLjxsNqzinIG5refRZXoyxJSoYeA2eG6w6mghALrcP89aNd3c4m11 HD4ATQ5ruKUgqvsnjkRuHlODoanAKHKVMmgdbgNrX/b7AkS7xV0ULbuThW4I5HX0qP xxyCEUueLLyQduxsyhepIT2fbQK5vXmBSmRS1RnHB8Ssxj+P3Y1Y/hTCrBgvycHC3C I+pQES2KMjAIw== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 020CF15C04F1; Fri, 4 Aug 2023 13:10:29 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Dave Chinner , Xiao Yang Subject: [PATCH CANDIDATE v5.15 1/9] xfs: hoist refcount record merge predicates Date: Fri, 4 Aug 2023 13:10:11 -0400 Message-Id: <20230804171019.1392900-1-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230802205747.GE358316@mit.edu> References: <20230802205747.GE358316@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit 9d720a5a658f5135861773f26e927449bef93d61 upstream. Hoist these multiline conditionals into separate static inline helpers to improve readability and set the stage for corruption fixes that will be introduced in the next patch. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Xiao Yang --- fs/xfs/libxfs/xfs_refcount.c | 129 ++++++++++++++++++++++++++++++----- 1 file changed, 113 insertions(+), 16 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index e5d767a7fc5d..fee4010b88bc 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -782,11 +782,119 @@ xfs_refcount_find_right_extents( /* Is this extent valid? */ static inline bool xfs_refc_valid( - struct xfs_refcount_irec *rc) + const struct xfs_refcount_irec *rc) { return rc->rc_startblock != NULLAGBLOCK; } +static inline bool +xfs_refc_want_merge_center( + const struct xfs_refcount_irec *left, + const struct xfs_refcount_irec *cleft, + const struct xfs_refcount_irec *cright, + const struct xfs_refcount_irec *right, + bool cleft_is_cright, + enum xfs_refc_adjust_op adjust, + unsigned long long *ulenp) +{ + unsigned long long ulen = left->rc_blockcount; + + /* + * To merge with a center record, both shoulder records must be + * adjacent to the record we want to adjust. This is only true if + * find_left and find_right made all four records valid. + */ + if (!xfs_refc_valid(left) || !xfs_refc_valid(right) || + !xfs_refc_valid(cleft) || !xfs_refc_valid(cright)) + return false; + + /* There must only be one record for the entire range. */ + if (!cleft_is_cright) + return false; + + /* The shoulder record refcounts must match the new refcount. */ + if (left->rc_refcount != cleft->rc_refcount + adjust) + return false; + if (right->rc_refcount != cleft->rc_refcount + adjust) + return false; + + /* + * The new record cannot exceed the max length. ulen is a ULL as the + * individual record block counts can be up to (u32 - 1) in length + * hence we need to catch u32 addition overflows here. + */ + ulen += cleft->rc_blockcount + right->rc_blockcount; + if (ulen >= MAXREFCEXTLEN) + return false; + + *ulenp = ulen; + return true; +} + +static inline bool +xfs_refc_want_merge_left( + const struct xfs_refcount_irec *left, + const struct xfs_refcount_irec *cleft, + enum xfs_refc_adjust_op adjust) +{ + unsigned long long ulen = left->rc_blockcount; + + /* + * For a left merge, the left shoulder record must be adjacent to the + * start of the range. If this is true, find_left made left and cleft + * contain valid contents. + */ + if (!xfs_refc_valid(left) || !xfs_refc_valid(cleft)) + return false; + + /* Left shoulder record refcount must match the new refcount. */ + if (left->rc_refcount != cleft->rc_refcount + adjust) + return false; + + /* + * The new record cannot exceed the max length. ulen is a ULL as the + * individual record block counts can be up to (u32 - 1) in length + * hence we need to catch u32 addition overflows here. + */ + ulen += cleft->rc_blockcount; + if (ulen >= MAXREFCEXTLEN) + return false; + + return true; +} + +static inline bool +xfs_refc_want_merge_right( + const struct xfs_refcount_irec *cright, + const struct xfs_refcount_irec *right, + enum xfs_refc_adjust_op adjust) +{ + unsigned long long ulen = right->rc_blockcount; + + /* + * For a right merge, the right shoulder record must be adjacent to the + * end of the range. If this is true, find_right made cright and right + * contain valid contents. + */ + if (!xfs_refc_valid(right) || !xfs_refc_valid(cright)) + return false; + + /* Right shoulder record refcount must match the new refcount. */ + if (right->rc_refcount != cright->rc_refcount + adjust) + return false; + + /* + * The new record cannot exceed the max length. ulen is a ULL as the + * individual record block counts can be up to (u32 - 1) in length + * hence we need to catch u32 addition overflows here. + */ + ulen += cright->rc_blockcount; + if (ulen >= MAXREFCEXTLEN) + return false; + + return true; +} + /* * Try to merge with any extents on the boundaries of the adjustment range. */ @@ -828,23 +936,15 @@ xfs_refcount_merge_extents( (cleft.rc_blockcount == cright.rc_blockcount); /* Try to merge left, cleft, and right. cleft must == cright. */ - ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount + - right.rc_blockcount; - if (xfs_refc_valid(&left) && xfs_refc_valid(&right) && - xfs_refc_valid(&cleft) && xfs_refc_valid(&cright) && cequal && - left.rc_refcount == cleft.rc_refcount + adjust && - right.rc_refcount == cleft.rc_refcount + adjust && - ulen < MAXREFCEXTLEN) { + if (xfs_refc_want_merge_center(&left, &cleft, &cright, &right, cequal, + adjust, &ulen)) { *shape_changed = true; return xfs_refcount_merge_center_extents(cur, &left, &cleft, &right, ulen, aglen); } /* Try to merge left and cleft. */ - ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount; - if (xfs_refc_valid(&left) && xfs_refc_valid(&cleft) && - left.rc_refcount == cleft.rc_refcount + adjust && - ulen < MAXREFCEXTLEN) { + if (xfs_refc_want_merge_left(&left, &cleft, adjust)) { *shape_changed = true; error = xfs_refcount_merge_left_extent(cur, &left, &cleft, agbno, aglen); @@ -860,10 +960,7 @@ xfs_refcount_merge_extents( } /* Try to merge cright and right. */ - ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount; - if (xfs_refc_valid(&right) && xfs_refc_valid(&cright) && - right.rc_refcount == cright.rc_refcount + adjust && - ulen < MAXREFCEXTLEN) { + if (xfs_refc_want_merge_right(&cright, &right, adjust)) { *shape_changed = true; return xfs_refcount_merge_right_extent(cur, &right, &cright, aglen); From patchwork Fri Aug 4 17:10:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72C54C41513 for ; Fri, 4 Aug 2023 17:11:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232227AbjHDRLS (ORCPT ); Fri, 4 Aug 2023 13:11:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231954AbjHDRLA (ORCPT ); Fri, 4 Aug 2023 13:11:00 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3221949C1 for ; Fri, 4 Aug 2023 10:10:49 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAULJ018436 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169032; bh=jxBnhPXcLYD4J0TVYjeN7tPYclKJvxcCMWA3Ednkrbc=; h=From:Subject:Date:Message-Id:MIME-Version; b=fBGgAM/X3jtpMfzfk1DKsJRyLHFyyE8fKTYzSE6rDTfoEC6Q9veaGSjSdVWs9rDAO 7RtOYKH9n3a04SarqG2CmKiecUo65CbNeHu3zwc7b14RmSOEIYC5PIpB32GdxqMjBG Jr9KtOEqrNmhiz2ZwRSelXTPePdVzFTKp8H7kLg2tREFX1ZMsziAXGeHyrFdzjL4nZ 303bnXShkBdPR26acXFgdhtz5atqM4nzYYcoyejdIQ5XTocI1Iz6TMY1RU6kqsBhOi SY1g95ElHGdM6y55+6PacjJdhNW5N102ZlK5P4XOaDkD04DGrIA0EKK8BZy5K13T3/ nShKLVPUBlKTw== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 03A6B15C04F2; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Dave Chinner , Xiao Yang Subject: [PATCH CANDIDATE v5.15 2/9] xfs: estimate post-merge refcounts correctly Date: Fri, 4 Aug 2023 13:10:12 -0400 Message-Id: <20230804171019.1392900-2-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit b25d1984aa884fc91a73a5a407b9ac976d441e9b upstream. Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount metadata corruptions after being run. Specifically, xfs_repair noticed single-block refcount records that could be combined but had not been. The root cause of this is improper MAXREFCOUNT edge case handling in xfs_refcount_merge_extents. When we're trying to find candidates for a refcount btree record merge, we compute the refcount attribute of the merged record, but we fail to account for the fact that once a record hits rc_refcount == MAXREFCOUNT, it is pinned that way forever. Hence the computed refcount is wrong, and we fail to merge the extents. Fix this by adjusting the merge predicates to compute the adjusted refcount correctly. Fixes: 3172725814f9 ("xfs: adjust refcount of an extent of blocks in refcount btree") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Xiao Yang --- fs/xfs/libxfs/xfs_refcount.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index fee4010b88bc..e2dbd30b416a 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -787,6 +787,17 @@ xfs_refc_valid( return rc->rc_startblock != NULLAGBLOCK; } +static inline xfs_nlink_t +xfs_refc_merge_refcount( + const struct xfs_refcount_irec *irec, + enum xfs_refc_adjust_op adjust) +{ + /* Once a record hits MAXREFCOUNT, it is pinned there forever */ + if (irec->rc_refcount == MAXREFCOUNT) + return MAXREFCOUNT; + return irec->rc_refcount + adjust; +} + static inline bool xfs_refc_want_merge_center( const struct xfs_refcount_irec *left, @@ -798,6 +809,7 @@ xfs_refc_want_merge_center( unsigned long long *ulenp) { unsigned long long ulen = left->rc_blockcount; + xfs_nlink_t new_refcount; /* * To merge with a center record, both shoulder records must be @@ -813,9 +825,10 @@ xfs_refc_want_merge_center( return false; /* The shoulder record refcounts must match the new refcount. */ - if (left->rc_refcount != cleft->rc_refcount + adjust) + new_refcount = xfs_refc_merge_refcount(cleft, adjust); + if (left->rc_refcount != new_refcount) return false; - if (right->rc_refcount != cleft->rc_refcount + adjust) + if (right->rc_refcount != new_refcount) return false; /* @@ -838,6 +851,7 @@ xfs_refc_want_merge_left( enum xfs_refc_adjust_op adjust) { unsigned long long ulen = left->rc_blockcount; + xfs_nlink_t new_refcount; /* * For a left merge, the left shoulder record must be adjacent to the @@ -848,7 +862,8 @@ xfs_refc_want_merge_left( return false; /* Left shoulder record refcount must match the new refcount. */ - if (left->rc_refcount != cleft->rc_refcount + adjust) + new_refcount = xfs_refc_merge_refcount(cleft, adjust); + if (left->rc_refcount != new_refcount) return false; /* @@ -870,6 +885,7 @@ xfs_refc_want_merge_right( enum xfs_refc_adjust_op adjust) { unsigned long long ulen = right->rc_blockcount; + xfs_nlink_t new_refcount; /* * For a right merge, the right shoulder record must be adjacent to the @@ -880,7 +896,8 @@ xfs_refc_want_merge_right( return false; /* Right shoulder record refcount must match the new refcount. */ - if (right->rc_refcount != cright->rc_refcount + adjust) + new_refcount = xfs_refc_merge_refcount(cright, adjust); + if (right->rc_refcount != new_refcount) return false; /* From patchwork Fri Aug 4 17:10:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B9F3C04FE0 for ; Fri, 4 Aug 2023 17:11:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232290AbjHDRLV (ORCPT ); Fri, 4 Aug 2023 13:11:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232001AbjHDRLA (ORCPT ); Fri, 4 Aug 2023 13:11:00 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBFB8469A for ; Fri, 4 Aug 2023 10:10:51 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAUv0018433 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169032; bh=/2zJRyZjVdTFX93RUFfhWfBmOLfv1scRe50CT8jbvsY=; h=From:Subject:Date:Message-Id:MIME-Version; b=ZR4eu0Nn3GfNjmsoHWugzqeVCuOYfIU4UWJCDnbHjaTe+mmwOJ0PIzZh5ATyffmTd l7kOCT5Hv6I9HjZzYoh2IwEBYBjoX65wfObuQ6JnnlWlKuAErr/F7s4S+8IrO3rjfg vl5By9DYqXlRaiVzb94ofmYXoIq/t4EbvQHABqa0Qa98pBopxO+aLgqeGoJs1lhF4Z Ivls6cWpJer9Hb3C/Z8utzhBPeFf1SZm3ArTd+VBck83jY9AjLA/8xiO4LQqmxGDDp xXV4y6xmgr31oHtQAOnMGL5DSUGoGzZJtLg8ybYhttw9mFU2ItRuSsUEhvUbksEk4Q oIV51eDZ7WARg== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 051A315C04F3; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Gao Xiang Subject: [PATCH CANDIDATE v5.15 3/9] xfs: add missing cmap->br_state = XFS_EXT_NORM update Date: Fri, 4 Aug 2023 13:10:13 -0400 Message-Id: <20230804171019.1392900-3-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Gao Xiang commit 1a39ae415c1be1e46f5b3f97d438c7c4adc22b63 upstream. COW extents are already converted into written real extents after xfs_reflink_convert_cow_locked(), therefore cmap->br_state should reflect it. Otherwise, there is another necessary unwritten convertion triggered in xfs_dio_write_end_io() for direct I/O cases. Signed-off-by: Gao Xiang Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_reflink.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 36832e4bc803..628ce65d02bb 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -425,7 +425,10 @@ xfs_reflink_allocate_cow( if (!convert_now || cmap->br_state == XFS_EXT_NORM) return 0; trace_xfs_reflink_convert_cow(ip, cmap); - return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb); + error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb); + if (!error) + cmap->br_state = XFS_EXT_NORM; + return error; out_trans_cancel: xfs_trans_cancel(tp); From patchwork Fri Aug 4 17:10:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342124 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A9F3C001DF for ; Fri, 4 Aug 2023 17:11:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232163AbjHDRLR (ORCPT ); Fri, 4 Aug 2023 13:11:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232215AbjHDRK7 (ORCPT ); Fri, 4 Aug 2023 13:10:59 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FCB649F0 for ; Fri, 4 Aug 2023 10:10:46 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAUpx018435 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169032; bh=euCSBHqrCSTxpKioaTNv1/QoCEx4QA6JUd7a9fl3J+M=; h=From:Subject:Date:Message-Id:MIME-Version; b=dUr1JXoJ5V/jgTi3QkmZxaqogXdNX/fiT30TlJRzcs90Ei+j/8OTntEDiuSJPzK2e efE/RCbzZOFn0r1XddtJxl+eaRpKYALLv8QGJ5TfW5OZJk0PWx7uouyUkltljMtyXB ul72AWxeVlKIEwAA0k+LsfFzKA2ON+FaIytoTL5n4aGIHgWsjBGnLllmBjnA5ZoNxf 5Hr2eAGget148cnZ5Q3V7rYZvLqZgGfk5PEEYogmE5FAMR2AYjAgl1nYoUvgI/qFrg 3SHQ/8Gx/t5IaqOoQauZvcch+WGICjVT1mN48FshNqPQP3lUYKy3bmCjwbt8mhtqm7 FkGtMaDPB2X4g== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 071CD15C04F4; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Wengang Wang Subject: [PATCH CANDIDATE v5.15 4/9] xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork Date: Fri, 4 Aug 2023 13:10:14 -0400 Message-Id: <20230804171019.1392900-4-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Chandan Babu R commit d62113303d691bcd8d0675ae4ac63e7769afc56c upstream. On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error even though the filesystem has sufficient number of free blocks. This occurs if the file offset range on which the write operation is being performed has a delalloc extent in the cow fork and this delalloc extent begins much before the Direct IO range. In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to allocate the blocks mapped by the delalloc extent. The extent thus allocated may not cover the beginning of file offset range on which the Direct IO write was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC. The following script reliably recreates the bug described above. #!/usr/bin/bash device=/dev/loop0 shortdev=$(basename $device) mntpnt=/mnt/ file1=${mntpnt}/file1 file2=${mntpnt}/file2 fragmentedfile=${mntpnt}/fragmentedfile punchprog=/root/repos/xfstests-dev/src/punch-alternating errortag=/sys/fs/xfs/${shortdev}/errortag/bmap_alloc_minlen_extent umount $device > /dev/null 2>&1 echo "Create FS" mkfs.xfs -f -m reflink=1 $device > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mkfs failed." exit 1 fi echo "Mount FS" mount $device $mntpnt > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mount failed." exit 1 fi echo "Create source file" xfs_io -f -c "pwrite 0 32M" $file1 > /dev/null 2>&1 sync echo "Create Reflinked file" xfs_io -f -c "reflink $file1" $file2 &>/dev/null echo "Set cowextsize" xfs_io -c "cowextsize 16M" $file1 > /dev/null 2>&1 echo "Fragment FS" xfs_io -f -c "pwrite 0 64M" $fragmentedfile > /dev/null 2>&1 sync $punchprog $fragmentedfile echo "Allocate block sized extent from now onwards" echo -n 1 > $errortag echo "Create 16MiB delalloc extent in CoW fork" xfs_io -c "pwrite 0 4k" $file1 > /dev/null 2>&1 sync echo "Direct I/O write at offset 12k" xfs_io -d -c "pwrite 12k 8k" $file1 This commit fixes the bug by invoking xfs_bmapi_write() in a loop until disk blocks are allocated for atleast the starting file offset of the Direct IO write range. Fixes: 3c68d44a2b49 ("xfs: allocate direct I/O COW blocks in iomap_begin") Reported-and-Root-caused-by: Wengang Wang Signed-off-by: Chandan Babu R Reviewed-by: Darrick J. Wong [djwong: slight editing to make the locking less grody, and fix some style things] Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_reflink.c | 198 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 163 insertions(+), 35 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 628ce65d02bb..793bdf5ac2f7 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -340,9 +340,41 @@ xfs_find_trim_cow_extent( return 0; } -/* Allocate all CoW reservations covering a range of blocks in a file. */ -int -xfs_reflink_allocate_cow( +static int +xfs_reflink_convert_unwritten( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + struct xfs_bmbt_irec *cmap, + bool convert_now) +{ + xfs_fileoff_t offset_fsb = imap->br_startoff; + xfs_filblks_t count_fsb = imap->br_blockcount; + int error; + + /* + * cmap might larger than imap due to cowextsize hint. + */ + xfs_trim_extent(cmap, offset_fsb, count_fsb); + + /* + * COW fork extents are supposed to remain unwritten until we're ready + * to initiate a disk write. For direct I/O we are going to write the + * data and need the conversion, but for buffered writes we're done. + */ + if (!convert_now || cmap->br_state == XFS_EXT_NORM) + return 0; + + trace_xfs_reflink_convert_cow(ip, cmap); + + error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb); + if (!error) + cmap->br_state = XFS_EXT_NORM; + + return error; +} + +static int +xfs_reflink_fill_cow_hole( struct xfs_inode *ip, struct xfs_bmbt_irec *imap, struct xfs_bmbt_irec *cmap, @@ -351,25 +383,12 @@ xfs_reflink_allocate_cow( bool convert_now) { struct xfs_mount *mp = ip->i_mount; - xfs_fileoff_t offset_fsb = imap->br_startoff; - xfs_filblks_t count_fsb = imap->br_blockcount; struct xfs_trans *tp; - int nimaps, error = 0; - bool found; xfs_filblks_t resaligned; - xfs_extlen_t resblks = 0; - - ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); - if (!ip->i_cowfp) { - ASSERT(!xfs_is_reflink_inode(ip)); - xfs_ifork_init_cow(ip); - } - - error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found); - if (error || !*shared) - return error; - if (found) - goto convert; + xfs_extlen_t resblks; + int nimaps; + int error; + bool found; resaligned = xfs_aligned_fsb_count(imap->br_startoff, imap->br_blockcount, xfs_get_cowextsz_hint(ip)); @@ -385,17 +404,17 @@ xfs_reflink_allocate_cow( *lockmode = XFS_ILOCK_EXCL; - /* - * Check for an overlapping extent again now that we dropped the ilock. - */ error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found); if (error || !*shared) goto out_trans_cancel; + if (found) { xfs_trans_cancel(tp); goto convert; } + ASSERT(cmap->br_startoff > imap->br_startoff); + /* Allocate the entire reservation as unwritten blocks. */ nimaps = 1; error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount, @@ -415,26 +434,135 @@ xfs_reflink_allocate_cow( */ if (nimaps == 0) return -ENOSPC; + convert: - xfs_trim_extent(cmap, offset_fsb, count_fsb); - /* - * COW fork extents are supposed to remain unwritten until we're ready - * to initiate a disk write. For direct I/O we are going to write the - * data and need the conversion, but for buffered writes we're done. - */ - if (!convert_now || cmap->br_state == XFS_EXT_NORM) - return 0; - trace_xfs_reflink_convert_cow(ip, cmap); - error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb); - if (!error) - cmap->br_state = XFS_EXT_NORM; + return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now); + +out_trans_cancel: + xfs_trans_cancel(tp); return error; +} + +static int +xfs_reflink_fill_delalloc( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + struct xfs_bmbt_irec *cmap, + bool *shared, + uint *lockmode, + bool convert_now) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_trans *tp; + int nimaps; + int error; + bool found; + + do { + xfs_iunlock(ip, *lockmode); + *lockmode = 0; + + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, 0, 0, + false, &tp); + if (error) + return error; + + *lockmode = XFS_ILOCK_EXCL; + + error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, + &found); + if (error || !*shared) + goto out_trans_cancel; + + if (found) { + xfs_trans_cancel(tp); + break; + } + + ASSERT(isnullstartblock(cmap->br_startblock) || + cmap->br_startblock == DELAYSTARTBLOCK); + + /* + * Replace delalloc reservation with an unwritten extent. + */ + nimaps = 1; + error = xfs_bmapi_write(tp, ip, cmap->br_startoff, + cmap->br_blockcount, + XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, + cmap, &nimaps); + if (error) + goto out_trans_cancel; + + xfs_inode_set_cowblocks_tag(ip); + error = xfs_trans_commit(tp); + if (error) + return error; + + /* + * Allocation succeeded but the requested range was not even + * partially satisfied? Bail out! + */ + if (nimaps == 0) + return -ENOSPC; + } while (cmap->br_startoff + cmap->br_blockcount <= imap->br_startoff); + + return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now); out_trans_cancel: xfs_trans_cancel(tp); return error; } +/* Allocate all CoW reservations covering a range of blocks in a file. */ +int +xfs_reflink_allocate_cow( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + struct xfs_bmbt_irec *cmap, + bool *shared, + uint *lockmode, + bool convert_now) +{ + int error; + bool found; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + if (!ip->i_cowfp) { + ASSERT(!xfs_is_reflink_inode(ip)); + xfs_ifork_init_cow(ip); + } + + error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found); + if (error || !*shared) + return error; + + /* CoW fork has a real extent */ + if (found) + return xfs_reflink_convert_unwritten(ip, imap, cmap, + convert_now); + + /* + * CoW fork does not have an extent and data extent is shared. + * Allocate a real extent in the CoW fork. + */ + if (cmap->br_startoff > imap->br_startoff) + return xfs_reflink_fill_cow_hole(ip, imap, cmap, shared, + lockmode, convert_now); + + /* + * CoW fork has a delalloc reservation. Replace it with a real extent. + * There may or may not be a data fork mapping. + */ + if (isnullstartblock(cmap->br_startblock) || + cmap->br_startblock == DELAYSTARTBLOCK) + return xfs_reflink_fill_delalloc(ip, imap, cmap, shared, + lockmode, convert_now); + + /* Shouldn't get here. */ + ASSERT(0); + return -EFSCORRUPTED; +} + /* * Cancel CoW reservations for some block range of an inode. * From patchwork Fri Aug 4 17:10:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDAE7C001DE for ; Fri, 4 Aug 2023 17:11:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232276AbjHDRLU (ORCPT ); Fri, 4 Aug 2023 13:11:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232262AbjHDRLA (ORCPT ); Fri, 4 Aug 2023 13:11:00 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3F694C1E for ; Fri, 4 Aug 2023 10:10:49 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAUOP018432 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169034; bh=lICehIxD8TQuQ8xl0JGfHs4wBrLMdBeRFnx/nf9EF6Y=; h=From:Subject:Date:Message-Id:MIME-Version:Content-Type; b=CCfKCiV2iNlwwEKlKcu6413vpx4DWoZ2r6kySUoCvzcVB3zVIJhdeLMu/p+7qgzrg RhBK9pgzaJClWcRM7dToXwLO5f6k9bYxJWKqPzHGxrn3HgDjoocL5XDzjTgmq1zLeR RCjLUSkPVpPtHEdW9jsn+AdX+TVcPu/awHPHLqD/ODvPOTt30PwTTTQ1uKq2rpHfjl UqcX9dboKD7JH7VbxbzpUazSN2Gf/0nCmKHMug1LnfigIxFof41wlwRYJZnEm7DPrD OvdLGku3F3im8ZidqBjFR6HVYQmVNhAH3G3GYaLEfwbRpN2T175fACvbdbC5AmsqTA KhY5B9yWIDYiw== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 08D4115C04F5; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Christoph Hellwig Subject: [PATCH CANDIDATE v5.15 5/9] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Date: Fri, 4 Aug 2023 13:10:15 -0400 Message-Id: <20230804171019.1392900-5-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit a9248538facc3d9e769489e50a544509c2f9cebe upstream. Back in the old days, the "ascii-ci" feature was created to implement case-insensitive directory entry lookups for latin1-encoded names and remove the large overhead of Samba's case-insensitive lookup code. UTF8 names were not allowed, but nobody explicitly wrote in the documentation that this was only expected to work if the system used latin1 names. The kernel tolower function was selected to prepare names for hashed lookups. There's a major discrepancy in the function that computes directory entry hashes for filesystems that have ASCII case-insensitive lookups enabled. The root of this is that the kernel and glibc's tolower implementations have differing behavior for extended ASCII accented characters. I wrote a program to spit out characters for which the tolower() return value is different from the input: glibc tolower: 65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N 79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z kernel tolower: 65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N 79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z 192:À 193:Á 194:Â 195:Ã 196:Ä 197:Å 198:Æ 199:Ç 200:È 201:É 202:Ê 203:Ë 204:Ì 205:Í 206:Î 207:Ï 208:Ð 209:Ñ 210:Ò 211:Ó 212:Ô 213:Õ 214:Ö 215:× 216:Ø 217:Ù 218:Ú 219:Û 220:Ü 221:Ý 222:Þ Which means that the kernel and userspace do not agree on the hash value for a directory filename that contains those higher values. The hash values are written into the leaf index block of directories that are larger than two blocks in size, which means that xfs_repair will flag these directories as having corrupted hash indexes and rewrite the index with hash values that the kernel now will not recognize. Because the ascii-ci feature is not frequently enabled and the kernel touches filesystems far more frequently than xfs_repair does, fix this by encoding the kernel's toupper predicate and tolower functions into libxfs. Give the new functions less provocative names to make it really obvious that this is a pre-hash name preparation function, and nothing else. This change makes userspace's behavior consistent with the kernel. Found by auditing obfuscate_name in xfs_metadump as part of working on parent pointers, wondering how it could possibly work correctly with ci filesystems, writing a test tool to create a directory with hash-colliding names, and watching xfs_repair flag it. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_dir2.c | 5 +++-- fs/xfs/libxfs/xfs_dir2.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index 50546eadaae2..18621912e47b 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -60,7 +60,7 @@ xfs_ascii_ci_hashname( int i; for (i = 0, hash = 0; i < name->len; i++) - hash = tolower(name->name[i]) ^ rol32(hash, 7); + hash = xfs_ascii_ci_xfrm(name->name[i]) ^ rol32(hash, 7); return hash; } @@ -81,7 +81,8 @@ xfs_ascii_ci_compname( for (i = 0; i < len; i++) { if (args->name[i] == name[i]) continue; - if (tolower(args->name[i]) != tolower(name[i])) + if (xfs_ascii_ci_xfrm(args->name[i]) != + xfs_ascii_ci_xfrm(name[i])) return XFS_CMP_DIFFERENT; result = XFS_CMP_CASE; } diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h index d03e6098ded9..1b877dc0ccf7 100644 --- a/fs/xfs/libxfs/xfs_dir2.h +++ b/fs/xfs/libxfs/xfs_dir2.h @@ -248,4 +248,35 @@ unsigned int xfs_dir3_data_end_offset(struct xfs_da_geometry *geo, struct xfs_dir2_data_hdr *hdr); bool xfs_dir2_namecheck(const void *name, size_t length); +/* + * The "ascii-ci" feature was created to speed up case-insensitive lookups for + * a Samba product. Because of the inherent problems with CI and UTF-8 + * encoding, etc, it was decided that Samba would be configured to export + * latin1/iso 8859-1 encodings as that covered >90% of the target markets for + * the product. Hence the "ascii-ci" casefolding code could be encoded into + * the XFS directory operations and remove all the overhead of casefolding from + * Samba. + * + * To provide consistent hashing behavior between the userspace and kernel, + * these functions prepare names for hashing by transforming specific bytes + * to other bytes. Robustness with other encodings is not guaranteed. + */ +static inline bool xfs_ascii_ci_need_xfrm(unsigned char c) +{ + if (c >= 0x41 && c <= 0x5a) /* A-Z */ + return true; + if (c >= 0xc0 && c <= 0xd6) /* latin A-O with accents */ + return true; + if (c >= 0xd8 && c <= 0xde) /* latin O-Y with accents */ + return true; + return false; +} + +static inline unsigned char xfs_ascii_ci_xfrm(unsigned char c) +{ + if (xfs_ascii_ci_need_xfrm(c)) + c -= 'A' - 'a'; + return c; +} + #endif /* __XFS_DIR2_H__ */ From patchwork Fri Aug 4 17:10:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FFFEC04A6A for ; Fri, 4 Aug 2023 17:11:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232346AbjHDRLT (ORCPT ); Fri, 4 Aug 2023 13:11:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232259AbjHDRLA (ORCPT ); Fri, 4 Aug 2023 13:11:00 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1557A198B for ; Fri, 4 Aug 2023 10:10:50 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAWbb018484 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169034; bh=oi/YQdJtCDS/0hEIfpYRZpQ2Zp44SZ7fmLeWR86RvKU=; h=From:Subject:Date:Message-Id:MIME-Version; b=VnA6QvRFdf2Q6plISM0Tr1QljyX2LMjt8DnqDJjhI/ps+DaR/0gQ8n/e6lAREW4mu i3drYLaLOz5PRBQQsPSjB3V0LQIAnGtPAg42PTq4SC2n8YJe07uMkgGFps9Hc/udLk 9INSFh3QvphfrYoSixHKON+c9IzGua5Yc3Z3Nfv8KHfrm2MmXXUtbFSGTY1etS8Mtd ar+z8/Jl5NXlRkY7cjLGQNhChtwxnyOr4DEpTuc5bUEouNRzMC752Dbsmu0uib+xW6 AtwxZ8Dl6hJ4/+cjHZDWwYLmmnpmLVaxlWMQEqWECCQnLGllTQUfNGK06iIRbFszx1 HqtQpeqjpPxSA== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 0A7BA15C04F6; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Christoph Hellwig , Dave Chinner Subject: [PATCH CANDIDATE v5.15 6/9] xfs: use the directory name hash function for dir scrubbing Date: Fri, 4 Aug 2023 13:10:16 -0400 Message-Id: <20230804171019.1392900-6-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit 9dceccc5822f2ecea12a89f24d7cad1f3e5eab7c upstream. The directory code has a directory-specific hash computation function that includes a modified hash function for case-insensitive lookups. Hence we must use that function (and not the raw da_hashname) when checking the dabtree structure. Found by accidentally breaking xfs/188 to create an abnormally huge case-insensitive directory and watching scrub break. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Dave Chinner --- fs/xfs/scrub/dir.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c index 200a63f58fe7..829dadd400b0 100644 --- a/fs/xfs/scrub/dir.c +++ b/fs/xfs/scrub/dir.c @@ -201,6 +201,7 @@ xchk_dir_rec( struct xchk_da_btree *ds, int level) { + struct xfs_name dname = { }; struct xfs_da_state_blk *blk = &ds->state->path.blk[level]; struct xfs_mount *mp = ds->state->mp; struct xfs_inode *dp = ds->dargs.dp; @@ -297,7 +298,11 @@ xchk_dir_rec( xchk_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno); goto out_relse; } - calc_hash = xfs_da_hashname(dent->name, dent->namelen); + + /* Does the directory hash match? */ + dname.name = dent->name; + dname.len = dent->namelen; + calc_hash = xfs_dir2_hashname(mp, &dname); if (calc_hash != hash) xchk_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno); From patchwork Fri Aug 4 17:10:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4D60C04FDF for ; Fri, 4 Aug 2023 17:11:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232259AbjHDRLT (ORCPT ); Fri, 4 Aug 2023 13:11:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232253AbjHDRLA (ORCPT ); Fri, 4 Aug 2023 13:11:00 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D279A4C1F for ; Fri, 4 Aug 2023 10:10:49 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAW18018483 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169034; bh=H03LCI/Mqx0dtR/ofulsXH0jrCyGFENesVehINbkeoE=; h=From:Subject:Date:Message-Id:MIME-Version; b=GIvIk1sPCwjEMNZqJs3nVfaPwrxMC7RfrDqET8bdY6lStAXw7QEgRQyHo2i560tob 0ypJlLUWfnjc1aQA7HoJMzNFKKPOfGJEIYrmU0fXk9IX/fvrnJq0H4+uXFFPlZ4aHc 0XwnFTexjImfuVDYg21qR70UiJxdH62/KhvAd1kziZ0tHJ0UwBIqDbFKKfMekFFMNp GR50Z077JK+KiEG6qcjHm3hVKJvkPFTIy3HYyNl12H7kN02pxaQLXP4PiCt8Dzm06f dDp4SObnlmqxJWNxEezHMZUuPHJ+wBIdcItA8shy4xKe/BKd3UimsXi6Bxod0EbIm0 K4ddSgTIEo1fw== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 0C26F15C04F7; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Hironori Shiina , Hironori Shiina Subject: [PATCH CANDIDATE v5.15 7/9] xfs: get root inode correctly at bulkstat Date: Fri, 4 Aug 2023 13:10:17 -0400 Message-Id: <20230804171019.1392900-7-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Hironori Shiina commit 817644fa4525258992f17fecf4f1d6cdd2e1b731 upstream. The root inode number should be set to `breq->startino` for getting stat information of the root when XFS_BULK_IREQ_SPECIAL_ROOT is used. Otherwise, the inode search is started from 1 (XFS_BULK_IREQ_SPECIAL_ROOT) and the inode with the lowest number in a filesystem is returned. Fixes: bf3cb3944792 ("xfs: allow single bulkstat of special inodes") Signed-off-by: Hironori Shiina Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_ioctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index bcc3c18c8080..17037c2b2daf 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -834,7 +834,7 @@ xfs_bulkstat_fmt( static int xfs_bulk_ireq_setup( struct xfs_mount *mp, - struct xfs_bulk_ireq *hdr, + const struct xfs_bulk_ireq *hdr, struct xfs_ibulk *breq, void __user *ubuffer) { @@ -860,7 +860,7 @@ xfs_bulk_ireq_setup( switch (hdr->ino) { case XFS_BULK_IREQ_SPECIAL_ROOT: - hdr->ino = mp->m_sb.sb_rootino; + breq->startino = mp->m_sb.sb_rootino; break; default: return -EINVAL; From patchwork Fri Aug 4 17:10:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2B57C001DE for ; Fri, 4 Aug 2023 17:11:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232216AbjHDRLR (ORCPT ); Fri, 4 Aug 2023 13:11:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232221AbjHDRK7 (ORCPT ); Fri, 4 Aug 2023 13:10:59 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67FCF4C05 for ; Fri, 4 Aug 2023 10:10:47 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAWik018485 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169034; bh=UDwpJ5GgI9EIQrPOCj5y0w/t5ldq45OM5CixWwwHDqQ=; h=From:Subject:Date:Message-Id:MIME-Version; b=D0hLi3APY+HHynO3jh4sHEDsbtKcOmgXGl0h4MeHO6oNLufKJdg6YV+CMwETg07RU sM7E8OiXlOiXwT6V/zlhUB1LeXQPzc8QIi1tkpLixGzsv2L0FJfPwNfmJ1yq4qPwwP e94xQYcLl3ekoWVMN0sKgpcjCSTuHfBTbw2rd6e1BMZQnMQ1JgOtrKl4g0YoXtd+L4 hMb6oK77FGY6LMquJWUCG3zaFXtP37MdFLIjI+hSSQiGrZ0fJfg/viDzUNdX3f6ghN Ypfg+sH23Rvkx2blRBgYTF9A5GpFKIQ80A3e3BhJ/oiKFRMzxt2MXqcv3JMRD5EVax gEGCWOmfHf8zA== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 0DCA315C04F8; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Dave Chinner Subject: [PATCH CANDIDATE v5.15 8/9] xfs: bound maximum wait time for inodegc work Date: Fri, 4 Aug 2023 13:10:18 -0400 Message-Id: <20230804171019.1392900-8-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner commit 7cf2b0f9611b9971d663e1fc3206eeda3b902922 upstream. Currently inodegc work can sit queued on the per-cpu queue until the workqueue is either flushed of the queue reaches a depth that triggers work queuing (and later throttling). This means that we could queue work that waits for a long time for some other event to trigger flushing. Hence instead of just queueing work at a specific depth, use a delayed work that queues the work at a bound time. We can still schedule the work immediately at a given depth, but we no long need to worry about leaving a number of items on the list that won't get processed until external events prevail. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 36 ++++++++++++++++++++++-------------- fs/xfs/xfs_mount.h | 2 +- fs/xfs/xfs_super.c | 2 +- 3 files changed, 24 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 5e44d7bbd8fc..2c3ef553f5ef 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -458,7 +458,7 @@ xfs_inodegc_queue_all( for_each_online_cpu(cpu) { gc = per_cpu_ptr(mp->m_inodegc, cpu); if (!llist_empty(&gc->list)) - queue_work_on(cpu, mp->m_inodegc_wq, &gc->work); + mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0); } } @@ -1851,8 +1851,8 @@ void xfs_inodegc_worker( struct work_struct *work) { - struct xfs_inodegc *gc = container_of(work, struct xfs_inodegc, - work); + struct xfs_inodegc *gc = container_of(to_delayed_work(work), + struct xfs_inodegc, work); struct llist_node *node = llist_del_all(&gc->list); struct xfs_inode *ip, *n; @@ -2021,6 +2021,7 @@ xfs_inodegc_queue( struct xfs_inodegc *gc; int items; unsigned int shrinker_hits; + unsigned long queue_delay = 1; trace_xfs_inode_set_need_inactive(ip); spin_lock(&ip->i_flags_lock); @@ -2032,19 +2033,26 @@ xfs_inodegc_queue( items = READ_ONCE(gc->items); WRITE_ONCE(gc->items, items + 1); shrinker_hits = READ_ONCE(gc->shrinker_hits); - put_cpu_ptr(gc); - if (!xfs_is_inodegc_enabled(mp)) + /* + * We queue the work while holding the current CPU so that the work + * is scheduled to run on this CPU. + */ + if (!xfs_is_inodegc_enabled(mp)) { + put_cpu_ptr(gc); return; - - if (xfs_inodegc_want_queue_work(ip, items)) { - trace_xfs_inodegc_queue(mp, __return_address); - queue_work(mp->m_inodegc_wq, &gc->work); } + if (xfs_inodegc_want_queue_work(ip, items)) + queue_delay = 0; + + trace_xfs_inodegc_queue(mp, __return_address); + mod_delayed_work(mp->m_inodegc_wq, &gc->work, queue_delay); + put_cpu_ptr(gc); + if (xfs_inodegc_want_flush_work(ip, items, shrinker_hits)) { trace_xfs_inodegc_throttle(mp, __return_address); - flush_work(&gc->work); + flush_delayed_work(&gc->work); } } @@ -2061,7 +2069,7 @@ xfs_inodegc_cpu_dead( unsigned int count = 0; dead_gc = per_cpu_ptr(mp->m_inodegc, dead_cpu); - cancel_work_sync(&dead_gc->work); + cancel_delayed_work_sync(&dead_gc->work); if (llist_empty(&dead_gc->list)) return; @@ -2080,12 +2088,12 @@ xfs_inodegc_cpu_dead( llist_add_batch(first, last, &gc->list); count += READ_ONCE(gc->items); WRITE_ONCE(gc->items, count); - put_cpu_ptr(gc); if (xfs_is_inodegc_enabled(mp)) { trace_xfs_inodegc_queue(mp, __return_address); - queue_work(mp->m_inodegc_wq, &gc->work); + mod_delayed_work(mp->m_inodegc_wq, &gc->work, 0); } + put_cpu_ptr(gc); } /* @@ -2180,7 +2188,7 @@ xfs_inodegc_shrinker_scan( unsigned int h = READ_ONCE(gc->shrinker_hits); WRITE_ONCE(gc->shrinker_hits, h + 1); - queue_work_on(cpu, mp->m_inodegc_wq, &gc->work); + mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0); no_items = false; } } diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 86564295fce6..3d58938a6f75 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -61,7 +61,7 @@ struct xfs_error_cfg { */ struct xfs_inodegc { struct llist_head list; - struct work_struct work; + struct delayed_work work; /* approximate count of inodes in the list */ unsigned int items; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index df1d6be61bfa..8fe6ca9208de 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1061,7 +1061,7 @@ xfs_inodegc_init_percpu( gc = per_cpu_ptr(mp->m_inodegc, cpu); init_llist_head(&gc->list); gc->items = 0; - INIT_WORK(&gc->work, xfs_inodegc_worker); + INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker); } return 0; } From patchwork Fri Aug 4 17:10:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 13342132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5A52C04A6A for ; Fri, 4 Aug 2023 17:11:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232300AbjHDRLW (ORCPT ); Fri, 4 Aug 2023 13:11:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232283AbjHDRLC (ORCPT ); Fri, 4 Aug 2023 13:11:02 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8694C46B1 for ; Fri, 4 Aug 2023 10:10:55 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-112-100.bstnma.fios.verizon.net [173.48.112.100]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 374HAWGN018486 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 4 Aug 2023 13:10:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1691169034; bh=DSZEMcDzHcclhsUmHzBZvm98nWwi1c9ddHiwzA+3kvo=; h=From:Subject:Date:Message-Id:MIME-Version; b=lvXZu3XsIsmkDdvxEEgm6mChUX4wViEy9149XHLRPHyHZ5wUjh1hc9DzTpIKhnwt0 4cNcq8ZiNeOJNMcHFOA+ph/YsYu+179/lUKSxtf9GxIJM5w26ALXIO/rXPNUkRLm0g vnGIldZk0Hliylydfnu/WNxXgCfEwhPhakiqPiFDSna3p0+J8mdnmI3raZYyUyXBYC atj/mCmjvEgUlvrmOVOvrjLJVob1RepVZJLr8QMbROYd/VAvA/J625FU8XSfEibYHS RJ3UXlX+ff0tCCis7W1duNRibtyTnX0LeYVw+oetKN4jFcu5a2oK0tnvDw9U41ZM1d QU7Fl7sspYyEw== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 0F79415C04F9; Fri, 4 Aug 2023 13:10:30 -0400 (EDT) From: "Theodore Ts'o" To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, djwong@kernel.org, chandan.babu@oracle.com, leah.rumancik@gmail.com, Dave Chinner , Chris Dunlop Subject: [PATCH CANDIDATE v5.15 9/9] xfs: introduce xfs_inodegc_push() Date: Fri, 4 Aug 2023 13:10:19 -0400 Message-Id: <20230804171019.1392900-9-tytso@mit.edu> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230804171019.1392900-1-tytso@mit.edu> References: <20230802205747.GE358316@mit.edu> <20230804171019.1392900-1-tytso@mit.edu> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner commit 5e672cd69f0a534a445df4372141fd0d1d00901d upstream. The current blocking mechanism for pushing the inodegc queue out to disk can result in systems becoming unusable when there is a long running inodegc operation. This is because the statfs() implementation currently issues a blocking flush of the inodegc queue and a significant number of common system utilities will call statfs() to discover something about the underlying filesystem. This can result in userspace operations getting stuck on inodegc progress, and when trying to remove a heavily reflinked file on slow storage with a full journal, this can result in delays measuring in hours. Avoid this problem by adding "push" function that expedites the flushing of the inodegc queue, but doesn't wait for it to complete. Convert xfs_fs_statfs() and xfs_qm_scall_getquota() to use this mechanism so they don't block but still ensure that queued operations are expedited. Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Reported-by: Chris Dunlop Signed-off-by: Dave Chinner [djwong: fix _getquota_next to use _inodegc_push too] Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Acked-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 20 +++++++++++++++----- fs/xfs/xfs_icache.h | 1 + fs/xfs/xfs_qm_syscalls.c | 9 ++++++--- fs/xfs/xfs_super.c | 7 +++++-- fs/xfs/xfs_trace.h | 1 + 5 files changed, 28 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 2c3ef553f5ef..e9ebfe6f8015 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1872,19 +1872,29 @@ xfs_inodegc_worker( } /* - * Force all currently queued inode inactivation work to run immediately and - * wait for the work to finish. + * Expedite all pending inodegc work to run immediately. This does not wait for + * completion of the work. */ void -xfs_inodegc_flush( +xfs_inodegc_push( struct xfs_mount *mp) { if (!xfs_is_inodegc_enabled(mp)) return; + trace_xfs_inodegc_push(mp, __return_address); + xfs_inodegc_queue_all(mp); +} +/* + * Force all currently queued inode inactivation work to run immediately and + * wait for the work to finish. + */ +void +xfs_inodegc_flush( + struct xfs_mount *mp) +{ + xfs_inodegc_push(mp); trace_xfs_inodegc_flush(mp, __return_address); - - xfs_inodegc_queue_all(mp); flush_workqueue(mp->m_inodegc_wq); } diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 2e4cfddf8b8e..6cd180721659 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -76,6 +76,7 @@ void xfs_blockgc_stop(struct xfs_mount *mp); void xfs_blockgc_start(struct xfs_mount *mp); void xfs_inodegc_worker(struct work_struct *work); +void xfs_inodegc_push(struct xfs_mount *mp); void xfs_inodegc_flush(struct xfs_mount *mp); void xfs_inodegc_stop(struct xfs_mount *mp); void xfs_inodegc_start(struct xfs_mount *mp); diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 47fe60e1a887..322a111dfbc0 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -481,9 +481,12 @@ xfs_qm_scall_getquota( struct xfs_dquot *dqp; int error; - /* Flush inodegc work at the start of a quota reporting scan. */ + /* + * Expedite pending inodegc work at the start of a quota reporting + * scan but don't block waiting for it to complete. + */ if (id == 0) - xfs_inodegc_flush(mp); + xfs_inodegc_push(mp); /* * Try to get the dquot. We don't want it allocated on disk, so don't @@ -525,7 +528,7 @@ xfs_qm_scall_getquota_next( /* Flush inodegc work at the start of a quota reporting scan. */ if (*id == 0) - xfs_inodegc_flush(mp); + xfs_inodegc_push(mp); error = xfs_qm_dqget_next(mp, *id, type, &dqp); if (error) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 8fe6ca9208de..9b3af7611eaa 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -795,8 +795,11 @@ xfs_fs_statfs( xfs_extlen_t lsize; int64_t ffree; - /* Wait for whatever inactivations are in progress. */ - xfs_inodegc_flush(mp); + /* + * Expedite background inodegc but don't wait. We do not want to block + * here waiting hours for a billion extent file to be truncated. + */ + xfs_inodegc_push(mp); statp->f_type = XFS_SUPER_MAGIC; statp->f_namelen = MAXNAMELEN - 1; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 1033a95fbf8e..ebd17ddba024 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -240,6 +240,7 @@ DEFINE_EVENT(xfs_fs_class, name, \ TP_PROTO(struct xfs_mount *mp, void *caller_ip), \ TP_ARGS(mp, caller_ip)) DEFINE_FS_EVENT(xfs_inodegc_flush); +DEFINE_FS_EVENT(xfs_inodegc_push); DEFINE_FS_EVENT(xfs_inodegc_start); DEFINE_FS_EVENT(xfs_inodegc_stop); DEFINE_FS_EVENT(xfs_inodegc_queue);