From patchwork Fri Jun 17 10:06:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885427 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4718C433EF for ; Fri, 17 Jun 2022 10:07:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381585AbiFQKH2 (ORCPT ); Fri, 17 Jun 2022 06:07:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381696AbiFQKHR (ORCPT ); Fri, 17 Jun 2022 06:07:17 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D403F69CE2; Fri, 17 Jun 2022 03:06:49 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id n1so4855416wrg.12; Fri, 17 Jun 2022 03:06:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aeFrWkB9bAtH1ohe/wgGvpaJlZDS3CmcylgXaGBlt54=; b=KOaQZ5POGis26lUZ50+UBc0xDSnzow02/3JLsQNYGscEugNsvMmOqifQXdhujixGOE +n3meouiYo96fH06XhO3p2ghSH0ySNk4wWJoqhOinPr/mSPgPQpQIdAqiqhdRnGbaEzX hIux1pDg+ouCaiBMzr2u4/9WbuC30NZpQGm1RYVQDOcaEFLHTgUknG+Q8lYggLFEl7sI /5fpIFiB6D7hktqiAZedeHdke+n2wlHITYjWfzJE1dDQ3dpL2ANi5ht6DTRVgraBzsiC WokvF73cQtbqLOofVlRWjRfoxYbRP2Xb6yQELe5G0ZTQhhskdZprhbsZ2Vy03G1Cn4nu UQOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aeFrWkB9bAtH1ohe/wgGvpaJlZDS3CmcylgXaGBlt54=; b=d2ZXXjgx5LwFBB0bfV3F+m3fNl8/aywHc0gTwpyR/JOBFxVDRj93rrNS4hx1qPoWrV avgJ2RD2CqcDG094XjkWgqCQtlpC6TAgL/ZkuU5hARvNX1kYvSLnsRUzDApz5vwZ6W1L QksDHfnzQ+eHJKSm552Mz0GPwjqYURydMVXn7be0iyZKiq72Otyx83JIqYVBaXp0kZUg ZYlXjo89gBKXkp/S8Fmfe2ITgrC5zbr/yPr33y3LzD66sYmrLBgZL24wJOLM2prdwPuW TDBadBy8CgHks6f/Q5zFSaK/caUafWLgW49usaZVGwBGdQI9xvPI5TMM8e4G//gF3jSU dqMQ== X-Gm-Message-State: AJIora+EjptPzyXqQeB5aF5WRBVlAuY9AQxhjgPu0Sn+q8Md0fOdNf3W Qu9kdxT4J7tdsDHfJ/Ih/YExs+g5dy27fQ== X-Google-Smtp-Source: AGRyM1tdf8q4Aj4fFkfDT/W2StMXWq4R3JP4G4LFK8mSKLXV6wzW6NlyXEvElGENaLZ62EOrSNpGEA== X-Received: by 2002:a5d:6786:0:b0:215:3cb5:b16c with SMTP id v6-20020a5d6786000000b002153cb5b16cmr8625426wru.6.1655460408012; Fri, 17 Jun 2022 03:06:48 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:06:47 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Rustam Kovhaev Subject: [PATCH 5.10 CANDIDATE 01/11] xfs: use kmem_cache_free() for kmem_cache objects Date: Fri, 17 Jun 2022 13:06:31 +0300 Message-Id: <20220617100641.1653164-2-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Rustam Kovhaev commit c30a0cbd07ecc0eec7b3cd568f7b1c7bb7913f93 upstream. For kmalloc() allocations SLOB prepends the blocks with a 4-byte header, and it puts the size of the allocated blocks in that header. Blocks allocated with kmem_cache_alloc() allocations do not have that header. SLOB explodes when you allocate memory with kmem_cache_alloc() and then try to free it with kfree() instead of kmem_cache_free(). SLOB will assume that there is a header when there is none, read some garbage to size variable and corrupt the adjacent objects, which eventually leads to hang or panic. Let's make XFS work with SLOB by using proper free function. Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free") Signed-off-by: Rustam Kovhaev Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein --- fs/xfs/xfs_extfree_item.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 5c0395256bd1..11474770d630 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -482,7 +482,7 @@ xfs_extent_free_finish_item( free->xefi_startblock, free->xefi_blockcount, &free->xefi_oinfo, free->xefi_skip_discard); - kmem_free(free); + kmem_cache_free(xfs_bmap_free_item_zone, free); return error; } @@ -502,7 +502,7 @@ xfs_extent_free_cancel_item( struct xfs_extent_free_item *free; free = container_of(item, struct xfs_extent_free_item, xefi_list); - kmem_free(free); + kmem_cache_free(xfs_bmap_free_item_zone, free); } const struct xfs_defer_op_type xfs_extent_free_defer_type = { @@ -564,7 +564,7 @@ xfs_agfl_free_finish_item( extp->ext_len = free->xefi_blockcount; efdp->efd_next_extent++; - kmem_free(free); + kmem_cache_free(xfs_bmap_free_item_zone, free); return error; } From patchwork Fri Jun 17 10:06:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885428 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BA5CC433EF for ; Fri, 17 Jun 2022 10:07:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381545AbiFQKHe (ORCPT ); Fri, 17 Jun 2022 06:07:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381596AbiFQKHS (ORCPT ); Fri, 17 Jun 2022 06:07:18 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E9D76A418; Fri, 17 Jun 2022 03:06:51 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id c21so5130440wrb.1; Fri, 17 Jun 2022 03:06:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XIj+XIGeyKI5FSWHzvU/4VEeWwx2VX+90KKkI9DaRug=; b=N5KFvKm5imKJ9UeSYvVSn3JV1xNsIncONcQSXbfCX2umx4x1RAowPzMgYaSC/tSPJV RuR277iVrBxWdrhieIhWAd+vSGUoX+ubWLn9LmARgETWjpe3NHep1bnbQ8FWXnei1xc/ zQkYrA3/Usp3oMimmGSZpbfZCIguTajxIasfmwfZ4M0O6FlUXvPaF4E4ibAixystE+E+ hSt1jKJDlGoRnQdqoTOkPS1C7N/rq38Ayj0b2/GCMaormoFQGaca/s9E8Bj2hY/igHLo 3FcivylLDlMbBg3aMEbFAkJ62HNocZbBIy81e6rWZaD7Hmkgst60hyQ6ZM3JAHAT/pIV kJjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XIj+XIGeyKI5FSWHzvU/4VEeWwx2VX+90KKkI9DaRug=; b=a8+sAIgzZHv/jkMok82qRMHoGKjJRcCEBNfwjYtIWB0K/GDH7xasnVBEyEqDu/9K+D xhzCJzd9vTSgm8dw5A8JLCCUR8AokWaEUiKTDCGFOBCuEd5DNTO8iYfjVjt6wQuM+sW9 jpV3ZH6IaZx4iwxu4Tol4oeg+kAtB73CFHSVPfeFk3Hie0o01g1lyOFOCzdDIIbtRpMe zr9C95PVtWZq5j2YzT/aWFedRZUfsg4emArQZiFf06UKWIJreGKDhmFpDKWQKMs1sJ1i xApk4detdCnq3bNfXpBhOXJ48k/YZKTKADsDwHG+BBNA8oMXrbrkou4Rc14ezTVaO962 7APQ== X-Gm-Message-State: AJIora+BdsfCRF3xnvjxhq1q3eQvNwGsHns57vDtBQUN/eVPdI6Dczwh jWPblkYGpTfSIP91DmsLbyU= X-Google-Smtp-Source: AGRyM1s/6TsX5UTwruLCfYYy6dUoMxQyGlyfGGWDdaxtkgvTqyqeN75pp8Hy+SFw1evyVe/iDxHJHA== X-Received: by 2002:a5d:5581:0:b0:20f:fc51:7754 with SMTP id i1-20020a5d5581000000b0020ffc517754mr8427465wrv.413.1655460409678; Fri, 17 Jun 2022 03:06:49 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:06:49 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Brian Foster Subject: [PATCH 5.10 CANDIDATE 02/11] xfs: punch out data fork delalloc blocks on COW writeback failure Date: Fri, 17 Jun 2022 13:06:32 +0300 Message-Id: <20220617100641.1653164-3-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Brian Foster commit 5ca5916b6bc93577c360c06cb7cdf71adb9b5faf upstream. If writeback I/O to a COW extent fails, the COW fork blocks are punched out and the data fork blocks left alone. It is possible for COW fork blocks to overlap non-shared data fork blocks (due to cowextsz hint prealloc), however, and writeback unconditionally maps to the COW fork whenever blocks exist at the corresponding offset of the page undergoing writeback. This means it's quite possible for a COW fork extent to overlap delalloc data fork blocks, writeback to convert and map to the COW fork blocks, writeback to fail, and finally for ioend completion to cancel the COW fork blocks and leave stale data fork delalloc blocks around in the inode. The blocks are effectively stale because writeback failure also discards dirty page state. If this occurs, it is likely to trigger assert failures, free space accounting corruption and failures in unrelated file operations. For example, a subsequent reflink attempt of the affected file to a new target file will trip over the stale delalloc in the source file and fail. Several of these issues are occasionally reproduced by generic/648, but are reproducible on demand with the right sequence of operations and timely I/O error injection. To fix this problem, update the ioend failure path to also punch out underlying data fork delalloc blocks on I/O error. This is analogous to the writeback submission failure path in xfs_discard_page() where we might fail to map data fork delalloc blocks and consistent with the successful COW writeback completion path, which is responsible for unmapping from the data fork and remapping in COW fork blocks. Fixes: 787eb485509f ("xfs: fix and streamline error handling in xfs_end_io") Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein --- fs/xfs/xfs_aops.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index b4186d666157..953de843d9c3 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -145,6 +145,7 @@ xfs_end_ioend( struct iomap_ioend *ioend) { struct xfs_inode *ip = XFS_I(ioend->io_inode); + struct xfs_mount *mp = ip->i_mount; xfs_off_t offset = ioend->io_offset; size_t size = ioend->io_size; unsigned int nofs_flag; @@ -160,18 +161,26 @@ xfs_end_ioend( /* * Just clean up the in-memory strutures if the fs has been shut down. */ - if (XFS_FORCED_SHUTDOWN(ip->i_mount)) { + if (XFS_FORCED_SHUTDOWN(mp)) { error = -EIO; goto done; } /* - * Clean up any COW blocks on an I/O error. + * Clean up all COW blocks and underlying data fork delalloc blocks on + * I/O error. The delalloc punch is required because this ioend was + * mapped to blocks in the COW fork and the associated pages are no + * longer dirty. If we don't remove delalloc blocks here, they become + * stale and can corrupt free space accounting on unmount. */ error = blk_status_to_errno(ioend->io_bio->bi_status); if (unlikely(error)) { - if (ioend->io_flags & IOMAP_F_SHARED) + if (ioend->io_flags & IOMAP_F_SHARED) { xfs_reflink_cancel_cow_range(ip, offset, size, true); + xfs_bmap_punch_delalloc_range(ip, + XFS_B_TO_FSBT(mp, offset), + XFS_B_TO_FSB(mp, size)); + } goto done; } From patchwork Fri Jun 17 10:06:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E4FFCCA47C for ; Fri, 17 Jun 2022 10:07:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381618AbiFQKHs (ORCPT ); Fri, 17 Jun 2022 06:07:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381638AbiFQKHT (ORCPT ); Fri, 17 Jun 2022 06:07:19 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FA7669CF9; Fri, 17 Jun 2022 03:06:53 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id p6-20020a05600c1d8600b0039c630b8d96so4215941wms.1; Fri, 17 Jun 2022 03:06:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CYHDL8iVH4k/h/u1jXtZX43fHxv3dP+zew3PELKoGzk=; b=ioBDdt23uQpSGJQwc3jXQrulPuAMXcKKDMfK7nC728S1KHXxtchsYf7y5dsVG3iaWa AWkwIgUzGlt073oriq9xtCnvbVEt7WMJ4XZjughCWrG+xmm+xED4jidvsM7WCE5WSKm1 BfT3LAo+B2V5L8H1aYz4fVOtUEegPYheWHWTT85DrdRnxrsZX97TcAfObOX1+OgnesYG zgRW7cXKyWol0iH1YRuRzV4RztOuA+UHsxZbLc069qU8kVYITyc0CjcGgn2yexx4iUqO gg9e9VL2iywga71iaRDqO/f+ML5xGUEpiptekucNV+hLLEWLWs3Ds0tZy2DPHL0+vd34 5itg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CYHDL8iVH4k/h/u1jXtZX43fHxv3dP+zew3PELKoGzk=; b=xqlQ/F6aOrOr6j1NqwZV+EzEXq1U76E8Hng5nImwCBxU5KMpdMKQzxuyYq6sYJlnX8 Qrr/23lowj7rMHTpja31Cgm9TX/V2cq+3xx2Z7LhehhzEg1MbelU0igGOmo3fz5Jn5kD dMrvvzIK7jq7pmX8pCEzQnn3nxy2HrAxkTmHEfZKH+tYc4l+/4ItybnY+FioGdS/tAyB FZeXtmnT3KciGnJWjSRbE7ek/p3NcGgm44P2zAK7K0Noivms+ErfZ8UPeoGpqAcLxPB6 cUohqQ2/8vRWGniKNqTluL2YIBOZ1tRrXVdeyQXyLFgpP5IanvwS5QBI8fDrjbVq6EeG tvhQ== X-Gm-Message-State: AOAM532AuRuDYwrj9cx/5vyKlaevcD79CgVaRGUOCHqLVTm0aOADUuH3 GY4M7pvjq+MEQkPe6oz6QTE= X-Google-Smtp-Source: ABdhPJy1Ge4VXnDs9/BWX3PjBGKOVRNi9H78DxT9xt1r5sjvRm/T4yAxbTqZqhjH1mWW5rXUWmCuVw== X-Received: by 2002:a1c:7317:0:b0:399:e654:3c92 with SMTP id d23-20020a1c7317000000b00399e6543c92mr20270880wmb.49.1655460411385; Fri, 17 Jun 2022 03:06:51 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:06:50 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Yang Xu Subject: [PATCH 5.10 CANDIDATE 03/11] xfs: Fix the free logic of state in xfs_attr_node_hasname Date: Fri, 17 Jun 2022 13:06:33 +0300 Message-Id: <20220617100641.1653164-4-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Yang Xu commit a1de97fe296c52eafc6590a3506f4bbd44ecb19a upstream. When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine. Adding a getxattr operation after xattr corrupted, I can reproduce it 100%. The deadlock as below: [983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080 [ 983.923405] Call Trace: [ 983.923410] __schedule+0x2c4/0x700 [ 983.923412] schedule+0x37/0xa0 [ 983.923414] schedule_timeout+0x274/0x300 [ 983.923416] __down+0x9b/0xf0 [ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923453] down+0x3b/0x50 [ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs] [ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs] [ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs] [ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs] [ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs] [ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs] [ 983.923624] ? kmem_cache_alloc+0x12e/0x270 [ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs] [ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs] [ 983.923664] xfs_attr_set+0x273/0x320 [xfs] [ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs] [ 983.923686] __vfs_removexattr+0x4d/0x60 [ 983.923688] __vfs_removexattr_locked+0xac/0x130 [ 983.923689] vfs_removexattr+0x4e/0xf0 [ 983.923690] removexattr+0x4d/0x80 [ 983.923693] ? __check_object_size+0xa8/0x16b [ 983.923695] ? strncpy_from_user+0x47/0x1a0 [ 983.923696] ? getname_flags+0x6a/0x1e0 [ 983.923697] ? _cond_resched+0x15/0x30 [ 983.923699] ? __sb_start_write+0x1e/0x70 [ 983.923700] ? mnt_want_write+0x28/0x50 [ 983.923701] path_removexattr+0x9b/0xb0 [ 983.923702] __x64_sys_removexattr+0x17/0x20 [ 983.923704] do_syscall_64+0x5b/0x1a0 [ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 983.923707] RIP: 0033:0x7f080f10ee1b When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job. Then subsequent removexattr will hang because of it. This bug was introduced by kernel commit 07120f1abdff ("xfs: Add xfs_has_attr and subroutines"). It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state in this case. But xfs_attr_node_hasname will free state itself instead of caller if xfs_da3_node_lookup_int fails. Fix this bug by moving the step of free state into caller. [amir: this text from original commit is not relevant for 5.10 backport: Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and xfs_attr_node_removename_setup function because we should free state ourselves. ] Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Signed-off-by: Yang Xu Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein --- fs/xfs/libxfs/xfs_attr.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 96ac7e562b87..fcca36bbd997 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -876,21 +876,18 @@ xfs_attr_node_hasname( state = xfs_da_state_alloc(args); if (statep != NULL) - *statep = NULL; + *statep = state; /* * Search to see if name exists, and get back a pointer to it. */ error = xfs_da3_node_lookup_int(state, &retval); - if (error) { - xfs_da_state_free(state); - return error; - } + if (error) + retval = error; - if (statep != NULL) - *statep = state; - else + if (!statep) xfs_da_state_free(state); + return retval; } From patchwork Fri Jun 17 10:06:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885434 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D936CCA479 for ; Fri, 17 Jun 2022 10:07:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381623AbiFQKHm (ORCPT ); Fri, 17 Jun 2022 06:07:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381641AbiFQKHU (ORCPT ); Fri, 17 Jun 2022 06:07:20 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A5A669B6B; Fri, 17 Jun 2022 03:06:55 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id w17so5096881wrg.7; Fri, 17 Jun 2022 03:06:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=m11EV4fTYjQr+CF4ItvJAnTX7XtYwLnaXp2qOkxkWzo=; b=h9cmdzTF46gHaG9ANVIwEyBi0nsK7WcVU4jjz/jyFa8UKRs1vhi8NvNlme6rM+dfkB jKuoSF3krJmqCuShZLggGEcobVaCaarF40f1flhARO+nv5nMmA5d66RfyZmpWLnp/5qs bfYULMN65DzIjo9CqhLH7FptX74OEsosOIfMZP3MNIwm/PiGbjJLLYzUY5iV/4jfm3Mh cJZCjJxqWf5FhDvrEq5EmwBm7UPtENO+W/yxxm5iRwYrtmrFrsgZdUlRGjyxo/WpfXoy OWcK/rdge4T8eP0XXoMLrOkR+q9y1RkNSo9STWwpzcMufgzAAx3jktXF8VAb6tuP8Wpq oHlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=m11EV4fTYjQr+CF4ItvJAnTX7XtYwLnaXp2qOkxkWzo=; b=VY0gFrkEc8fTx0lh5zYN0VGYpYVSusQJQmlNt2I+rckBjk+B1lCd+PLZ6jhuh5/yWi nu1bcMG4hMpKyn9VhB5+0KGxBlpW89g3cuqScukMw6wA22F/vZlkfct8av8WD82yJKBC BUCVg7pwSTDc2DdD9fkUeY1PGGzD1KJnZwFAF8rTjno6tQ0y1sQKFa4FnnDuDhgSbP29 1KXofTy4L+s4Xn972JIcrC8htjFXIY5SWi6qAusyFcP0tZw7G6wtqm5LBw2wTTK8zh7R QpIW/8flV89+4vyhhCzw30s1b3OaRrye9GYVz0gDOGyKVbU35HxuLb8eu1jBoyeKM6X2 Fy+A== X-Gm-Message-State: AJIora/ijN4BgQwZGVzwS6Afm2+P98v2D9+f9PoWslq58/VYFHCTamoA J8+XLl+K2ML0nnXIJZ/qK5I= X-Google-Smtp-Source: AGRyM1v6cRBnhrJNK4NC/q0Lx0MgNcl4vveBlH1xqVrddaUIrQ21amVx9/iioJGJQ94d3jY9BWU9pQ== X-Received: by 2002:adf:b644:0:b0:210:1fde:a513 with SMTP id i4-20020adfb644000000b002101fdea513mr9123878wre.604.1655460413730; Fri, 17 Jun 2022 03:06:53 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:06:53 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Dave Chinner , Chandan Babu R Subject: [PATCH 5.10 CANDIDATE 04/11] xfs: remove all COW fork extents when remounting readonly Date: Fri, 17 Jun 2022 13:06:34 +0300 Message-Id: <20220617100641.1653164-5-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: "Darrick J. Wong" commit 089558bc7ba785c03815a49c89e28ad9b8de51f9 upstream. [backport xfs_icwalk -> xfs_eofblocks for 5.10.y] As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount # (0) fsstress & # (1) while true; do fsstress mount -o remount,ro # (2) fsstress mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. Solve this race by forcing the xfs_blockgc_free_space to run synchronously, which causes xfs_icwalk to return to inodes that were skipped because the blockgc code couldn't take the IOLOCK. This is safe to do here because the VFS has already prohibited new writer threads. Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Chandan Babu R Signed-off-by: Amir Goldstein --- fs/xfs/xfs_super.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index d220a63d7883..6323974d6b3e 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1711,7 +1711,10 @@ static int xfs_remount_ro( struct xfs_mount *mp) { - int error; + struct xfs_eofblocks eofb = { + .eof_flags = XFS_EOF_FLAGS_SYNC, + }; + int error; /* * Cancel background eofb scanning so it cannot race with the final @@ -1719,8 +1722,13 @@ xfs_remount_ro( */ xfs_stop_block_reaping(mp); - /* Get rid of any leftover CoW reservations... */ - error = xfs_icache_free_cowblocks(mp, NULL); + /* + * Clear out all remaining COW staging extents and speculative post-EOF + * preallocations so that we don't leave inodes requiring inactivation + * cleanups during reclaim on a read-only mount. We must process every + * cached inode, so this requires a synchronous cache scan. + */ + error = xfs_icache_free_cowblocks(mp, &eofb); if (error) { xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); return error; From patchwork Fri Jun 17 10:06:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885429 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17EB0CCA479 for ; Fri, 17 Jun 2022 10:07:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381640AbiFQKHf (ORCPT ); Fri, 17 Jun 2022 06:07:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381643AbiFQKHU (ORCPT ); Fri, 17 Jun 2022 06:07:20 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B860269CEC; Fri, 17 Jun 2022 03:06:56 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id x17so5098505wrg.6; Fri, 17 Jun 2022 03:06:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=47jpAIOuXq9C8yduL/rogZSE3IT97eVSEXgeU3hQDGc=; b=E8LEjur6pUGNeqET9KBTyDi2DDGKJwsVxRgcjPe+MV7DLr6TRNnDUKQjhYI4jlrSw2 4FNvYDWWiZWuNE5sliJEhKO3uQmyekI4foZh+7G2zJ4r6i0keI2utSFvMwClQXnBIRv/ XkRedZDDa8r9xbyYzpjd10+U+5BAwQE9FY9UOrmedNIEijJjhCoNPij/WiFY8bHLLzGA G4TP5dNCd84d+YLNfMqyIFWpfEeoSbqvQCI+UG5Kh3Z00kisfVh0Mogj7qH7t8TpGBkM aKyN8SXzj8UEW8lQI3G8ubHMWBlBqlCr4BAs7qOT3EG73HGRiVmyTISR3AIPflkc+28w gS/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=47jpAIOuXq9C8yduL/rogZSE3IT97eVSEXgeU3hQDGc=; b=gLUmCQLlgABUlV7zkVmHMp0JKteTwf5gQ3hYioxer74xCqd2S6/FKMMeVMSHmwgGan Ql+5k9nMpoCwfVBzfE50QG6tZy8Fnbqd56oE+wN9aKkvq57/1gUR0ulI/lLApxRexvYe WL23SACgD1OuazVSO+Hq1hTuQa6k04CQO5dFaogVaZCF/PGQL0s5XzXNWWM0Bye1q8Jb hP/5Nl6A35hLpiNBGaCsWJ0KXRodU1Uc0CLgbXA25y0pB4hDJuNZSeAInM0fjgyijgUb V63k3Am5TBKtFrqsk5Fdpdn/aY/QIwzJ69l3r3iB6yTxLhVfJw7zPYeEtEKIRp0Nd/2W uuLw== X-Gm-Message-State: AJIora+K0yjwoI3Ls1mcMDrfS6J/Mt/J/NG3/XOLZCTMHe0mHJEpAU39 ZLzK+8vlxJNjfH+LuMHpjWs= X-Google-Smtp-Source: AGRyM1upm3iDDnzGgTzFpwshxYvWBSHaWUDPJPc2blLKHU1q/2yuBwIf7bfTm8BA+PwflkL1RxRgRw== X-Received: by 2002:adf:eb4d:0:b0:218:45b6:8fdb with SMTP id u13-20020adfeb4d000000b0021845b68fdbmr8959402wrn.77.1655460415283; Fri, 17 Jun 2022 03:06:55 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:06:54 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Dave Chinner Subject: [PATCH 5.10 CANDIDATE 05/11] xfs: check sb_meta_uuid for dabuf buffer recovery Date: Fri, 17 Jun 2022 13:06:35 +0300 Message-Id: <20220617100641.1653164-6-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Dave Chinner commit 09654ed8a18cfd45027a67d6cbca45c9ea54feab upstream. Got a report that a repeated crash test of a container host would eventually fail with a log recovery error preventing the system from mounting the root filesystem. It manifested as a directory leaf node corruption on writeback like so: XFS (loop0): Mounting V5 Filesystem XFS (loop0): Starting recovery (logdev: internal) XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=....... 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...).... 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.# 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C...... 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................ 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A...... 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A.... XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed Tracing indicated that we were recovering changes from a transaction at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57. That is, log recovery was overwriting a buffer with newer changes on disk than was in the transaction. Tracing indicated that we were hitting the "recovery immediately" case in xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the buffer. The code was extracting the LSN correctly, then ignoring it because the UUID in the buffer did not match the superblock UUID. The problem arises because the UUID check uses the wrong UUID - it should be checking the sb_meta_uuid, not sb_uuid. This filesystem has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the correct matching sb_meta_uuid in it, it's just the code checked it against the wrong superblock uuid. The is no corruption in the filesystem, and failing to recover the buffer due to a write verifier failure means the recovery bug did not propagate the corruption to disk. Hence there is no corruption before or after this bug has manifested, the impact is limited simply to an unmountable filesystem.... This was missed back in 2015 during an audit of incorrect sb_uuid usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs to validate against sb_meta_uuid") that fixed the magic32 buffers to validate against sb_meta_uuid instead of sb_uuid. It missed the magicda buffers.... Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag") Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein --- fs/xfs/xfs_buf_item_recover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index d44e8b4a3391..1d649462d731 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -805,7 +805,7 @@ xlog_recover_get_buf_lsn( } if (lsn != (xfs_lsn_t)-1) { - if (!uuid_equal(&mp->m_sb.sb_uuid, uuid)) + if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid)) goto recover_immediately; return lsn; } From patchwork Fri Jun 17 10:06:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885430 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59771CCA480 for ; Fri, 17 Jun 2022 10:07:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381610AbiFQKHf (ORCPT ); Fri, 17 Jun 2022 06:07:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381574AbiFQKHU (ORCPT ); Fri, 17 Jun 2022 06:07:20 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADE1F6A011; Fri, 17 Jun 2022 03:06:58 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id a15so5118721wrh.2; Fri, 17 Jun 2022 03:06:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=loriEoiVPoDrwPGkS8Q5XLOnTWcA/2w8hcb95i2ZPaE=; b=oqjAWdy8E9mBMlxIqQuip+44uzZRTrJPoz8xDEN/HE8QmYhSr73zFDJMjB/TVexTy7 5x1vpIELIHSsHuGZN+6VojK4fd/JLAELaVchUpzzvvWmm1P2gLR4DQkpjgr8PTcZfgVK A6b2uT9KT6Vj0Fb2C0FsNpw+cNFK0Qn/QigS3kn8lDvxmL9r3JHTTZev75yyTIzs+EzM u5u/wDHR8qLFnPf+UVpx603JjyBaDdioThvGPchLVkrjovWS3nk9XUpwLWQK2CIyecy4 /KsePWtJspWveevJOj+QHv9E5LbvwHKefKLPBCsvdPgLSpW6JeJDldtcAxJB+QBS423w 0UEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=loriEoiVPoDrwPGkS8Q5XLOnTWcA/2w8hcb95i2ZPaE=; b=dK+CCzl66+FupfJp70vysql4exV5I7kK0WXkGrpK+Lgm9TjWkeCAUqsgGgz9dEJw1e TOah5u3FKqmUEeaunCBd+Sk0rPlXK/leFoq4jiWZ1Ypp+fJehWOANVq4bgNGsfjp6MsR j7AUVf2WCQBjTy4c1qXsjhZT0Lu6Z5XyYdiMk7iGFxizsX5Wl/2cjDqxao8GTtBTwolF +n9wm8/W37+43r3nqjLKPSxjPL5rFAg/aWu9n1yW7i7hK/YDIH6JXECFW8O33Og1b6KM E41boyj4FYZ9libLr536AstJ5+roBQxM6TUa4ac4+eUUGv8GtTbOc+TQ8+4vmKzWMobp AUWg== X-Gm-Message-State: AJIora8ttp3ysjvizmWCeJYjUhNfecKGyCXdPgMG5/85r0QkGt5QjbhZ OQtuuaoYKPw6WOqnw6kfIo4= X-Google-Smtp-Source: AGRyM1ubRH7Q0YRz2AFsOMHjK2er0Q4X1+tqKqDrw1fcjgRMwfTKB6D3iVTKi44Vel9uTJ5hDj+5jg== X-Received: by 2002:a5d:6a0e:0:b0:213:1f7f:e1cc with SMTP id m14-20020a5d6a0e000000b002131f7fe1ccmr8661795wru.31.1655460417077; Fri, 17 Jun 2022 03:06:57 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:06:56 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Brian Foster , Dave Chinner Subject: [PATCH 5.10 CANDIDATE 06/11] xfs: refactor xfs_file_fsync Date: Fri, 17 Jun 2022 13:06:36 +0300 Message-Id: <20220617100641.1653164-7-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Christoph Hellwig commit f22c7f87777361f94aa17f746fbadfa499248dc8 upstream. [backported for dependency] Factor out the log syncing logic into two helpers to make the code easier to read and more maintainable. Signed-off-by: Christoph Hellwig Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Amir Goldstein --- fs/xfs/xfs_file.c | 81 +++++++++++++++++++++++++++++------------------ 1 file changed, 50 insertions(+), 31 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 5b0f93f73837..414d856e2e75 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -118,6 +118,54 @@ xfs_dir_fsync( return xfs_log_force_inode(ip); } +static xfs_lsn_t +xfs_fsync_lsn( + struct xfs_inode *ip, + bool datasync) +{ + if (!xfs_ipincount(ip)) + return 0; + if (datasync && !(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP)) + return 0; + return ip->i_itemp->ili_last_lsn; +} + +/* + * All metadata updates are logged, which means that we just have to flush the + * log up to the latest LSN that touched the inode. + * + * If we have concurrent fsync/fdatasync() calls, we need them to all block on + * the log force before we clear the ili_fsync_fields field. This ensures that + * we don't get a racing sync operation that does not wait for the metadata to + * hit the journal before returning. If we race with clearing ili_fsync_fields, + * then all that will happen is the log force will do nothing as the lsn will + * already be on disk. We can't race with setting ili_fsync_fields because that + * is done under XFS_ILOCK_EXCL, and that can't happen because we hold the lock + * shared until after the ili_fsync_fields is cleared. + */ +static int +xfs_fsync_flush_log( + struct xfs_inode *ip, + bool datasync, + int *log_flushed) +{ + int error = 0; + xfs_lsn_t lsn; + + xfs_ilock(ip, XFS_ILOCK_SHARED); + lsn = xfs_fsync_lsn(ip, datasync); + if (lsn) { + error = xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC, + log_flushed); + + spin_lock(&ip->i_itemp->ili_lock); + ip->i_itemp->ili_fsync_fields = 0; + spin_unlock(&ip->i_itemp->ili_lock); + } + xfs_iunlock(ip, XFS_ILOCK_SHARED); + return error; +} + STATIC int xfs_file_fsync( struct file *file, @@ -125,13 +173,10 @@ xfs_file_fsync( loff_t end, int datasync) { - struct inode *inode = file->f_mapping->host; - struct xfs_inode *ip = XFS_I(inode); - struct xfs_inode_log_item *iip = ip->i_itemp; + struct xfs_inode *ip = XFS_I(file->f_mapping->host); struct xfs_mount *mp = ip->i_mount; int error = 0; int log_flushed = 0; - xfs_lsn_t lsn = 0; trace_xfs_file_fsync(ip); @@ -155,33 +200,7 @@ xfs_file_fsync( else if (mp->m_logdev_targp != mp->m_ddev_targp) xfs_blkdev_issue_flush(mp->m_ddev_targp); - /* - * All metadata updates are logged, which means that we just have to - * flush the log up to the latest LSN that touched the inode. If we have - * concurrent fsync/fdatasync() calls, we need them to all block on the - * log force before we clear the ili_fsync_fields field. This ensures - * that we don't get a racing sync operation that does not wait for the - * metadata to hit the journal before returning. If we race with - * clearing the ili_fsync_fields, then all that will happen is the log - * force will do nothing as the lsn will already be on disk. We can't - * race with setting ili_fsync_fields because that is done under - * XFS_ILOCK_EXCL, and that can't happen because we hold the lock shared - * until after the ili_fsync_fields is cleared. - */ - xfs_ilock(ip, XFS_ILOCK_SHARED); - if (xfs_ipincount(ip)) { - if (!datasync || - (iip->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP)) - lsn = iip->ili_last_lsn; - } - - if (lsn) { - error = xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed); - spin_lock(&iip->ili_lock); - iip->ili_fsync_fields = 0; - spin_unlock(&iip->ili_lock); - } - xfs_iunlock(ip, XFS_ILOCK_SHARED); + error = xfs_fsync_flush_log(ip, datasync, &log_flushed); /* * If we only have a single device, and the log force about was From patchwork Fri Jun 17 10:06:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36876C43334 for ; Fri, 17 Jun 2022 10:07:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381724AbiFQKHl (ORCPT ); Fri, 17 Jun 2022 06:07:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381648AbiFQKHV (ORCPT ); Fri, 17 Jun 2022 06:07:21 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94E646A409; Fri, 17 Jun 2022 03:07:00 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id a15so5118822wrh.2; Fri, 17 Jun 2022 03:07:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2mKJAGzNneCtsX1D90Vh4DpYd9rQyh4v6Mzb9MdLoH4=; b=OUFiz/ykZtP8gzs8hWarj9VfHH2SOjSbLRUD8JIy3NOUlzRyfTO8gfc5Na5QLV9JB7 0I/GNVy1e+aI3ePeKfo7p0G5XjBbAq8X7IBJrK9PLPgqkHaQfJD13mf0bxRQRHFro6fK i9Z8qIjrjsjmrL2GaM/F8Tfs+fFMShK6kuk692O8kmIKa5i9rmUXOf+CYoqamW45YhLb Dg7f85pgWPUox3xbvYKcAUuzKMueEzsmdGwcMklLWU6vbGiyynKySvMz8M9BTipMYhjI ox1SutRM1jfIsJkRCpPEEmB5Y/0DB5REJg/yb5sIWVb0/UtOy/n1JNDoTYwfXTs3FHT+ 788w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2mKJAGzNneCtsX1D90Vh4DpYd9rQyh4v6Mzb9MdLoH4=; b=uH9z5Tb/YPfuF4ayhvGKTfWbkLV+BRaX8OMBThqOM0JhuhC1uAqR8bsfi5QV63zvJ+ Dg2aU4vBGgcUY4aJoQaZUhT5LRvz/x1j4nnLP+eU3cNMsJiybrp8xfS1RJERbmtN6VFf ff6T05MqBh6ecYO1I91mDz9yDUr6k2yby8oQt73fLAhBybeqd6ST5Hchan6B4QsQ0WWv d1Xp8ueSdskoctz5y3TpTqy3AfihmqX9Z5UnQ3aF/57FAsaoTLEmIHgGipDKpBIswl83 wIXqkbzZgZHDRSxmkjjHyKO5XWgzSCj3OwHugowxp81wXA9TWmzDME6acizV4rjOFpsa 6Z9g== X-Gm-Message-State: AJIora+CdFV73EKcepMZqbJ/NOOPZoTOAanyZ1btelS8yL0jsA74o+wR 05CtGKZhgmyfesYxhpExRGW8I50jd8qfdA== X-Google-Smtp-Source: AGRyM1vdIFPRkXE2NSGUbmX6wZnL4yuszGcQU6/n1sdSSJOOKFe2b+s5OXMrrD4K94RCwU72/VQdvA== X-Received: by 2002:adf:c64c:0:b0:21a:7a3:54a4 with SMTP id u12-20020adfc64c000000b0021a07a354a4mr8692659wrg.163.1655460418975; Fri, 17 Jun 2022 03:06:58 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:06:58 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Dave Chinner , Brian Foster , Allison Henderson Subject: [PATCH 5.10 CANDIDATE 07/11] xfs: xfs_log_force_lsn isn't passed a LSN Date: Fri, 17 Jun 2022 13:06:37 +0300 Message-Id: <20220617100641.1653164-8-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Dave Chinner commit 5f9b4b0de8dc2fb8eb655463b438001c111570fe upstream. [backported from CIL scalability series for dependency] In doing an investigation into AIL push stalls, I was looking at the log force code to see if an async CIL push could be done instead. This lead me to xfs_log_force_lsn() and looking at how it works. xfs_log_force_lsn() is only called from inode synchronisation contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn value as the LSN to sync the log to. This gets passed to xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the journal, and then used by xfs_log_force_lsn() to flush the iclogs to the journal. The problem is that ip->i_itemp->ili_last_lsn does not store a log sequence number. What it stores is passed to it from the ->iop_committing method, which is called by xfs_log_commit_cil(). The value this passes to the iop_committing method is the CIL context sequence number that the item was committed to. As it turns out, xlog_cil_force_lsn() converts the sequence to an actual commit LSN for the related context and returns that to xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn" variable that contained a sequence with an actual LSN and then uses that to sync the iclogs. This caused me some confusion for a while, even though I originally wrote all this code a decade ago. ->iop_committing is only used by a couple of log item types, and only inode items use the sequence number it is passed. Let's clean up the API, CIL structures and inode log item to call it a sequence number, and make it clear that the high level code is using CIL sequence numbers and not on-disk LSNs for integrity synchronisation purposes. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong Reviewed-by: Allison Henderson Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein --- fs/xfs/libxfs/xfs_types.h | 1 + fs/xfs/xfs_buf_item.c | 2 +- fs/xfs/xfs_dquot_item.c | 2 +- fs/xfs/xfs_file.c | 14 +++++++------- fs/xfs/xfs_inode.c | 10 +++++----- fs/xfs/xfs_inode_item.c | 4 ++-- fs/xfs/xfs_inode_item.h | 2 +- fs/xfs/xfs_log.c | 27 ++++++++++++++------------- fs/xfs/xfs_log.h | 4 +--- fs/xfs/xfs_log_cil.c | 30 +++++++++++------------------- fs/xfs/xfs_log_priv.h | 15 +++++++-------- fs/xfs/xfs_trans.c | 6 +++--- fs/xfs/xfs_trans.h | 4 ++-- 13 files changed, 56 insertions(+), 65 deletions(-) diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index 397d94775440..1ce06173c2f5 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -21,6 +21,7 @@ typedef int32_t xfs_suminfo_t; /* type of bitmap summary info */ typedef uint32_t xfs_rtword_t; /* word type for bitmap manipulations */ typedef int64_t xfs_lsn_t; /* log sequence number */ +typedef int64_t xfs_csn_t; /* CIL sequence number */ typedef uint32_t xfs_dablk_t; /* dir/attr block number (in file) */ typedef uint32_t xfs_dahash_t; /* dir/attr hash value */ diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 8c6e26d62ef2..5d6535370f87 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -632,7 +632,7 @@ xfs_buf_item_release( STATIC void xfs_buf_item_committing( struct xfs_log_item *lip, - xfs_lsn_t commit_lsn) + xfs_csn_t seq) { return xfs_buf_item_release(lip); } diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index 8c1fdf37ee8f..8ed47b739b6c 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -188,7 +188,7 @@ xfs_qm_dquot_logitem_release( STATIC void xfs_qm_dquot_logitem_committing( struct xfs_log_item *lip, - xfs_lsn_t commit_lsn) + xfs_csn_t seq) { return xfs_qm_dquot_logitem_release(lip); } diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 414d856e2e75..4d6bf8d4974f 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -118,8 +118,8 @@ xfs_dir_fsync( return xfs_log_force_inode(ip); } -static xfs_lsn_t -xfs_fsync_lsn( +static xfs_csn_t +xfs_fsync_seq( struct xfs_inode *ip, bool datasync) { @@ -127,7 +127,7 @@ xfs_fsync_lsn( return 0; if (datasync && !(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP)) return 0; - return ip->i_itemp->ili_last_lsn; + return ip->i_itemp->ili_commit_seq; } /* @@ -150,12 +150,12 @@ xfs_fsync_flush_log( int *log_flushed) { int error = 0; - xfs_lsn_t lsn; + xfs_csn_t seq; xfs_ilock(ip, XFS_ILOCK_SHARED); - lsn = xfs_fsync_lsn(ip, datasync); - if (lsn) { - error = xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC, + seq = xfs_fsync_seq(ip, datasync); + if (seq) { + error = xfs_log_force_seq(ip->i_mount, seq, XFS_LOG_SYNC, log_flushed); spin_lock(&ip->i_itemp->ili_lock); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index e958b1c74561..1ae669d12301 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2754,7 +2754,7 @@ xfs_iunpin( trace_xfs_inode_unpin_nowait(ip, _RET_IP_); /* Give the log a push to start the unpinning I/O */ - xfs_log_force_lsn(ip->i_mount, ip->i_itemp->ili_last_lsn, 0, NULL); + xfs_log_force_seq(ip->i_mount, ip->i_itemp->ili_commit_seq, 0, NULL); } @@ -3717,16 +3717,16 @@ int xfs_log_force_inode( struct xfs_inode *ip) { - xfs_lsn_t lsn = 0; + xfs_csn_t seq = 0; xfs_ilock(ip, XFS_ILOCK_SHARED); if (xfs_ipincount(ip)) - lsn = ip->i_itemp->ili_last_lsn; + seq = ip->i_itemp->ili_commit_seq; xfs_iunlock(ip, XFS_ILOCK_SHARED); - if (!lsn) + if (!seq) return 0; - return xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC, NULL); + return xfs_log_force_seq(ip->i_mount, seq, XFS_LOG_SYNC, NULL); } /* diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 6ff91e5bf3cd..3aba4559469f 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -617,9 +617,9 @@ xfs_inode_item_committed( STATIC void xfs_inode_item_committing( struct xfs_log_item *lip, - xfs_lsn_t commit_lsn) + xfs_csn_t seq) { - INODE_ITEM(lip)->ili_last_lsn = commit_lsn; + INODE_ITEM(lip)->ili_commit_seq = seq; return xfs_inode_item_release(lip); } diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h index 4b926e32831c..403b45ab9aa2 100644 --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -33,7 +33,7 @@ struct xfs_inode_log_item { unsigned int ili_fields; /* fields to be logged */ unsigned int ili_fsync_fields; /* logged since last fsync */ xfs_lsn_t ili_flush_lsn; /* lsn at last flush */ - xfs_lsn_t ili_last_lsn; /* lsn at last transaction */ + xfs_csn_t ili_commit_seq; /* last transaction commit */ }; static inline int xfs_inode_clean(struct xfs_inode *ip) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b445e63cbc3c..05791456adbb 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -3210,14 +3210,13 @@ xfs_log_force( } static int -__xfs_log_force_lsn( - struct xfs_mount *mp, +xlog_force_lsn( + struct xlog *log, xfs_lsn_t lsn, uint flags, int *log_flushed, bool already_slept) { - struct xlog *log = mp->m_log; struct xlog_in_core *iclog; spin_lock(&log->l_icloglock); @@ -3250,8 +3249,6 @@ __xfs_log_force_lsn( if (!already_slept && (iclog->ic_prev->ic_state == XLOG_STATE_WANT_SYNC || iclog->ic_prev->ic_state == XLOG_STATE_SYNCING)) { - XFS_STATS_INC(mp, xs_log_force_sleep); - xlog_wait(&iclog->ic_prev->ic_write_wait, &log->l_icloglock); return -EAGAIN; @@ -3289,25 +3286,29 @@ __xfs_log_force_lsn( * to disk, that thread will wake up all threads waiting on the queue. */ int -xfs_log_force_lsn( +xfs_log_force_seq( struct xfs_mount *mp, - xfs_lsn_t lsn, + xfs_csn_t seq, uint flags, int *log_flushed) { + struct xlog *log = mp->m_log; + xfs_lsn_t lsn; int ret; - ASSERT(lsn != 0); + ASSERT(seq != 0); XFS_STATS_INC(mp, xs_log_force); - trace_xfs_log_force(mp, lsn, _RET_IP_); + trace_xfs_log_force(mp, seq, _RET_IP_); - lsn = xlog_cil_force_lsn(mp->m_log, lsn); + lsn = xlog_cil_force_seq(log, seq); if (lsn == NULLCOMMITLSN) return 0; - ret = __xfs_log_force_lsn(mp, lsn, flags, log_flushed, false); - if (ret == -EAGAIN) - ret = __xfs_log_force_lsn(mp, lsn, flags, log_flushed, true); + ret = xlog_force_lsn(log, lsn, flags, log_flushed, false); + if (ret == -EAGAIN) { + XFS_STATS_INC(mp, xs_log_force_sleep); + ret = xlog_force_lsn(log, lsn, flags, log_flushed, true); + } return ret; } diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 98c913da7587..a1089f8b7169 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -106,7 +106,7 @@ struct xfs_item_ops; struct xfs_trans; int xfs_log_force(struct xfs_mount *mp, uint flags); -int xfs_log_force_lsn(struct xfs_mount *mp, xfs_lsn_t lsn, uint flags, +int xfs_log_force_seq(struct xfs_mount *mp, xfs_csn_t seq, uint flags, int *log_forced); int xfs_log_mount(struct xfs_mount *mp, struct xfs_buftarg *log_target, @@ -132,8 +132,6 @@ bool xfs_log_writable(struct xfs_mount *mp); struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket); void xfs_log_ticket_put(struct xlog_ticket *ticket); -void xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_lsn_t *commit_lsn, bool regrant); void xlog_cil_process_committed(struct list_head *list); bool xfs_log_item_in_current_chkpt(struct xfs_log_item *lip); diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index cd5c04dabe2e..88730883bb70 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -777,7 +777,7 @@ xlog_cil_push_work( * that higher sequences will wait for us to write out a commit record * before they do. * - * xfs_log_force_lsn requires us to mirror the new sequence into the cil + * xfs_log_force_seq requires us to mirror the new sequence into the cil * structure atomically with the addition of this sequence to the * committing list. This also ensures that we can do unlocked checks * against the current sequence in log forces without risking @@ -1020,16 +1020,14 @@ xlog_cil_empty( * allowed again. */ void -xfs_log_commit_cil( - struct xfs_mount *mp, +xlog_cil_commit( + struct xlog *log, struct xfs_trans *tp, - xfs_lsn_t *commit_lsn, + xfs_csn_t *commit_seq, bool regrant) { - struct xlog *log = mp->m_log; struct xfs_cil *cil = log->l_cilp; struct xfs_log_item *lip, *next; - xfs_lsn_t xc_commit_lsn; /* * Do all necessary memory allocation before we lock the CIL. @@ -1043,10 +1041,6 @@ xfs_log_commit_cil( xlog_cil_insert_items(log, tp); - xc_commit_lsn = cil->xc_ctx->sequence; - if (commit_lsn) - *commit_lsn = xc_commit_lsn; - if (regrant && !XLOG_FORCED_SHUTDOWN(log)) xfs_log_ticket_regrant(log, tp->t_ticket); else @@ -1069,8 +1063,10 @@ xfs_log_commit_cil( list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) { xfs_trans_del_item(lip); if (lip->li_ops->iop_committing) - lip->li_ops->iop_committing(lip, xc_commit_lsn); + lip->li_ops->iop_committing(lip, cil->xc_ctx->sequence); } + if (commit_seq) + *commit_seq = cil->xc_ctx->sequence; /* xlog_cil_push_background() releases cil->xc_ctx_lock */ xlog_cil_push_background(log); @@ -1087,9 +1083,9 @@ xfs_log_commit_cil( * iclog flush is necessary following this call. */ xfs_lsn_t -xlog_cil_force_lsn( +xlog_cil_force_seq( struct xlog *log, - xfs_lsn_t sequence) + xfs_csn_t sequence) { struct xfs_cil *cil = log->l_cilp; struct xfs_cil_ctx *ctx; @@ -1185,21 +1181,17 @@ bool xfs_log_item_in_current_chkpt( struct xfs_log_item *lip) { - struct xfs_cil_ctx *ctx; + struct xfs_cil_ctx *ctx = lip->li_mountp->m_log->l_cilp->xc_ctx; if (list_empty(&lip->li_cil)) return false; - ctx = lip->li_mountp->m_log->l_cilp->xc_ctx; - /* * li_seq is written on the first commit of a log item to record the * first checkpoint it is written to. Hence if it is different to the * current sequence, we're in a new checkpoint. */ - if (XFS_LSN_CMP(lip->li_seq, ctx->sequence) != 0) - return false; - return true; + return lip->li_seq == ctx->sequence; } /* diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 1c6fdbf3d506..42cd1602ac25 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -230,7 +230,7 @@ struct xfs_cil; struct xfs_cil_ctx { struct xfs_cil *cil; - xfs_lsn_t sequence; /* chkpt sequence # */ + xfs_csn_t sequence; /* chkpt sequence # */ xfs_lsn_t start_lsn; /* first LSN of chkpt commit */ xfs_lsn_t commit_lsn; /* chkpt commit record lsn */ struct xlog_ticket *ticket; /* chkpt ticket */ @@ -268,10 +268,10 @@ struct xfs_cil { struct xfs_cil_ctx *xc_ctx; spinlock_t xc_push_lock ____cacheline_aligned_in_smp; - xfs_lsn_t xc_push_seq; + xfs_csn_t xc_push_seq; struct list_head xc_committing; wait_queue_head_t xc_commit_wait; - xfs_lsn_t xc_current_sequence; + xfs_csn_t xc_current_sequence; struct work_struct xc_push_work; wait_queue_head_t xc_push_wait; /* background push throttle */ } ____cacheline_aligned_in_smp; @@ -547,19 +547,18 @@ int xlog_cil_init(struct xlog *log); void xlog_cil_init_post_recovery(struct xlog *log); void xlog_cil_destroy(struct xlog *log); bool xlog_cil_empty(struct xlog *log); +void xlog_cil_commit(struct xlog *log, struct xfs_trans *tp, + xfs_csn_t *commit_seq, bool regrant); /* * CIL force routines */ -xfs_lsn_t -xlog_cil_force_lsn( - struct xlog *log, - xfs_lsn_t sequence); +xfs_lsn_t xlog_cil_force_seq(struct xlog *log, xfs_csn_t sequence); static inline void xlog_cil_force(struct xlog *log) { - xlog_cil_force_lsn(log, log->l_cilp->xc_current_sequence); + xlog_cil_force_seq(log, log->l_cilp->xc_current_sequence); } /* diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 36166bae24a6..73a1de7ceefc 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -832,7 +832,7 @@ __xfs_trans_commit( bool regrant) { struct xfs_mount *mp = tp->t_mountp; - xfs_lsn_t commit_lsn = -1; + xfs_csn_t commit_seq = 0; int error = 0; int sync = tp->t_flags & XFS_TRANS_SYNC; @@ -874,7 +874,7 @@ __xfs_trans_commit( xfs_trans_apply_sb_deltas(tp); xfs_trans_apply_dquot_deltas(tp); - xfs_log_commit_cil(mp, tp, &commit_lsn, regrant); + xlog_cil_commit(mp->m_log, tp, &commit_seq, regrant); xfs_trans_free(tp); @@ -883,7 +883,7 @@ __xfs_trans_commit( * log out now and wait for it. */ if (sync) { - error = xfs_log_force_lsn(mp, commit_lsn, XFS_LOG_SYNC, NULL); + error = xfs_log_force_seq(mp, commit_seq, XFS_LOG_SYNC, NULL); XFS_STATS_INC(mp, xs_trans_sync); } else { XFS_STATS_INC(mp, xs_trans_async); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 075eeade4f7d..97485559008b 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -43,7 +43,7 @@ struct xfs_log_item { struct list_head li_cil; /* CIL pointers */ struct xfs_log_vec *li_lv; /* active log vector */ struct xfs_log_vec *li_lv_shadow; /* standby vector */ - xfs_lsn_t li_seq; /* CIL commit seq */ + xfs_csn_t li_seq; /* CIL commit seq */ }; /* @@ -69,7 +69,7 @@ struct xfs_item_ops { void (*iop_pin)(struct xfs_log_item *); void (*iop_unpin)(struct xfs_log_item *, int remove); uint (*iop_push)(struct xfs_log_item *, struct list_head *); - void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn); + void (*iop_committing)(struct xfs_log_item *lip, xfs_csn_t seq); void (*iop_release)(struct xfs_log_item *); xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t); int (*iop_recover)(struct xfs_log_item *lip, From patchwork Fri Jun 17 10:06:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885432 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53526CCA479 for ; Fri, 17 Jun 2022 10:07:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381718AbiFQKHi (ORCPT ); Fri, 17 Jun 2022 06:07:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381607AbiFQKHW (ORCPT ); Fri, 17 Jun 2022 06:07:22 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FA196A425; Fri, 17 Jun 2022 03:07:02 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id p6-20020a05600c1d8600b0039c630b8d96so4216141wms.1; Fri, 17 Jun 2022 03:07:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fRsuzpmsJTmqxEUSle46OYeatOffzA49CzqyNi+9re0=; b=BLoFGTcOUkCGrG8UpEV2vv+ROf+cTEOO+XNFbJarjvmNYa8mOz2+i48MzYBN6IAq/1 yCTu3Z5pZB2h80M3zvUnDlVyfm+ou2+jpvuDcFXp705Swhg7/FwYhifwiIHpPK56ArYi +ebRVCFPTfxTmGaf3iA1g+VzEJKj2IL72sIrQVOCPLr4MeH3ZFRtnW22mStqtsEknY8C mYR0jGJJX0xHXpccjDBSOEgIjhaoaYjbl9tQw1nMZNh4NLDmNIZ3uUICZ7HrQwNqtOm3 FL0A3/UncPCrj3Bs3xyI+NbGFmKftX0MLlnIoQd/MVEt4OUt61PvQtNWOlxzVuLg+7Nt YdSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fRsuzpmsJTmqxEUSle46OYeatOffzA49CzqyNi+9re0=; b=CpZCsOD/5V93scPllJSwK5TuFW9WJCdi80ZvtfDDrpGWjZpu/w8ZAdtPmIa8vDhoV0 jp3B1mHVfV0eWp5Kr2kDXInVMdDjfRX3/hiK61FjRmiAlvh8xH6iK2PQ+tnozn9l0j7e 8Y10+xqxMi0lSvXa8Wd9PDbV5NDDARgW111tZH08AyfaAflVrZgSgWHmaJE5uN30ZiQB NcbFlkoD2eAuBCTan7dujw8cIkuw+b+rxP+1TtqYNSZ1xZGqKH/XUlzOAmHhAIniiYBN aundlDOUfWUOjbjvdCZ0cwRi61rIB3CxnsxJjb9DG7v61bOKzTKXe0q/j9evAKzpZ0z9 RY4A== X-Gm-Message-State: AJIora/+uYVlDHqAM01UvK4mHM2rjrA+tXiwBCN59hr+G1l8696knOKT 0SLXUH2rZeT8KYBAnqvUc+c= X-Google-Smtp-Source: AGRyM1tEMTxriJkwUDy7ds6fctTwWdn3I1lg8Zs9HpPTHOeIMEcMf+u5/MMO8UY5+epMVJgV4Lti3Q== X-Received: by 2002:a05:600c:190d:b0:39c:8216:f53d with SMTP id j13-20020a05600c190d00b0039c8216f53dmr9607547wmq.108.1655460420590; Fri, 17 Jun 2022 03:07:00 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.06.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:07:00 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Dave Chinner Subject: [PATCH 5.10 CANDIDATE 08/11] xfs: prevent UAF in xfs_log_item_in_current_chkpt Date: Fri, 17 Jun 2022 13:06:38 +0300 Message-Id: <20220617100641.1653164-9-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: "Darrick J. Wong" commit f8d92a66e810acbef6ddbc0bd0cbd9b117ce8acd upstream. While I was running with KASAN and lockdep enabled, I stumbled upon an KASAN report about a UAF to a freed CIL checkpoint. Looking at the comment for xfs_log_item_in_current_chkpt, it seems pretty obvious to me that the original patch to xfs_defer_finish_noroll should have done something to lock the CIL to prevent it from switching the CIL contexts while the predicate runs. For upper level code that needs to know if a given log item is new enough not to need relogging, add a new wrapper that takes the CIL context lock long enough to sample the current CIL context. This is kind of racy in that the CIL can switch the contexts immediately after sampling, but that's ok because the consequence is that the defer ops code is a little slow to relog items. ================================================================== BUG: KASAN: use-after-free in xfs_log_item_in_current_chkpt+0x139/0x160 [xfs] Read of size 8 at addr ffff88804ea5f608 by task fsstress/527999 CPU: 1 PID: 527999 Comm: fsstress Tainted: G D 5.16.0-rc4-xfsx #rc4 Call Trace: dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x83/0xdf xfs_log_item_in_current_chkpt+0x139/0x160 xfs_defer_finish_noroll+0x3bb/0x1e30 __xfs_trans_commit+0x6c8/0xcf0 xfs_reflink_remap_extent+0x66f/0x10e0 xfs_reflink_remap_blocks+0x2dd/0xa90 xfs_file_remap_range+0x27b/0xc30 vfs_dedupe_file_range_one+0x368/0x420 vfs_dedupe_file_range+0x37c/0x5d0 do_vfs_ioctl+0x308/0x1260 __x64_sys_ioctl+0xa1/0x170 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f2c71a2950b Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe8c0e03c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005600862a8740 RCX: 00007f2c71a2950b RDX: 00005600862a7be0 RSI: 00000000c0189436 RDI: 0000000000000004 RBP: 000000000000000b R08: 0000000000000027 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000005a R13: 00005600862804a8 R14: 0000000000016000 R15: 00005600862a8a20 Allocated by task 464064: kasan_save_stack+0x1e/0x50 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0xcd/0x2c0 [xfs] xlog_cil_ctx_alloc+0x17/0x1e0 [xfs] xlog_cil_push_work+0x141/0x13d0 [xfs] process_one_work+0x7f6/0x1380 worker_thread+0x59d/0x1040 kthread+0x3b0/0x490 ret_from_fork+0x1f/0x30 Freed by task 51: kasan_save_stack+0x1e/0x50 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xed/0x130 slab_free_freelist_hook+0x7f/0x160 kfree+0xde/0x340 xlog_cil_committed+0xbfd/0xfe0 [xfs] xlog_cil_process_committed+0x103/0x1c0 [xfs] xlog_state_do_callback+0x45d/0xbd0 [xfs] xlog_ioend_work+0x116/0x1c0 [xfs] process_one_work+0x7f6/0x1380 worker_thread+0x59d/0x1040 kthread+0x3b0/0x490 ret_from_fork+0x1f/0x30 Last potentially related work creation: kasan_save_stack+0x1e/0x50 __kasan_record_aux_stack+0xb7/0xc0 insert_work+0x48/0x2e0 __queue_work+0x4e7/0xda0 queue_work_on+0x69/0x80 xlog_cil_push_now.isra.0+0x16b/0x210 [xfs] xlog_cil_force_seq+0x1b7/0x850 [xfs] xfs_log_force_seq+0x1c7/0x670 [xfs] xfs_file_fsync+0x7c1/0xa60 [xfs] __x64_sys_fsync+0x52/0x80 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88804ea5f600 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 8 bytes inside of 256-byte region [ffff88804ea5f600, ffff88804ea5f700) The buggy address belongs to the page: page:ffffea00013a9780 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804ea5ea00 pfn:0x4ea5e head:ffffea00013a9780 order:1 compound_mapcount:0 flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff) raw: 04fff80000010200 ffffea0001245908 ffffea00011bd388 ffff888004c42b40 raw: ffff88804ea5ea00 0000000000100009 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88804ea5f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88804ea5f580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88804ea5f600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88804ea5f680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88804ea5f700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fixes: 4e919af7827a ("xfs: periodically relog deferred intent items") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Amir Goldstein --- fs/xfs/xfs_log_cil.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 88730883bb70..fbe160d5e9b9 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -1179,9 +1179,9 @@ xlog_cil_force_seq( */ bool xfs_log_item_in_current_chkpt( - struct xfs_log_item *lip) + struct xfs_log_item *lip) { - struct xfs_cil_ctx *ctx = lip->li_mountp->m_log->l_cilp->xc_ctx; + struct xfs_cil *cil = lip->li_mountp->m_log->l_cilp; if (list_empty(&lip->li_cil)) return false; @@ -1191,7 +1191,7 @@ xfs_log_item_in_current_chkpt( * first checkpoint it is written to. Hence if it is different to the * current sequence, we're in a new checkpoint. */ - return lip->li_seq == ctx->sequence; + return lip->li_seq == READ_ONCE(cil->xc_current_sequence); } /* From patchwork Fri Jun 17 10:06:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C806C433EF for ; Fri, 17 Jun 2022 10:07:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381654AbiFQKHg (ORCPT ); Fri, 17 Jun 2022 06:07:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381605AbiFQKHW (ORCPT ); Fri, 17 Jun 2022 06:07:22 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99CE96A42C; Fri, 17 Jun 2022 03:07:03 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id l2-20020a05600c4f0200b0039c55c50482so4117094wmq.0; Fri, 17 Jun 2022 03:07:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4KLwIj5Vu7AZlMJL3ChxFNmM9rxj5CAzgdTye3eR8PQ=; b=WTrBLahFEki5jxOMZWYPRoyceiXgV9iw1qPlrUbx/Mkaxnp8p2f0vzfygdGGwhF4fd GIPcRHiYlWWPrHrQwRdkRD28EtNIf6fTO3fz1jyES+edEpaj0gZ5ycUbWBqALzBcp21Y hl1m1KIqOVgqo9IIltisvb1S6sq9/ieMkYWbe+pz7FazBOSaQY9975rAENmg0qJcNh7p pshYupgu366+jJAsJOzhiq1HcqwofB51JgyoE8JORbZKUBL1aFFClbbKpiKVoo2RrsUg y/6KK8AVnuiIEtxIepRmDYSJxDbgygeo4TdeBtIny2342ILwyyvxDC8jnYXIufCbc9Tv Tv2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4KLwIj5Vu7AZlMJL3ChxFNmM9rxj5CAzgdTye3eR8PQ=; b=qjVBD1dufAOOq7g+H437RRn84O/QiSvtmNZUIwE//3R4jNbuCgpqpbmLvTpHQcFYeN u9lJoIc9rwNYUUj3HEMjbtrPZ8/gvQ7T30XBFtyzZzlaf4Ot2G/h/ejrkOs7Yhiqi5ET jIqvy2DrtzqYUUw4HwNy3qdK25Do5WABglr1ydIVPZflOBneQlDY+1P3V9gcQXI+ZLab gf6W1WrQqmlcxZ9rH6R83rMI59mQYcIZ21fTg90VSrzLMmSIt11T7QiZ4+SSwo5rFURz Aiv/v8eL2+jDxl63M+yVMBUR9W5wJ7k4sa6lpN9TA945Iev6Tf/6oGw3DMxBv351YIUz ynpA== X-Gm-Message-State: AOAM5324sxmCBf7ucw0GSEJQiAcnprlWnqPnvtRRCIIiXT8nd7qATbNm ULkld2UpmFmfbdhc5uRevVsknsoHdqcO/A== X-Google-Smtp-Source: ABdhPJwrIaptM1v8O+UzyLACJ6TPhS00hzw3rvxFC9pDVzrS8iaoiFH3JVDsH2jk+cwZtT5FwsPSRA== X-Received: by 2002:a7b:cc13:0:b0:38e:67e3:db47 with SMTP id f19-20020a7bcc13000000b0038e67e3db47mr20197606wmh.133.1655460422202; Fri, 17 Jun 2022 03:07:02 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.07.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:07:01 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Dave Chinner Subject: [PATCH 5.10 CANDIDATE 09/11] xfs: only bother with sync_filesystem during readonly remount Date: Fri, 17 Jun 2022 13:06:39 +0300 Message-Id: <20220617100641.1653164-10-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: "Darrick J. Wong" commit b97cca3ba9098522e5a1c3388764ead42640c1a5 upstream. In commit 02b9984d6408, we pushed a sync_filesystem() call from the VFS into xfs_fs_remount. The only time that we ever need to push dirty file data or metadata to disk for a remount is if we're remounting the filesystem read only, so this really could be moved to xfs_remount_ro. Once we've moved the call site, actually check the return value from sync_filesystem. Fixes: 02b9984d6408 ("fs: push sync_filesystem() down to the file system's remount_fs()") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Amir Goldstein --- fs/xfs/xfs_super.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 6323974d6b3e..dd0439ae6732 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1716,6 +1716,11 @@ xfs_remount_ro( }; int error; + /* Flush all the dirty data to disk. */ + error = sync_filesystem(mp->m_super); + if (error) + return error; + /* * Cancel background eofb scanning so it cannot race with the final * log force+buftarg wait and deadlock the remount. @@ -1786,8 +1791,6 @@ xfs_fc_reconfigure( if (error) return error; - sync_filesystem(mp->m_super); - /* inode32 -> inode64 */ if ((mp->m_flags & XFS_MOUNT_SMALL_INUMS) && !(new_mp->m_flags & XFS_MOUNT_SMALL_INUMS)) { From patchwork Fri Jun 17 10:06:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885436 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7820BC433EF for ; Fri, 17 Jun 2022 10:07:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381614AbiFQKHr (ORCPT ); Fri, 17 Jun 2022 06:07:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381581AbiFQKHX (ORCPT ); Fri, 17 Jun 2022 06:07:23 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37F676A436; Fri, 17 Jun 2022 03:07:05 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id z9so2058826wmf.3; Fri, 17 Jun 2022 03:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VPwd3V/mnJTePijepg+bg6gI8KKUf3jNlg+9LNT8k34=; b=E5Y+jdBNSP6FJgv01znY/FqvVxDVRzuTA0E6/KWltR7AQvsYNt9YDm+rL/qzpI0/LE p1moFKNmkPA05tehp9CVBxKJm3rQW102IV6rKzYRosIjOjT/bReYMDvUTmD13zZzPEit FWvtfBWCAyL8MsTMeynedbm0jE4TkmTvB5N/1r1G/ydySmOz56/2DBXnMT5EV8REKPGn TNkPOCj7//At1V+B+0XfJiO1t3+V8ovSVI2ZfBJ8O3c2/aCRHWzP5sSVcKPiXG8Hou3U TSlodWgAR1UZC1fvgWEVziy5uKo8H4Reo5gyTuGwzv2NAvXPtYrPmPDZJFPNMLdvS/ZN hcvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VPwd3V/mnJTePijepg+bg6gI8KKUf3jNlg+9LNT8k34=; b=gZ7abM8lderM5aLFJNK0B08/Fq9fE9Bx4sBkAUhvYgX9macsfM6FSKZ3l2+vtP+4w3 9YvG25EY5fC7hxcG/RvhTe2Dn2L5WMHP35Mj5oUSslFy46WLEbKNEvaNWp6bhbzaAq9y 0m6I1Yh/oeCs+hI8Jh7BAK+JmJXZdSBKkwlwkzfdyF9NS35Vrto+90k0i5opZoIkx8c6 KlXIm1RppHXLuJFBYhRkkbAv+wqsMmHDaQJDyYe1k0J1D9UsixbGOKJQ/HrGcF05zZYv IjAl/81Bd6DZ+o2Zs+Gu96nYducVInK07gJ4gEpRDd4Qaz3Fk4kKNVFP2RVAPAiyhQfm hBjw== X-Gm-Message-State: AJIora/TlIls7V3usfJzVDKefYlqNLg98nfcq1jdhnO93NqfHIS7E4YW JWnp+L/u+CXpgifue1xulKw= X-Google-Smtp-Source: AGRyM1u2GxiarAuwK9CPF5PGhoMntxA+rwQDI1ys2Luxn5j5Y3sG+3sii7xe9daGZ4qK12PklJWb/w== X-Received: by 2002:a05:600c:b51:b0:39d:b58f:67bf with SMTP id k17-20020a05600c0b5100b0039db58f67bfmr9391616wmr.195.1655460423753; Fri, 17 Jun 2022 03:07:03 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:07:03 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org Subject: [PATCH 5.10 CANDIDATE 10/11] xfs: fix up non-directory creation in SGID directories Date: Fri, 17 Jun 2022 13:06:40 +0300 Message-Id: <20220617100641.1653164-11-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: Christoph Hellwig commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream. XFS always inherits the SGID bit if it is set on the parent inode, while the generic inode_init_owner does not do this in a few cases where it can create a possible security problem, see commit 0fa3ecd87848 ("Fix up non-directory creation in SGID directories") for details. Switch XFS to use the generic helper for the normal path to fix this, just keeping the simple field inheritance open coded for the case of the non-sgid case with the bsdgrpid mount option. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Christian Brauner Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein --- fs/xfs/xfs_inode.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 1ae669d12301..ae86fbdc1bab 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -802,6 +802,7 @@ xfs_ialloc( xfs_buf_t **ialloc_context, xfs_inode_t **ipp) { + struct inode *dir = pip ? VFS_I(pip) : NULL; struct xfs_mount *mp = tp->t_mountp; xfs_ino_t ino; xfs_inode_t *ip; @@ -847,18 +848,17 @@ xfs_ialloc( return error; ASSERT(ip != NULL); inode = VFS_I(ip); - inode->i_mode = mode; set_nlink(inode, nlink); - inode->i_uid = current_fsuid(); inode->i_rdev = rdev; ip->i_d.di_projid = prid; - if (pip && XFS_INHERIT_GID(pip)) { - inode->i_gid = VFS_I(pip)->i_gid; - if ((VFS_I(pip)->i_mode & S_ISGID) && S_ISDIR(mode)) - inode->i_mode |= S_ISGID; + if (dir && !(dir->i_mode & S_ISGID) && + (mp->m_flags & XFS_MOUNT_GRPID)) { + inode->i_uid = current_fsuid(); + inode->i_gid = dir->i_gid; + inode->i_mode = mode; } else { - inode->i_gid = current_fsgid(); + inode_init_owner(inode, dir, mode); } /* From patchwork Fri Jun 17 10:06:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 12885435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E916CCA482 for ; Fri, 17 Jun 2022 10:07:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381641AbiFQKHo (ORCPT ); Fri, 17 Jun 2022 06:07:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381669AbiFQKHY (ORCPT ); Fri, 17 Jun 2022 06:07:24 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE8B46A43D; Fri, 17 Jun 2022 03:07:06 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id m32-20020a05600c3b2000b0039756bb41f2so2138305wms.3; Fri, 17 Jun 2022 03:07:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+F6aTIXlSin5tx3PF0RUSLMxKLKa7f7bpvoyxuqglDY=; b=Y+Zd61v6ktTBXhWUVgV12t8pNlvHlYfwGz9eH2kZS7s66YkpVym6RrEISo/3SfJCG6 XIjlSs4CQx21GfhrRhAJR6YMHSi6nyuyRucOc60xyD242QdoOuvJmPiiij9BBVPloHds iY0Z9A+a1n0DitsF9tusC8gjVHvIzELsLA70gZsOQCYFMhUSD6v9DTyzOtuX8Seyg4GA zk3DgJ1CyYGWBn37qFbma0BdIxP6Y7lraoROjxNl/mDt1s06IUfE6F92GYUG7oED6JxP 1oJ210g5cS4L0uKSpVP6fXH5XZpyxZudBwEkBXcAPUy/bxIgkt7cgdDMX6q3gSMFJikR djFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+F6aTIXlSin5tx3PF0RUSLMxKLKa7f7bpvoyxuqglDY=; b=w/RfiICczQrailgO21KyQoAtCusg7rHVUzQNi7Lm4MQg3nvYvOTOAM1pOpFOOuEdYY 4xXoRgZWsFCjlo6qfEzyGO08T6innhZlh6dbuKJZuzB0qH0nVF1mVshNg93ANQkrfFxS MsW8gKv3E+bZLUHZNtzg4Awj3T09jYqXGASAsUwyVxFbeEBYXzMVqV28CIHh/PzQPEyR 5KS3MOkuY+CYCSUEyHP27irpUdkE/zemvq3XCgW5XQBjlpfrOJPog3YQbO6v+qy/aIKL rS1+79OHjmxoDpuYS2ScJjcOjHwAU14/uQgPITFZDUXbaNs8m8ttdKdo9gno3yU+hk5l hRhg== X-Gm-Message-State: AOAM531S7mNHjRWHhhyyaFhk0BdT5+cBIqo3RkpKxoo+4t+twFEP20VQ oUBkNnHFJ+JRCsMtq2KCwUg= X-Google-Smtp-Source: ABdhPJyCJKuxVWiduVnPGF4s7hzJEkNISVOvHq0ip+JFdNiwuQSNXJ0Ejm0J/7D8gaeOt+8IvkGWxg== X-Received: by 2002:a7b:c1c1:0:b0:39c:58c4:c701 with SMTP id a1-20020a7bc1c1000000b0039c58c4c701mr20609265wmj.117.1655460425390; Fri, 17 Jun 2022 03:07:05 -0700 (PDT) Received: from localhost.localdomain ([77.137.66.49]) by smtp.gmail.com with ESMTPSA id m42-20020a05600c3b2a00b003973435c517sm5265534wms.0.2022.06.17.03.07.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:07:05 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" Cc: Leah Rumancik , Luis Chamberlain , Dave Chinner , Christoph Hellwig , Christian Brauner , linux-xfs@vger.kernel.org, fstests@vger.kernel.org, Dave Chinner , Christian Brauner Subject: [PATCH 5.10 CANDIDATE 11/11] xfs: use setattr_copy to set vfs inode attributes Date: Fri, 17 Jun 2022 13:06:41 +0300 Message-Id: <20220617100641.1653164-12-amir73il@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617100641.1653164-1-amir73il@gmail.com> References: <20220617100641.1653164-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org From: "Darrick J. Wong" commit e014f37db1a2d109afa750042ac4d69cf3e3d88e upstream. [remove userns argument of setattr_copy() for backport] Filipe Manana pointed out that XFS' behavior w.r.t. setuid/setgid revocation isn't consistent with btrfs[1] or ext4. Those two filesystems use the VFS function setattr_copy to convey certain attributes from struct iattr into the VFS inode structure. Andrey Zhadchenko reported[2] that XFS uses the wrong user namespace to decide if it should clear setgid and setuid on a file attribute update. This is a second symptom of the problem that Filipe noticed. XFS, on the other hand, open-codes setattr_copy in xfs_setattr_mode, xfs_setattr_nonsize, and xfs_setattr_time. Regrettably, setattr_copy is /not/ a simple copy function; it contains additional logic to clear the setgid bit when setting the mode, and XFS' version no longer matches. The VFS implements its own setuid/setgid stripping logic, which establishes consistent behavior. It's a tad unfortunate that it's scattered across notify_change, should_remove_suid, and setattr_copy but XFS should really follow the Linux VFS. Adapt XFS to use the VFS functions and get rid of the old functions. [1] https://lore.kernel.org/fstests/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y2KCV25Q@mail.gmail.com/ [2] https://lore.kernel.org/linux-xfs/20220221182218.748084-1-andrey.zhadchenko@virtuozzo.com/ Fixes: 7fa294c8991c ("userns: Allow chown and setgid preservation") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Christian Brauner Signed-off-by: Amir Goldstein --- fs/xfs/xfs_iops.c | 56 +++-------------------------------------------- fs/xfs/xfs_pnfs.c | 3 ++- 2 files changed, 5 insertions(+), 54 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index b7f7b31a77d5..5711c8c12625 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -595,37 +595,6 @@ xfs_vn_getattr( return 0; } -static void -xfs_setattr_mode( - struct xfs_inode *ip, - struct iattr *iattr) -{ - struct inode *inode = VFS_I(ip); - umode_t mode = iattr->ia_mode; - - ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); - - inode->i_mode &= S_IFMT; - inode->i_mode |= mode & ~S_IFMT; -} - -void -xfs_setattr_time( - struct xfs_inode *ip, - struct iattr *iattr) -{ - struct inode *inode = VFS_I(ip); - - ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); - - if (iattr->ia_valid & ATTR_ATIME) - inode->i_atime = iattr->ia_atime; - if (iattr->ia_valid & ATTR_CTIME) - inode->i_ctime = iattr->ia_ctime; - if (iattr->ia_valid & ATTR_MTIME) - inode->i_mtime = iattr->ia_mtime; -} - static int xfs_vn_change_ok( struct dentry *dentry, @@ -740,16 +709,6 @@ xfs_setattr_nonsize( goto out_cancel; } - /* - * CAP_FSETID overrides the following restrictions: - * - * The set-user-ID and set-group-ID bits of a file will be - * cleared upon successful return from chown() - */ - if ((inode->i_mode & (S_ISUID|S_ISGID)) && - !capable(CAP_FSETID)) - inode->i_mode &= ~(S_ISUID|S_ISGID); - /* * Change the ownerships and register quota modifications * in the transaction. @@ -761,7 +720,6 @@ xfs_setattr_nonsize( olddquot1 = xfs_qm_vop_chown(tp, ip, &ip->i_udquot, udqp); } - inode->i_uid = uid; } if (!gid_eq(igid, gid)) { if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_GQUOTA_ON(mp)) { @@ -772,15 +730,10 @@ xfs_setattr_nonsize( olddquot2 = xfs_qm_vop_chown(tp, ip, &ip->i_gdquot, gdqp); } - inode->i_gid = gid; } } - if (mask & ATTR_MODE) - xfs_setattr_mode(ip, iattr); - if (mask & (ATTR_ATIME|ATTR_CTIME|ATTR_MTIME)) - xfs_setattr_time(ip, iattr); - + setattr_copy(inode, iattr); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); XFS_STATS_INC(mp, xs_ig_attrchg); @@ -1025,11 +978,8 @@ xfs_setattr_size( xfs_inode_clear_eofblocks_tag(ip); } - if (iattr->ia_valid & ATTR_MODE) - xfs_setattr_mode(ip, iattr); - if (iattr->ia_valid & (ATTR_ATIME|ATTR_CTIME|ATTR_MTIME)) - xfs_setattr_time(ip, iattr); - + ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID))); + setattr_copy(inode, iattr); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); XFS_STATS_INC(mp, xs_ig_attrchg); diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c index f3082a957d5e..ae61094bc9d1 100644 --- a/fs/xfs/xfs_pnfs.c +++ b/fs/xfs/xfs_pnfs.c @@ -283,7 +283,8 @@ xfs_fs_commit_blocks( xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - xfs_setattr_time(ip, iattr); + ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID))); + setattr_copy(inode, iattr); if (update_isize) { i_size_write(inode, iattr->ia_size); ip->i_d.di_size = iattr->ia_size;