From patchwork Mon May 1 18:26:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13227873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4AC8C77B73 for ; Mon, 1 May 2023 18:27:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229688AbjEAS1C (ORCPT ); Mon, 1 May 2023 14:27:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229653AbjEAS1B (ORCPT ); Mon, 1 May 2023 14:27:01 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47E16DF for ; Mon, 1 May 2023 11:27:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D943061E84 for ; Mon, 1 May 2023 18:26:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40604C433D2; Mon, 1 May 2023 18:26:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1682965619; bh=B/KXX9/QYgTj0QZ0Ow25WYl4nwu5/N83I2/JU3lhTDc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=L6PrFr05S0+wjMGu6DNmB+X42BebdL0TQV4wYZSdWbVywLo1OTNyn2aldLAeE6Nra 3VvqdG0a/1SCk/KIOsiUH01CuUsE13WOHS5qrGucLFNawIy2rcpU0KcvA7EWqLflJo mkDof9vRbukHzahLlGTWZwsZE3QXJ/cNAhS9zltGDfHIj75HmAt2InE2GESC7Wy6iw Khh2Yw5UF06xYTcfRcWESTCzY98xAb9714GcBPXQVFxQ8wssT+v9xmROeC+0iK6orL HjF2ir2zc/5Of0Ox7J0p5D5Ie7UmKizvrvsYIqsT9v6FcQhWOSP3t3HMj6iC48CG9n XzB0VSJ5R/xVg== Subject: [PATCH 1/4] xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof From: "Darrick J. Wong" To: david@fromorbit.com, djwong@kernel.org Cc: Dave Chinner , linux-xfs@vger.kernel.org Date: Mon, 01 May 2023 11:26:58 -0700 Message-ID: <168296561879.290030.9885864692870487053.stgit@frogsfrogsfrogs> In-Reply-To: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> References: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong xfs/170 on a filesystem with su=128k,sw=4 produces this splat: BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP CPU: 1 PID: 4022907 Comm: dd Tainted: G W 6.3.0-xfsx #2 6ebeeffbe9577d32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-bu RIP: 0010:xfs_perag_rele+0x10/0x70 [xfs] RSP: 0018:ffffc90001e43858 EFLAGS: 00010217 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100 RDX: ffffffffa054e717 RSI: 0000000000000005 RDI: 0000000000000000 RBP: ffff888194eea000 R08: 0000000000000000 R09: 0000000000000037 R10: ffff888100ac1cb0 R11: 0000000000000018 R12: 0000000000000000 R13: ffffc90001e43a38 R14: ffff888194eea000 R15: ffff888194eea000 FS: 00007f93d1a0e740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000018a34f000 CR4: 00000000003506e0 Call Trace: xfs_bmap_btalloc+0x1a7/0x5d0 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_bmapi_allocate+0xee/0x470 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_bmapi_write+0x539/0x9e0 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_iomap_write_direct+0x1bb/0x2b0 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_direct_write_iomap_begin+0x51c/0x710 [xfs f85291d6841cbb3dc740083f1f331c0327394518] iomap_iter+0x132/0x2f0 __iomap_dio_rw+0x2f8/0x840 iomap_dio_rw+0xe/0x30 xfs_file_dio_write_aligned+0xad/0x180 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_file_write_iter+0xfb/0x190 [xfs f85291d6841cbb3dc740083f1f331c0327394518] vfs_write+0x2eb/0x410 ksys_write+0x65/0xe0 do_syscall_64+0x2b/0x80 This crash occurs under the "out_low_space" label. We grabbed a perag reference, passed it via args->pag into xfs_bmap_btalloc_at_eof, and afterwards args->pag is NULL. Fix the second function not to clobber args->pag if the caller had passed one in. Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/libxfs/xfs_bmap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index b512de0540d5..cd8870a16fd1 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3494,8 +3494,10 @@ xfs_bmap_btalloc_at_eof( if (!caller_pag) args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno)); error = xfs_alloc_vextent_exact_bno(args, ap->blkno); - if (!caller_pag) + if (!caller_pag) { xfs_perag_put(args->pag); + args->pag = NULL; + } if (error) return error; @@ -3505,7 +3507,6 @@ xfs_bmap_btalloc_at_eof( * Exact allocation failed. Reset to try an aligned allocation * according to the original allocation specification. */ - args->pag = NULL; args->alignment = stripe_align; args->minlen = nextminlen; args->minalignslop = 0; From patchwork Mon May 1 18:27:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13227874 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A15FC77B7C for ; Mon, 1 May 2023 18:27:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231428AbjEAS1I (ORCPT ); Mon, 1 May 2023 14:27:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229653AbjEAS1H (ORCPT ); Mon, 1 May 2023 14:27:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEF5A13A for ; Mon, 1 May 2023 11:27:05 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8637861E86 for ; Mon, 1 May 2023 18:27:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DE81EC433EF; Mon, 1 May 2023 18:27:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1682965624; bh=OuDAhlfax0/YckQ54V7X7cVusF6SkXjUK3JLcLDCIWk=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=rUnLHsOcS/9kPvLV3/1T9WeaswYTHh93d95fTT0TvAtaqZwO7rmRbQjoxwkvapWwo 5fu7USnfNhLKn4API4JAsr6LgWxG6p54puLVVVNcamAU82Tr1ByhR5fxvCvkD2pNYy PHsZhqs4jsE4Q7nyCbWyUJcrU+rIV4f1YKSB1YlZ1sE7DWMAWeWpnUlfpvZbSMgIku gyXdB6LyuK+dLk/NkDBNEL1342x0gSjtryzl/ho5e/YZ2RxVcJA4m0b3FiS9xF5KhV Q8/3OJNYNQUA6keD06/kPsmOpDPUE+S3jH+xtM1+Pt5RsNhjgXkSAHxCJX0svOXZdK mBgQodaRgsyFw== Subject: [PATCH 2/4] xfs: set bnobt/cntbt numrecs correctly when formatting new AGs From: "Darrick J. Wong" To: david@fromorbit.com, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Mon, 01 May 2023 11:27:04 -0700 Message-ID: <168296562443.290030.11898351600272300988.stgit@frogsfrogsfrogs> In-Reply-To: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> References: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Through generic/300, I discovered that mkfs.xfs creates corrupt filesystems when given these parameters: # mkfs.xfs -d size=512M /dev/sda -f -d su=128k,sw=4 --unsupported Filesystems formatted with --unsupported are not supported!! meta-data=/dev/sda isize=512 agcount=8, agsize=16352 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=130816, imaxpct=25 = sunit=32 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=8192, version=2 = sectsz=512 sunit=32 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=0 blks Discarding blocks...Done. # xfs_repair -n /dev/sda Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - 16:30:50: zeroing log - 16320 of 16320 blocks done - scan filesystem freespace and inode maps... agf_freeblks 25, counted 0 in ag 4 sb_fdblocks 8823, counted 8798 The root cause of this problem is the numrecs handling in xfs_freesp_init_recs, which is used to initialize a new AG. Prior to calling the function, we set up the new bnobt block with numrecs == 1 and rely on _freesp_init_recs to format that new record. If the last record created has a blockcount of zero, then it sets numrecs = 0. That last bit isn't correct if the AG contains the log, the start of the log is not immediately after the initial blocks due to stripe alignment, and the end of the log is perfectly aligned with the end of the AG. For this case, we actually formatted a single bnobt record to handle the free space before the start of the (stripe aligned) log, and incremented arec to try to format a second record. That second record turned out to be unnecessary, so what we really want is to leave numrecs at 1. The numrecs handling itself is overly complicated because a different function sets numrecs == 1. Change the bnobt creation code to start with numrecs set to zero and only increment it after successfully formatting a free space extent into the btree block. Fixes: f327a00745ff ("xfs: account for log space when formatting new AGs") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/libxfs/xfs_ag.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index 1b078bbbf225..9b373a0c7aaf 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -495,10 +495,12 @@ xfs_freesp_init_recs( ASSERT(start >= mp->m_ag_prealloc_blocks); if (start != mp->m_ag_prealloc_blocks) { /* - * Modify first record to pad stripe align of log + * Modify first record to pad stripe align of log and + * bump the record count. */ arec->ar_blockcount = cpu_to_be32(start - mp->m_ag_prealloc_blocks); + be16_add_cpu(&block->bb_numrecs, 1); nrec = arec + 1; /* @@ -509,7 +511,6 @@ xfs_freesp_init_recs( be32_to_cpu(arec->ar_startblock) + be32_to_cpu(arec->ar_blockcount)); arec = nrec; - be16_add_cpu(&block->bb_numrecs, 1); } /* * Change record start to after the internal log @@ -518,15 +519,13 @@ xfs_freesp_init_recs( } /* - * Calculate the record block count and check for the case where - * the log might have consumed all available space in the AG. If - * so, reset the record count to 0 to avoid exposure of an invalid - * record start block. + * Calculate the block count of this record; if it is nonzero, + * increment the record count. */ arec->ar_blockcount = cpu_to_be32(id->agsize - be32_to_cpu(arec->ar_startblock)); - if (!arec->ar_blockcount) - block->bb_numrecs = 0; + if (arec->ar_blockcount) + be16_add_cpu(&block->bb_numrecs, 1); } /* @@ -538,7 +537,7 @@ xfs_bnoroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); } @@ -548,7 +547,7 @@ xfs_cntroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); } From patchwork Mon May 1 18:27:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13227875 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78E65C77B7C for ; Mon, 1 May 2023 18:27:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231779AbjEAS1N (ORCPT ); Mon, 1 May 2023 14:27:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231482AbjEAS1M (ORCPT ); Mon, 1 May 2023 14:27:12 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CFA713A for ; Mon, 1 May 2023 11:27:11 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 33F6761E86 for ; Mon, 1 May 2023 18:27:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C05CC433EF; Mon, 1 May 2023 18:27:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1682965630; bh=ePuyMvo5dZ9PSz0ghyDO0slN3x9yNvDoL79X8VTBoHA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=J652q3/0Us55LDUL714sJRd8b+tZMhP9r2NIdrk6MtAxS1WnBNmfxRDVHJl9Np1/f y8KVNbQotu77igRTuv4/J5LqRHjpimZZNTLgvSOn34aWl5n5B8/RAboxwooe0SKaXn DkQHH8r1zeG8L0hM7nrauWmEuqyxotb9fLXaU2yuV6z0AM8zbI7Pgf2KSQcQMq/N5t uopV588XuGD1O7L/9JR8h4Y86C0mhhyUSN2XREpdjYoi3GI+7CkQAesJ6RF5XQ1eIV dKMR4G5U1u3OZhaTGhuW0gka99OTVck53PE2CR8aZZotyKxfzM+DtcXgaDAFt8xCtF 43zWBS2dNCAow== Subject: [PATCH 3/4] xfs: flush dirty data and drain directios before scrubbing cow fork From: "Darrick J. Wong" To: david@fromorbit.com, djwong@kernel.org Cc: Dave Chinner , linux-xfs@vger.kernel.org Date: Mon, 01 May 2023 11:27:10 -0700 Message-ID: <168296563007.290030.7849076336331509809.stgit@frogsfrogsfrogs> In-Reply-To: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> References: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong When we're scrubbing the COW fork, we need to take MMAPLOCK_EXCL to prevent page_mkwrite from modifying any inode state. The ILOCK should suffice to avoid confusing online fsck, but let's take the same locks that we do everywhere else. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/scrub/bmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 87ab9f95a487..69bc89d0fc68 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -42,12 +42,12 @@ xchk_setup_inode_bmap( xfs_ilock(sc->ip, XFS_IOLOCK_EXCL); /* - * We don't want any ephemeral data fork updates sitting around + * We don't want any ephemeral data/cow fork updates sitting around * while we inspect block mappings, so wait for directio to finish * and flush dirty data if we have delalloc reservations. */ if (S_ISREG(VFS_I(sc->ip)->i_mode) && - sc->sm->sm_type == XFS_SCRUB_TYPE_BMBTD) { + sc->sm->sm_type != XFS_SCRUB_TYPE_BMBTA) { struct address_space *mapping = VFS_I(sc->ip)->i_mapping; sc->ilock_flags |= XFS_MMAPLOCK_EXCL; From patchwork Mon May 1 18:27:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13227876 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E2B2C77B7C for ; Mon, 1 May 2023 18:27:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231482AbjEAS1T (ORCPT ); Mon, 1 May 2023 14:27:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230114AbjEAS1R (ORCPT ); Mon, 1 May 2023 14:27:17 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D95412F for ; Mon, 1 May 2023 11:27:17 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BD5B761E87 for ; Mon, 1 May 2023 18:27:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 22553C433D2; Mon, 1 May 2023 18:27:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1682965636; bh=nxKbXQGzPB6KdQ/TnzCDlRXYG7CQJ61vqkEFXhnCv/w=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=O7KW0mLJgvEwRk0CuwoOtePF27suuJ8eAoR9R1LpsdsJoQPc1YkhTXxKNyj1Lhmm+ 7GOh1IRR49G19lPYcJRG0U+8hQySaXfLM1KhumMbXJmU8gRU1f/rWBwb8ARgPKXl+t BwwrQ6RkQOBJrX3BX0iNsxmsxw0N2r0fMCx1XoiuKEWrMKdIPv+f5sJ/jRXXMkKysQ jaHg5LVtsWfo9Y6RhAtJOYx7XeUHkVUrWkqRRlQCLyvYfYlLDwzi+Cq9qMkqSbhzMH awNmE7z0U4XiS9bBeXXrNEbI125mNzpQzYPvSg8ntuHvul7WW3a6RfKn13gjTf/kB8 Y9ci1xCL776Xg== Subject: [PATCH 4/4] xfs: don't allocate into the data fork for an unshare request From: "Darrick J. Wong" To: david@fromorbit.com, djwong@kernel.org Cc: Dave Chinner , linux-xfs@vger.kernel.org Date: Mon, 01 May 2023 11:27:15 -0700 Message-ID: <168296563575.290030.8748741047509895798.stgit@frogsfrogsfrogs> In-Reply-To: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> References: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong For an unshare request, we only have to take action if the data fork has a shared mapping. We don't care if someone else set up a cow operation. If we find nothing in the data fork, return a hole to avoid allocating space. Note that fallocate will replace the delalloc reservation with an unwritten extent anyway, so this has no user-visible effects outside of avoiding unnecessary updates. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/xfs_iomap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 285885c308bd..18c8f168b153 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1006,8 +1006,9 @@ xfs_buffered_write_iomap_begin( if (eof) imap.br_startoff = end_fsb; /* fake hole until the end */ - /* We never need to allocate blocks for zeroing a hole. */ - if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) { + /* We never need to allocate blocks for zeroing or unsharing a hole. */ + if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) && + imap.br_startoff > offset_fsb) { xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff); goto out_unlock; } From patchwork Mon May 1 21:24:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13228352 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D7D1C77B73 for ; Mon, 1 May 2023 21:24:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231995AbjEAVYh (ORCPT ); Mon, 1 May 2023 17:24:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229627AbjEAVYg (ORCPT ); Mon, 1 May 2023 17:24:36 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCE48173C for ; Mon, 1 May 2023 14:24:35 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 68B5E61F97 for ; Mon, 1 May 2023 21:24:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF4AAC433EF; Mon, 1 May 2023 21:24:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1682976274; bh=MIVpH+igFd8WsYlG0BS22I3OLLcuNYMptFvxjlNzp/E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LbA+AdyAIJkxbPVnkxrHYSOQ/n/g80pmxEykFAANOGUq7IYtQ5moXfcTmsmj7RP9C 1dJf811Fprxi77WvGytq2mbf140C/RVIVJyGvFg/lUHmGWPBGoUfoE/2R9h3Wphooy t7U5s2bVyMEb2P53gmXO4rdb9rvVqiQ8qJESL98hTl+YbiuLx9V0CUaiYn7AKYwGWZ FGRTZ519pRchcfhuOYc19o4/lImj5oki9Z7StzSxpYtL9iOq4i3qN1Zd8CilHGNx11 tYcQElYsrY9lSu3ZnmgJBTxWHPANC8EtBVyntaIbTq8M7RSEI6gQWwN10UowqSCOvP +y5oMUsRjML4A== Date: Mon, 1 May 2023 14:24:34 -0700 From: "Darrick J. Wong" To: david@fromorbit.com Cc: Dave Chinner , linux-xfs@vger.kernel.org, yebin10@huawei.com Subject: [PATCH 5/4] xfs: fix negative array access in xfs_getbmap Message-ID: <20230501212434.GM59213@frogsfrogsfrogs> References: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <168296561299.290030.5324305660599413777.stgit@frogsfrogsfrogs> Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong In commit 8ee81ed581ff, Ye Bin complained about an ASSERT in the bmapx code that trips if we encounter a delalloc extent after flushing the pagecache to disk. The ioctl code does not hold MMAPLOCK so it's entirely possible that a racing write page fault can create a delalloc extent after the file has been flushed. The proposed solution was to replace the assertion with an early return that avoids filling out the bmap recordset with a delalloc entry if the caller didn't ask for it. At the time, I recall thinking that the forward logic sounded ok, but felt hesitant because I suspected that changing this code would cause something /else/ to burst loose due to some other subtlety. syzbot of course found that subtlety. If all the extent mappings found after the flush are delalloc mappings, we'll reach the end of the data fork without ever incrementing bmv->bmv_entries. This is new, since before we'd have emitted the delalloc mappings even though the caller didn't ask for them. Once we reach the end, we'll try to set BMV_OF_LAST on the -1st entry (because bmv_entries is zero) and go corrupt something else in memory. Yay. I really dislike all these stupid patches that fiddle around with debug code and break things that otherwise worked well enough. Nobody was complaining that calling XFS_IOC_BMAPX without BMV_IF_DELALLOC would return BMV_OF_DELALLOC records, and now we've gone from "weird behavior that nobody cared about" to "bad behavior that must be addressed immediately". Maybe I'll just ignore anything from Huawei from now on for my own sake. Reported-by: syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/ Fixes: 8ee81ed581ff ("xfs: fix BUG_ON in xfs_getbmap()") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/xfs_bmap_util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index f032d3a4b727..fbb675563208 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -558,7 +558,9 @@ xfs_getbmap( if (!xfs_iext_next_extent(ifp, &icur, &got)) { xfs_fileoff_t end = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)); - out[bmv->bmv_entries - 1].bmv_oflags |= BMV_OF_LAST; + if (bmv->bmv_entries > 0) + out[bmv->bmv_entries - 1].bmv_oflags |= + BMV_OF_LAST; if (whichfork != XFS_ATTR_FORK && bno < end && !xfs_getbmap_full(bmv)) {