From patchwork Tue Nov 14 01:53:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13454713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21CB0C4332F for ; Tue, 14 Nov 2023 01:54:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232013AbjKNByD (ORCPT ); Mon, 13 Nov 2023 20:54:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232001AbjKNByC (ORCPT ); Mon, 13 Nov 2023 20:54:02 -0500 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8C20D43 for ; Mon, 13 Nov 2023 17:53:58 -0800 (PST) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1cc9b626a96so37486365ad.2 for ; Mon, 13 Nov 2023 17:53:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699926838; x=1700531638; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=feGnkf7hecFo0+lJM4+aii3zZm5IM2/wPCItdLGBNTc=; b=J0Q7f29R5CVtqanMb/Zhcj4hVgf2fx+QqkfnX6ssxyYxCWe0soVwTNvqpfKMEGQbfp wvuqcbta/1MKjjCw5CjNV2Iy4lqz+gHjJHXk8bNjlfjH2yxq1Bwl1SnxvrNOoFiSoM6/ C2pNGL3Y864fOkdlrA125ah8A7S24mTehRJWbq8Mw+MnOeH+WDsZ7vbmgBxe4nzlpBpt kD2r7tzEQ59tXekO7iokqvOXxycPXgMFR1tyBzPspa+WXCR8drGH8YzwCDjb8IFPwpfM iTuI3Upy5fg4yXQm4Nz5sz052781Upk+YrfKQrcEggCppihrTcNKWSRpxXpjJX6ACwzv pVKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699926838; x=1700531638; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=feGnkf7hecFo0+lJM4+aii3zZm5IM2/wPCItdLGBNTc=; b=pEeW9oWBHw/W6thResh1SG08WwiPMne36lA2T+T1Usk5BIwvFFhy3zCpKR1j8Y5KZN BLPN1LLIdVUH1eHTBYqz/aMg4jsgBCYzz4UkjDlu3C50No8DS++VL36IwTpVUuik+Hvo xpZLrHtsM1EjiBflydL5c0z5PFXtVgq+gZtKjM0Bb4QlkJrG2kqtPn7+qMHegHqZUwDY CWAznfISQM8u7PEykLmsYZv3txpUcVcrC+SoRgjo88JoF0PQdneXKUu5tobM+ljnJyF/ hEHFz5A74v/t7avjFzg+KG5mntWVddHfff2EsFeVAMR5qdz4TefZVe853x0TECui6wb9 0urg== X-Gm-Message-State: AOJu0YyQWkRv+QI0Qm+6JTwnVXPhXUKCnsIjVyy6Zgg0152Wy5TMaHma 3DoYFSflm7QpliZcUcODYKTm57CfKpd0Qg== X-Google-Smtp-Source: AGHT+IFBiVJ1tv5npt7PmTe0+SXfFf/nhsjDwH6Q2gG1MONujliQyDjv2khfhfzQuelZHJgWlASfYw== X-Received: by 2002:a17:903:20f:b0:1cc:70dd:62e7 with SMTP id r15-20020a170903020f00b001cc70dd62e7mr946064plh.32.1699926838109; Mon, 13 Nov 2023 17:53:58 -0800 (PST) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:d177:a8ad:804f:74f1]) by smtp.gmail.com with ESMTPSA id a17-20020a170902ecd100b001c9cb2fb8d8sm4668592plh.49.2023.11.13.17.53.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 17:53:57 -0800 (PST) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, fred@cloudflare.com, Wengang Wang , "Darrick J . Wong" , Leah Rumancik Subject: [PATCH 5.15 CANDIDATE 12/17] xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork Date: Mon, 13 Nov 2023 17:53:33 -0800 Message-ID: <20231114015339.3922119-13-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.43.0.rc0.421.g78406f8d94-goog In-Reply-To: <20231114015339.3922119-1-leah.rumancik@gmail.com> References: <20231114015339.3922119-1-leah.rumancik@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Chandan Babu R [ Upstream commit d62113303d691bcd8d0675ae4ac63e7769afc56c ] On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error even though the filesystem has sufficient number of free blocks. This occurs if the file offset range on which the write operation is being performed has a delalloc extent in the cow fork and this delalloc extent begins much before the Direct IO range. In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to allocate the blocks mapped by the delalloc extent. The extent thus allocated may not cover the beginning of file offset range on which the Direct IO write was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC. The following script reliably recreates the bug described above. #!/usr/bin/bash device=/dev/loop0 shortdev=$(basename $device) mntpnt=/mnt/ file1=${mntpnt}/file1 file2=${mntpnt}/file2 fragmentedfile=${mntpnt}/fragmentedfile punchprog=/root/repos/xfstests-dev/src/punch-alternating errortag=/sys/fs/xfs/${shortdev}/errortag/bmap_alloc_minlen_extent umount $device > /dev/null 2>&1 echo "Create FS" mkfs.xfs -f -m reflink=1 $device > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mkfs failed." exit 1 fi echo "Mount FS" mount $device $mntpnt > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mount failed." exit 1 fi echo "Create source file" xfs_io -f -c "pwrite 0 32M" $file1 > /dev/null 2>&1 sync echo "Create Reflinked file" xfs_io -f -c "reflink $file1" $file2 &>/dev/null echo "Set cowextsize" xfs_io -c "cowextsize 16M" $file1 > /dev/null 2>&1 echo "Fragment FS" xfs_io -f -c "pwrite 0 64M" $fragmentedfile > /dev/null 2>&1 sync $punchprog $fragmentedfile echo "Allocate block sized extent from now onwards" echo -n 1 > $errortag echo "Create 16MiB delalloc extent in CoW fork" xfs_io -c "pwrite 0 4k" $file1 > /dev/null 2>&1 sync echo "Direct I/O write at offset 12k" xfs_io -d -c "pwrite 12k 8k" $file1 This commit fixes the bug by invoking xfs_bmapi_write() in a loop until disk blocks are allocated for atleast the starting file offset of the Direct IO write range. Fixes: 3c68d44a2b49 ("xfs: allocate direct I/O COW blocks in iomap_begin") Reported-and-Root-caused-by: Wengang Wang Signed-off-by: Chandan Babu R Reviewed-by: Darrick J. Wong [djwong: slight editing to make the locking less grody, and fix some style things] Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_reflink.c | 198 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 163 insertions(+), 35 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 628ce65d02bb..793bdf5ac2f7 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -340,9 +340,41 @@ xfs_find_trim_cow_extent( return 0; } -/* Allocate all CoW reservations covering a range of blocks in a file. */ -int -xfs_reflink_allocate_cow( +static int +xfs_reflink_convert_unwritten( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + struct xfs_bmbt_irec *cmap, + bool convert_now) +{ + xfs_fileoff_t offset_fsb = imap->br_startoff; + xfs_filblks_t count_fsb = imap->br_blockcount; + int error; + + /* + * cmap might larger than imap due to cowextsize hint. + */ + xfs_trim_extent(cmap, offset_fsb, count_fsb); + + /* + * COW fork extents are supposed to remain unwritten until we're ready + * to initiate a disk write. For direct I/O we are going to write the + * data and need the conversion, but for buffered writes we're done. + */ + if (!convert_now || cmap->br_state == XFS_EXT_NORM) + return 0; + + trace_xfs_reflink_convert_cow(ip, cmap); + + error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb); + if (!error) + cmap->br_state = XFS_EXT_NORM; + + return error; +} + +static int +xfs_reflink_fill_cow_hole( struct xfs_inode *ip, struct xfs_bmbt_irec *imap, struct xfs_bmbt_irec *cmap, @@ -351,25 +383,12 @@ xfs_reflink_allocate_cow( bool convert_now) { struct xfs_mount *mp = ip->i_mount; - xfs_fileoff_t offset_fsb = imap->br_startoff; - xfs_filblks_t count_fsb = imap->br_blockcount; struct xfs_trans *tp; - int nimaps, error = 0; - bool found; xfs_filblks_t resaligned; - xfs_extlen_t resblks = 0; - - ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); - if (!ip->i_cowfp) { - ASSERT(!xfs_is_reflink_inode(ip)); - xfs_ifork_init_cow(ip); - } - - error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found); - if (error || !*shared) - return error; - if (found) - goto convert; + xfs_extlen_t resblks; + int nimaps; + int error; + bool found; resaligned = xfs_aligned_fsb_count(imap->br_startoff, imap->br_blockcount, xfs_get_cowextsz_hint(ip)); @@ -385,17 +404,17 @@ xfs_reflink_allocate_cow( *lockmode = XFS_ILOCK_EXCL; - /* - * Check for an overlapping extent again now that we dropped the ilock. - */ error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found); if (error || !*shared) goto out_trans_cancel; + if (found) { xfs_trans_cancel(tp); goto convert; } + ASSERT(cmap->br_startoff > imap->br_startoff); + /* Allocate the entire reservation as unwritten blocks. */ nimaps = 1; error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount, @@ -415,26 +434,135 @@ xfs_reflink_allocate_cow( */ if (nimaps == 0) return -ENOSPC; + convert: - xfs_trim_extent(cmap, offset_fsb, count_fsb); - /* - * COW fork extents are supposed to remain unwritten until we're ready - * to initiate a disk write. For direct I/O we are going to write the - * data and need the conversion, but for buffered writes we're done. - */ - if (!convert_now || cmap->br_state == XFS_EXT_NORM) - return 0; - trace_xfs_reflink_convert_cow(ip, cmap); - error = xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb); - if (!error) - cmap->br_state = XFS_EXT_NORM; + return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now); + +out_trans_cancel: + xfs_trans_cancel(tp); return error; +} + +static int +xfs_reflink_fill_delalloc( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + struct xfs_bmbt_irec *cmap, + bool *shared, + uint *lockmode, + bool convert_now) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_trans *tp; + int nimaps; + int error; + bool found; + + do { + xfs_iunlock(ip, *lockmode); + *lockmode = 0; + + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, 0, 0, + false, &tp); + if (error) + return error; + + *lockmode = XFS_ILOCK_EXCL; + + error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, + &found); + if (error || !*shared) + goto out_trans_cancel; + + if (found) { + xfs_trans_cancel(tp); + break; + } + + ASSERT(isnullstartblock(cmap->br_startblock) || + cmap->br_startblock == DELAYSTARTBLOCK); + + /* + * Replace delalloc reservation with an unwritten extent. + */ + nimaps = 1; + error = xfs_bmapi_write(tp, ip, cmap->br_startoff, + cmap->br_blockcount, + XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, + cmap, &nimaps); + if (error) + goto out_trans_cancel; + + xfs_inode_set_cowblocks_tag(ip); + error = xfs_trans_commit(tp); + if (error) + return error; + + /* + * Allocation succeeded but the requested range was not even + * partially satisfied? Bail out! + */ + if (nimaps == 0) + return -ENOSPC; + } while (cmap->br_startoff + cmap->br_blockcount <= imap->br_startoff); + + return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now); out_trans_cancel: xfs_trans_cancel(tp); return error; } +/* Allocate all CoW reservations covering a range of blocks in a file. */ +int +xfs_reflink_allocate_cow( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + struct xfs_bmbt_irec *cmap, + bool *shared, + uint *lockmode, + bool convert_now) +{ + int error; + bool found; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + if (!ip->i_cowfp) { + ASSERT(!xfs_is_reflink_inode(ip)); + xfs_ifork_init_cow(ip); + } + + error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found); + if (error || !*shared) + return error; + + /* CoW fork has a real extent */ + if (found) + return xfs_reflink_convert_unwritten(ip, imap, cmap, + convert_now); + + /* + * CoW fork does not have an extent and data extent is shared. + * Allocate a real extent in the CoW fork. + */ + if (cmap->br_startoff > imap->br_startoff) + return xfs_reflink_fill_cow_hole(ip, imap, cmap, shared, + lockmode, convert_now); + + /* + * CoW fork has a delalloc reservation. Replace it with a real extent. + * There may or may not be a data fork mapping. + */ + if (isnullstartblock(cmap->br_startblock) || + cmap->br_startblock == DELAYSTARTBLOCK) + return xfs_reflink_fill_delalloc(ip, imap, cmap, shared, + lockmode, convert_now); + + /* Shouldn't get here. */ + ASSERT(0); + return -EFSCORRUPTED; +} + /* * Cancel CoW reservations for some block range of an inode. *