From patchwork Thu Jan 17 19:20:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10768849 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 856FD6C2 for ; Thu, 17 Jan 2019 19:20:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C79D2FF70 for ; Thu, 17 Jan 2019 19:20:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 611132FF76; Thu, 17 Jan 2019 19:20:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B2472FE92 for ; Thu, 17 Jan 2019 19:20:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727327AbfAQTUG (ORCPT ); Thu, 17 Jan 2019 14:20:06 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59254 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726993AbfAQTUG (ORCPT ); Thu, 17 Jan 2019 14:20:06 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E4FAB2914F for ; Thu, 17 Jan 2019 19:20:05 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-66.bos.redhat.com [10.18.41.66]) by smtp.corp.redhat.com (Postfix) with ESMTP id 86018600C6 for ; Thu, 17 Jan 2019 19:20:05 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH v2 1/5] xfs: eof trim writeback mapping as soon as it is cached Date: Thu, 17 Jan 2019 14:20:00 -0500 Message-Id: <20190117192004.49346-2-bfoster@redhat.com> In-Reply-To: <20190117192004.49346-1-bfoster@redhat.com> References: <20190117192004.49346-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 17 Jan 2019 19:20:05 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The cached writeback mapping is EOF trimmed to try and avoid races between post-eof block management and writeback that result in sending cached data to a stale location. The cached mapping is currently trimmed on the validation check, which leaves a race window between the time the mapping is cached and when it is trimmed against the current inode size. For example, if a new mapping is cached by delalloc conversion on a blocksize == page size fs, we could cycle various locks, perform memory allocations, etc. in the writeback codepath before the associated mapping is eventually trimmed to i_size. This leaves enough time for a post-eof truncate and file append before the cached mapping is trimmed. The former event essentially invalidates a range of the cached mapping and the latter bumps the inode size such the trim on the next writepage event won't trim all of the invalid blocks. fstest generic/464 reproduces this scenario occasionally and causes a lost writeback and stale delalloc blocks warning on inode inactivation. To work around this problem, trim the cached writeback mapping as soon as it is cached in addition to on subsequent validation checks. This is a minor tweak to tighten the race window as much as possible until a proper invalidation mechanism is available. Fixes: 40214d128e07 ("xfs: trim writepage mapping to within eof") Cc: # v4.14+ Signed-off-by: Brian Foster Reviewed-by: Allison Henderson Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_aops.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 338b9d9984e0..d9048bcea49c 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -449,6 +449,7 @@ xfs_map_blocks( } wpc->imap = imap; + xfs_trim_extent_eof(&wpc->imap, ip); trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap); return 0; allocate_blocks: @@ -459,6 +460,7 @@ xfs_map_blocks( ASSERT(whichfork == XFS_COW_FORK || cow_fsb == NULLFILEOFF || imap.br_startoff + imap.br_blockcount <= cow_fsb); wpc->imap = imap; + xfs_trim_extent_eof(&wpc->imap, ip); trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap); return 0; } From patchwork Thu Jan 17 19:20:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10768857 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB21C13B4 for ; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD0762FFCD for ; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB1B72FF3B; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CFDD2FED0 for ; Thu, 17 Jan 2019 19:20:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727553AbfAQTUG (ORCPT ); Thu, 17 Jan 2019 14:20:06 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59908 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726993AbfAQTUG (ORCPT ); Thu, 17 Jan 2019 14:20:06 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4A7F680F9B for ; Thu, 17 Jan 2019 19:20:06 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-66.bos.redhat.com [10.18.41.66]) by smtp.corp.redhat.com (Postfix) with ESMTP id ECE17600C7 for ; Thu, 17 Jan 2019 19:20:05 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH v2 2/5] xfs: update fork seq counter on data fork changes Date: Thu, 17 Jan 2019 14:20:01 -0500 Message-Id: <20190117192004.49346-3-bfoster@redhat.com> In-Reply-To: <20190117192004.49346-1-bfoster@redhat.com> References: <20190117192004.49346-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 17 Jan 2019 19:20:06 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The sequence counter in the xfs_ifork structure is only updated on COW forks. This is because the counter is currently only used to optimize out repetitive COW fork checks at writeback time. Tweak the extent code to update the seq counter regardless of the fork type in preparation for using this counter on data forks as well. Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig Reviewed-by: Allison Henderson --- fs/xfs/libxfs/xfs_iext_tree.c | 13 ++++++------- fs/xfs/libxfs/xfs_inode_fork.h | 2 +- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c index 771dd072015d..bc690f2409fa 100644 --- a/fs/xfs/libxfs/xfs_iext_tree.c +++ b/fs/xfs/libxfs/xfs_iext_tree.c @@ -614,16 +614,15 @@ xfs_iext_realloc_root( } /* - * Increment the sequence counter if we are on a COW fork. This allows - * the writeback code to skip looking for a COW extent if the COW fork - * hasn't changed. We use WRITE_ONCE here to ensure the update to the - * sequence counter is seen before the modifications to the extent - * tree itself take effect. + * Increment the sequence counter on extent tree changes. If we are on a COW + * fork, this allows the writeback code to skip looking for a COW extent if the + * COW fork hasn't changed. We use WRITE_ONCE here to ensure the update to the + * sequence counter is seen before the modifications to the extent tree itself + * take effect. */ static inline void xfs_iext_inc_seq(struct xfs_ifork *ifp, int state) { - if (state & BMAP_COWFORK) - WRITE_ONCE(ifp->if_seq, READ_ONCE(ifp->if_seq) + 1); + WRITE_ONCE(ifp->if_seq, READ_ONCE(ifp->if_seq) + 1); } void diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 60361d2d74a1..00c62ce170d0 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -14,7 +14,7 @@ struct xfs_dinode; */ struct xfs_ifork { int if_bytes; /* bytes in if_u1 */ - unsigned int if_seq; /* cow fork mod counter */ + unsigned int if_seq; /* fork mod counter */ struct xfs_btree_block *if_broot; /* file's incore btree root */ short if_broot_bytes; /* bytes allocated for root */ unsigned char if_flags; /* per-fork flags */ From patchwork Thu Jan 17 19:20:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10768851 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3DE213B4 for ; Thu, 17 Jan 2019 19:20:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5CD92FEEC for ; Thu, 17 Jan 2019 19:20:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AA7E72FFCD; Thu, 17 Jan 2019 19:20:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F5862FEEC for ; Thu, 17 Jan 2019 19:20:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726993AbfAQTUH (ORCPT ); Thu, 17 Jan 2019 14:20:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54016 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727363AbfAQTUH (ORCPT ); Thu, 17 Jan 2019 14:20:07 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ACB5D12F8F9 for ; Thu, 17 Jan 2019 19:20:06 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-66.bos.redhat.com [10.18.41.66]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6085D600C6 for ; Thu, 17 Jan 2019 19:20:06 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH v2 3/5] xfs: validate writeback mapping using data fork seq counter Date: Thu, 17 Jan 2019 14:20:02 -0500 Message-Id: <20190117192004.49346-4-bfoster@redhat.com> In-Reply-To: <20190117192004.49346-1-bfoster@redhat.com> References: <20190117192004.49346-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 17 Jan 2019 19:20:06 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The writeback code caches the current extent mapping across multiple xfs_do_writepage() calls to avoid repeated lookups for sequential pages backed by the same extent. This is known to be slightly racy with extent fork changes in certain difficult to reproduce scenarios. The cached extent is trimmed to within EOF to help avoid the most common vector for this problem via speculative preallocation management, but this is a band-aid that does not address the fundamental problem. Now that we have an xfs_ifork sequence counter mechanism used to facilitate COW writeback, we can use the same mechanism to validate consistency between the data fork and cached writeback mappings. On its face, this is somewhat of a big hammer approach because any change to the data fork invalidates any mapping currently cached by a writeback in progress regardless of whether the data fork change overlaps with the range under writeback. In practice, however, the impact of this approach is minimal in most cases. First, data fork changes (delayed allocations) caused by sustained sequential buffered writes are amortized across speculative preallocations. This means that a cached mapping won't be invalidated by each buffered write of a common file copy workload, but rather only on less frequent allocation events. Second, the extent tree is always entirely in-core so an additional lookup of a usable extent mostly costs a shared ilock cycle and in-memory tree lookup. This means that a cached mapping reval is relatively cheap compared to the I/O itself. Third, spurious invalidations don't impact ioend construction. This means that even if the same extent is revalidated multiple times across multiple writepage instances, we still construct and submit the same size ioend (and bio) if the blocks are physically contiguous. Update struct xfs_writepage_ctx with a new field to hold the sequence number of the data fork associated with the currently cached mapping. Check the wpc seqno against the data fork when the mapping is validated and reestablish the mapping whenever the fork has changed since the mapping was cached. This ensures that writeback always uses a valid extent mapping and thus prevents lost writebacks and stale delalloc block problems. Signed-off-by: Brian Foster Reviewed-by: Allison Henderson Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_aops.c | 60 ++++++++++++++++++++++++++++++++++++---------- fs/xfs/xfs_iomap.c | 5 ++-- 2 files changed, 49 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index d9048bcea49c..649e4ad76add 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -29,6 +29,7 @@ struct xfs_writepage_ctx { struct xfs_bmbt_irec imap; unsigned int io_type; + unsigned int data_seq; unsigned int cow_seq; struct xfs_ioend *ioend; }; @@ -301,6 +302,42 @@ xfs_end_bio( xfs_destroy_ioend(ioend, blk_status_to_errno(bio->bi_status)); } +/* + * Fast revalidation of the cached writeback mapping. Return true if the current + * mapping is valid, false otherwise. + */ +static bool +xfs_imap_valid( + struct xfs_writepage_ctx *wpc, + struct xfs_inode *ip, + xfs_fileoff_t offset_fsb) +{ + /* + * If this is a COW mapping, it is sufficient to check that the mapping + * covers the offset. Be careful to check this first because the caller + * can revalidate a COW mapping without updating the data seqno. + */ + if (offset_fsb < wpc->imap.br_startoff || + offset_fsb >= wpc->imap.br_startoff + wpc->imap.br_blockcount) + return false; + if (wpc->io_type == XFS_IO_COW) + return true; + + /* + * This is not a COW mapping. Check the sequence number of the data fork + * because concurrent changes could have invalidated the extent. Check + * the COW fork because concurrent changes since the last time we + * checked (and found nothing at this offset) could have added + * overlapping blocks. + */ + if (wpc->data_seq != READ_ONCE(ip->i_df.if_seq)) + return false; + if (xfs_inode_has_cow_data(ip) && + wpc->cow_seq != READ_ONCE(ip->i_cowfp->if_seq)) + return false; + return true; +} + STATIC int xfs_map_blocks( struct xfs_writepage_ctx *wpc, @@ -315,9 +352,11 @@ xfs_map_blocks( struct xfs_bmbt_irec imap; int whichfork = XFS_DATA_FORK; struct xfs_iext_cursor icur; - bool imap_valid; int error = 0; + if (XFS_FORCED_SHUTDOWN(mp)) + return -EIO; + /* * We have to make sure the cached mapping is within EOF to protect * against eofblocks trimming on file release leaving us with a stale @@ -346,17 +385,9 @@ xfs_map_blocks( * against concurrent updates and provides a memory barrier on the way * out that ensures that we always see the current value. */ - imap_valid = offset_fsb >= wpc->imap.br_startoff && - offset_fsb < wpc->imap.br_startoff + wpc->imap.br_blockcount; - if (imap_valid && - (!xfs_inode_has_cow_data(ip) || - wpc->io_type == XFS_IO_COW || - wpc->cow_seq == READ_ONCE(ip->i_cowfp->if_seq))) + if (xfs_imap_valid(wpc, ip, offset_fsb)) return 0; - if (XFS_FORCED_SHUTDOWN(mp)) - return -EIO; - /* * If we don't have a valid map, now it's time to get a new one for this * offset. This will convert delayed allocations (including COW ones) @@ -403,9 +434,10 @@ xfs_map_blocks( } /* - * Map valid and no COW extent in the way? We're done. + * No COW extent overlap. Revalidate now that we may have updated + * ->cow_seq. If the data mapping is still valid, we're done. */ - if (imap_valid) { + if (xfs_imap_valid(wpc, ip, offset_fsb)) { xfs_iunlock(ip, XFS_ILOCK_SHARED); return 0; } @@ -417,6 +449,7 @@ xfs_map_blocks( */ if (!xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap)) imap.br_startoff = end_fsb; /* fake a hole past EOF */ + wpc->data_seq = READ_ONCE(ip->i_df.if_seq); xfs_iunlock(ip, XFS_ILOCK_SHARED); if (imap.br_startoff > offset_fsb) { @@ -454,7 +487,8 @@ xfs_map_blocks( return 0; allocate_blocks: error = xfs_iomap_write_allocate(ip, whichfork, offset, &imap, - &wpc->cow_seq); + whichfork == XFS_COW_FORK ? + &wpc->cow_seq : &wpc->data_seq); if (error) return error; ASSERT(whichfork == XFS_COW_FORK || cow_fsb == NULLFILEOFF || diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 27c93b5f029d..ab69caa685b4 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -681,7 +681,7 @@ xfs_iomap_write_allocate( int whichfork, xfs_off_t offset, xfs_bmbt_irec_t *imap, - unsigned int *cow_seq) + unsigned int *seq) { xfs_mount_t *mp = ip->i_mount; struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); @@ -797,8 +797,7 @@ xfs_iomap_write_allocate( if (error) goto error0; - if (whichfork == XFS_COW_FORK) - *cow_seq = READ_ONCE(ifp->if_seq); + *seq = READ_ONCE(ifp->if_seq); xfs_iunlock(ip, XFS_ILOCK_EXCL); } From patchwork Thu Jan 17 19:20:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10768855 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 536E81880 for ; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 463862FF70 for ; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3B3B32FF95; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8FE5A2FEED for ; Thu, 17 Jan 2019 19:20:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727560AbfAQTUH (ORCPT ); Thu, 17 Jan 2019 14:20:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:61241 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727363AbfAQTUH (ORCPT ); Thu, 17 Jan 2019 14:20:07 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1F8ADC0740E7 for ; Thu, 17 Jan 2019 19:20:07 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-66.bos.redhat.com [10.18.41.66]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9F7B600C6 for ; Thu, 17 Jan 2019 19:20:06 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH v2 4/5] xfs: remove superfluous writeback mapping eof trimming Date: Thu, 17 Jan 2019 14:20:03 -0500 Message-Id: <20190117192004.49346-5-bfoster@redhat.com> In-Reply-To: <20190117192004.49346-1-bfoster@redhat.com> References: <20190117192004.49346-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Thu, 17 Jan 2019 19:20:07 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now that the cached writeback mapping is explicitly invalidated on data fork changes, the EOF trimming band-aid is no longer necessary. Remove xfs_trim_extent_eof() as well since it has no other users. Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_bmap.c | 11 ----------- fs/xfs/libxfs/xfs_bmap.h | 1 - fs/xfs/xfs_aops.c | 15 --------------- 3 files changed, 27 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 332eefa2700b..4c73927819c2 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3685,17 +3685,6 @@ xfs_trim_extent( } } -/* trim extent to within eof */ -void -xfs_trim_extent_eof( - struct xfs_bmbt_irec *irec, - struct xfs_inode *ip) - -{ - xfs_trim_extent(irec, 0, XFS_B_TO_FSB(ip->i_mount, - i_size_read(VFS_I(ip)))); -} - /* * Trim the returned map to the required bounds */ diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 09d3ea97cc15..b4ff710d7250 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -181,7 +181,6 @@ static inline bool xfs_bmap_is_real_extent(struct xfs_bmbt_irec *irec) void xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno, xfs_filblks_t len); -void xfs_trim_extent_eof(struct xfs_bmbt_irec *, struct xfs_inode *); int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd); int xfs_bmap_set_attrforkoff(struct xfs_inode *ip, int size, int *version); void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 649e4ad76add..09d8f7690e9e 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -357,19 +357,6 @@ xfs_map_blocks( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; - /* - * We have to make sure the cached mapping is within EOF to protect - * against eofblocks trimming on file release leaving us with a stale - * mapping. Otherwise, a page for a subsequent file extending buffered - * write could get picked up by this writeback cycle and written to the - * wrong blocks. - * - * Note that what we really want here is a generic mapping invalidation - * mechanism to protect us from arbitrary extent modifying contexts, not - * just eofblocks. - */ - xfs_trim_extent_eof(&wpc->imap, ip); - /* * COW fork blocks can overlap data fork blocks even if the blocks * aren't shared. COW I/O always takes precedent, so we must always @@ -482,7 +469,6 @@ xfs_map_blocks( } wpc->imap = imap; - xfs_trim_extent_eof(&wpc->imap, ip); trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap); return 0; allocate_blocks: @@ -494,7 +480,6 @@ xfs_map_blocks( ASSERT(whichfork == XFS_COW_FORK || cow_fsb == NULLFILEOFF || imap.br_startoff + imap.br_blockcount <= cow_fsb); wpc->imap = imap; - xfs_trim_extent_eof(&wpc->imap, ip); trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap); return 0; } From patchwork Thu Jan 17 19:20:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10768853 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BBDE188D for ; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C11C2FEEC for ; Thu, 17 Jan 2019 19:20:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E3FB32FE92; Thu, 17 Jan 2019 19:20:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B06642FF13 for ; Thu, 17 Jan 2019 19:20:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727592AbfAQTUI (ORCPT ); Thu, 17 Jan 2019 14:20:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60350 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727363AbfAQTUI (ORCPT ); Thu, 17 Jan 2019 14:20:08 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9C28B461F8 for ; Thu, 17 Jan 2019 19:20:07 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-66.bos.redhat.com [10.18.41.66]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3CE8D600C6 for ; Thu, 17 Jan 2019 19:20:07 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH v2 5/5] xfs: revalidate imap properly before writeback delalloc conversion Date: Thu, 17 Jan 2019 14:20:04 -0500 Message-Id: <20190117192004.49346-6-bfoster@redhat.com> In-Reply-To: <20190117192004.49346-1-bfoster@redhat.com> References: <20190117192004.49346-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 17 Jan 2019 19:20:07 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The writeback delalloc conversion code has a couple historical hacks to help deal with racing with other operations on the file under writeback. For example, xfs_iomap_write_allocate() checks i_size once under ilock to determine whether a truncate may have invalidated some portion of the range we're about to map. This helps prevent the bmapi call from doing physical allocation where only delalloc conversion was expected (which is a transaction reservation overrun vector). XFS_BMAPI_DELALLOC helps similarly protect us from this problem as it ensures that the bmapi call will only convert existing blocks. Neither of these actually ensure we actually pass a valid range to convert to xfs_bmapi_write(). The bmapi flag turns the aforementioned physical allocation over a hole problem into an explicit error, which means that any particular instances of that problem change from being a transaction overrun error to a writeback error. The latter means page discard, which is an ominous error even if due to the underlying block(s) being punched out. Now that we have a mechanism in place to track inode fork changes, use it in xfs_iomap_write_allocate() to properly handle potential changes to the cached writeback mapping and establish a valid range to map. Check the already provided fork seqno once under ilock. If it has changed, look up the extent again in the associated fork and trim the caller's mapping to the latest version. We can expect to still find the blocks backing the current file offset because the page is held locked during writeback and page cache truncation acquires the page lock and waits on writeback to complete before it can toss the page. This ensures that writeback never attempts to map a range of a file that has been punched out and never submits I/O to blocks that are no longer mapped to the file. Signed-off-by: Brian Foster Reviewed-by: Allison Henderson --- fs/xfs/xfs_iomap.c | 102 +++++++++++++++++++-------------------------- 1 file changed, 44 insertions(+), 58 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index ab69caa685b4..80da465ebf7a 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -677,22 +677,24 @@ xfs_file_iomap_begin_delay( */ int xfs_iomap_write_allocate( - xfs_inode_t *ip, - int whichfork, - xfs_off_t offset, - xfs_bmbt_irec_t *imap, - unsigned int *seq) + struct xfs_inode *ip, + int whichfork, + xfs_off_t offset, + struct xfs_bmbt_irec *imap, + unsigned int *seq) { - xfs_mount_t *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); - xfs_fileoff_t offset_fsb, last_block; - xfs_fileoff_t end_fsb, map_start_fsb; - xfs_filblks_t count_fsb; - xfs_trans_t *tp; - int nimaps; - int error = 0; - int flags = XFS_BMAPI_DELALLOC; - int nres; + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_trans *tp; + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec timap; + xfs_fileoff_t offset_fsb; + xfs_fileoff_t map_start_fsb; + xfs_filblks_t count_fsb; + int nimaps; + int error = 0; + int flags = XFS_BMAPI_DELALLOC; + int nres; if (whichfork == XFS_COW_FORK) flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC; @@ -737,56 +739,40 @@ xfs_iomap_write_allocate( xfs_trans_ijoin(tp, ip, 0); /* - * it is possible that the extents have changed since - * we did the read call as we dropped the ilock for a - * while. We have to be careful about truncates or hole - * punchs here - we are not allowed to allocate - * non-delalloc blocks here. - * - * The only protection against truncation is the pages - * for the range we are being asked to convert are - * locked and hence a truncate will block on them - * first. - * - * As a result, if we go beyond the range we really - * need and hit an delalloc extent boundary followed by - * a hole while we have excess blocks in the map, we - * will fill the hole incorrectly and overrun the - * transaction reservation. + * Now that we have ILOCK we must account for the fact + * that the fork (and thus our mapping) could have + * changed while the inode was unlocked. If the fork + * has changed, trim the caller's mapping to the + * current extent in the fork. * - * Using a single map prevents this as we are forced to - * check each map we look for overlap with the desired - * range and abort as soon as we find it. Also, given - * that we only return a single map, having one beyond - * what we can return is probably a bit silly. + * If the external change did not modify the current + * mapping (or just grew it) this will have no effect. + * If the current mapping shrunk, we expect to at + * minimum still have blocks backing the current page as + * the page has remained locked since writeback first + * located delalloc block(s) at the page offset. A + * racing truncate, hole punch or even reflink must wait + * on page writeback before it can modify our page and + * underlying block(s). * - * We also need to check that we don't go beyond EOF; - * this is a truncate optimisation as a truncate sets - * the new file size before block on the pages we - * currently have locked under writeback. Because they - * are about to be tossed, we don't need to write them - * back.... + * We'll update *seq before we drop ilock for the next + * iteration. */ - nimaps = 1; - end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)); - error = xfs_bmap_last_offset(ip, &last_block, - XFS_DATA_FORK); - if (error) - goto trans_cancel; - - last_block = XFS_FILEOFF_MAX(last_block, end_fsb); - if ((map_start_fsb + count_fsb) > last_block) { - count_fsb = last_block - map_start_fsb; - if (count_fsb == 0) { - error = -EAGAIN; + if (*seq != READ_ONCE(ifp->if_seq)) { + if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, + &icur, &timap) || + timap.br_startoff > offset_fsb) { + ASSERT(0); + error = -EFSCORRUPTED; goto trans_cancel; } + xfs_trim_extent(imap, timap.br_startoff, + timap.br_blockcount); + count_fsb = imap->br_blockcount; + map_start_fsb = imap->br_startoff; } - /* - * From this point onwards we overwrite the imap - * pointer that the caller gave to us. - */ + nimaps = 1; error = xfs_bmapi_write(tp, ip, map_start_fsb, count_fsb, flags, nres, imap, &nimaps);