From patchwork Thu Oct 12 11:47:54 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10001703 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D79BA6028A for ; Thu, 12 Oct 2017 11:47:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C828128CFB for ; Thu, 12 Oct 2017 11:47:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD10C28D7E; Thu, 12 Oct 2017 11:47:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1866328CFB for ; Thu, 12 Oct 2017 11:47:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752705AbdJLLr5 (ORCPT ); Thu, 12 Oct 2017 07:47:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45210 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752581AbdJLLrz (ORCPT ); Thu, 12 Oct 2017 07:47:55 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9FF8B5BED9 for ; Thu, 12 Oct 2017 11:47:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9FF8B5BED9 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=bfoster@redhat.com Received: from bfoster.bfoster (dhcp-41-20.bos.redhat.com [10.18.41.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 730AC18A68 for ; Thu, 12 Oct 2017 11:47:55 +0000 (UTC) Received: by bfoster.bfoster (Postfix, from userid 1000) id 644EF1213A7; Thu, 12 Oct 2017 07:47:54 -0400 (EDT) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH] xfs: trim writepage mapping to within eof Date: Thu, 12 Oct 2017 07:47:54 -0400 Message-Id: <20171012114754.39626-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 12 Oct 2017 11:47:55 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The writeback rework in commit fbcc02561359 ("xfs: Introduce writeback context for writepages") introduced a subtle change in behavior with regard to the block mapping used across the ->writepages() sequence. The previous xfs_cluster_write() code would only flush pages up to EOF at the time of the writepage, thus ensuring that any pages due to file-extending writes would be handled on a separate cycle and with a new, updated block mapping. The updated code establishes a block mapping in xfs_writepage_map() that could extend beyond EOF if the file has post-eof preallocation. Because we now use the generic writeback infrastructure and pass the cached mapping to each writepage call, there is no implicit EOF limit in place. If eofblocks trimming occurs during ->writepages(), any post-eof portion of the cached mapping becomes invalid. The eofblocks code has no means to serialize against writeback because there are no pages associated with post-eof blocks. Therefore if an eofblocks trim occurs and is followed by a file-extending buffered write, not only has the mapping become invalid, but we could end up writing a page to disk based on the invalid mapping. Consider the following sequence of events: - A buffered write creates a delalloc extent and post-eof speculative preallocation. - Writeback starts and on the first writepage cycle, the delalloc extent is converted to real blocks (including the post-eof blocks) and the mapping is cached. - The file is closed and xfs_release() trims post-eof blocks. The cached writeback mapping is now invalid. - Another buffered write appends the file with a delalloc extent. - The concurrent writeback cycle picks up the just written page because the writeback range end is LLONG_MAX. xfs_writepage_map() attributes it to the (now invalid) cached mapping and writes the data to an incorrect location on disk (and where the file offset is still backed by a delalloc extent). This problem is reproduced by xfstests test generic/463, which triggers racing writes, appends, open/closes and writeback requests. To address this problem, trim the mapping used during writeback to within EOF when the mapping is created. This ensures the mapping is revalidated for any pages encountered beyond EOF as of the time the current mapping was cached. Reported-by: Eryu Guan Diagnosed-by: Eryu Guan Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong --- Hi all, This is a followup to the issue Eryu tracked down, described here[1]. Note that this patch will not deal with any writeback mapping validity issues not associated with eofblocks management. Dave is working on a more generic approach to deal with such problems. This patch is intended to be a targeted and backportable fix for the regression in the writeback code. Brian [1] https://marc.info/?l=linux-xfs&m=150406724427829&w=2 fs/xfs/libxfs/xfs_bmap.c | 11 +++++++++++ fs/xfs/libxfs/xfs_bmap.h | 1 + fs/xfs/xfs_aops.c | 6 ++++-- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 044a363..dd3fb7b 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3852,6 +3852,17 @@ xfs_trim_extent( } } +/* trim extent to within eof */ +void +xfs_trim_extent_eof( + struct xfs_bmbt_irec *irec, + struct xfs_inode *ip) + +{ + xfs_trim_extent(irec, 0, XFS_B_TO_FSB(ip->i_mount, + i_size_read(VFS_I(ip)))); +} + /* * Trim the returned map to the required bounds */ diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 851982a..502e0d8 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -208,6 +208,7 @@ void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt, void xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno, xfs_filblks_t len); +void xfs_trim_extent_eof(struct xfs_bmbt_irec *, struct xfs_inode *); int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd); void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork); void xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_defer_ops *dfops, diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 1dbc5cf..3ab6d9d 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -423,7 +423,7 @@ xfs_map_blocks( imap); if (!error) trace_xfs_map_blocks_alloc(ip, offset, count, type, imap); - return error; + goto out_trim; } #ifdef DEBUG @@ -435,7 +435,9 @@ xfs_map_blocks( #endif if (nimaps) trace_xfs_map_blocks_found(ip, offset, count, type, imap); - return 0; +out_trim: + xfs_trim_extent_eof(imap, ip); + return error; } STATIC bool