From patchwork Tue Feb 28 17:36:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eryu Guan X-Patchwork-Id: 9596421 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1BC9760453 for ; Tue, 28 Feb 2017 17:41:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12B4D25223 for ; Tue, 28 Feb 2017 17:41:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0751F28159; Tue, 28 Feb 2017 17:41:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C13025223 for ; Tue, 28 Feb 2017 17:41:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751450AbdB1Rlr (ORCPT ); Tue, 28 Feb 2017 12:41:47 -0500 Received: from mail-pg0-f65.google.com ([74.125.83.65]:34274 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751243AbdB1Rl3 (ORCPT ); Tue, 28 Feb 2017 12:41:29 -0500 Received: by mail-pg0-f65.google.com with SMTP id s67so2358009pgb.1; Tue, 28 Feb 2017 09:40:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=W0KhEUwggI125mwRmZXx0+bDgoCnWACb+t+MrXX3nSI=; b=Wk5N7OgHMIo9/wTYccLD2JVlj+ivqARA11VGCkTJtXjzsMxXTS/kiYD0WfKRGeDLgJ oIxZAqVLK5KSHwC0a3mkTccw4GCsS0V3CE1wweMmlHVA81pibCSVurkt1V2oVuUNYDFj csd9oxD4hCWvt2G1+/wSM3JOJryUI2yyDeYEKQjBEsMu5oMqXEmm3eyGguHicv4XtRBp kCNvgRwArrhf7p0kaE+2jNoi6xg1V6A0KurjsiJES/P02pldU8NJHjGwMXyQ4wzsD+Je bcDoeTEsgueb36QsMZJCX2W1s+ZhsUzFSju0vmzJ3LQsBLHFhlqYPuxBwiemcwxv4wgV xA7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=W0KhEUwggI125mwRmZXx0+bDgoCnWACb+t+MrXX3nSI=; b=szYOlbKdUvCIvGACPIQEuVdeyP98J1f+yO6Ep+cM3Ki2VdY61twd3KrV5bp70JTcTb zey9jv1QfHzzSvoq3afKwEPtNuxHw/nZLpxkzQS2unABuDjjsq3/MhH3WRftxZpU8+Sh 26j1Lrbtxjel1F0wFRtSjllEmqcAKWGK4IRc1M7WxSnPw95C6PT4myb5Iwf9270t3OvJ agfMLNofUWeG8xzEDwg0UgchQo8NQm6DTeKyrIq9uJLah+Ydru7ldntxHl6FGF/fM2yZ wlEAQxuyVoB9Mz/4PkiD4zrocXWM4XYeH2ahuwYEjzLNCd9aOU8FD45cvAnAKaZlOCVW 9AAg== X-Gm-Message-State: AMke39moHA8uLxIL/ynvhyzHu74fZ+Zf3NcSmCXuK0tOdiVYpy+szIEZn17870Ttcw0+4w== X-Received: by 10.98.87.142 with SMTP id i14mr3774144pfj.85.1488303655038; Tue, 28 Feb 2017 09:40:55 -0800 (PST) Received: from localhost ([128.199.137.77]) by smtp.gmail.com with ESMTPSA id w29sm5461102pfi.131.2017.02.28.09.40.54 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 28 Feb 2017 09:40:54 -0800 (PST) From: Eryu Guan To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, hch@lst.de, Eryu Guan Subject: [PATCH] iomap: invalidate page caches should be after iomap_dio_complete() in direct write Date: Wed, 1 Mar 2017 01:36:26 +0800 Message-Id: <20170228173626.27640-1-guaneryu@gmail.com> X-Mailer: git-send-email 2.9.3 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eryu Guan After XFS switching to iomap based DIO (commit acdda3aae146 ("xfs: use iomap_dio_rw")), I started to notice dio29/dio30 tests failures from LTP run on ppc64 hosts, and they can be reproduced on x86_64 hosts with 512B/1k block size XFS too. dio29 diotest3 -b 65536 -n 100 -i 1000 -o 1024000 dio30 diotest6 -b 65536 -n 100 -i 1000 -o 1024000 The failure message is like: bufcmp: offset 0: Expected: 0x62, got 0x0 diotest03 1 TPASS : Read with Direct IO, Write without diotest03 2 TFAIL : diotest3.c:142: comparsion failed; child=98 offset=1425408 diotest03 3 TFAIL : diotest3.c:194: Write Direct-child 98 failed Direct write wrote 0x62 but buffer read got zero. This is because, when doing direct write to a hole or preallocated file, we invalidate the page caches before converting the extent from unwritten state to normal state, which is done by iomap_dio_complete(), thus leave a window for other buffer reader to cache the unwritten state extent. Consider this case, with sub-page blocksize XFS, two processes are direct writing to different blocksize-aligned regions (say 512B) of the same preallocated file, and reading the region back via buffered I/O to compare contents. process A, region [0,512] process B, region [512,1024] xfs_file_write_iter xfs_file_aio_dio_write iomap_dio_rw iomap_apply invalidate_inode_pages2_range xfs_file_write_iter xfs_file_aio_dio_write iomap_dio_rw iomap_apply invalidate_inode_pages2_range iomap_dio_complete xfs_file_read_iter xfs_file_buffered_aio_read generic_file_read_iter do_generic_file_read iomap_dio_complete xfs_file_read_iter Process A first invalidates page caches, at this point the underlying extent is still in unwritten state (iomap_dio_complete not called yet), and process B finishs direct write and populates page caches via readahead, which caches zeros in page for region A, then process A reads zeros from page cache, instead of the actual data. Fix it by invalidating page caches after converting unwritten extent to make sure we read content from disk after extent state changed, as what we did before switching to iomap based dio. Also introduce a new 'start' variable to save the original write offset (iomap_dio_complete() updates iocb->ki_pos), and a 'res' variable for invalidating caches result, cause we can't reuse 'ret' anymore. Signed-off-by: Eryu Guan --- fs/iomap.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 0f85f24..d5e4ea0 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -844,10 +844,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = file_inode(iocb->ki_filp); size_t count = iov_iter_count(iter); - loff_t pos = iocb->ki_pos, end = iocb->ki_pos + count - 1, ret = 0; + loff_t pos = iocb->ki_pos, start = pos; + loff_t end = iocb->ki_pos + count - 1, ret = 0; unsigned int flags = IOMAP_DIRECT; struct blk_plug plug; struct iomap_dio *dio; + int res; lockdep_assert_held(&inode->i_rwsem); @@ -885,14 +887,13 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, } if (mapping->nrpages) { - ret = filemap_write_and_wait_range(mapping, iocb->ki_pos, end); + ret = filemap_write_and_wait_range(mapping, start, end); if (ret) goto out_free_dio; - ret = invalidate_inode_pages2_range(mapping, - iocb->ki_pos >> PAGE_SHIFT, end >> PAGE_SHIFT); - WARN_ON_ONCE(ret); - ret = 0; + res = invalidate_inode_pages2_range(mapping, + start >> PAGE_SHIFT, end >> PAGE_SHIFT); + WARN_ON_ONCE(res); } inode_dio_begin(inode); @@ -939,6 +940,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, __set_current_state(TASK_RUNNING); } + ret = iomap_dio_complete(dio); + /* * Try again to invalidate clean pages which might have been cached by * non-direct readahead, or faulted in by get_user_pages() if the source @@ -947,12 +950,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, * this invalidation fails, tough, the write still worked... */ if (iov_iter_rw(iter) == WRITE && mapping->nrpages) { - ret = invalidate_inode_pages2_range(mapping, - iocb->ki_pos >> PAGE_SHIFT, end >> PAGE_SHIFT); - WARN_ON_ONCE(ret); + res = invalidate_inode_pages2_range(mapping, + start >> PAGE_SHIFT, end >> PAGE_SHIFT); + WARN_ON_ONCE(res); } - return iomap_dio_complete(dio); + return ret; out_free_dio: kfree(dio);