From patchwork Fri Jun 5 20:48:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11590469 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63A7990 for ; Fri, 5 Jun 2020 20:48:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 497C4206FA for ; Fri, 5 Jun 2020 20:48:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728277AbgFEUst (ORCPT ); Fri, 5 Jun 2020 16:48:49 -0400 Received: from mx2.suse.de ([195.135.220.15]:47742 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728226AbgFEUst (ORCPT ); Fri, 5 Jun 2020 16:48:49 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id C4770AC51; Fri, 5 Jun 2020 20:48:50 +0000 (UTC) From: Goldwyn Rodrigues To: darrick.wong@oracle.com Cc: linux-btrfs@vger.kernel.org, fdmanana@gmail.com, linux-fsdevel@vger.kernel.org, hch@lst.de, Goldwyn Rodrigues Subject: [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation Date: Fri, 5 Jun 2020 15:48:36 -0500 Message-Id: <20200605204838.10765-2-rgoldwyn@suse.de> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200605204838.10765-1-rgoldwyn@suse.de> References: <20200605204838.10765-1-rgoldwyn@suse.de> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Filesystems such as btrfs are unable to guarantee page invalidation because extents could be locked because of an ongoing I/O. This happens even though a filemap_write_and_wait() has been called because btrfs locks extents in a separate cleanup thread until all ordered extents in range have performed the tree changes. Return zero in case a page cache invalidation is unsuccessful so filesystems can fallback to buffered I/O. This takes care of the following invalidation warning during btrfs mixed buffered and direct I/O using iomap_dio_rw(): Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O! This is similar to the behavior of generic_file_direct_write(). Signed-off-by: Goldwyn Rodrigues --- fs/iomap/direct-io.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index e4addfc58107..215315be6233 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -483,9 +483,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, */ ret = invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT); - if (ret) - dio_warn_stale_pagecache(iocb->ki_filp); - ret = 0; + /* + * If a page can not be invalidated, return 0 to fall back + * to buffered write. + */ + if (ret) { + if (ret == -EBUSY) + ret = 0; + goto out_free_dio; + } if (iov_iter_rw(iter) == WRITE && !wait_for_completion && !inode->i_sb->s_dio_done_wq) { From patchwork Fri Jun 5 20:48:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11590473 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 641EF159A for ; Fri, 5 Jun 2020 20:48:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54D0B206FA for ; Fri, 5 Jun 2020 20:48:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728316AbgFEUsw (ORCPT ); Fri, 5 Jun 2020 16:48:52 -0400 Received: from mx2.suse.de ([195.135.220.15]:47762 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728226AbgFEUsw (ORCPT ); Fri, 5 Jun 2020 16:48:52 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 43E26ABE4; Fri, 5 Jun 2020 20:48:53 +0000 (UTC) From: Goldwyn Rodrigues To: darrick.wong@oracle.com Cc: linux-btrfs@vger.kernel.org, fdmanana@gmail.com, linux-fsdevel@vger.kernel.org, hch@lst.de, Goldwyn Rodrigues , Johannes Thumshirn , Nikolay Borisov Subject: [PATCH 2/3] btrfs: Wait for extent bits to release page Date: Fri, 5 Jun 2020 15:48:37 -0500 Message-Id: <20200605204838.10765-3-rgoldwyn@suse.de> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200605204838.10765-1-rgoldwyn@suse.de> References: <20200605204838.10765-1-rgoldwyn@suse.de> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues While trying to release a page, the extent containing the page may be locked which would stop the page from being released. Wait for the extent lock to be cleared, if blocking is allowed and then clear the bits. While we are at it, clean the code of try_release_extent_state() to make it simpler. Reviewed-by: Johannes Thumshirn Reviewed-by: Nikolay Borisov Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/extent_io.c | 37 ++++++++++++++++--------------------- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 4 ++-- 3 files changed, 19 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c59e07360083..0ab444d2028d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4466,33 +4466,28 @@ int extent_invalidatepage(struct extent_io_tree *tree, * are locked or under IO and drops the related state bits if it is safe * to drop the page. */ -static int try_release_extent_state(struct extent_io_tree *tree, +static bool try_release_extent_state(struct extent_io_tree *tree, struct page *page, gfp_t mask) { u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; - int ret = 1; if (test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) { - ret = 0; - } else { - /* - * at this point we can safely clear everything except the - * locked bit and the nodatasum bit - */ - ret = __clear_extent_bit(tree, start, end, - ~(EXTENT_LOCKED | EXTENT_NODATASUM), - 0, 0, NULL, mask, NULL); - - /* if clear_extent_bit failed for enomem reasons, - * we can't allow the release to continue. - */ - if (ret < 0) - ret = 0; - else - ret = 1; + if (!gfpflags_allow_blocking(mask)) + return false; + wait_extent_bit(tree, start, end, EXTENT_LOCKED); } - return ret; + /* + * At this point we can safely clear everything except the locked and + * nodatasum bits. If clear_extent_bit failed due to -ENOMEM, + * don't allow release. + */ + if (__clear_extent_bit(tree, start, end, + ~(EXTENT_LOCKED | EXTENT_NODATASUM), 0, 0, + NULL, mask, NULL) < 0) + return false; + + return true; } /* @@ -4500,7 +4495,7 @@ static int try_release_extent_state(struct extent_io_tree *tree, * in the range corresponding to the page, both state records and extent * map records are removed */ -int try_release_extent_mapping(struct page *page, gfp_t mask) +bool try_release_extent_mapping(struct page *page, gfp_t mask) { struct extent_map *em; u64 start = page_offset(page); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 9a10681b12bf..6cba4ad6ebc1 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -189,7 +189,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode, struct page *page, size_t pg_offset, u64 start, u64 len); -int try_release_extent_mapping(struct page *page, gfp_t mask); +bool try_release_extent_mapping(struct page *page, gfp_t mask); int try_release_extent_buffer(struct page *page); int extent_read_full_page(struct page *page, get_extent_t *get_extent, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1242d0aa108d..8cb44c49c1d2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7887,8 +7887,8 @@ btrfs_readpages(struct file *file, struct address_space *mapping, static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags) { - int ret = try_release_extent_mapping(page, gfp_flags); - if (ret == 1) { + bool ret = try_release_extent_mapping(page, gfp_flags); + if (ret) { ClearPagePrivate(page); set_page_private(page, 0); put_page(page); From patchwork Fri Jun 5 20:48:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11590477 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CED8990 for ; Fri, 5 Jun 2020 20:48:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B7228206FA for ; Fri, 5 Jun 2020 20:48:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728328AbgFEUsy (ORCPT ); Fri, 5 Jun 2020 16:48:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:47776 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728226AbgFEUsy (ORCPT ); Fri, 5 Jun 2020 16:48:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id B9126AC51; Fri, 5 Jun 2020 20:48:55 +0000 (UTC) From: Goldwyn Rodrigues To: darrick.wong@oracle.com Cc: linux-btrfs@vger.kernel.org, fdmanana@gmail.com, linux-fsdevel@vger.kernel.org, hch@lst.de, Goldwyn Rodrigues Subject: [PATCH 3/3] xfs: fallback to buffered I/O if direct I/O is short Date: Fri, 5 Jun 2020 15:48:38 -0500 Message-Id: <20200605204838.10765-4-rgoldwyn@suse.de> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200605204838.10765-1-rgoldwyn@suse.de> References: <20200605204838.10765-1-rgoldwyn@suse.de> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Most filesystems such as ext4 and btrfs fallback to buffered I/O in case direct write's fail. In case direct I/O is short, fallback to buffered write to complete the I/O. Make sure the data is on disk by performing a filemap_write_and_wait_range() and invalidating the pages in the range. For reads, call xfs_file_buffered_aio_read() in case of short I/O. Signed-off-by: Goldwyn Rodrigues --- fs/xfs/xfs_file.c | 41 ++++++++++++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 4b8bdecc3863..786391228dea 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -31,6 +31,10 @@ #include static const struct vm_operations_struct xfs_file_vm_ops; +STATIC ssize_t xfs_file_buffered_aio_write(struct kiocb *iocb, + struct iov_iter *from); +STATIC ssize_t xfs_file_buffered_aio_read(struct kiocb *iocb, + struct iov_iter *to); int xfs_update_prealloc_flags( @@ -169,6 +173,7 @@ xfs_file_dio_aio_read( struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); size_t count = iov_iter_count(to); ssize_t ret; + ssize_t buffered_read = 0; trace_xfs_file_direct_read(ip, count, iocb->ki_pos); @@ -187,7 +192,13 @@ xfs_file_dio_aio_read( is_sync_kiocb(iocb)); xfs_iunlock(ip, XFS_IOLOCK_SHARED); - return ret; + if (ret < 0 || ret == count) + return ret; + + iocb->ki_flags &= ~IOCB_DIRECT; + buffered_read = xfs_file_buffered_aio_read(iocb, to); + + return ret + buffered_read; } static noinline ssize_t @@ -483,6 +494,9 @@ xfs_file_dio_aio_write( int iolock; size_t count = iov_iter_count(from); struct xfs_buftarg *target = xfs_inode_buftarg(ip); + loff_t offset, end; + ssize_t buffered_write = 0; + int err; /* DIO must be aligned to device logical sector size */ if ((iocb->ki_pos | count) & target->bt_logical_sectormask) @@ -552,12 +566,25 @@ xfs_file_dio_aio_write( out: xfs_iunlock(ip, iolock); - /* - * No fallback to buffered IO on errors for XFS, direct IO will either - * complete fully or fail. - */ - ASSERT(ret < 0 || ret == count); - return ret; + if (ret < 0 || ret == count) + return ret; + + /* Fallback to buffered write */ + + offset = iocb->ki_pos; + + buffered_write = xfs_file_buffered_aio_write(iocb, from); + if (buffered_write < 0) + return ret; + + end = offset + buffered_write - 1; + + err = filemap_write_and_wait_range(mapping, offset, end); + if (!err) + invalidate_mapping_pages(mapping, offset >> PAGE_SHIFT, + end >> PAGE_SHIFT); + + return ret + buffered_write; } static noinline ssize_t