From patchwork Thu Nov 8 22:19:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 10675063 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E86015A6 for ; Thu, 8 Nov 2018 22:19:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4F5622E42E for ; Thu, 8 Nov 2018 22:19:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 440272E461; Thu, 8 Nov 2018 22:19:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9B562E42E for ; Thu, 8 Nov 2018 22:19:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731066AbeKIH4v (ORCPT ); Fri, 9 Nov 2018 02:56:51 -0500 Received: from ipmail03.adl2.internode.on.net ([150.101.137.141]:35615 "EHLO ipmail03.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728311AbeKIH4u (ORCPT ); Fri, 9 Nov 2018 02:56:50 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail03.adl2.internode.on.net with ESMTP; 09 Nov 2018 08:49:15 +1030 Received: from discord.disaster.area ([192.168.1.111]) by dastard with esmtp (Exim 4.80) (envelope-from ) id 1gKsdr-00023l-Nu; Fri, 09 Nov 2018 09:19:11 +1100 Received: from dave by discord.disaster.area with local (Exim 4.91) (envelope-from ) id 1gKsdr-0008IQ-Ms; Fri, 09 Nov 2018 09:19:11 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 1/2] vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP Date: Fri, 9 Nov 2018 09:19:08 +1100 Message-Id: <20181108221909.27602-2-david@fromorbit.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181108221909.27602-1-david@fromorbit.com> References: <20181108221909.27602-1-david@fromorbit.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner It returns EINVAL when the operation is not supported by the filesystem. Fix it to return EOPNOTSUPP to be consistent with the man page and clone_file_range(). Clean up the inconsistent error return handling while I'm there. (I know, lipstick on a pig, but every little bit helps...) Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/read_write.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index bfcb4ced5664..aa43224bcec6 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -2094,17 +2094,18 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same) off = same->src_offset; len = same->src_length; - ret = -EISDIR; + if (!file->f_op->remap_file_range) + return -EOPNOTSUPP; + if (S_ISDIR(src->i_mode)) - goto out; + return -EISDIR; - ret = -EINVAL; if (!S_ISREG(src->i_mode)) - goto out; + return -EINVAL; ret = remap_verify_area(file, off, len, false); if (ret < 0) - goto out; + return ret; ret = 0; if (off + len > i_size_read(src)) @@ -2147,10 +2148,8 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same) fdput(dst_fd); next_loop: if (fatal_signal_pending(current)) - goto out; + break; } - -out: return ret; } EXPORT_SYMBOL(vfs_dedupe_file_range); From patchwork Thu Nov 8 22:19:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 10675067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 368B414D6 for ; Thu, 8 Nov 2018 22:19:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 280C82E194 for ; Thu, 8 Nov 2018 22:19:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1C7302E42E; Thu, 8 Nov 2018 22:19:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 99DDD2E40F for ; Thu, 8 Nov 2018 22:19:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731329AbeKIH4w (ORCPT ); Fri, 9 Nov 2018 02:56:52 -0500 Received: from ipmail03.adl2.internode.on.net ([150.101.137.141]:35615 "EHLO ipmail03.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728535AbeKIH4w (ORCPT ); Fri, 9 Nov 2018 02:56:52 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail03.adl2.internode.on.net with ESMTP; 09 Nov 2018 08:49:15 +1030 Received: from discord.disaster.area ([192.168.1.111]) by dastard with esmtp (Exim 4.80) (envelope-from ) id 1gKsdr-00023m-Of; Fri, 09 Nov 2018 09:19:11 +1100 Received: from dave by discord.disaster.area with local (Exim 4.91) (envelope-from ) id 1gKsdr-0008IT-Nd; Fri, 09 Nov 2018 09:19:11 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 2/2] iomap: dio data corruption and spurious errors when pipes fill Date: Fri, 9 Nov 2018 09:19:09 +1100 Message-Id: <20181108221909.27602-3-david@fromorbit.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181108221909.27602-1-david@fromorbit.com> References: <20181108221909.27602-1-david@fromorbit.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner When doing direct IO to a pipe for do_splice_direct(), then pipe is trivial to fill up and overflow as it can only hold 16 pages. At this point bio_iov_iter_get_pages() then returns -EFAULT, and we abort the IO submission process. Unfortunately, iomap_dio_rw() propagates the error back up the stack. The error is converted from the EFAULT to EAGAIN in generic_file_splice_read() to tell the splice layers that the pipe is full. do_splice_direct() completely fails to handle EAGAIN errors (it aborts on error) and returns EAGAIN to the caller. copy_file_write() then compeltely fails to handle EAGAIN as well, and so returns EAGAIN to userspace, having failed to copy the data it was asked to. Avoid this whole steaming pile of fail by having iomap_dio_rw() silently swallow EFAULT errors and so do short reads. To make matters worse, iomap_dio_actor() has a stale data exposure bug bio_iov_iter_get_pages() fails - it does not zero the tail block that it may have been left uncovered by partial IO. Fix the error handling case to drop to the sub-block zeroing rather than immmediately returning the -EFAULT error. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/iomap.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 7aacd48c593e..2fda13bc37d8 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1775,7 +1775,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct bio *bio; bool need_zeroout = false; bool use_fua = false; - int nr_pages, ret; + int nr_pages, ret = 0; size_t copied = 0; if ((pos | length | align) & ((1 << blkbits) - 1)) @@ -1839,8 +1839,14 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, ret = bio_iov_iter_get_pages(bio, &iter); if (unlikely(ret)) { + /* + * We have to stop part way through an IO. We must fall through + * to the sub-block tail zeroing here, otherwise this + * short IO may expose stale data in the tail of the + * block we haven't written data to. + */ bio_put(bio); - return copied ? copied : ret; + goto zero_tail; } n = bio->bi_iter.bi_size; @@ -1877,6 +1883,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, * the block tail in the latter case, we can expose stale data via mmap * reads of the EOF block. */ +zero_tail: if (need_zeroout || ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) { /* zero out from the end of the write to the end of the block */ @@ -1884,7 +1891,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, if (pad) iomap_dio_zero(dio, iomap, pos, fs_block_size - pad); } - return copied; + return copied ? copied : ret; } static loff_t @@ -2079,6 +2086,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->wait_for_completion = true; ret = 0; } + + /* + * splicing to pipes can fail on a full pipe. We have to + * swallow this to make it look like a short IO + * otherwise the higher splice layers will completely + * mishandle the error and stop moving data. + */ + if (ret == -EFAULT) + ret = 0; break; } pos += ret; From patchwork Fri Nov 9 00:54:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 10675189 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D42FC1751 for ; Fri, 9 Nov 2018 00:54:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1EF22E95E for ; Fri, 9 Nov 2018 00:54:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AAD762E971; Fri, 9 Nov 2018 00:54:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SUBJ_OBFU_PUNCT_FEW autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 547E22E966 for ; Fri, 9 Nov 2018 00:54:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727486AbeKIKcX (ORCPT ); Fri, 9 Nov 2018 05:32:23 -0500 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:55847 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbeKIKcX (ORCPT ); Fri, 9 Nov 2018 05:32:23 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail06.adl2.internode.on.net with ESMTP; 09 Nov 2018 11:24:10 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1gKv3q-0002E1-5x; Fri, 09 Nov 2018 11:54:10 +1100 Date: Fri, 9 Nov 2018 11:54:10 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 3/2] splice: increase pipe size in splice_direct_to_actor() Message-ID: <20181109005410.GG19305@dastard> References: <20181108221909.27602-1-david@fromorbit.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181108221909.27602-1-david@fromorbit.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner When copy_file_range() is called on files that have been opened with O_DIRECT, do_splice_direct() does a manual copy of the range one pipe buffer at a time. The default is 16 pages, which means on x86_64 it is limited to 64kB IO. This is extremely slow - 64k synchrnous read/write will run at maybe 5-10MB/s on a spinning disk and be seek bound. It will be faster on SSDs, but still very inefficient. Increase the pipe size to the maximum allowed user size so that we can get decent throughput for this highly sub-optimal copy loop. Add a new function to the pipe code that lets us set the pipe size to the maximum allowed without root permissions to keep things really simple. We also don't care if changing the pipe size fails - that will just result in a slower copy. Signed-off-by: Dave Chinner --- fs/pipe.c | 10 ++++++++++ fs/splice.c | 7 +++++++ include/linux/pipe_fs_i.h | 1 + 3 files changed, 18 insertions(+) diff --git a/fs/pipe.c b/fs/pipe.c index bdc5d3c0977d..436bc0464569 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1109,6 +1109,16 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) return ret; } +/* + * Set the pipe to the maximum allowable user size. Advisory only, will + * swallow any errors and return the resultant pipe size. + */ +long pipe_set_max_safe_size(struct pipe_inode_info *pipe) +{ + pipe_set_size(pipe, pipe_max_size); + return pipe->buffers * PAGE_SIZE; +} + /* * After the inode slimming patch, i_pipe/i_bdev/i_cdev share the same * location, so checking ->i_pipe is not enough to verify that this is a diff --git a/fs/splice.c b/fs/splice.c index 3553f1956508..9749139da731 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -931,6 +931,13 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, current->splice_pipe = pipe; } + /* + * Try to increase the data holding capacity of the pipe so we can do + * larger IOs. This may not increase the size at all because maximum + * user pipe size is administrator controlled, but we still should try. + */ + pipe_set_max_safe_size(pipe); + /* * Do the splice. */ diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 5a3bb3b7c9ad..962ba4cfcb74 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -191,5 +191,6 @@ struct pipe_inode_info *get_pipe_info(struct file *file); int create_pipe_files(struct file **, int); unsigned int round_pipe_size(unsigned long size); +long pipe_set_max_safe_size(struct pipe_inode_info *pipe); #endif