From patchwork Tue Feb 14 17:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13140603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97A8AC64ED6 for ; Tue, 14 Feb 2023 17:13:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 899CA6B007E; Tue, 14 Feb 2023 12:13:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 84A3C6B0080; Tue, 14 Feb 2023 12:13:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EA236B0081; Tue, 14 Feb 2023 12:13:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5D85E6B007E for ; Tue, 14 Feb 2023 12:13:52 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 017A31401F5 for ; Tue, 14 Feb 2023 17:13:51 +0000 (UTC) X-FDA: 80466544704.18.8353A0D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 43EA880011 for ; Tue, 14 Feb 2023 17:13:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ECxwjEg7; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676394830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wrfy5DSUofzooKuydOZLN29ccYXYPtSMXDanySrmvRI=; b=HO8J+s82TREJJM9B7F4JpPnCxe2ze+t+0aY1cNS3ps/4bdTbvZSGrIiVW5pXr5+AkmtFj4 m3b8erUBpW3jI8Du2ts7n1zR63Qh4GFnNXxsN1u09XzG3TAL9/PiMH5H6/wAeGUOjWbX4l N+RJi1X9/8D2MmvXYrVlAGFIUb3DJX0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ECxwjEg7; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676394830; a=rsa-sha256; cv=none; b=PM92b/VdaP6TMBLDtfawjg/PNzaODuh1dFHHYLUKKgKlYip2UQV7DxmfeiXcCwV4Zf7xhJ bBrHmMr9Bfj7jwXsgX37jCqMkpHxX0jL+dyRYJNDdBZRuhLTH2VG0AAmMXZCnR6dU/bpfH CAfuC5+fXun8LFO7a5EjgG+sgq6COjw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676394829; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wrfy5DSUofzooKuydOZLN29ccYXYPtSMXDanySrmvRI=; b=ECxwjEg72Aygmn9QqzZA8toC1d6o9uldlZEsTn4ThgDYTO+eQLPgsyqeNTS4s3J0h6tURC mOTcphw7r0NvOEu69u+JhFjJt1qYFuqdSInzlJalRL8NxWN1k0RRN+UtD0+olnv5ZVajsv S5522d92MnFjjDKCKaKhZlc3cMtCryU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-593-llfpyhjxOKyRI8fnUf8UoA-1; Tue, 14 Feb 2023 12:13:44 -0500 X-MC-Unique: llfpyhjxOKyRI8fnUf8UoA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6EDA11C08795; Tue, 14 Feb 2023 17:13:43 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 53202140EBF6; Tue, 14 Feb 2023 17:13:41 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com, Christoph Hellwig , John Hubbard Subject: [PATCH v14 03/17] splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE Date: Tue, 14 Feb 2023 17:13:16 +0000 Message-Id: <20230214171330.2722188-4-dhowells@redhat.com> In-Reply-To: <20230214171330.2722188-1-dhowells@redhat.com> References: <20230214171330.2722188-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 43EA880011 X-Stat-Signature: i3f464w7d9w6kozqd6wa9sdg5b6uejdc X-HE-Tag: 1676394830-188795 X-HE-Meta: U2FsdGVkX1+aLYzUyYqdZc+XgSq4AO3fo0eC3QuJIJs0CeTaj52yzcb3IqgoxJXT36T7KxgeWX4vTZzIEqnrTSEZmoQFO5wimmaK6Z3N/dpeW7Tac01yDuxeTCAarN6KpJwmAM4Gw0kvcqxxo31vWJ4IOcqQjCR5pUt2/OSJBQozaslQF1Y+s0emZWYwOoI/2wF7+PtENUxF/ts8izqIFRiw0rjLR9ytnnQmEcn6RChdAvr4FEYN0hIfHGJAhdsjNe9lKccO1Z88xoq5AGGAHta7mwBjucV2HuE/E2KdqnG3ugd65zVUzpawJ6xORP7w5efFklLMPvLPHbsfgbE0qFkjiKi7oKI4nl+Sh4UUVboOqvgcPFo+C5VhsT/La8ZxNJmMNfHwTP4OBvWwi8wUJYRSbtujFmqamiWk347hlPkDmgV0jwYOzOp3X7opSlvfAg9mPgVZeT9L1urXcMzuva/K/S+WvO2Sz09KMJrRB871kRKKEOV/EzaEwnF3wXMSz+t64KP4BdjYrm1p7WBUnsHd05ldyT/sXFl+Jod7jt8PV7znfs3+c8XKkt1xAzimB86S9A5G4E0AxBnBnyEGan3OUSFZLAKb9mZdpMoJES2//pH90goyxaqvRWFfbYy9P+4A6qMmEjLgocjaLwrEUIjAqDy9gpk6zKBe+yb1D2DZWI/0sThujYmqyYiWMOYJHW/0d1cgXFZc1Pb0FfYYSUNpgLBO5/kdLZ0RyLPbQJwlIqpcU9Cz9HpdneT1Ghcc7LLbXlITju0Snqs0cIEMotNZe3ikXFqEdI2BTbha9cvT187UDrEI2rk0VuRBIRm+1m00ZaBhidd7ZQl9CXrETZYAwjZ88K0aoWPGZd8sw7cnDNaPQD0EjpvbKQfc9iCIxcu4Z0rLJJ8v1qysj1R7NRa7CuV9bZosgX6KDrJXmCFuecE1+X0UW2sIWjvqSdGp2uaOrJvWTPtaTF/pMDv QXsmCSum iS3fXON5FcPN5BMUdmR/NqHDdP9hpNt1syLGN+O4zFzXLSrET9J53l7LVBprHaCVIG5dUWUksFKwSKEKZ51KAimdjnV+nO0EaojAn4qEjiB1GDDjAO8FJ5O+XP+mG9jMdBQzQ8XjK0K7/GlALfqrFEaSBWhsBXS69hbN/CuNRHwyLb021H/QGbcio7UcagPz6TrH4cAzTZCxuO4zHN/hLXzYQMhmcR4YYlD/2+mxH1dD/OLCoY0BesmsKJKGjabVHYWi1iyrnA63is9I2OdD0ybzzOYysaZ9Dllo7QO3YowSFsqMYN/77HWQCHRVeVl7yLZrPn0fMQhyhGFp2cnBr+V9JQ0ZyKljgj2uUJs3fpiNkPh5LZ3YP0XBX/EYpT8EwdMILgrCXEgyqwUWC8ydnpBpjCeRhuhrvoF82S5RzC3kT3vUWEAnN1J598xCw9eSJpASxLtI7gDMeCL4roJlJI92QuolPVs7H6664e3AD1teiyN2GMZLR1w9iPvUziO+yo9zcNRMVCyPrMthkR08q9qw/CSA1BpFKMpTf8eEMvvj5+IVY6wls9qnPH0jnJFFnguUMPHVX0H4MrXplrO0EoUHysuHKC7gPjwZslF7ruIi10DiaOLJOoGvo/66WlFsOHn3ngOM4yj142L4nCiMIsNyyfoZrMPchISpPxTJBzNNEL7VR7EdB6t8Qxj4RD5xn3NoKEyYIGqWsxFcR7GtIX21R1CVqd/PRIJBNpZz0qciiQLUeZ17Z07eOmA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Implement a function, direct_file_splice(), that deals with this by using an ITER_BVEC iterator instead of an ITER_PIPE iterator as the former won't free its buffers when reverted. The function bulk allocates all the buffers it thinks it is going to use in advance, does the read synchronously and only then trims the buffer down. The pages we did use get pushed into the pipe. This fixes a problem with the upcoming iov_iter_extract_pages() function, whereby pages extracted from a non-user-backed iterator such as ITER_PIPE aren't pinned. __iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator to just the bufferage it is going to use - which has the side-effect of freeing the excess pipe buffers, even though they're attached to a bio and may get written to by DMA (thanks to Hillf Danton for spotting this[1]). This then causes memory corruption that is particularly noticable when the syzbot test[2] is run. The test boils down to: out = creat(argv[1], 0666); ftruncate(out, 0x800); lseek(out, 0x200, SEEK_SET); in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); sendfile(out, in, NULL, 0x1dd00); run repeatedly in parallel. What I think is happening is that ftruncate() occasionally shortens the DIO read that's about to be made by sendfile's splice core by reducing i_size. This should be more efficient for DIO read by virtue of doing a bulk page allocation, but slightly less efficient by ignoring any partial page in the pipe. Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com Signed-off-by: David Howells cc: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1] Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2] --- Notes: ver #14) - Use alloc_pages_bulk_array() rather than alloc_pages_bulk_list(). - Use release_pages() rather than a loop calling __free_page(). - Rename to direct_splice_read(). - Don't call from generic_file_splice_read() yet. ver #13) - Don't completely replace generic_file_splice_read(), but rather only use this if we're doing a splicing from an O_DIRECT file fd. fs/splice.c | 92 +++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 3 ++ include/linux/pipe_fs_i.h | 20 +++++++++ lib/iov_iter.c | 6 --- 4 files changed, 115 insertions(+), 6 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 5969b7a1d353..4c6332854b63 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -282,6 +282,98 @@ void splice_shrink_spd(struct splice_pipe_desc *spd) kfree(spd->partial); } +/* + * Splice data from an O_DIRECT file into pages and then add them to the output + * pipe. + */ +ssize_t direct_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags) +{ + struct iov_iter to; + struct bio_vec *bv; + struct kiocb kiocb; + struct page **pages; + ssize_t ret; + size_t used, npages, chunk, remain, reclaim; + int i; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + npages = DIV_ROUND_UP(len, PAGE_SIZE); + + bv = kzalloc(array_size(npages, sizeof(bv[0])) + + array_size(npages, sizeof(struct page *)), GFP_KERNEL); + if (!bv) + return -ENOMEM; + + pages = (void *)(bv + npages); + npages = alloc_pages_bulk_array(GFP_USER, npages, pages); + if (!npages) { + kfree(bv); + return -ENOMEM; + } + + remain = len = min_t(size_t, len, npages * PAGE_SIZE); + + for (i = 0; i < npages; i++) { + chunk = min_t(size_t, PAGE_SIZE, remain); + bv[i].bv_page = pages[i]; + bv[i].bv_offset = 0; + bv[i].bv_len = chunk; + remain -= chunk; + } + + /* Do the I/O */ + iov_iter_bvec(&to, ITER_DEST, bv, npages, len); + init_sync_kiocb(&kiocb, in); + kiocb.ki_pos = *ppos; + ret = call_read_iter(in, &kiocb, &to); + + reclaim = npages * PAGE_SIZE; + remain = 0; + if (ret > 0) { + reclaim -= ret; + remain = ret; + *ppos = kiocb.ki_pos; + file_accessed(in); + } else if (ret < 0) { + /* + * callers of ->splice_read() expect -EAGAIN on + * "can't put anything in there", rather than -EFAULT. + */ + if (ret == -EFAULT) + ret = -EAGAIN; + } + + /* Free any pages that didn't get touched at all. */ + reclaim /= PAGE_SIZE; + if (reclaim) { + npages -= reclaim; + release_pages(pages + npages, reclaim); + } + + /* Push the remaining pages into the pipe. */ + for (i = 0; i < npages; i++) { + struct pipe_buffer *buf = pipe_head_buf(pipe); + + chunk = min_t(size_t, remain, PAGE_SIZE); + *buf = (struct pipe_buffer) { + .ops = &default_pipe_buf_ops, + .page = bv[i].bv_page, + .offset = 0, + .len = chunk, + }; + pipe->head++; + remain -= chunk; + } + + kfree(bv); + return ret; +} + /** * generic_file_splice_read - splice data from file to a pipe * @in: file to splice from diff --git a/include/linux/fs.h b/include/linux/fs.h index 28743e38df91..551c9403f9b3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3166,6 +3166,9 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, ssize_t filemap_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); +ssize_t direct_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags); extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); extern ssize_t iter_file_splice_write(struct pipe_inode_info *, diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 6cb65df3e3ba..d2c3f16cf6b1 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -156,6 +156,26 @@ static inline bool pipe_full(unsigned int head, unsigned int tail, return pipe_occupancy(head, tail) >= limit; } +/** + * pipe_buf - Return the pipe buffer for the specified slot in the pipe ring + * @pipe: The pipe to access + * @slot: The slot of interest + */ +static inline struct pipe_buffer *pipe_buf(const struct pipe_inode_info *pipe, + unsigned int slot) +{ + return &pipe->bufs[slot & (pipe->ring_size - 1)]; +} + +/** + * pipe_head_buf - Return the pipe buffer at the head of the pipe ring + * @pipe: The pipe to access + */ +static inline struct pipe_buffer *pipe_head_buf(const struct pipe_inode_info *pipe) +{ + return pipe_buf(pipe, pipe->head); +} + /** * pipe_buf_get - get a reference to a pipe_buffer * @pipe: the pipe that the buffer belongs to diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f9a3ff37ecd1..47c484551c59 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -186,12 +186,6 @@ static int copyin(void *to, const void __user *from, size_t n) return res; } -static inline struct pipe_buffer *pipe_buf(const struct pipe_inode_info *pipe, - unsigned int slot) -{ - return &pipe->bufs[slot & (pipe->ring_size - 1)]; -} - #ifdef PIPE_PARANOIA static bool sanity(const struct iov_iter *i) {