From patchwork Thu Jun 29 15:54:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13297119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F4DAEB64D9 for ; Thu, 29 Jun 2023 15:55:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 701F88D0005; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68AE18D0001; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 503E08D0005; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 39CEB8D0001 for ; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 04F5F140A70 for ; Thu, 29 Jun 2023 15:55:06 +0000 (UTC) X-FDA: 80956234254.28.9144492 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 2A2D0120014 for ; Thu, 29 Jun 2023 15:55:04 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G9XzZqYO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688054105; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2eueBEW3aL+xqYTSeuQ3xb53C/MZpWPSC4TMM2avShg=; b=ElZtu0vpPd7AvHxrw13Jg3pEnU7rZANoFMGMFAs4S6ViqanxAdFvFdgH+jJ8eh8Osgjuhn C+EP1DS7AK5LN9J+dcyIWYVvj/75YCnzOGFAE7pJmiHAW4p2iw+Ojm1NfqPb+5aqSH7tr3 Iqb6u/S+25+MXhz15wdK6iqPgvKO9PU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G9XzZqYO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688054105; a=rsa-sha256; cv=none; b=KLhA5i9xckOf/cSIrefZ3AWOrS3WR1LLqZkKHo8RhqOu2KjbboAjxDhTbUKjRQxg7Lex7J 3ov6zU9w7yVS/kQyHYO++ZHl0d2djDisZOOcdHQjLnh2/maCD871u4JugDIhMFsAobvfkJ 1iNeVWtXfEb/PYGYMttSjjEAb+GbZ5I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688054104; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2eueBEW3aL+xqYTSeuQ3xb53C/MZpWPSC4TMM2avShg=; b=G9XzZqYOU/JTDCv8bVABRfWkJn3eN8FJFRcvnYtMrU+jHNmMBpkgziQJsXMBxEq3PA2zv+ ZvRd4PPVkpcyh6CjPbUCzwNDpUp1IgKMp+bDCVAMxl+7ol0I6Bhq5kvlz7qyazWCnGzImv UaH8DaClqOqj1ymP+LiwwVrbC4wllkk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-381-pxvax7y_Nu26XJJKh4xM3A-1; Thu, 29 Jun 2023 11:54:59 -0400 X-MC-Unique: pxvax7y_Nu26XJJKh4xM3A-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 66B8A280AA42; Thu, 29 Jun 2023 15:54:39 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0EF7D4CD0C3; Thu, 29 Jun 2023 15:54:37 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Matthew Wilcox , Dave Chinner , Matt Whitlock , Linus Torvalds , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-fsdevel@vger.kernel.org Subject: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns Date: Thu, 29 Jun 2023 16:54:30 +0100 Message-ID: <20230629155433.4170837-2-dhowells@redhat.com> In-Reply-To: <20230629155433.4170837-1-dhowells@redhat.com> References: <20230629155433.4170837-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2A2D0120014 X-Stat-Signature: wd4e5yae9nkbzm9qufqhjphtyh38ffyz X-HE-Tag: 1688054104-573848 X-HE-Meta: U2FsdGVkX19HXkxbwJyGuSnIYrzgRIqYT/CJ+AozL1UO+AKYp5Yb1aNVdbpiJLSTvx64IIQH4cIzIrhNWymmYifRgk8AOMkkdvOK6xXikbada3qgydlCKy/Rj9IAFkYqzjT88rJGGaKAqJzpwNhhj/KpJZq4hKzthXbLOwKhpOE4SmQhUt6PVinys3My51wlxrT/sF//eKBDJX5wSVrvyFHARHtTxntL/daZBs1sKV0clGN00V3SOHdPo8ru4Ia+UKbHmrQSxPA55QpuvQu+Z2ZjSbZ2U4mzok6RHxHjYj9sX+dbLHL17WUerqgKMq+wFmm2EWmAPJjLC+KRol8+ITu7MJD95mMKuBIjVOYaOahrz+Jlm4ErQflgc0HtqHZjul0OxpV7cOEXMUeAWxNInW30oaCpbgRDVqGSrEHz3f5ifGQdC9M6yMe4SEf5PcWTW6fha6LMo7TIF0gTiGnsWCaszoFIUjcYzXM3KXna04ke8Axw5okgv/M+jWV3WTgq0B3TsRRe871ujXJDrUKgkKIVyrKHwZJsCmK8m/oZ40HcPN5JBmuF5LAVclSQjmj4OiCsejzLn0maaC4kZ0KmIDyiqMpuH0IOa3mV83w8TcnrwY7Ng1C6r/lbnRQQCs/kODju9aeQygFFEw1xhWwp1OtS0cRX0eaTvdrpMss6ue+qYxMQYKoGylq1Lj3LQcE5JtP3PjJ4RMJAS9WNXWvy2WdjG4K9dr6VPb0TLOpIzHLqseM+5QZZTo2mFwHSWsGZIpGQZocqBAh8cksSUEhvBN3YUlyLDPDgl4fl6tHxaxvNNii9hcpiNMf5/B2ATxOHYcOTmxgiFqGx8zgf6u56k1of4Adz1toB2H2jFfDhvoTUxk9DDJ+w5TgVF1clOc3AtLsyWKHVtW0mv+noA8qw7+KKh33AOLF0XBGGN0374slEmzGHgO7IoCQkdXJ2UPlOnXMFKFagdnNiPlpyvQF 7vgNlX1T GTs5ab70rFw/Ya83QZYmEvQ/V05huN2no2/0L1oxWdRVdKOog4rOfOellqP2C4te9O2HoznbecYKVT5809L6VMOBqPUYFX9zB3d3K5vL7RTpb5msWBCDr0c7V7a5WNrwJl28WvHJSAnvoE9F1uuH6GzxMownbcHufnO7xjgbUTc2KySB/2Cv8niqdzj+f7grsV7LRWubpSu3gnVloInjpLGVhLNx/ukIyU5rdZR+K+EvJ8fzYwkchoMcnezRYeFN/SQcIo97pv4oUIpUErwjmeeSAXqqNPL5+7hfhoxK2osnILPMUMJrlWlCZnYKxtpXcxaLm95jH7F7WNdBz6VGf6qsd/CU7KBYknoDi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Splicing data from, say, a file into a pipe currently leaves the source pages in the pipe after splice() returns - but this means that those pages can be subsequently modified by shared-writable mmap(), write(), fallocate(), etc. before they're consumed. Fix this by stealing the pages in splice() before they're added to the pipe if no one else is using them or has them mapped and copying them otherwise. Reported-by: Matt Whitlock Link: https://lore.kernel.org/r/ec804f26-fa76-4fbe-9b1c-8fbbd829b735@mattwhitlock.name/ Signed-off-by: David Howells cc: Matthew Wilcox cc: Dave Chinner cc: Christoph Hellwig cc: Jens Axboe cc: linux-fsdevel@vger.kernel.org --- mm/filemap.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++--- mm/internal.h | 4 +-- mm/shmem.c | 8 +++-- 3 files changed, 95 insertions(+), 9 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 9e44a49bbd74..a002df515966 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2838,15 +2838,87 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) } EXPORT_SYMBOL(generic_file_read_iter); +static inline void copy_folio_to_folio(struct folio *src, size_t src_offset, + struct folio *dst, size_t dst_offset, + size_t size) +{ + void *p, *q; + + while (size > 0) { + size_t part = min3(PAGE_SIZE - src_offset % PAGE_SIZE, + PAGE_SIZE - dst_offset % PAGE_SIZE, + size); + + p = kmap_local_folio(src, src_offset); + q = kmap_local_folio(dst, dst_offset); + memcpy(q, p, part); + kunmap_local(p); + kunmap_local(q); + src_offset += part; + dst_offset += part; + size -= part; + } +} + /* - * Splice subpages from a folio into a pipe. + * Splice data from a folio into a pipe. The folio is stolen if no one else is + * using it and copied otherwise. We can't put the folio into the pipe still + * attached to the pagecache as that allows someone to modify it after the + * splice. */ -size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, - struct folio *folio, loff_t fpos, size_t size) +ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, loff_t fpos, size_t size) { + struct address_space *mapping; + struct folio *copy = NULL; struct page *page; + unsigned int flags = 0; + ssize_t ret; size_t spliced = 0, offset = offset_in_folio(folio, fpos); + folio_lock(folio); + + mapping = folio_mapping(folio); + ret = -ENODATA; + if (!folio->mapping) + goto err_unlock; /* Truncated */ + ret = -EIO; + if (!folio_test_uptodate(folio)) + goto err_unlock; + + /* + * At least for ext2 with nobh option, we need to wait on writeback + * completing on this folio, since we'll remove it from the pagecache. + * Otherwise truncate wont wait on the folio, allowing the disk blocks + * to be reused by someone else before we actually wrote our data to + * them. fs corruption ensues. + */ + folio_wait_writeback(folio); + + if (folio_has_private(folio) && + !filemap_release_folio(folio, GFP_KERNEL)) + goto need_copy; + + /* If we succeed in removing the mapping, set LRU flag and add it. */ + if (remove_mapping(mapping, folio)) { + folio_unlock(folio); + flags = PIPE_BUF_FLAG_LRU; + goto add_to_pipe; + } + +need_copy: + folio_unlock(folio); + + copy = folio_alloc(GFP_KERNEL, 0); + if (!copy) + return -ENOMEM; + + size = min(size, PAGE_SIZE - offset % PAGE_SIZE); + copy_folio_to_folio(folio, offset, copy, 0, size); + folio = copy; + offset = 0; + +add_to_pipe: page = folio_page(folio, offset / PAGE_SIZE); size = min(size, folio_size(folio) - offset); offset %= PAGE_SIZE; @@ -2861,6 +2933,7 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, .page = page, .offset = offset, .len = part, + .flags = flags, }; folio_get(folio); pipe->head++; @@ -2869,7 +2942,13 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, offset = 0; } + if (copy) + folio_put(copy); return spliced; + +err_unlock: + folio_unlock(folio); + return ret; } /** @@ -2947,7 +3026,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos, for (i = 0; i < folio_batch_count(&fbatch); i++) { struct folio *folio = fbatch.folios[i]; - size_t n; + ssize_t n; if (folio_pos(folio) >= end_offset) goto out; @@ -2963,8 +3042,11 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos, n = min_t(loff_t, len, isize - *ppos); n = splice_folio_into_pipe(pipe, folio, *ppos, n); - if (!n) + if (n <= 0) { + if (n < 0) + error = n; goto out; + } len -= n; total_spliced += n; *ppos += n; diff --git a/mm/internal.h b/mm/internal.h index a7d9e980429a..ae395e0f31d5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -881,8 +881,8 @@ struct migration_target_control { /* * mm/filemap.c */ -size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, - struct folio *folio, loff_t fpos, size_t size); +ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, loff_t fpos, size_t size); /* * mm/vmalloc.c diff --git a/mm/shmem.c b/mm/shmem.c index 2f2e0e618072..969931b0f00e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2783,7 +2783,8 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, struct inode *inode = file_inode(in); struct address_space *mapping = inode->i_mapping; struct folio *folio = NULL; - size_t total_spliced = 0, used, npages, n, part; + ssize_t n; + size_t total_spliced = 0, used, npages, part; loff_t isize; int error = 0; @@ -2844,8 +2845,11 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, n = splice_zeropage_into_pipe(pipe, *ppos, len); } - if (!n) + if (n <= 0) { + if (n < 0) + error = n; break; + } len -= n; total_spliced += n; *ppos += n;