From patchwork Fri May 19 07:49:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13247846 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E59DC77B7A for ; Fri, 19 May 2023 07:49:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7CC9280008; Fri, 19 May 2023 03:49:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D05D9280001; Fri, 19 May 2023 03:49:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8034280008; Fri, 19 May 2023 03:49:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A11B1280001 for ; Fri, 19 May 2023 03:49:28 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6BBCA80991 for ; Fri, 19 May 2023 07:49:28 +0000 (UTC) X-FDA: 80806229616.09.946AD15 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 443D680013 for ; Fri, 19 May 2023 07:49:26 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aPC7ZJ4c; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684482566; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XeaCD2UlGOxbB6KCWNh7uHI2oUW1y45K4W4A/6wlL3s=; b=VWl6w//KskyPmVqHii/bA4GzVAy3dGgUNLBmS3SQIWCYF8XmvcxKhF4cgv2sGiPQC1jf8I xk9v1nv/aH7JCiXOHGV5dl7Baa4lMWXqwgaN+YFXf5TZ4ueZOYRFZltAvkBN1hUK5T8Qg9 G2pkso2eoaiUYfMPDF9t3ylTnYWxYYY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aPC7ZJ4c; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684482566; a=rsa-sha256; cv=none; b=Z5KQBfuI5RrMdYbe7rMS0qtwgJtZkQll2A4UiGcyO3P1Gpj1C35J4PJmCzXt0jnKJxj4/k +MNMx4gk5RNVN+XP2KzT7S3UDMbmAnEuQOMq9h19K+Sdc+0FFSV6VmByQGfG76+wsnZms9 prRPZIvPAAL7+B9lsMJmDNp9xmVGV28= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684482565; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XeaCD2UlGOxbB6KCWNh7uHI2oUW1y45K4W4A/6wlL3s=; b=aPC7ZJ4cco+T4l6XvkCmEMV8K/5Z0dJx5F02IBP/i9UnjGMGLNLJr8sX8q16MefxjE/4AR uSY9aZPxBtSqyu2W9MCK4PHt52lHfYWkgzIIEmG7M1LIHe90aO6skNqsugNqHKHFtg3DhI ljlEM7hwC5CwB1OBeHkAITrS0D9UZ9g= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-352-oXYVwhe6PISPwOFbVlp0Kg-1; Fri, 19 May 2023 03:49:21 -0400 X-MC-Unique: oXYVwhe6PISPwOFbVlp0Kg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E867B2999B22; Fri, 19 May 2023 07:49:20 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.221]) by smtp.corp.redhat.com (Postfix) with ESMTP id 97D5B4F2DE0; Fri, 19 May 2023 07:49:18 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20230519074047.1739879-1-dhowells@redhat.com> References: <20230519074047.1739879-1-dhowells@redhat.com> To: Jens Axboe , Al Viro , Christoph Hellwig Cc: dhowells@redhat.com, Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] iov_iter: Add automatic-alloc for ITER_BVEC and use in direct_splice_read() MIME-Version: 1.0 Content-ID: <1740263.1684482558.1@warthog.procyon.org.uk> Date: Fri, 19 May 2023 08:49:18 +0100 Message-ID: <1740264.1684482558@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Rspamd-Queue-Id: 443D680013 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: dtekwrkankau8axytr7qsg1shj9w53ip X-HE-Tag: 1684482566-8190 X-HE-Meta: U2FsdGVkX18MFo2kUXzp+MSMb4SgiwsaIwJBLVKA8JFSyhDyuE5eLzPKdc5po/poSW9xSZK0IjYJ/n8x4RWEBkBbPuPB6PIkrn5VJRjPmYUOgca/FG9Np5myRiNCsuAA7XzNk4R/ZWkx1kw97li2SD9Lj5fXSCY1iKUI2qOdf/FCiQXaqXZp8rG58nl0mbG7QtOW3rnBtqFIUsq/wgCoP27e3uzh1cgpg+xYTpJWEE+bf5q9t4sS1Vkf9qMMU35HmAOuOOnmQuDj2e95rpAi2gBpiifqeXHQnftoq+4DxsKlWuCEUAuog0+unZ0C8/NqLY7nA0u+YIlPwkGrQGTayVunxHhgRJCXXly+CDtvTc5iJ2n3jgkmIzdX8JPwxiD1ypZK1ygqjREx4l67VbPJroNd77x12K81qA5HOFMYU8lAny+Y6wR+leJzRz8l7BFC+HLw+lwM6Iv8+XkiwWM0qy2ddVL8b2YQgMNuRoTLTmvS2E53HSwjm93hYWrp7n/kZ8Pl6np7hbHtvskuaF+CenwifVY2YnA+YgbMKWoE9TqyuWB11YltBrTqoD+fHKRaofq/sbwT9E30QGzHcWpFyL2P/Mh17FrKip9mXk7jTbtr9z+jqTOGyzhJh17+6GbFnZEzxthJETnRR1IZjpRagfecA45poAnsMU8bdt1jFATQc3qJb5+kbF8ed9JTGL5xnNjM4U/tGUEWMa1lF4tVzSc/0zeqtrHEb10UpDP7sKclB2xVIwZkfgtn5FoQZ2IxrE13Ey7PeQHTsQFKGFDtbK9L5d0Hc15FlScK58lNrvwUAI4rLfTW8pwy245Ip9EbEtLltoszjFSp76/sQWf9Jnn8KdWqhcNJBaS7vtsgh/H/h+NkUgZLycmpnu/YVpqz4TVPj7BL0p39zlAazBRaVfxlgmlcCfJ7UjWCeN7EDYz5WO/CvmTYymbYiR1q+L0+JPs+w6OQkl8t/XjnX24 zNHLFSvm DcZtHsg/njiXaXHp58KJ2uC0Ld6SIr4kaeIAa7I5FIxWJdQcI6aUNf4+anEJuiArLtpVwayGMXsxSHf9YCaIv6X31SCCepbwuLi34d3qhX3xsFEAPZ8OJid1gcquM012x9juQoyohbLkucp0lofeEnjT2KvUIcuMfB3Lw6I2mUfUysB2WKn2uetG08uQea/fwr5qVlMvi+1rDev31jKyuuCHX4gIVhHbflwJtIt4Z0uGXQQgGXUDA8N/z+a5IKxetFsDtH90F4stzJ5SGLrhvAeIkj+rK0BVddxUTC8jnA6OoeUcJAz9kUIDuiIhO/p41LMO9Fp9OJjupxN4GL6ilFuQahIKPPOg1vNrruounkW+MXntEqivXMwDqERK3KRWbLqPeNU261KWOrBuggbeFxbRazLuBbv2GKCQp7ETnkJoZpRE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If it's a problem that direct_splice_read() always allocates as much memory as is asked for and that will fit into the pipe when less could be allocated in the case that, say, an O_DIRECT-read will hit a hole and do a short read or a socket will return less than was asked for, something like the attached modification to ITER_BVEC could be made. David --- iov_iter: Add automatic-alloc for ITER_BVEC and use in direct_splice_read() Add a flag to the iov_iter struct that tells things that write to or allow writing to a BVEC-type iterator that they should allocate pages to fill in any slots in the bio_vec array that have null page pointers. This allows the bufferage in the bvec to be allocated on-demand. Iterators of this type are initialised with iov_iter_bvec_autoalloc() instead of iov_iter_bvec(). Only destination (ie. READ/ITER_DEST) iterators may be used in this fashion. An additional function, iov_iter_auto_alloc() is provided to perform the allocation in the case that the caller wishes to make use of the bio_vec array directly and the block layer is modified to use it. direct_splice_read() is then modified to make use of this. This is less efficient if we know in advance that we want to allocate the full buffer as we can't so easily use bulk allocation, but it does mean in cases where we might not want the full buffer (say we hit a hole in DIO), we don't have to allocate it. Signed-off-by: David Howells cc: Christoph Hellwig cc: Jens Axboe cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- block/bio.c | 2 fs/splice.c | 36 ++++++----------- include/linux/uio.h | 13 ++++-- lib/iov_iter.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 133 insertions(+), 28 deletions(-) diff --git a/block/bio.c b/block/bio.c index 798cc4cf3bd2..72d5c1125df2 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1330,6 +1330,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) int ret = 0; if (iov_iter_is_bvec(iter)) { + if (!iov_iter_auto_alloc(iter, iov_iter_count(iter))) + return -ENOMEM; bio_iov_bvec_set(bio, iter); iov_iter_advance(iter, bio->bi_iter.bi_size); return 0; diff --git a/fs/splice.c b/fs/splice.c index 56d9802729d0..30e7a31c5ada 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -310,10 +310,8 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos, struct iov_iter to; struct bio_vec *bv; struct kiocb kiocb; - struct page **pages; ssize_t ret; - size_t used, npages, chunk, remain, keep = 0; - int i; + size_t used, npages, chunk, remain, keep = 0, i; if (!len) return 0; @@ -334,30 +332,14 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos, len = min_t(size_t, len, npages * PAGE_SIZE); npages = DIV_ROUND_UP(len, PAGE_SIZE); - bv = kzalloc(array_size(npages, sizeof(bv[0])) + - array_size(npages, sizeof(struct page *)), GFP_KERNEL); + bv = kzalloc(array_size(npages, sizeof(bv[0])), GFP_KERNEL); if (!bv) return -ENOMEM; - pages = (struct page **)(bv + npages); - npages = alloc_pages_bulk_array(GFP_USER, npages, pages); - if (!npages) { - kfree(bv); - return -ENOMEM; - } - remain = len = min_t(size_t, len, npages * PAGE_SIZE); - for (i = 0; i < npages; i++) { - chunk = min_t(size_t, PAGE_SIZE, remain); - bv[i].bv_page = pages[i]; - bv[i].bv_offset = 0; - bv[i].bv_len = chunk; - remain -= chunk; - } - /* Do the I/O */ - iov_iter_bvec(&to, ITER_DEST, bv, npages, len); + iov_iter_bvec_autoalloc(&to, ITER_DEST, bv, npages, len); init_sync_kiocb(&kiocb, in); kiocb.ki_pos = *ppos; ret = call_read_iter(in, &kiocb, &to); @@ -376,8 +358,16 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos, } /* Free any pages that didn't get touched at all. */ - if (keep < npages) - release_pages(pages + keep, npages - keep); + if (keep < npages) { + struct page **pages = (struct page **)&bv[keep]; + size_t j = 0; + + for (i = keep; i < npages; i++) + if (bv[i].bv_page) + pages[j++] = bv[i].bv_page; + if (j) + release_pages(pages, j); + } /* Push the remaining pages into the pipe. */ remain = ret; diff --git a/include/linux/uio.h b/include/linux/uio.h index 60c342bb7ab8..6bc2287021d9 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -40,10 +40,11 @@ struct iov_iter_state { struct iov_iter { u8 iter_type; - bool copy_mc; - bool nofault; - bool data_source; - bool user_backed; + bool copy_mc:1; + bool nofault:1; + bool data_source:1; + bool user_backed:1; + bool auto_alloc:1; /* Automatically alloc pages into a bvec */ union { size_t iov_offset; int last_offset; @@ -263,6 +264,7 @@ static inline bool iov_iter_is_copy_mc(const struct iov_iter *i) } #endif +bool iov_iter_auto_alloc(struct iov_iter *iter, size_t count); size_t iov_iter_zero(size_t bytes, struct iov_iter *); bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask, unsigned len_mask); @@ -274,6 +276,9 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec unsigned long nr_segs, size_t count); void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count); +void iov_iter_bvec_autoalloc(struct iov_iter *i, unsigned int direction, + const struct bio_vec *bvec, unsigned long nr_segs, + size_t count); void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, loff_t start, size_t count); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f18138e0292a..3643f9d80ecc 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -52,7 +52,11 @@ while (n) { \ unsigned offset = p->bv_offset + skip; \ unsigned left; \ - void *kaddr = kmap_local_page(p->bv_page + \ + void *kaddr; \ + \ + if (!p->bv_page) \ + break; \ + kaddr = kmap_local_page(p->bv_page + \ offset / PAGE_SIZE); \ base = kaddr + offset % PAGE_SIZE; \ len = min(min(n, (size_t)(p->bv_len - skip)), \ @@ -159,6 +163,49 @@ __out: \ #define iterate_and_advance(i, n, base, len, off, I, K) \ __iterate_and_advance(i, n, base, len, off, I, ((void)(K),0)) +/* + * Preallocate pages into the bvec sufficient to store count bytes. + */ +static bool bvec_auto_alloc(struct iov_iter *iter, size_t count) +{ + struct bio_vec *bvec = (struct bio_vec *)iter->bvec; + int j; + + if (!count) + return true; + + count += iter->iov_offset; + for (j = 0; j < iter->nr_segs; j++) { + if (!bvec[j].bv_page) { + bvec[j].bv_page = alloc_page(GFP_KERNEL); + if (!bvec[j].bv_page) + return false; + } + if (bvec[j].bv_len >= count) + break; + count -= bvec[j].bv_len; + } + + return true; +} + +/** + * iov_iter_auto_alloc - Perform auto-alloc for an iterator + * @iter: The iterator to do the allocation for + * @count: The number of bytes we need to store + * + * Perform auto-allocation on a iterator. This only works with ITER_BVEC-type + * iterators. It will make sure that pages are allocated sufficient to store + * the specified number of bytes. Returns true if sufficient pages are present + * in the bvec and false if an allocation failure occurs. + */ +bool iov_iter_auto_alloc(struct iov_iter *iter, size_t count) +{ + return !iov_iter_is_bvec(iter) || !iter->auto_alloc || + bvec_auto_alloc(iter, count); +} +EXPORT_SYMBOL_GPL(iov_iter_auto_alloc); + static int copyout(void __user *to, const void *from, size_t n) { if (should_fail_usercopy()) @@ -313,6 +360,8 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) return 0; if (user_backed_iter(i)) might_fault(); + if (!iov_iter_auto_alloc(i, bytes)) + return 0; iterate_and_advance(i, bytes, base, len, off, copyout(base, addr + off, len), memcpy(base, addr + off, len) @@ -362,6 +411,8 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) return 0; if (user_backed_iter(i)) might_fault(); + if (!iov_iter_auto_alloc(i, bytes)) + return 0; __iterate_and_advance(i, bytes, base, len, off, copyout_mc(base, addr + off, len), copy_mc_to_kernel(base, addr + off, len) @@ -503,6 +554,8 @@ size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t byte return 0; if (WARN_ON_ONCE(i->data_source)) return 0; + if (!iov_iter_auto_alloc(i, bytes)) + return 0; page += offset / PAGE_SIZE; // first subpage offset %= PAGE_SIZE; while (1) { @@ -557,6 +610,8 @@ EXPORT_SYMBOL(copy_page_from_iter); size_t iov_iter_zero(size_t bytes, struct iov_iter *i) { + if (!iov_iter_auto_alloc(i, bytes)) + return -ENOMEM; iterate_and_advance(i, bytes, base, len, count, clear_user(base, len), memset(base, 0, len) @@ -598,6 +653,7 @@ static void iov_iter_bvec_advance(struct iov_iter *i, size_t size) size += i->iov_offset; for (bvec = i->bvec, end = bvec + i->nr_segs; bvec < end; bvec++) { + BUG_ON(!bvec->bv_page); if (likely(size < bvec->bv_len)) break; size -= bvec->bv_len; @@ -740,6 +796,51 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_bvec); +/** + * iov_iter_bvec_autoalloc - Initialise a BVEC-type I/O iterator with automatic alloc + * @i: The iterator to initialise. + * @direction: The direction of the transfer. + * @bvec: The array of bio_vecs listing the buffer segments + * @nr_segs: The number of segments in @bvec[]. + * @count: The size of the I/O buffer in bytes. + * + * Set up an I/O iterator to insert pages into a bvec as data is written into + * it where NULL pointers exist in the bvec array (if a pointer isn't NULL, the + * page it points to will just be used). No more than @nr_segs pages will be + * filled in. Empty slots will have bv_offset set to 0 and bv_len to + * PAGE_SIZE. + * + * If the iterator is reverted, excess pages will be left for the + * caller to clean up. + */ +void iov_iter_bvec_autoalloc(struct iov_iter *i, unsigned int direction, + const struct bio_vec *bvec, unsigned long nr_segs, + size_t count) +{ + struct bio_vec *bv = (struct bio_vec *)bvec; + unsigned long j; + + BUG_ON(direction != READ); + *i = (struct iov_iter){ + .iter_type = ITER_BVEC, + .copy_mc = false, + .data_source = direction, + .auto_alloc = true, + .bvec = bvec, + .nr_segs = nr_segs, + .iov_offset = 0, + .count = count + }; + + for (j = 0; j < nr_segs; j++) { + if (!bv[j].bv_page) { + bv[j].bv_offset = 0; + bv[j].bv_len = PAGE_SIZE; + } + } +} +EXPORT_SYMBOL(iov_iter_bvec_autoalloc); + /** * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray * @i: The iterator to initialise. @@ -1122,6 +1223,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page **p; struct page *page; + if (!iov_iter_auto_alloc(i, maxsize)) + return -ENOMEM; page = first_bvec_segment(i, &maxsize, start); n = want_pages_array(pages, maxsize, *start, maxpages); if (!n) @@ -1226,6 +1329,8 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, csstate->off += bytes; return bytes; } + if (!iov_iter_auto_alloc(i, bytes)) + return -ENOMEM; sum = csum_shift(csstate->csum, csstate->off); iterate_and_advance(i, bytes, base, len, off, ({ @@ -1664,6 +1769,9 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, size_t skip = i->iov_offset, offset; int k; + if (!iov_iter_auto_alloc(i, maxsize)) + return -ENOMEM; + for (;;) { if (i->nr_segs == 0) return 0;