From patchwork Mon Jan 16 23:08:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103853 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF3C0C54EBE for ; Mon, 16 Jan 2023 23:09:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235082AbjAPXJh (ORCPT ); Mon, 16 Jan 2023 18:09:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235028AbjAPXJC (ORCPT ); Mon, 16 Jan 2023 18:09:02 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B07A23C76 for ; Mon, 16 Jan 2023 15:08:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910495; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U7dHEM5AmzKgzCCEEdwyvVJA92oDXUOBOpHeITrEM9k=; b=C1WUCLo/r41Kl3FfZPqRmz9vfOSfOUJG1QiYIOXDO26fYPqT5IpQP4l/nqAoCBzV2wH3Ry tGjDjjaiImLnf9eL42zzqCie3qbEvDdldqe6E/sSJ9aF96vPkbKOfGk/cJgnBsZVbpbeOm f1qnzLvpwUZ36kFOYPYeWH1tS10Y/FI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-574-m8_Zm0OtPyCt81uCEzo9Ig-1; Mon, 16 Jan 2023 18:08:12 -0500 X-MC-Unique: m8_Zm0OtPyCt81uCEzo9Ig-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CE7FB101A52E; Mon, 16 Jan 2023 23:08:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7D3F9492B00; Mon, 16 Jan 2023 23:08:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 01/34] vfs: Unconditionally set IOCB_WRITE in call_write_iter() From: David Howells To: Al Viro Cc: Christoph Hellwig , Jens Axboe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:09 +0000 Message-ID: <167391048988.2311931.1567396746365286847.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write operation to the VFS, but it isn't set by, say, the write() system call. Fix this by setting IOCB_WRITE unconditionally in call_write_iter(). This will allow drivers to use IOCB_WRITE instead of the iterator data source to determine the I/O direction. Signed-off-by: David Howells cc: Alexander Viro cc: Christoph Hellwig cc: Jens Axboe cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index 066555ad1bf8..649ff061440e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2183,6 +2183,7 @@ static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { + kio->ki_flags |= IOCB_WRITE; return file->f_op->write_iter(kio, iter); } From patchwork Mon Jan 16 23:08:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103854 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C43DC46467 for ; Mon, 16 Jan 2023 23:09:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235246AbjAPXJ5 (ORCPT ); Mon, 16 Jan 2023 18:09:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235268AbjAPXJM (ORCPT ); Mon, 16 Jan 2023 18:09:12 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4FCB23DA6 for ; Mon, 16 Jan 2023 15:08:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910503; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/M45GDyobIrHzzTTPdxvHNS6ahkXl3LWGYzDNmotxYs=; b=Sfqjs0j9slXxyQeHcLmH7T/70HIn+1sfXaYPMpQEIe9wfioAh2YuY2wULhPW8Rz05RXfA6 INWfAIFAdRr2IhF0z30GdvgL3iF1RnD9GkNZaerJnbFeJ0dQtpkuKuPa/2ql1D0CiL/JxG cBNgRSpenNVXpT4B5wH7pDglPpg7nio= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-617-Hy5vodhWN_CQKmWbF5lN4w-1; Mon, 16 Jan 2023 18:08:19 -0500 X-MC-Unique: Hy5vodhWN_CQKmWbF5lN4w-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EA79B3C0CD55; Mon, 16 Jan 2023 23:08:18 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 884EF492B00; Mon, 16 Jan 2023 23:08:17 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 02/34] iov_iter: Use IOCB/IOMAP_WRITE/op_is_write rather than iterator direction From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:17 +0000 Message-ID: <167391049698.2311931.13641162904441620555.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Use information other than the iterator direction to determine the direction of the I/O: (*) If a kiocb is available, use the IOCB_WRITE flag. (*) If an iomap_iter is available, use the IOMAP_WRITE flag. (*) If a request is available, use op_is_write(). Drop the check on the iterator in smbd_recv() and its warning. This leaves __iov_iter_get_pages_alloc() the only user of iov_iter_rw(), so move it there and uninline it. Changes: ======== ver #6) - Move to the front of the patchset. - Added iocb_is_read() and iocb_is_write() to check IOCB_WRITE. - Use op_is_write() in bio_copy_user_iov(). - Drop the checks from smbd_recv(). Signed-off-by: David Howells cc: Al Viro Link: https://lore.kernel.org/r/167305163159.1521586.9460968250704377087.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344727810.2425628.4715663653893036683.stgit@warthog.procyon.org.uk/ # v5 --- block/blk-map.c | 2 +- block/fops.c | 8 ++++---- fs/9p/vfs_addr.c | 2 +- fs/affs/file.c | 4 ++-- fs/ceph/file.c | 2 +- fs/cifs/smbdirect.c | 9 --------- fs/dax.c | 6 +++--- fs/direct-io.c | 22 +++++++++++----------- fs/exfat/inode.c | 6 +++--- fs/ext2/inode.c | 2 +- fs/f2fs/file.c | 10 +++++----- fs/fat/inode.c | 4 ++-- fs/fuse/dax.c | 2 +- fs/fuse/file.c | 8 ++++---- fs/hfs/inode.c | 2 +- fs/hfsplus/inode.c | 2 +- fs/iomap/direct-io.c | 6 +++--- fs/jfs/inode.c | 2 +- fs/nfs/direct.c | 2 +- fs/nilfs2/inode.c | 2 +- fs/ntfs3/inode.c | 2 +- fs/ocfs2/aops.c | 2 +- fs/orangefs/inode.c | 2 +- fs/reiserfs/inode.c | 2 +- fs/udf/inode.c | 2 +- include/linux/fs.h | 10 ++++++++++ include/linux/uio.h | 5 ----- lib/iov_iter.c | 5 +++++ 28 files changed, 67 insertions(+), 66 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index 19940c978c73..08cbb7ff3b19 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -203,7 +203,7 @@ static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data, /* * success */ - if ((iov_iter_rw(iter) == WRITE && + if ((op_is_write(rq->cmd_flags) && (!map_data || !map_data->null_mapped)) || (map_data && map_data->from_user)) { ret = bio_copy_from_iter(bio, iter); diff --git a/block/fops.c b/block/fops.c index 50d245e8c913..5d376285edde 100644 --- a/block/fops.c +++ b/block/fops.c @@ -73,7 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, return -ENOMEM; } - if (iov_iter_rw(iter) == READ) { + if (iocb_is_read(iocb)) { bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ); if (user_backed_iter(iter)) should_dirty = true; @@ -88,7 +88,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, goto out; ret = bio.bi_iter.bi_size; - if (iov_iter_rw(iter) == WRITE) + if (iocb_is_write(iocb)) task_io_account_write(ret); if (iocb->ki_flags & IOCB_NOWAIT) @@ -174,7 +174,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, struct blk_plug plug; struct blkdev_dio *dio; struct bio *bio; - bool is_read = (iov_iter_rw(iter) == READ), is_sync; + bool is_read = iocb_is_read(iocb), is_sync; blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb); loff_t pos = iocb->ki_pos; int ret = 0; @@ -296,7 +296,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, unsigned int nr_pages) { struct block_device *bdev = iocb->ki_filp->private_data; - bool is_read = iov_iter_rw(iter) == READ; + bool is_read = iocb_is_read(iocb); blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb); struct blkdev_dio *dio; struct bio *bio; diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 97599edbc300..080be076b7b6 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -254,7 +254,7 @@ v9fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t n; int err = 0; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { n = p9_client_write(file->private_data, pos, iter, &err); if (n) { struct inode *inode = file_inode(file); diff --git a/fs/affs/file.c b/fs/affs/file.c index cefa222f7881..0dc67fc5d6cb 100644 --- a/fs/affs/file.c +++ b/fs/affs/file.c @@ -400,7 +400,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) loff_t offset = iocb->ki_pos; ssize_t ret; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { loff_t size = offset + count; if (AFFS_I(inode)->mmu_private < size) @@ -408,7 +408,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) } ret = blockdev_direct_IO(iocb, inode, iter, affs_get_block); - if (ret < 0 && iov_iter_rw(iter) == WRITE) + if (ret < 0 && iocb_is_write(iocb)) affs_write_failed(mapping, offset + count); return ret; } diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 764598e1efd9..27c72a2f6af5 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1284,7 +1284,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, struct timespec64 mtime = current_time(inode); size_t count = iov_iter_count(iter); loff_t pos = iocb->ki_pos; - bool write = iov_iter_rw(iter) == WRITE; + bool write = iocb_is_write(iocb); bool should_dirty = !write && user_backed_iter(iter); if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 90789aaa6567..3e693ffd0662 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -1938,14 +1938,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) unsigned int to_read, page_offset; int rc; - if (iov_iter_rw(&msg->msg_iter) == WRITE) { - /* It's a bug in upper layer to get there */ - cifs_dbg(VFS, "Invalid msg iter dir %u\n", - iov_iter_rw(&msg->msg_iter)); - rc = -EINVAL; - goto out; - } - switch (iov_iter_type(&msg->msg_iter)) { case ITER_KVEC: buf = msg->msg_iter.kvec->iov_base; @@ -1967,7 +1959,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) rc = -EINVAL; } -out: /* SMBDirect will read it all or nothing */ if (rc > 0) msg->msg_iter.count = 0; diff --git a/fs/dax.c b/fs/dax.c index c48a3a93ab29..b538a2ab7b66 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1405,7 +1405,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, loff_t pos = iomi->pos; struct dax_device *dax_dev = iomap->dax_dev; loff_t end = pos + length, done = 0; - bool write = iov_iter_rw(iter) == WRITE; + bool write = iomi->flags & IOMAP_WRITE; bool cow = write && iomap->flags & IOMAP_F_SHARED; ssize_t ret = 0; size_t xfer; @@ -1455,7 +1455,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), DAX_ACCESS, &kaddr, NULL); - if (map_len == -EIO && iov_iter_rw(iter) == WRITE) { + if (map_len == -EIO && write) { map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), DAX_RECOVERY_WRITE, &kaddr, NULL); @@ -1530,7 +1530,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, if (!iomi.len) return 0; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { lockdep_assert_held_write(&iomi.inode->i_rwsem); iomi.flags |= IOMAP_WRITE; } else { diff --git a/fs/direct-io.c b/fs/direct-io.c index 03d381377ae1..cf196f2a211e 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -1143,7 +1143,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, */ /* watch out for a 0 len io from a tricksy fs */ - if (iov_iter_rw(iter) == READ && !count) + if (iocb_is_read(iocb) && !count) return 0; dio = kmem_cache_alloc(dio_cache, GFP_KERNEL); @@ -1157,14 +1157,14 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, memset(dio, 0, offsetof(struct dio, pages)); dio->flags = flags; - if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) { + if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) { /* will be released by direct_io_worker */ inode_lock(inode); } /* Once we sampled i_size check for reads beyond EOF */ dio->i_size = i_size_read(inode); - if (iov_iter_rw(iter) == READ && offset >= dio->i_size) { + if (iocb_is_read(iocb) && offset >= dio->i_size) { retval = 0; goto fail_dio; } @@ -1177,7 +1177,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, goto fail_dio; } - if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) { + if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) { struct address_space *mapping = iocb->ki_filp->f_mapping; retval = filemap_write_and_wait_range(mapping, offset, end - 1); @@ -1193,13 +1193,13 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, */ if (is_sync_kiocb(iocb)) dio->is_async = false; - else if (iov_iter_rw(iter) == WRITE && end > i_size_read(inode)) + else if (iocb_is_write(iocb) && end > i_size_read(inode)) dio->is_async = false; else dio->is_async = true; dio->inode = inode; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { dio->opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; if (iocb->ki_flags & IOCB_NOWAIT) dio->opf |= REQ_NOWAIT; @@ -1211,7 +1211,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, * For AIO O_(D)SYNC writes we need to defer completions to a workqueue * so that we can call ->fsync. */ - if (dio->is_async && iov_iter_rw(iter) == WRITE) { + if (dio->is_async && iocb_is_write(iocb)) { retval = 0; if (iocb_is_dsync(iocb)) retval = dio_set_defer_completion(dio); @@ -1248,7 +1248,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, spin_lock_init(&dio->bio_lock); dio->refcount = 1; - dio->should_dirty = user_backed_iter(iter) && iov_iter_rw(iter) == READ; + dio->should_dirty = user_backed_iter(iter) && iocb_is_read(iocb); sdio.iter = iter; sdio.final_block_in_request = end >> blkbits; @@ -1305,7 +1305,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, * we can let i_mutex go now that its achieved its purpose * of protecting us from looking up uninitialized blocks. */ - if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING)) + if (iocb_is_read(iocb) && (dio->flags & DIO_LOCKING)) inode_unlock(dio->inode); /* @@ -1317,7 +1317,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, */ BUG_ON(retval == -EIOCBQUEUED); if (dio->is_async && retval == 0 && dio->result && - (iov_iter_rw(iter) == READ || dio->result == count)) + (iocb_is_read(iocb) || dio->result == count)) retval = -EIOCBQUEUED; else dio_await_completion(dio); @@ -1330,7 +1330,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, return retval; fail_dio: - if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) + if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) inode_unlock(inode); kmem_cache_free(dio_cache, dio); diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c index 5b644cb057fa..82554aaf4fd0 100644 --- a/fs/exfat/inode.c +++ b/fs/exfat/inode.c @@ -412,10 +412,10 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = mapping->host; loff_t size = iocb->ki_pos + iov_iter_count(iter); - int rw = iov_iter_rw(iter); + bool writing = iocb_is_write(iocb); ssize_t ret; - if (rw == WRITE) { + if (writing) { /* * FIXME: blockdev_direct_IO() doesn't use ->write_begin(), * so we need to update the ->i_size_aligned to block boundary. @@ -434,7 +434,7 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * condition of exfat_get_block() and ->truncate(). */ ret = blockdev_direct_IO(iocb, inode, iter, exfat_get_block); - if (ret < 0 && (rw & WRITE)) + if (ret < 0 && writing) exfat_write_failed(mapping, size); return ret; } diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 69aed9e2359e..26a61f886844 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -919,7 +919,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t ret; ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block); - if (ret < 0 && iov_iter_rw(iter) == WRITE) + if (ret < 0 && iocb_is_write(iocb)) ext2_write_failed(mapping, offset + count); return ret; } diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ecbc8c135b49..51a24580cfec 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -809,7 +809,7 @@ int f2fs_truncate(struct inode *inode) return 0; } -static bool f2fs_force_buffered_io(struct inode *inode, int rw) +static bool f2fs_force_buffered_io(struct inode *inode, bool writing) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -827,9 +827,9 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw) * for blkzoned device, fallback direct IO to buffered IO, so * all IOs can be serialized by log-structured write. */ - if (f2fs_sb_has_blkzoned(sbi) && (rw == WRITE)) + if (f2fs_sb_has_blkzoned(sbi) && writing) return true; - if (f2fs_lfs_mode(sbi) && rw == WRITE && F2FS_IO_ALIGNED(sbi)) + if (f2fs_lfs_mode(sbi) && writing && F2FS_IO_ALIGNED(sbi)) return true; if (is_sbi_flag_set(sbi, SBI_CP_DISABLED)) return true; @@ -865,7 +865,7 @@ int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path, unsigned int bsize = i_blocksize(inode); stat->result_mask |= STATX_DIOALIGN; - if (!f2fs_force_buffered_io(inode, WRITE)) { + if (!f2fs_force_buffered_io(inode, true)) { stat->dio_mem_align = bsize; stat->dio_offset_align = bsize; } @@ -4254,7 +4254,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb, if (!(iocb->ki_flags & IOCB_DIRECT)) return false; - if (f2fs_force_buffered_io(inode, iov_iter_rw(iter))) + if (f2fs_force_buffered_io(inode, iocb_is_write(iocb))) return false; /* diff --git a/fs/fat/inode.c b/fs/fat/inode.c index d99b8549ec8f..237e20891df2 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -261,7 +261,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) loff_t offset = iocb->ki_pos; ssize_t ret; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { /* * FIXME: blockdev_direct_IO() doesn't use ->write_begin(), * so we need to update the ->mmu_private to block boundary. @@ -281,7 +281,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * condition of fat_get_block() and ->truncate(). */ ret = blockdev_direct_IO(iocb, inode, iter, fat_get_block); - if (ret < 0 && iov_iter_rw(iter) == WRITE) + if (ret < 0 && iocb_is_write(iocb)) fat_write_failed(mapping, offset + count); return ret; diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index e23e802a8013..4351376db4a1 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -720,7 +720,7 @@ static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); - return (iov_iter_rw(from) == WRITE && + return (iocb_is_write(iocb) && ((iocb->ki_pos) >= i_size_read(inode) || (iocb->ki_pos + iov_iter_count(from) > i_size_read(inode)))); } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 875314ee6f59..d68b45f8b3ae 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2897,7 +2897,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) inode = file->f_mapping->host; i_size = i_size_read(inode); - if ((iov_iter_rw(iter) == READ) && (offset >= i_size)) + if (iocb_is_read(iocb) && (offset >= i_size)) return 0; io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); @@ -2909,7 +2909,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) io->bytes = -1; io->size = 0; io->offset = offset; - io->write = (iov_iter_rw(iter) == WRITE); + io->write = iocb_is_write(iocb); io->err = 0; /* * By default, we want to optimize all I/Os with async request @@ -2942,7 +2942,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) io->done = &wait; } - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { ret = fuse_direct_io(io, iter, &pos, FUSE_DIO_WRITE); fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); } else { @@ -2965,7 +2965,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) kref_put(&io->refcnt, fuse_io_release); - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { fuse_write_update_attr(inode, pos, ret); /* For extending writes we already hold exclusive lock */ if (ret < 0 && offset + count > i_size) diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c index 9c329a365e75..eec166e039d5 100644 --- a/fs/hfs/inode.c +++ b/fs/hfs/inode.c @@ -141,7 +141,7 @@ static ssize_t hfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c index 840577a0c1e7..2b4effb6ca3e 100644 --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -138,7 +138,7 @@ static ssize_t hfsplus_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 9804714b1751..b03d87f116fc 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -519,7 +519,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->submit.waiter = current; dio->submit.poll_bio = NULL; - if (iov_iter_rw(iter) == READ) { + if (iocb_is_read(iocb)) { if (iomi.pos >= dio->i_size) goto out_free_dio; @@ -573,7 +573,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (ret) goto out_free_dio; - if (iov_iter_rw(iter) == WRITE) { + if (iomi.flags & IOMAP_WRITE) { /* * Try to invalidate cache pages for the range we are writing. * If this invalidation fails, let the caller fall back to @@ -613,7 +613,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, * Revert iter to a state corresponding to that as some callers (such * as the splice code) rely on it. */ - if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size) + if (!(iomi.flags & IOMAP_WRITE) && iomi.pos >= dio->i_size) iov_iter_revert(iter, iomi.pos - dio->i_size); if (ret == -EFAULT && dio->size && (dio_flags & IOMAP_DIO_PARTIAL)) { diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c index 8ac10e396050..0d1f94ac9488 100644 --- a/fs/jfs/inode.c +++ b/fs/jfs/inode.c @@ -334,7 +334,7 @@ static ssize_t jfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 1707f46b1335..d865945f2a63 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -133,7 +133,7 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); - if (iov_iter_rw(iter) == READ) + if (iocb_is_read(iocb)) ret = nfs_file_direct_read(iocb, iter, true); else ret = nfs_file_direct_write(iocb, iter, true); diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index 232dd7b6cca1..496801507083 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -289,7 +289,7 @@ nilfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct inode *inode = file_inode(iocb->ki_filp); - if (iov_iter_rw(iter) == WRITE) + if (iocb_is_write(iocb)) return 0; /* Needs synchronization with the cleaner */ diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 20b953871574..675be8d629fc 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -761,7 +761,7 @@ static ssize_t ntfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct ntfs_inode *ni = ntfs_i(inode); loff_t vbo = iocb->ki_pos; loff_t end; - int wr = iov_iter_rw(iter) & WRITE; + bool wr = iocb_is_write(iocb); size_t iter_count = iov_iter_count(iter); loff_t valid; ssize_t ret; diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 1d65f6ef00ca..b741068a0a7e 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2441,7 +2441,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) !ocfs2_supports_append_dio(osb)) return 0; - if (iov_iter_rw(iter) == READ) + if (iocb_is_read(iocb)) get_block = ocfs2_lock_get_block; else get_block = ocfs2_dio_wr_get_block; diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c index 4df560894386..ece65907ff83 100644 --- a/fs/orangefs/inode.c +++ b/fs/orangefs/inode.c @@ -521,7 +521,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb, */ struct file *file = iocb->ki_filp; loff_t pos = iocb->ki_pos; - enum ORANGEFS_io_type type = iov_iter_rw(iter) == WRITE ? + enum ORANGEFS_io_type type = iocb_is_write(iocb) ? ORANGEFS_IO_WRITE : ORANGEFS_IO_READ; loff_t *offset = &pos; struct inode *inode = file->f_mapping->host; diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index c7d1fa526dea..0ed65feda193 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -3249,7 +3249,7 @@ static ssize_t reiserfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 1d7c2a812fc1..66a1b9e85cb2 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -219,7 +219,7 @@ static ssize_t udf_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t ret; ret = blockdev_direct_IO(iocb, inode, iter, udf_get_block); - if (unlikely(ret < 0 && iov_iter_rw(iter) == WRITE)) + if (unlikely(ret < 0 && iocb_is_write(iocb))) udf_write_failed(mapping, iocb->ki_pos + count); return ret; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 649ff061440e..6a488ae69f5d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -353,6 +353,16 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb) return kiocb->ki_complete == NULL; } +static inline bool iocb_is_write(const struct kiocb *kiocb) +{ + return kiocb->ki_flags & IOCB_WRITE; +} + +static inline bool iocb_is_read(const struct kiocb *kiocb) +{ + return !iocb_is_write(kiocb); +} + struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*read_folio)(struct file *, struct folio *); diff --git a/include/linux/uio.h b/include/linux/uio.h index 9f158238edba..6f4dfa96324d 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -114,11 +114,6 @@ static inline bool iov_iter_is_xarray(const struct iov_iter *i) return iov_iter_type(i) == ITER_XARRAY; } -static inline unsigned char iov_iter_rw(const struct iov_iter *i) -{ - return i->data_source ? WRITE : READ; -} - static inline bool user_backed_iter(const struct iov_iter *i) { return i->user_backed; diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f9a3ff37ecd1..68497d9c1452 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1429,6 +1429,11 @@ static struct page *first_bvec_segment(const struct iov_iter *i, return page; } +static unsigned char iov_iter_rw(const struct iov_iter *i) +{ + return i->data_source ? WRITE : READ; +} + static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, size_t *start, From patchwork Mon Jan 16 23:08:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4592BC54EBE for ; Mon, 16 Jan 2023 23:10:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235263AbjAPXKO (ORCPT ); Mon, 16 Jan 2023 18:10:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235264AbjAPXJV (ORCPT ); Mon, 16 Jan 2023 18:09:21 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 910ED23852 for ; Mon, 16 Jan 2023 15:08:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tVennCGgg329YaWBH1MuYWHNl9yn102yVqT3Ftw70OY=; b=GW2naVR37i38bZDSmxP66xcq8MDkhCjidR0UK/Q8aqWSoKT1q+59/yQkfl5tZFIKJr2Owc E0jr36IJcO0D3szEV5S3iQt85hwYHS4MmCWwYOyrl1Tv/TKjmfu34bW3ywytC25PsLZcDd lHpMSRYmKj7c/ose6MyL0AGwiy8p04k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-642-9iYowuiSPQ2ft8O_1VfWHQ-1; Mon, 16 Jan 2023 18:08:26 -0500 X-MC-Unique: 9iYowuiSPQ2ft8O_1VfWHQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 161DB811E6E; Mon, 16 Jan 2023 23:08:26 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id AA8C040C6EC4; Mon, 16 Jan 2023 23:08:24 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 03/34] iov_iter: Pass I/O direction into iov_iter_get_pages*() From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:24 +0000 Message-ID: <167391050409.2311931.7103784292954267373.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Define FOLL_SOURCE_BUF and FOLL_DEST_BUF to indicate to get_user_pages*() and iov_iter_get_pages*() how the buffer is intended to be used in an I/O operation. Don't use READ and WRITE as a read I/O writes to memory and vice versa - which causes confusion. The direction is checked against the iterator's data_source. Signed-off-by: David Howells --- block/bio.c | 6 ++++++ block/blk-map.c | 2 ++ crypto/af_alg.c | 9 ++++++--- crypto/algif_hash.c | 3 ++- drivers/vhost/scsi.c | 9 ++++++--- fs/ceph/addr.c | 2 +- fs/ceph/file.c | 14 ++++++++------ fs/cifs/file.c | 8 ++++---- fs/cifs/misc.c | 3 ++- fs/direct-io.c | 6 ++++-- fs/fuse/dev.c | 3 ++- fs/fuse/file.c | 8 ++++---- fs/nfs/direct.c | 10 ++++++---- fs/splice.c | 3 ++- include/crypto/if_alg.h | 3 ++- include/linux/bio.h | 18 ++++++++++++++++-- include/linux/mm.h | 10 ++++++++++ lib/iov_iter.c | 14 +++++++------- net/9p/trans_virtio.c | 12 ++++++++---- net/core/datagram.c | 5 +++-- net/core/skmsg.c | 4 ++-- net/rds/message.c | 4 ++-- net/tls/tls_sw.c | 5 ++--- 23 files changed, 107 insertions(+), 54 deletions(-) diff --git a/block/bio.c b/block/bio.c index 5f96fcae3f75..867cf4db87ea 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1242,6 +1242,8 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, * pages will have to be released using put_page() when done. * For multi-segment *iter, this function only adds pages from the * next non-empty segment of the iov iterator. + * + * The I/O direction is determined from the bio operation type. */ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { @@ -1263,6 +1265,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); + gup_flags |= bio_is_write(bio) ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) gup_flags |= FOLL_PCI_P2PDMA; @@ -1332,6 +1336,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * fit into the bio, or are requested in @iter, whatever is smaller. If * MM encounters an error pinning the requested pages, it stops. Error * is returned only if 0 pages could be pinned. + * + * The bio operation indicates the data direction. */ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { diff --git a/block/blk-map.c b/block/blk-map.c index 08cbb7ff3b19..c30be529fb55 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -279,6 +279,8 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (bio == NULL) return -ENOMEM; + gup_flags |= bio_is_write(bio) ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + if (blk_queue_pci_p2pdma(rq->q)) gup_flags |= FOLL_PCI_P2PDMA; diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 0a4fa2a429e2..7a68db157fae 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -531,13 +531,15 @@ static const struct net_proto_family alg_family = { .owner = THIS_MODULE, }; -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len) +int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, + unsigned int gup_flags) { size_t off; ssize_t n; int npages, i; - n = iov_iter_get_pages2(iter, sgl->pages, len, ALG_MAX_PAGES, &off); + n = iov_iter_get_pages(iter, sgl->pages, len, ALG_MAX_PAGES, &off, + gup_flags); if (n < 0) return n; @@ -1310,7 +1312,8 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, list_add_tail(&rsgl->list, &areq->rsgl_list); /* make one iovec available as scatterlist */ - err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen); + err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen, + FOLL_DEST_BUF); if (err < 0) { rsgl->sg_num_bytes = 0; return err; diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index 1d017ec5c63c..fe3d2258145f 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -91,7 +91,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg, if (len > limit) len = limit; - len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len); + len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len, + FOLL_SOURCE_BUF); if (len < 0) { err = copied ? 0 : len; goto unlock; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index dca6346d75b3..5d10837d19ec 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -646,10 +646,13 @@ vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd, struct scatterlist *sg = sgl; ssize_t bytes; size_t offset; - unsigned int npages = 0; + unsigned int npages = 0, gup_flags = 0; - bytes = iov_iter_get_pages2(iter, pages, LONG_MAX, - VHOST_SCSI_PREALLOC_UPAGES, &offset); + gup_flags |= write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + + bytes = iov_iter_get_pages(iter, pages, LONG_MAX, + VHOST_SCSI_PREALLOC_UPAGES, &offset, + gup_flags); /* No pages were pinned */ if (bytes <= 0) return bytes < 0 ? bytes : -EFAULT; diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8c74871e37c9..cfc3353e5604 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -328,7 +328,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len); iov_iter_xarray(&iter, ITER_DEST, &rreq->mapping->i_pages, subreq->start, len); - err = iov_iter_get_pages_alloc2(&iter, &pages, len, &page_off); + err = iov_iter_get_pages_alloc(&iter, &pages, len, &page_off, FOLL_DEST_BUF); if (err < 0) { dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err); goto out; diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 27c72a2f6af5..ffd36eeea186 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -81,7 +81,7 @@ static __le32 ceph_flags_sys2wire(u32 flags) #define ITER_GET_BVECS_PAGES 64 static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize, - struct bio_vec *bvecs) + struct bio_vec *bvecs, bool write) { size_t size = 0; int bvec_idx = 0; @@ -95,8 +95,9 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize, size_t start; int idx = 0; - bytes = iov_iter_get_pages2(iter, pages, maxsize - size, - ITER_GET_BVECS_PAGES, &start); + bytes = iov_iter_get_pages(iter, pages, maxsize - size, + ITER_GET_BVECS_PAGES, &start, + write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (bytes < 0) return size ?: bytes; @@ -127,7 +128,8 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize, * Return the number of bytes in the created bio_vec array, or an error. */ static ssize_t iter_get_bvecs_alloc(struct iov_iter *iter, size_t maxsize, - struct bio_vec **bvecs, int *num_bvecs) + struct bio_vec **bvecs, int *num_bvecs, + bool write) { struct bio_vec *bv; size_t orig_count = iov_iter_count(iter); @@ -146,7 +148,7 @@ static ssize_t iter_get_bvecs_alloc(struct iov_iter *iter, size_t maxsize, if (!bv) return -ENOMEM; - bytes = __iter_get_bvecs(iter, maxsize, bv); + bytes = __iter_get_bvecs(iter, maxsize, bv, write); if (bytes < 0) { /* * No pages were pinned -- just free the array. @@ -1334,7 +1336,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, break; } - len = iter_get_bvecs_alloc(iter, size, &bvecs, &num_pages); + len = iter_get_bvecs_alloc(iter, size, &bvecs, &num_pages, write); if (len < 0) { ceph_osdc_put_request(req); ret = len; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 22dfc1f8b4f1..d100b9cb8682 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3290,8 +3290,8 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, if (ctx->direct_io) { ssize_t result; - result = iov_iter_get_pages_alloc2( - from, &pagevec, cur_len, &start); + result = iov_iter_get_pages_alloc( + from, &pagevec, cur_len, &start, FOLL_SOURCE_BUF); if (result < 0) { cifs_dbg(VFS, "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", @@ -4031,9 +4031,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, if (ctx->direct_io) { ssize_t result; - result = iov_iter_get_pages_alloc2( + result = iov_iter_get_pages_alloc( &direct_iov, &pagevec, - cur_len, &start); + cur_len, &start, FOLL_DEST_BUF); if (result < 0) { cifs_dbg(VFS, "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 4d3c586785a5..9655cf359ab9 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -1030,7 +1030,8 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) saved_len = count; while (count && npages < max_pages) { - rc = iov_iter_get_pages2(iter, pages, count, max_pages, &start); + rc = iov_iter_get_pages(iter, pages, count, max_pages, &start, + rw == WRITE ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (rc < 0) { cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc); break; diff --git a/fs/direct-io.c b/fs/direct-io.c index cf196f2a211e..b1e26a706e31 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -169,8 +169,10 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) const enum req_op dio_op = dio->opf & REQ_OP_MASK; ssize_t ret; - ret = iov_iter_get_pages2(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from); + ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, + &sdio->from, + op_is_write(dio_op) ? + FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { struct page *page = ZERO_PAGE(0); diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e8b60ce72c9a..e3d8443e24a6 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -730,7 +730,8 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) } } else { size_t off; - err = iov_iter_get_pages2(cs->iter, &page, PAGE_SIZE, 1, &off); + err = iov_iter_get_pages(cs->iter, &page, PAGE_SIZE, 1, &off, + cs->write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (err < 0) return err; BUG_ON(!err); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d68b45f8b3ae..68c196437306 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1414,10 +1414,10 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, while (nbytes < *nbytesp && ap->num_pages < max_pages) { unsigned npages; size_t start; - ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages], - *nbytesp - nbytes, - max_pages - ap->num_pages, - &start); + ret = iov_iter_get_pages(ii, &ap->pages[ap->num_pages], + *nbytesp - nbytes, + max_pages - ap->num_pages, + &start, write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (ret < 0) break; diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index d865945f2a63..42af84685f20 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -332,8 +332,9 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc2(iter, &pagevec, - rsize, &pgbase); + result = iov_iter_get_pages_alloc(iter, &pagevec, + rsize, &pgbase, + FOLL_DEST_BUF); if (result < 0) break; @@ -791,8 +792,9 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc2(iter, &pagevec, - wsize, &pgbase); + result = iov_iter_get_pages_alloc(iter, &pagevec, + wsize, &pgbase, + FOLL_SOURCE_BUF); if (result < 0) break; diff --git a/fs/splice.c b/fs/splice.c index 5969b7a1d353..19c5b5adc548 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1165,7 +1165,8 @@ static int iter_to_pipe(struct iov_iter *from, size_t start; int i, n; - left = iov_iter_get_pages2(from, pages, ~0UL, 16, &start); + left = iov_iter_get_pages(from, pages, ~0UL, 16, &start, + FOLL_SOURCE_BUF); if (left <= 0) { ret = left; break; diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index a5db86670bdf..12058ab6cad9 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -165,7 +165,8 @@ int af_alg_release(struct socket *sock); void af_alg_release_parent(struct sock *sk); int af_alg_accept(struct sock *sk, struct socket *newsock, bool kern); -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len); +int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, + unsigned int gup_flags); void af_alg_free_sg(struct af_alg_sgl *sgl); static inline struct alg_sock *alg_sk(struct sock *sk) diff --git a/include/linux/bio.h b/include/linux/bio.h index 22078a28d7cb..3f7ba7fe48ac 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -40,11 +40,25 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs) #define bio_sectors(bio) bvec_iter_sectors((bio)->bi_iter) #define bio_end_sector(bio) bvec_iter_end_sector((bio)->bi_iter) +/** + * bio_is_write - Query if the I/O direction is towards the disk + * @bio: The bio to query + * + * Return true if this is some sort of write operation - ie. the data is going + * towards the disk. + */ +static inline bool bio_is_write(const struct bio *bio) +{ + return op_is_write(bio_op(bio)); +} + /* * Return the data direction, READ or WRITE. */ -#define bio_data_dir(bio) \ - (op_is_write(bio_op(bio)) ? WRITE : READ) +static inline int bio_data_dir(const struct bio *bio) +{ + return bio_is_write(bio) ? WRITE : READ; +} /* * Check whether this bio carries any data or not. A NULL bio is allowed. diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..3af4ca8b1fe7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3090,6 +3090,10 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ #define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */ +#define FOLL_SOURCE_BUF 0 /* Memory will be read from by I/O */ +#define FOLL_DEST_BUF FOLL_WRITE /* Memory will be written to by I/O */ +#define FOLL_BUF_MASK FOLL_WRITE + /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each * other. Here is what they mean, and how to use them: @@ -3143,6 +3147,12 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * releasing pages: get_user_pages*() pages must be released via put_page(), * while pin_user_pages*() pages must be released via unpin_user_page(). * + * FOLL_SOURCE_BUF and FOLL_DEST_BUF are indicators to get_user_pages*() and + * iov_iter_*_pages*() as to how the pages obtained are going to be used. + * FOLL_SOURCE_BUF indicates that I/O op is going to transfer from memory to + * device; FOLL_DEST_BUF that the op is going to transfer from device to + * memory. + * * Please see Documentation/core-api/pin_user_pages.rst for more information. */ diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 68497d9c1452..f53583836009 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1429,11 +1429,6 @@ static struct page *first_bvec_segment(const struct iov_iter *i, return page; } -static unsigned char iov_iter_rw(const struct iov_iter *i) -{ - return i->data_source ? WRITE : READ; -} - static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, size_t *start, @@ -1448,12 +1443,17 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, if (maxsize > MAX_RW_COUNT) maxsize = MAX_RW_COUNT; + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF && + i->data_source == ITER_DEST)) + return -EIO; + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_DEST_BUF && + i->data_source == ITER_SOURCE)) + return -EIO; + if (likely(user_backed_iter(i))) { unsigned long addr; int res; - if (iov_iter_rw(i) != WRITE) - gup_flags |= FOLL_WRITE; if (i->nofault) gup_flags |= FOLL_NOFAULT; diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 3c27ffb781e3..eb28b54fe5f6 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -310,7 +310,8 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, struct iov_iter *data, int count, size_t *offs, - int *need_drop) + int *need_drop, + unsigned int gup_flags) { int nr_pages; int err; @@ -330,7 +331,8 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, if (err == -ERESTARTSYS) return err; } - n = iov_iter_get_pages_alloc2(data, pages, count, offs); + n = iov_iter_get_pages_alloc(data, pages, count, offs, + gup_flags); if (n < 0) return n; *need_drop = 1; @@ -437,7 +439,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, if (uodata) { __le32 sz; int n = p9_get_mapped_pages(chan, &out_pages, uodata, - outlen, &offs, &need_drop); + outlen, &offs, &need_drop, + FOLL_DEST_BUF); if (n < 0) { err = n; goto err_out; @@ -456,7 +459,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, memcpy(&req->tc.sdata[0], &sz, sizeof(sz)); } else if (uidata) { int n = p9_get_mapped_pages(chan, &in_pages, uidata, - inlen, &offs, &need_drop); + inlen, &offs, &need_drop, + FOLL_SOURCE_BUF); if (n < 0) { err = n; goto err_out; diff --git a/net/core/datagram.c b/net/core/datagram.c index e4ff2db40c98..9f0914b781ad 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -632,8 +632,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; - copied = iov_iter_get_pages2(from, pages, length, - MAX_SKB_FRAGS - frag, &start); + copied = iov_iter_get_pages(from, pages, length, + MAX_SKB_FRAGS - frag, &start, + FOLL_SOURCE_BUF); if (copied < 0) return -EFAULT; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 53d0251788aa..f63a13690712 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -324,8 +324,8 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from, goto out; } - copied = iov_iter_get_pages2(from, pages, bytes, maxpages, - &offset); + copied = iov_iter_get_pages(from, pages, bytes, maxpages, + &offset, FOLL_SOURCE_BUF); if (copied <= 0) { ret = -EFAULT; goto out; diff --git a/net/rds/message.c b/net/rds/message.c index b47e4f0a1639..fcfd406b97af 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -390,8 +390,8 @@ static int rds_message_zcopy_from_user(struct rds_message *rm, struct iov_iter * size_t start; ssize_t copied; - copied = iov_iter_get_pages2(from, &pages, PAGE_SIZE, - 1, &start); + copied = iov_iter_get_pages(from, &pages, PAGE_SIZE, + 1, &start, FOLL_SOURCE_BUF); if (copied < 0) { struct mmpin *mmp; int i; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 9ed978634125..59acaeb24f54 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1354,9 +1354,8 @@ static int tls_setup_from_iter(struct iov_iter *from, rc = -EFAULT; goto out; } - copied = iov_iter_get_pages2(from, pages, - length, - maxpages, &offset); + copied = iov_iter_get_pages(from, pages, length, + maxpages, &offset, FOLL_DEST_BUF); if (copied <= 0) { rc = -EFAULT; goto out; From patchwork Mon Jan 16 23:08:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103856 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4669BC678D4 for ; Mon, 16 Jan 2023 23:10:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231158AbjAPXKP (ORCPT ); Mon, 16 Jan 2023 18:10:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234721AbjAPXJ1 (ORCPT ); Mon, 16 Jan 2023 18:09:27 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A54F1D905 for ; Mon, 16 Jan 2023 15:08:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910516; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4dAIPKunUAzbkWNJ87wOuiKX6GyF+tsszNSftx0o+4s=; b=CXX8LSevDo2ZC5ODUh6k78nGkoy+y4xVxKKv0FpzFQ+vCHGU9Hkj0vR01B//fTALFnu8if tWtAefKZZ7ztFuodjgwPNKCyfH6YpN5/YD1B1KbuUVlALQ8MZ04l4I/Tc6tG1VOMhNTZRy MQ4KCU4NXOdTTRJeFjRsMsIADVkWUEs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-39-CDItCd_iNCaYGYyZBePHvw-1; Mon, 16 Jan 2023 18:08:33 -0500 X-MC-Unique: CDItCd_iNCaYGYyZBePHvw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ECD041C0432A; Mon, 16 Jan 2023 23:08:32 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id BE10E2166B26; Mon, 16 Jan 2023 23:08:31 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 04/34] iov_iter: Remove iov_iter_get_pages2/pages_alloc2() From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:31 +0000 Message-ID: <167391051122.2311931.14824492646435673046.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org There are now no users of iov_iter_get_pages2() and iov_iter_get_pages_alloc2(), so remove them. Signed-off-by: David Howells --- include/linux/uio.h | 4 ---- lib/iov_iter.c | 14 -------------- 2 files changed, 18 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 6f4dfa96324d..365e26c405f2 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -248,13 +248,9 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray * ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, unsigned gup_flags); -ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, - size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start, unsigned gup_flags); -ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, - size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f53583836009..ca89ffa9d6e1 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1511,13 +1511,6 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, } EXPORT_SYMBOL_GPL(iov_iter_get_pages); -ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, - size_t maxsize, unsigned maxpages, size_t *start) -{ - return iov_iter_get_pages(i, pages, maxsize, maxpages, start, 0); -} -EXPORT_SYMBOL(iov_iter_get_pages2); - ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start, unsigned gup_flags) @@ -1536,13 +1529,6 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, } EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc); -ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, - struct page ***pages, size_t maxsize, size_t *start) -{ - return iov_iter_get_pages_alloc(i, pages, maxsize, start, 0); -} -EXPORT_SYMBOL(iov_iter_get_pages_alloc2); - size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i) { From patchwork Mon Jan 16 23:08:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD533C54EBE for ; Mon, 16 Jan 2023 23:10:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235241AbjAPXKV (ORCPT ); Mon, 16 Jan 2023 18:10:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235279AbjAPXJe (ORCPT ); Mon, 16 Jan 2023 18:09:34 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57D1922DCF for ; Mon, 16 Jan 2023 15:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910523; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AXGRyxrMRjzyNrjh13JZHEue2NHbrOxKU/9FUI9urUA=; b=ZC7InEO/20UsqM0wBy8uaRx85t+cfUfrCHKKj6O1jmLnveFzqho8dHi7n1gQi+Isk/grRx lY0oetTprlRoXGsgwd16e5iQ5+3l9GV7rBG2R5R9TsshNy5DBt7MB1ynIja9902MGOHPtI 56IjfJBhy+rP7LBTq/6+lK7fH0rDQ4U= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-661-49nb9eXwM3SfZlrM7W4EnA-1; Mon, 16 Jan 2023 18:08:40 -0500 X-MC-Unique: 49nb9eXwM3SfZlrM7W4EnA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CE2282A59556; Mon, 16 Jan 2023 23:08:39 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id A01352166B26; Mon, 16 Jan 2023 23:08:38 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 05/34] iov_iter: Change the direction macros into an enum From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:38 +0000 Message-ID: <167391051810.2311931.8545361041888737395.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Change the ITER_SOURCE and ITER_DEST direction macros into an enum and provide three new helper functions: iov_iter_dir() - returns the iterator direction iov_iter_is_dest() - returns true if it's an ITER_DEST iterator iov_iter_is_source() - returns true if it's an ITER_SOURCE iterator Signed-off-by: David Howells cc: Al Viro Link: https://lore.kernel.org/r/167305161763.1521586.6593798818336440133.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344726413.2425628.317218805692680763.stgit@warthog.procyon.org.uk/ # v5 --- include/linux/uio.h | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 365e26c405f2..8d0dabfcb2fe 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -29,8 +29,10 @@ enum iter_type { ITER_UBUF, }; -#define ITER_SOURCE 1 // == WRITE -#define ITER_DEST 0 // == READ +enum iter_dir { + ITER_DEST = 0, // == READ + ITER_SOURCE = 1, // == WRITE +} __mode(byte); struct iov_iter_state { size_t iov_offset; @@ -39,9 +41,9 @@ struct iov_iter_state { }; struct iov_iter { - u8 iter_type; + enum iter_type iter_type __mode(byte); bool nofault; - bool data_source; + enum iter_dir data_source; bool user_backed; union { size_t iov_offset; @@ -114,6 +116,26 @@ static inline bool iov_iter_is_xarray(const struct iov_iter *i) return iov_iter_type(i) == ITER_XARRAY; } +static inline enum iter_dir iov_iter_dir(const struct iov_iter *i) +{ + return i->data_source; +} + +static inline bool iov_iter_is_source(const struct iov_iter *i) +{ + return iov_iter_dir(i) == ITER_SOURCE; /* ie. WRITE */ +} + +static inline bool iov_iter_is_dest(const struct iov_iter *i) +{ + return iov_iter_dir(i) == ITER_DEST; /* ie. READ */ +} + +static inline bool iov_iter_dir_valid(enum iter_dir direction) +{ + return direction == ITER_DEST || direction == ITER_SOURCE; +} + static inline bool user_backed_iter(const struct iov_iter *i) { return i->user_backed; From patchwork Mon Jan 16 23:08:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 311EEC678D7 for ; Mon, 16 Jan 2023 23:10:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235171AbjAPXKw (ORCPT ); Mon, 16 Jan 2023 18:10:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235152AbjAPXJm (ORCPT ); Mon, 16 Jan 2023 18:09:42 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1CAB2387C for ; Mon, 16 Jan 2023 15:08:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9V/eGmZYnC6MiNi85mIVw3mHkZILPya3zFOC5f4iszc=; b=I4wXJvIbfbGeMfwsTOBdKXnM6Lv670IJG4IlUuSvP8gnwkzGNsxExb3ePquPSU4zllVybh Z+5sCZZ6jMI+23u3586b41/iuXjqeUCjNPVIGJ0WhUCgdn5qhF7QYt0CZFH21rOL2j9yI4 FGKVK3Cn/jDm7CcM5ZohUD1bt3UWcTk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-350-v6XTz86ZOaSWSCziUJ045Q-1; Mon, 16 Jan 2023 18:08:47 -0500 X-MC-Unique: v6XTz86ZOaSWSCziUJ045Q-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E42B12806046; Mon, 16 Jan 2023 23:08:46 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 82A6B40C6EC4; Mon, 16 Jan 2023 23:08:45 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 06/34] iov_iter: Use the direction in the iterator functions From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:44 +0000 Message-ID: <167391052497.2311931.9463379582932734164.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Use the direction in the iterator functions rather than READ/WRITE. Add a check into __iov_iter_get_pages_alloc() that the supplied FOLL_SOURCE/DEST_BUF gup_flag matches the ITER_SOURCE/DEST flag on the iterator. Changes ======= ver #6) - Add a check on FOLL_SOURCE/DEST_BUF into __iov_iter_get_pages_alloc() Signed-off-by: David Howells cc: Al Viro Link: https://lore.kernel.org/r/167305162465.1521586.18077838937455153675.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344727112.2425628.995771894170560721.stgit@warthog.procyon.org.uk/ # v5 --- include/linux/uio.h | 22 +-- lib/iov_iter.c | 409 ++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 396 insertions(+), 35 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 8d0dabfcb2fe..18b64068cc6d 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -256,16 +256,16 @@ bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask, unsigned len_mask); unsigned long iov_iter_alignment(const struct iov_iter *i); unsigned long iov_iter_gap_alignment(const struct iov_iter *i); -void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov, +void iov_iter_init(struct iov_iter *i, enum iter_dir direction, const struct iovec *iov, unsigned long nr_segs, size_t count); -void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec *kvec, +void iov_iter_kvec(struct iov_iter *i, enum iter_dir direction, const struct kvec *kvec, unsigned long nr_segs, size_t count); -void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec, +void iov_iter_bvec(struct iov_iter *i, enum iter_dir direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count); -void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, +void iov_iter_pipe(struct iov_iter *i, enum iter_dir direction, struct pipe_inode_info *pipe, size_t count); -void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); -void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, +void iov_iter_discard(struct iov_iter *i, enum iter_dir direction, size_t count); +void iov_iter_xarray(struct iov_iter *i, enum iter_dir direction, struct xarray *xarray, loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, @@ -351,19 +351,19 @@ size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp, struct iovec *iovec_from_user(const struct iovec __user *uvector, unsigned long nr_segs, unsigned long fast_segs, struct iovec *fast_iov, bool compat); -ssize_t import_iovec(int type, const struct iovec __user *uvec, +ssize_t import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i); -ssize_t __import_iovec(int type, const struct iovec __user *uvec, +ssize_t __import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i, bool compat); -int import_single_range(int type, void __user *buf, size_t len, +int import_single_range(enum iter_dir direction, void __user *buf, size_t len, struct iovec *iov, struct iov_iter *i); -static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, +static inline void iov_iter_ubuf(struct iov_iter *i, enum iter_dir direction, void __user *buf, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter) { .iter_type = ITER_UBUF, .user_backed = true, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index ca89ffa9d6e1..6436438bf46b 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -421,11 +421,11 @@ size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t size) } EXPORT_SYMBOL(fault_in_iov_iter_writeable); -void iov_iter_init(struct iov_iter *i, unsigned int direction, +void iov_iter_init(struct iov_iter *i, enum iter_dir direction, const struct iovec *iov, unsigned long nr_segs, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter) { .iter_type = ITER_IOVEC, .nofault = false, @@ -994,11 +994,11 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i) } EXPORT_SYMBOL(iov_iter_single_seg_count); -void iov_iter_kvec(struct iov_iter *i, unsigned int direction, +void iov_iter_kvec(struct iov_iter *i, enum iter_dir direction, const struct kvec *kvec, unsigned long nr_segs, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter){ .iter_type = ITER_KVEC, .data_source = direction, @@ -1010,11 +1010,11 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_kvec); -void iov_iter_bvec(struct iov_iter *i, unsigned int direction, +void iov_iter_bvec(struct iov_iter *i, enum iter_dir direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter){ .iter_type = ITER_BVEC, .data_source = direction, @@ -1026,15 +1026,15 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_bvec); -void iov_iter_pipe(struct iov_iter *i, unsigned int direction, +void iov_iter_pipe(struct iov_iter *i, enum iter_dir direction, struct pipe_inode_info *pipe, size_t count) { - BUG_ON(direction != READ); + BUG_ON(direction != ITER_DEST); WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size)); *i = (struct iov_iter){ .iter_type = ITER_PIPE, - .data_source = false, + .data_source = ITER_DEST, .pipe = pipe, .head = pipe->head, .start_head = pipe->head, @@ -1057,10 +1057,10 @@ EXPORT_SYMBOL(iov_iter_pipe); * from evaporation, either by taking a ref on them or locking them by the * caller. */ -void iov_iter_xarray(struct iov_iter *i, unsigned int direction, +void iov_iter_xarray(struct iov_iter *i, enum iter_dir direction, struct xarray *xarray, loff_t start, size_t count) { - BUG_ON(direction & ~1); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter) { .iter_type = ITER_XARRAY, .data_source = direction, @@ -1079,14 +1079,14 @@ EXPORT_SYMBOL(iov_iter_xarray); * @count: The size of the I/O buffer in bytes. * * Set up an I/O iterator that just discards everything that's written to it. - * It's only available as a READ iterator. + * It's only available as a destination iterator. */ -void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count) +void iov_iter_discard(struct iov_iter *i, enum iter_dir direction, size_t count) { - BUG_ON(direction != READ); + BUG_ON(direction != ITER_DEST); *i = (struct iov_iter){ .iter_type = ITER_DISCARD, - .data_source = false, + .data_source = ITER_DEST, .count = count, .iov_offset = 0 }; @@ -1444,10 +1444,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, maxsize = MAX_RW_COUNT; if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF && - i->data_source == ITER_DEST)) + iov_iter_is_dest(i))) return -EIO; if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_DEST_BUF && - i->data_source == ITER_SOURCE)) + iov_iter_is_source(i))) return -EIO; if (likely(user_backed_iter(i))) { @@ -1775,7 +1775,7 @@ struct iovec *iovec_from_user(const struct iovec __user *uvec, return iov; } -ssize_t __import_iovec(int type, const struct iovec __user *uvec, +ssize_t __import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i, bool compat) { @@ -1814,7 +1814,7 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec, total_len += len; } - iov_iter_init(i, type, iov, nr_segs, total_len); + iov_iter_init(i, direction, iov, nr_segs, total_len); if (iov == *iovp) *iovp = NULL; else @@ -1827,7 +1827,7 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec, * into the kernel, check that it is valid, and initialize a new * &struct iov_iter iterator to access it. * - * @type: One of %READ or %WRITE. + * @direction: One of %ITER_SOURCE or %ITER_DEST. * @uvec: Pointer to the userspace array. * @nr_segs: Number of elements in userspace array. * @fast_segs: Number of elements in @iov. @@ -1844,16 +1844,16 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec, * * Return: Negative error code on error, bytes imported on success */ -ssize_t import_iovec(int type, const struct iovec __user *uvec, +ssize_t import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i) { - return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i, + return __import_iovec(direction, uvec, nr_segs, fast_segs, iovp, i, in_compat_syscall()); } EXPORT_SYMBOL(import_iovec); -int import_single_range(int rw, void __user *buf, size_t len, +int import_single_range(enum iter_dir direction, void __user *buf, size_t len, struct iovec *iov, struct iov_iter *i) { if (len > MAX_RW_COUNT) @@ -1863,7 +1863,7 @@ int import_single_range(int rw, void __user *buf, size_t len, iov->iov_base = buf; iov->iov_len = len; - iov_iter_init(i, rw, iov, 1, len); + iov_iter_init(i, direction, iov, 1, len); return 0; } EXPORT_SYMBOL(import_single_range); @@ -1905,3 +1905,364 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state) i->iov -= state->nr_segs - i->nr_segs; i->nr_segs = state->nr_segs; } + +/* + * Extract a list of contiguous pages from an ITER_PIPE iterator. This does + * not get references of its own on the pages, nor does it get a pin on them. + * If there's a partial page, it adds that first and will then allocate and add + * pages into the pipe to make up the buffer space to the amount required. + * + * The caller must hold the pipe locked and only transferring into a pipe is + * supported. + */ +static ssize_t iov_iter_extract_pipe_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + unsigned int nr, offset, chunk, j; + struct page **p; + size_t left; + + if (!sanity(i)) + return -EFAULT; + + offset = pipe_npages(i, &nr); + if (!nr) + return -EFAULT; + *offset0 = offset; + + maxpages = min_t(size_t, nr, maxpages); + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + left = maxsize; + for (j = 0; j < maxpages; j++) { + struct page *page = append_pipe(i, left, &offset); + if (!page) + break; + chunk = min_t(size_t, left, PAGE_SIZE - offset); + left -= chunk; + *p++ = page; + } + if (!j) + return -EFAULT; + return maxsize - left; +} + +/* + * Extract a list of contiguous pages from an ITER_XARRAY iterator. This does not + * get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + struct page *page, **p; + unsigned int nr = 0, offset; + loff_t pos = i->xarray_start + i->iov_offset; + pgoff_t index = pos >> PAGE_SHIFT; + XA_STATE(xas, i->xarray, index); + + offset = pos & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + p[nr++] = find_subpage(page, xas.xa_index); + if (nr == maxpages) + break; + } + rcu_read_unlock(); + + maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize); + i->iov_offset += maxsize; + i->count -= maxsize; + return maxsize; +} + +/* + * Extract a list of contiguous pages from an ITER_BVEC iterator. This does + * not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + struct page **p, *page; + size_t skip = i->iov_offset, offset; + int k; + + maxsize = min(maxsize, i->bvec->bv_len - skip); + skip += i->bvec->bv_offset; + page = i->bvec->bv_page + skip / PAGE_SIZE; + offset = skip % PAGE_SIZE; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + for (k = 0; k < maxpages; k++) + p[k] = page + k; + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + i->count -= maxsize; + i->iov_offset += maxsize; + if (i->iov_offset == i->bvec->bv_len) { + i->iov_offset = 0; + i->bvec++; + i->nr_segs--; + } + return maxsize; +} + +/* + * Get the first segment from an ITER_UBUF or ITER_IOVEC iterator. The + * iterator must not be empty. + */ +static unsigned long iov_iter_extract_first_user_segment(const struct iov_iter *i, + size_t *size) +{ + size_t skip; + long k; + + if (iter_is_ubuf(i)) + return (unsigned long)i->ubuf + i->iov_offset; + + for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) { + size_t len = i->iov[k].iov_len - skip; + + if (unlikely(!len)) + continue; + if (*size > len) + *size = len; + return (unsigned long)i->iov[k].iov_base + skip; + } + BUG(); // if it had been empty, we wouldn't get called +} + +/* + * Extract a list of contiguous pages from a user iterator and get references + * on them. This should only be used iff the iterator is user-backed + * (IOBUF/UBUF) and data is being transferred out of the buffer described by + * the iterator (ie. this is the source). + * + * The pages are returned with incremented refcounts that the caller must undo + * once the transfer is complete, but no additional pins are obtained. + * + * This is only safe to be used where background IO/DMA is not going to be + * modifying the buffer, and so won't cause a problem with CoW on fork. + */ +static ssize_t iov_iter_extract_user_pages_and_get(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + unsigned long addr; + size_t offset; + int res; + + if (WARN_ON_ONCE(!iov_iter_is_source(i))) + return -EFAULT; + + gup_flags |= FOLL_GET; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = iov_iter_extract_first_user_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = get_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from a user iterator and get a pin on + * each of them. This should only be used iff the iterator is user-backed + * (IOBUF/UBUF) and data is being transferred into the buffer described by the + * iterator (ie. this is the destination). + * + * It does not get refs on the pages, but the pages must be unpinned by the + * caller once the transfer is complete. + * + * This is safe to be used where background IO/DMA *is* going to be modifying + * the buffer; using a pin rather than a ref makes sure that CoW happens + * correctly in the parent during fork. + */ +static ssize_t iov_iter_extract_user_pages_and_pin(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + unsigned long addr; + size_t offset; + int res; + + if (WARN_ON_ONCE(!iov_iter_is_dest(i))) + return -EFAULT; + + gup_flags |= FOLL_PIN | FOLL_WRITE; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = first_iovec_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +static ssize_t iov_iter_extract_user_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + if (iov_iter_extract_mode(i, gup_flags) == FOLL_GET) + return iov_iter_extract_user_pages_and_get(i, pages, maxsize, + maxpages, gup_flags, + offset0); + else + return iov_iter_extract_user_pages_and_pin(i, pages, maxsize, + maxpages, gup_flags, + offset0); +} + +/** + * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator + * @i: The iterator to extract from + * @pages: Where to return the list of pages + * @maxsize: The maximum amount of iterator to extract + * @maxpages: The maximum size of the list of pages + * @gup_flags: Direction indicator and additional flags + * @offset0: Where to return the starting offset into (*@pages)[0] + * + * Extract a list of contiguous pages from the current point of the iterator, + * advancing the iterator. The maximum number of pages and the maximum amount + * of page contents can be set. + * + * If *@pages is NULL, a page list will be allocated to the required size and + * *@pages will be set to its base. If *@pages is not NULL, it will be assumed + * that the caller allocated a page list at least @maxpages in size and this + * will be filled in. + * + * @gup_flags can be set to either FOLL_SOURCE_BUF or FOLL_DEST_BUF, indicating + * how the buffer is to be used, and can have FOLL_PCI_P2PDMA OR'd with that. + * + * The iov_iter_extract_mode() function can be used to query how cleanup should + * be performed. + * + * Extra refs or pins on the pages may be obtained as follows: + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be + * transferred /OUT OF/ the buffer (@gup_flags |= FOLL_SOURCE_BUF), refs + * will be taken on the pages, but pins will not be added. This can be + * used for DMA from a page; it cannot be used for DMA to a page, as it + * may cause page-COW problems in fork. iov_iter_extract_mode() will + * return FOLL_GET. + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be + * transferred /INTO/ the described buffer (@gup_flags |= FOLL_DEST_BUF), + * pins will be added to the pages, but refs will not be taken. This must + * be used for DMA to a page. iov_iter_extract_mode() will return + * FOLL_PIN. + * + * (*) If the iterator is ITER_PIPE, this must describe a destination for the + * data. Additional pages may be allocated and added to the pipe (which + * will hold the refs), but neither refs nor pins will be obtained for the + * caller. The caller must hold the pipe lock. iov_iter_extract_mode() + * will return 0. + * + * (*) If the iterator is ITER_BVEC or ITER_XARRAY, the pages are merely + * listed; no extra refs or pins are obtained. iov_iter_extract_mode() + * will return 0. + * + * Note also: + * + * (*) Use with ITER_KVEC is not supported as that may refer to memory that + * doesn't have associated page structs. + * + * (*) Use with ITER_DISCARD is not supported as that has no content. + * + * On success, the function sets *@pages to the new pagelist, if allocated, and + * sets *offset0 to the offset into the first page.. + * + * It may also return -ENOMEM and -EFAULT. + */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF && + iov_iter_is_dest(i))) + return -EIO; + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_DEST_BUF && + iov_iter_is_source(i))) + return -EIO; + + maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT); + if (!maxsize) + return 0; + + if (likely(user_backed_iter(i))) + return iov_iter_extract_user_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + if (iov_iter_is_bvec(i)) + return iov_iter_extract_bvec_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + if (iov_iter_is_pipe(i)) + return iov_iter_extract_pipe_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + if (iov_iter_is_xarray(i)) + return iov_iter_extract_xarray_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + return -EFAULT; +} +EXPORT_SYMBOL_GPL(iov_iter_extract_pages); From patchwork Mon Jan 16 23:08:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103858 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60101C54EBE for ; Mon, 16 Jan 2023 23:10:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235301AbjAPXKu (ORCPT ); Mon, 16 Jan 2023 18:10:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235130AbjAPXJt (ORCPT ); Mon, 16 Jan 2023 18:09:49 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0584D2A179 for ; Mon, 16 Jan 2023 15:09:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910542; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TCpipUj3nCrwQxMm0ou1SVJZpUKEvQM3FUN+vn5vbFE=; b=B2o8sY9WvEcBmbObtyQyh84SjmYqEn9QdEJ6g2gCv40h3cnxSo0a/lau/hMUtTSIVSFzJE g3Rgiy2nqEWVTUHCNjYv35wd3tli5arIp/NpPQgDYAvHF3lYUnVNmnOUFTMe95DuTIOiVN w8gTOEd0kF2V3yuXvN+Vo4a5cdHqwWE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-644-i5YEOVHvMj-EMbjcVQNT2w-1; Mon, 16 Jan 2023 18:08:55 -0500 X-MC-Unique: i5YEOVHvMj-EMbjcVQNT2w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 338A4811E6E; Mon, 16 Jan 2023 23:08:54 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 987AA40C6EC4; Mon, 16 Jan 2023 23:08:52 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 07/34] iov_iter: Add a function to extract a page list from an iterator From: David Howells To: Al Viro Cc: Christoph Hellwig , John Hubbard , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:52 +0000 Message-ID: <167391053207.2311931.16398133457201442907.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a reference added or a pin added or neither, depending on the type of iterator and the direction of transfer. The caller should pass FOLL_SOURCE_BUF or FOLL_DEST_BUF as part of gup_flags to indicate how the iterator contents are to be used. Add a second function, iov_iter_extract_mode(), to determine how the cleanup should be done. There are three cases: (1) Transfer *into* an ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins obtained on them (but not references) so that fork() doesn't CoW the pages incorrectly whilst the I/O is in progress. iov_iter_extract_mode() will return FOLL_PIN for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Transfer is *out of* an ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have references obtained on them, but not pins. iov_iter_extract_mode() will return FOLL_GET. The caller should use something like put_page() for page disposal. (3) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_mode() will return 0. The pages don't need additional disposal. Changes: ======== ver #6) - Add back the function to indicate the cleanup mode. - Drop the cleanup_mode return arg to iov_iter_extract_pages(). - Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter data_source. ver #4) - Use ITER_SOURCE/DEST instead of WRITE/READ. - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: John Hubbard cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1] Link: https://lore.kernel.org/r/166722777971.2555743.12953624861046741424.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732025748.3186319.8314014902727092626.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166869689451.3723671.18242195992447653092.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166920903885.1461876.692029808682876184.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997421646.9475.14837976344157464997.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305163883.1521586.10777155475378874823.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344728530.2425628.9613910866466387722.stgit@warthog.procyon.org.uk/ # v5 --- include/linux/uio.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index 18b64068cc6d..38607c82e0cc 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -373,4 +373,32 @@ static inline void iov_iter_ubuf(struct iov_iter *i, enum iter_dir direction, }; } +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, + size_t maxsize, unsigned int maxpages, + unsigned int gup_flags, size_t *offset0); + +/** + * iov_iter_extract_mode - Indicate how pages from the iterator will be retained + * @iter: The iterator + * @gup_flags: How the iterator is to be used (FOLL_SOURCE/DEST_BUF) + * + * Examine the iterator and the gup_flags and indicate by returning FOLL_PIN, + * FOLL_GET or 0 as to how, if at all, pages extracted from the iterator will + * be retained by the extraction function. + * + * FOLL_GET indicates that the pages will have a reference taken on them that + * the caller must put. This can be done for DMA/async DIO write from a page. + * + * FOLL_PIN indicates that the pages will have a pin placed in them that the + * caller must unpin. This is must be done for DMA/async DIO read to a page to + * avoid CoW problems in fork. + * + * 0 indicates that no measures are taken and that it's up to the caller to + * retain the pages. + */ +#define iov_iter_extract_mode(iter, gup_flags) \ + (user_backed_iter(iter) ? \ + (gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF ? \ + FOLL_GET : FOLL_PIN : 0) + #endif From patchwork Mon Jan 16 23:08:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103860 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7415EC54EBE for ; Mon, 16 Jan 2023 23:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235319AbjAPXLW (ORCPT ); Mon, 16 Jan 2023 18:11:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235255AbjAPXJ6 (ORCPT ); Mon, 16 Jan 2023 18:09:58 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88FD92B2B2 for ; Mon, 16 Jan 2023 15:09:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910545; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QrTYZnqppnIRJJtPh9wzilaUaNto9L5Dg5kn8SyEIfI=; b=I7dzk5VxK5k6rPcikxeHdepzmKVqqbN/wfws5XHDGGGGF2kSE8xZJffLgFz4hmdqJazfs6 WIbw82fWJY77QYiCZbMNPb85lCny/OzICFiwOlo2Sq9j8SEwrS+ZgPTxyDYn8mxrpoAvtj O2MHoyJ3t5nTF/SirNuTHCQhvB+Ip/8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-511-rgMGV2BLNVaszWNZA-oF3g-1; Mon, 16 Jan 2023 18:09:01 -0500 X-MC-Unique: rgMGV2BLNVaszWNZA-oF3g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2CA8385CCE3; Mon, 16 Jan 2023 23:09:01 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id EFE7540C6EC4; Mon, 16 Jan 2023 23:08:59 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 08/34] mm: Provide a helper to drop a pin/ref on a page From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:59 +0000 Message-ID: <167391053934.2311931.17229969100836070492.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Provide a helper in the get_user_pages code to drop a pin or a ref on a page based on being given FOLL_GET or FOLL_PIN in its flags argument or do nothing if neither is set. Signed-off-by: David Howells --- include/linux/mm.h | 3 +++ mm/gup.c | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3af4ca8b1fe7..8e746a930945 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1367,6 +1367,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) #define SECTION_IN_PAGE_FLAGS #endif +void folio_put_unpin(struct folio *folio, unsigned int flags); +void page_put_unpin(struct page *page, unsigned int flags); + /* * The identification function is mainly used by the buddy allocator for * determining if two pages could be buddies. We are not really identifying diff --git a/mm/gup.c b/mm/gup.c index f45a3a5be53a..3ee4b4c7e0cb 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -191,6 +191,28 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) folio_put_refs(folio, refs); } +/** + * folio_put_unpin - Unpin/put a folio as appropriate + * @folio: The folio to release + * @flags: gup flags indicating the mode of release (FOLL_*) + * + * Release a folio according to the flags. If FOLL_GET is set, the folio has a + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left + * unaltered. + */ +void folio_put_unpin(struct folio *folio, unsigned int flags) +{ + if (flags & (FOLL_GET | FOLL_PIN)) + gup_put_folio(folio, 1, flags); +} +EXPORT_SYMBOL_GPL(folio_put_unpin); + +void page_put_unpin(struct page *page, unsigned int flags) +{ + folio_put_unpin(page_folio(page), flags); +} +EXPORT_SYMBOL_GPL(page_put_unpin); + /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount * @page: pointer to page to be grabbed From patchwork Mon Jan 16 23:09:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D525BC67871 for ; Mon, 16 Jan 2023 23:11:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235195AbjAPXLi (ORCPT ); Mon, 16 Jan 2023 18:11:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235267AbjAPXKZ (ORCPT ); Mon, 16 Jan 2023 18:10:25 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D40F22A162 for ; Mon, 16 Jan 2023 15:09:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oIg4SI8jd35c8Kgm0gG6JOeWye+BZTFoEmJg3h8S5uc=; b=U2tzO5L3eVNH/bQdMw6iswVDw9ustmx7Epejj7BdzETs54WzgrtNc9UrOplwKgbcyEavA3 eMx2DVnwi4i8JKpy2CszKB05h0s32PoiWHY18Ou6d3Jh245F3arAT+yniYgB86//dRteOM ERWiUN43pqSVHfsRs+nw/dFxOSpllvs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-21-t5kEwhkRNwqeCqPIKbDeXQ-1; Mon, 16 Jan 2023 18:09:08 -0500 X-MC-Unique: t5kEwhkRNwqeCqPIKbDeXQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3F7A22A59556; Mon, 16 Jan 2023 23:09:08 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id D489D492B00; Mon, 16 Jan 2023 23:09:06 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 09/34] bio: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:06 +0000 Message-ID: <167391054631.2311931.7588488803802952158.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning. In a following patch I intend to add a BIO_PAGE_PINNED flag to indicate that the page needs unpinning and this way both flags have the same logic. Changes ======= ver #5) - Split from patch that uses iov_iter_extract_pages(). Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/167305166150.1521586.10220949115402059720.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344730802.2425628.14034153595667416149.stgit@warthog.procyon.org.uk/ # v5 Reviewed-by: Christoph Hellwig --- block/bio.c | 9 ++++++++- include/linux/bio.h | 2 +- include/linux/blk_types.h | 2 +- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/block/bio.c b/block/bio.c index 867cf4db87ea..5b6a76c3e620 100644 --- a/block/bio.c +++ b/block/bio.c @@ -243,6 +243,10 @@ static void bio_free(struct bio *bio) * Users of this function have their own bio allocation. Subsequently, * they must remember to pair any call to bio_init() with bio_uninit() * when IO has completed, or when the bio is released. + * + * We set the initial assumption that pages attached to the bio will be + * released with put_page() by setting BIO_PAGE_REFFED; if the pages + * should not be put, this flag should be cleared. */ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, blk_opf_t opf) @@ -274,6 +278,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, #ifdef CONFIG_BLK_DEV_INTEGRITY bio->bi_integrity = NULL; #endif + bio_set_flag(bio, BIO_PAGE_REFFED); bio->bi_vcnt = 0; atomic_set(&bio->__bi_remaining, 1); @@ -302,6 +307,7 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) { bio_uninit(bio); memset(bio, 0, BIO_RESET_BYTES); + bio_set_flag(bio, BIO_PAGE_REFFED); atomic_set(&bio->__bi_remaining, 1); bio->bi_bdev = bdev; if (bio->bi_bdev) @@ -812,6 +818,7 @@ EXPORT_SYMBOL(bio_put); static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); + bio_clear_flag(bio, BIO_PAGE_REFFED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; @@ -1198,7 +1205,7 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio->bi_io_vec = (struct bio_vec *)iter->bvec; bio->bi_iter.bi_bvec_done = iter->iov_offset; bio->bi_iter.bi_size = size; - bio_set_flag(bio, BIO_NO_PAGE_REF); + bio_clear_flag(bio, BIO_PAGE_REFFED); bio_set_flag(bio, BIO_CLONED); } diff --git a/include/linux/bio.h b/include/linux/bio.h index 3f7ba7fe48ac..69b32c5532f6 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -496,7 +496,7 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (!bio_flagged(bio, BIO_NO_PAGE_REF)) + if (bio_flagged(bio, BIO_PAGE_REFFED)) __bio_release_pages(bio, mark_dirty); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 99be590f952f..86711fb0534a 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -318,7 +318,7 @@ struct bio { * bio flags */ enum { - BIO_NO_PAGE_REF, /* don't put release vec pages */ + BIO_PAGE_REFFED, /* Pages need refs putting (equivalent to FOLL_GET) */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */ From patchwork Mon Jan 16 23:09:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14CCDC54EBE for ; Mon, 16 Jan 2023 23:11:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235166AbjAPXLq (ORCPT ); Mon, 16 Jan 2023 18:11:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235114AbjAPXKl (ORCPT ); Mon, 16 Jan 2023 18:10:41 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 865512C642 for ; Mon, 16 Jan 2023 15:09:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910559; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Girl2YBNSmk/RhHD3I8qnvIF9Ng617btd8CJd+5Emo0=; b=QbtqwaZXNZ0LyFeFY8vzcCPeC69ihK8fFGvWvOVlBVkS1e8GLXLuGxbdaT+w8YvG/ewceL vmW6I0FqbZz0CGdgin0N4ubIdNRqZrefCXXdXP0VhlJMqTDNRHlyrqTj+O+JymSXT5FoQL qbLbdjDjzWsEvAc+ZB6qT46dL9GeSmI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-355-QG4fZcgpOoC_YVw6MQevGA-1; Mon, 16 Jan 2023 18:09:16 -0500 X-MC-Unique: QG4fZcgpOoC_YVw6MQevGA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 533ED1C0518D; Mon, 16 Jan 2023 23:09:15 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id E91BE492B11; Mon, 16 Jan 2023 23:09:13 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 10/34] mm, block: Make BIO_PAGE_REFFED/PINNED the same as FOLL_GET/PIN numerically From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:13 +0000 Message-ID: <167391055339.2311931.11902422289425837725.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Make BIO_PAGE_REFFED the same as FOLL_GET and BIO_PAGE_PINNED the same as FOLL_PIN numerically so that the BIO_* flags can be passed directly to page_put_unpin(). Provide a build-time assertion to check this. Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org --- block/bio.c | 3 +++ include/linux/blk_types.h | 1 + include/linux/mm.h | 17 ++++++++++------- 3 files changed, 14 insertions(+), 7 deletions(-) diff --git a/block/bio.c b/block/bio.c index 5b6a76c3e620..d8c636cefcdd 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1798,6 +1798,9 @@ static int __init init_bio(void) { int i; + BUILD_BUG_ON((1 << BIO_PAGE_REFFED) != FOLL_GET); + BUILD_BUG_ON((1 << BIO_PAGE_PINNED) != FOLL_PIN); + bio_integrity_init(); for (i = 0; i < ARRAY_SIZE(bvec_slabs); i++) { diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 86711fb0534a..42b40156c517 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -319,6 +319,7 @@ struct bio { */ enum { BIO_PAGE_REFFED, /* Pages need refs putting (equivalent to FOLL_GET) */ + BIO_PAGE_PINNED, /* Pages need unpinning (equivalent to FOLL_PIN) */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 8e746a930945..f14edb192394 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3074,12 +3074,13 @@ static inline vm_fault_t vmf_error(int err) struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags); -#define FOLL_WRITE 0x01 /* check pte is writable */ -#define FOLL_TOUCH 0x02 /* mark page accessed */ -#define FOLL_GET 0x04 /* do get_page on page */ -#define FOLL_DUMP 0x08 /* give error on hole if it would be zero */ -#define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */ -#define FOLL_NOWAIT 0x20 /* if a disk transfer is needed, start the IO +#define FOLL_GET 0x01 /* do get_page on page (equivalent to BIO_FOLL_GET) */ +#define FOLL_PIN 0x02 /* pages must be released via unpin_user_page */ +#define FOLL_WRITE 0x04 /* check pte is writable */ +#define FOLL_TOUCH 0x08 /* mark page accessed */ +#define FOLL_DUMP 0x10 /* give error on hole if it would be zero */ +#define FOLL_FORCE 0x20 /* get_user_pages read/write w/o permission */ +#define FOLL_NOWAIT 0x40 /* if a disk transfer is needed, start the IO * and return without waiting upon it */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ @@ -3088,7 +3089,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ -#define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ #define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ #define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */ @@ -3098,6 +3098,9 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_BUF_MASK FOLL_WRITE /* + * FOLL_GET must be the same bit as BIO_FOLL_GET and FOLL_PIN must be the same + * bit as BIO_FOLL_PIN. + * * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each * other. Here is what they mean, and how to use them: * From patchwork Mon Jan 16 23:09:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103862 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75F42C678D4 for ; Mon, 16 Jan 2023 23:11:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234606AbjAPXLo (ORCPT ); Mon, 16 Jan 2023 18:11:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235166AbjAPXKr (ORCPT ); Mon, 16 Jan 2023 18:10:47 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DF71298FC for ; Mon, 16 Jan 2023 15:09:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XMfr1u6zgBhuUVGQJaZ/A8MK9MKewwNzFy741O6RN1U=; b=NajNmed+PdUhbCmI+dlIlvADNmUjeHTEvx8jiZNKERyg3puwI2EMxjMAN8TCKUqDf9jozr Pt2/lsuRKlGU2RlfOGc7m0bkEw47S5twJfZvWYwl0B6DDufdyrzB7Q3NRzMcP2jWtfU6fO RSyfq2MZ2TsmzWYjOltPpyf4wG70cKc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-457-5Pu3Y-WLN-GMPLkQPXNLGg-1; Mon, 16 Jan 2023 18:09:23 -0500 X-MC-Unique: 5Pu3Y-WLN-GMPLkQPXNLGg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 686E9101A521; Mon, 16 Jan 2023 23:09:22 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0775A51FF; Mon, 16 Jan 2023 23:09:20 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 11/34] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:20 +0000 Message-ID: <167391056047.2311931.6772604381276147664.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert the block layer's bio code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the source iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). To implement this: (1) If the BIO_PAGE_REFFED flag is set, this causes attached pages to be passed to put_page() during cleanup. (2) A BIO_PAGE_PINNED flag is provided. If set, this causes attached pages to be passed to unpin_user_page() during cleanup. (3) BIO_PAGE_REFFED is set by default and BIO_PAGE_PINNED is cleared by default when the bio is (re-)initialised. (4) If iov_iter_extract_pages() indicates FOLL_GET, this causes BIO_PAGE_REFFED to be set and if FOLL_PIN is indicated, this causes BIO_PAGE_PINNED to be set. If it returns neither FOLL_* flag, then both BIO_PAGE_* flags will be cleared. Mixing sets of pages with different clean up modes is not supported. (5) Cloned bio structs have both flags cleared. (6) bio_release_pages() will do the release if either BIO_PAGE_* flag is set. [!] Note that this is tested a bit with ext4, but nothing else. Changes ======= ver #5) - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to BIO_* flags and got rid of bi_cleanup_mode. - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch. Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/167305166150.1521586.10220949115402059720.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344731521.2425628.5403113335062567245.stgit@warthog.procyon.org.uk/ # v5 --- block/bio.c | 34 +++++++++++++++++++--------------- block/blk-map.c | 22 +++++++++++----------- block/blk.h | 25 +++++++++++++++++++++++++ include/linux/bio.h | 3 ++- 4 files changed, 57 insertions(+), 27 deletions(-) diff --git a/block/bio.c b/block/bio.c index d8c636cefcdd..f9ee3625d65c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -245,8 +245,9 @@ static void bio_free(struct bio *bio) * when IO has completed, or when the bio is released. * * We set the initial assumption that pages attached to the bio will be - * released with put_page() by setting BIO_PAGE_REFFED; if the pages - * should not be put, this flag should be cleared. + * released with put_page() by setting BIO_PAGE_REFFED, but this should be set + * to BIO_PAGE_PINNED if the page should be unpinned instead; if the pages + * should not be put or unpinned, these flags should be cleared. */ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, blk_opf_t opf) @@ -819,6 +820,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); bio_clear_flag(bio, BIO_PAGE_REFFED); + bio_clear_flag(bio, BIO_PAGE_PINNED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; @@ -1183,7 +1185,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty) bio_for_each_segment_all(bvec, bio, iter_all) { if (mark_dirty && !PageCompound(bvec->bv_page)) set_page_dirty_lock(bvec->bv_page); - put_page(bvec->bv_page); + bio_release_page(bio, bvec->bv_page); } } EXPORT_SYMBOL_GPL(__bio_release_pages); @@ -1220,7 +1222,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, } if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1234,7 +1236,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, queue_max_zone_append_sectors(q), &same_page) != len) return -EINVAL; if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1245,10 +1247,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, * @bio: bio to add pages to * @iter: iov iterator describing the region to be mapped * - * Pins pages from *iter and appends them to @bio's bvec array. The - * pages will have to be released using put_page() when done. - * For multi-segment *iter, this function only adds pages from the - * next non-empty segment of the iov iterator. + * Extracts pages from *iter and appends them to @bio's bvec array. The pages + * will have to be cleaned up in the way indicated by the BIO_PAGE_REFFED and + * BIO_PAGE_PINNED flags. For a multi-segment *iter, this function only adds + * pages from the next non-empty segment of the iov iterator. * * The I/O direction is determined from the bio operation type. */ @@ -1284,12 +1286,14 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * result to ensure the bio's total size is correct. The remainder of * the iov data will be picked up in the next bio iteration. */ - size = iov_iter_get_pages(iter, pages, - UINT_MAX - bio->bi_iter.bi_size, - nr_pages, &offset, gup_flags); + size = iov_iter_extract_pages(iter, &pages, + UINT_MAX - bio->bi_iter.bi_size, + nr_pages, gup_flags, &offset); if (unlikely(size <= 0)) return size ? size : -EFAULT; + bio_set_cleanup_mode(bio, iter, gup_flags); + nr_pages = DIV_ROUND_UP(offset + size, PAGE_SIZE); trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1); @@ -1319,7 +1323,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_revert(iter, left); out: while (i < nr_pages) - put_page(pages[i++]); + bio_release_page(bio, pages[i++]); return ret; } @@ -1502,8 +1506,8 @@ void bio_set_pages_dirty(struct bio *bio) * the BIO and re-dirty the pages in process context. * * It is expected that bio_check_pages_dirty() will wholly own the BIO from - * here on. It will run one put_page() against each page and will run one - * bio_put() against the BIO. + * here on. It will run one put_page() or unpin_user_page() against each page + * and will run one bio_put() against the BIO. */ static void bio_dirty_fn(struct work_struct *work); diff --git a/block/blk-map.c b/block/blk-map.c index c30be529fb55..be769f889eca 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -285,24 +285,24 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, gup_flags |= FOLL_PCI_P2PDMA; while (iov_iter_count(iter)) { - struct page **pages, *stack_pages[UIO_FASTIOV]; + struct page *stack_pages[UIO_FASTIOV]; + struct page **pages = stack_pages; ssize_t bytes; size_t offs; int npages; - if (nr_vecs <= ARRAY_SIZE(stack_pages)) { - pages = stack_pages; - bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - nr_vecs, &offs, gup_flags); - } else { - bytes = iov_iter_get_pages_alloc(iter, &pages, - LONG_MAX, &offs, gup_flags); - } + if (nr_vecs > ARRAY_SIZE(stack_pages)) + pages = NULL; + + bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX, + nr_vecs, gup_flags, &offs); if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; goto out_unmap; } + bio_set_cleanup_mode(bio, iter, gup_flags); + npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE); if (unlikely(offs & queue_dma_alignment(rq->q))) @@ -319,7 +319,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (!bio_add_hw_page(rq->q, bio, page, n, offs, max_sectors, &same_page)) { if (same_page) - put_page(page); + bio_release_page(bio, page); break; } @@ -331,7 +331,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, * release the pages we didn't map into the bio, if any */ while (j < npages) - put_page(pages[j++]); + bio_release_page(bio, pages[j++]); if (pages != stack_pages) kvfree(pages); /* couldn't stuff something into bio? */ diff --git a/block/blk.h b/block/blk.h index 4c3b3325219a..29f12f758915 100644 --- a/block/blk.h +++ b/block/blk.h @@ -425,6 +425,31 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); +/* + * Set the cleanup mode for a bio from an iterator and the GUP flags. + */ +static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter, + unsigned int gup_flags) +{ + unsigned int cleanup_mode; + + bio_clear_flag(bio, BIO_PAGE_REFFED); + cleanup_mode = iov_iter_extract_mode(iter, gup_flags); + if (cleanup_mode & FOLL_GET) + bio_set_flag(bio, BIO_PAGE_REFFED); + if (cleanup_mode & FOLL_PIN) + bio_set_flag(bio, BIO_PAGE_PINNED); +} + +/* + * Clean up a page appropriately, where the page may be pinned, may have a + * ref taken on it or neither. + */ +static inline void bio_release_page(struct bio *bio, struct page *page) +{ + page_put_unpin(page, bio->bi_flags & (FOLL_GET | FOLL_PIN)); +} + struct request_queue *blk_alloc_queue(int node_id); int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner); diff --git a/include/linux/bio.h b/include/linux/bio.h index 69b32c5532f6..856b28e41d24 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -496,7 +496,8 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (bio_flagged(bio, BIO_PAGE_REFFED)) + if (bio_flagged(bio, BIO_PAGE_REFFED) || + bio_flagged(bio, BIO_PAGE_PINNED)) __bio_release_pages(bio, mark_dirty); } From patchwork Mon Jan 16 23:09:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103864 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC4B6C67871 for ; Mon, 16 Jan 2023 23:12:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235357AbjAPXMT (ORCPT ); Mon, 16 Jan 2023 18:12:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235273AbjAPXLE (ORCPT ); Mon, 16 Jan 2023 18:11:04 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC99914EAC for ; Mon, 16 Jan 2023 15:09:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910575; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YVa1ryPTQBH/SP1C4Yj88p68hB8CmUr5EMTQUgrIC1o=; b=EWR3CiUhcdkvi+fknZXWf/Mg4akdWLv4/Wpq7jOXnRFIfpXj6E68O/A21F96zT++XcX+6v ZYbHrSQzExAZY6l36N0uZh5ZCBVdzzTedSPdnM0KMKXXovAZDCaX8PeHt8LMUNSbljjIwY a4bJdwxz0nzUh+QNBBrATm/sGBAVQPY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-20-qc22pDstOuC3dJ8UShOP6g-1; Mon, 16 Jan 2023 18:09:29 -0500 X-MC-Unique: qc22pDstOuC3dJ8UShOP6g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4B38B2806046; Mon, 16 Jan 2023 23:09:29 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1BE9A14171B8; Mon, 16 Jan 2023 23:09:28 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 12/34] bio: Fix bio_flagged() so that gcc can better optimise it From: David Howells To: Al Viro Cc: Jens Axboe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:27 +0000 Message-ID: <167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Fixes: b7c44ed9d2fc ("block: manipulate bio->bi_flags through helpers") Signed-off-by: David Howells cc: Jens Axboe cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Reviewed-by: Christoph Hellwig --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 856b28e41d24..5e34bcfcfa2c 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -241,7 +241,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count) static inline bool bio_flagged(struct bio *bio, unsigned int bit) { - return (bio->bi_flags & (1U << bit)) != 0; + return bio->bi_flags & (1U << bit); } static inline void bio_set_flag(struct bio *bio, unsigned int bit) From patchwork Mon Jan 16 23:09:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E89C2C46467 for ; Mon, 16 Jan 2023 23:12:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235368AbjAPXM2 (ORCPT ); Mon, 16 Jan 2023 18:12:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234239AbjAPXLV (ORCPT ); Mon, 16 Jan 2023 18:11:21 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B57B2A17B for ; Mon, 16 Jan 2023 15:09:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910585; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SyFdojZHqYZAjivktOFvj0WETw0xJd5nC3JvqHo/mQc=; b=cQPCX5gAzB3Ta3o6cfOhlOlvXBMsziGJS0vjixSH3J7dFnCgHHyvsW/GvGPlQJRH8Cbbl3 4pTJLfZ2Oe6WvApFgp16+w2UgXwOKedBT717pgPI9HZluLKT2JX0g/QMQlhl94RBWs8HUv /IJ3iSQHpCehOpXNEHck/TRLAb7U9cI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-199-wUUqsBjMOO6GsYwbVltGmw-1; Mon, 16 Jan 2023 18:09:37 -0500 X-MC-Unique: wUUqsBjMOO6GsYwbVltGmw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C5F14811E6E; Mon, 16 Jan 2023 23:09:36 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id F3F3140C2064; Mon, 16 Jan 2023 23:09:34 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 13/34] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator From: David Howells To: Al Viro Cc: Jeff Layton , Steve French , Shyam Prasad N , Rohith Surabattula , linux-cachefs@redhat.com, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:34 +0000 Message-ID: <167391057444.2311931.12321968641492694017.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a function to extract the pages from a user-space supplied iterator (UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by getting a ref on them (FOLL_SOURCE_BUF is indicated) or pinning them (FOLL_DEST_BUF is indicated) as we go. This is useful in three situations: (1) A userspace thread may have a sibling that unmaps or remaps the process's VM during the operation, changing the assignment of the pages and potentially causing an error. Retaining the pages keeps some pages around, even if this occurs; futher, we find out at the point of extraction if EFAULT is going to be incurred. (2) Pages might get swapped out/discarded if not retained, so we want to retain them to avoid the reload causing a deadlock due to a DIO from/to an mmapped region on the same file. (3) The iterator may get passed to sendmsg() by the filesystem. If a fault occurs, we may get a short write to a TCP stream that's then tricky to recover from. We don't deal with other types of iterator here, leaving it to other mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe lock). Changes: ======== ver #6) - Pass in a gup_flags argument to allow FOLL_SOURCE_BUF and FOLL_DEST_BUF and other FOLL_* flags to be passed in. - Don't pass back the cleanup mode - iov_iter_extract_mode() can be used to determine that. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1] Link: https://lore.kernel.org/r/166697255265.61150.6289490555867717077.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732026503.3186319.12020462741051772825.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166869690376.3723671.8813331570219190705.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166920904810.1461876.11603559311247187100.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997422579.9475.12101700945635692496.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305164634.1521586.12199658904363317567.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344729278.2425628.3277966637577509831.stgit@warthog.procyon.org.uk/ # v5 --- fs/netfs/Makefile | 1 fs/netfs/iterator.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 2 + 3 files changed, 105 insertions(+) create mode 100644 fs/netfs/iterator.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index f684c0cd1ec5..386d6fb92793 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ buffered_read.o \ io.o \ + iterator.o \ main.o \ objects.o diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c new file mode 100644 index 000000000000..f7f26de1a247 --- /dev/null +++ b/fs/netfs/iterator.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Iterator helpers. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec + * @orig: The original iterator + * @orig_len: The amount of iterator to copy + * @new: The iterator to be set up + * @gup_flags: Direction indicator and additional flags + * + * Extract the page fragments from the given amount of the source iterator and + * build up a second iterator that refers to all of those bits. This allows + * the original iterator to disposed of. + * + * @gup_flags should indicate FOLL_SOURCE_BUF or FOLL_DEST_BUF plus any + * additional flags needed. + * + * On success, the number of elements in the bvec is returned, the original + * iterator will have been advanced by the amount extracted. + * + * The iov_iter_extract_mode() function should be used to query how cleanup + * should be performed. + */ +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, unsigned int gup_flags) +{ + struct bio_vec *bv = NULL; + struct page **pages; + unsigned int cur_npages; + unsigned int max_pages; + unsigned int npages = 0; + unsigned int i; + ssize_t ret; + size_t count = orig_len, offset, len; + size_t bv_size, pg_size; + + if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig))) + return -EIO; + + max_pages = iov_iter_npages(orig, INT_MAX); + bv_size = array_size(max_pages, sizeof(*bv)); + bv = kvmalloc(bv_size, GFP_KERNEL); + if (!bv) + return -ENOMEM; + + /* Put the page list at the end of the bvec list storage. bvec + * elements are larger than page pointers, so as long as we work + * 0->last, we should be fine. + */ + pg_size = array_size(max_pages, sizeof(*pages)); + pages = (void *)bv + bv_size - pg_size; + + while (count && npages < max_pages) { + ret = iov_iter_extract_pages(orig, &pages, count, + max_pages - npages, gup_flags, + &offset); + if (ret < 0) { + pr_err("Couldn't get user pages (rc=%zd)\n", ret); + break; + } + + if (ret > count) { + pr_err("get_pages rc=%zd more than %zu\n", ret, count); + break; + } + + count -= ret; + ret += offset; + cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE); + + if (npages + cur_npages > max_pages) { + pr_err("Out of bvec array capacity (%u vs %u)\n", + npages + cur_npages, max_pages); + break; + } + + for (i = 0; i < cur_npages; i++) { + len = ret > PAGE_SIZE ? PAGE_SIZE : ret; + bv[npages + i].bv_page = *pages++; + bv[npages + i].bv_offset = offset; + bv[npages + i].bv_len = len - offset; + ret -= len; + offset = 0; + } + + npages += cur_npages; + } + + iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count); + return npages; +} +EXPORT_SYMBOL_GPL(netfs_extract_user_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 4c76ddfb6a67..a45757dd382d 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -296,6 +296,8 @@ void netfs_get_subrequest(struct netfs_io_subrequest *subreq, void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, enum netfs_sreq_ref_trace what); void netfs_stats_show(struct seq_file *); +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, unsigned int gup_flags); /** * netfs_inode - Get the netfs inode context from the inode From patchwork Mon Jan 16 23:09:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16F8AC46467 for ; Mon, 16 Jan 2023 23:12:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235267AbjAPXMv (ORCPT ); Mon, 16 Jan 2023 18:12:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235348AbjAPXLl (ORCPT ); Mon, 16 Jan 2023 18:11:41 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57B182CFFC for ; Mon, 16 Jan 2023 15:09:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5rRwJZJuSkZ5sxTa2OXIsMauckbTTOA2+6xHNDzx1XY=; b=W5tiUeRnePryyenyV5Oawqgw4UkjOetVE2QCMK6doeMJdUtzBIh0jcD/3EBI1OZ9u/9Asz 0JiIc6fCa5y+ub3+EljKfc9ceHQ9MFtD8jL9KZNeMROLmEAeYEM4fl67ZWOnEyeiCwLd+r 0gIxxY8B0o4PtZlXYvgs6dpRMQitWx0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-147-xbF-EhUjPymfyGtJUq3XLA-1; Mon, 16 Jan 2023 18:09:45 -0500 X-MC-Unique: xbF-EhUjPymfyGtJUq3XLA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6544980234E; Mon, 16 Jan 2023 23:09:44 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7944E2026D4B; Mon, 16 Jan 2023 23:09:42 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 14/34] netfs: Add a function to extract an iterator into a scatterlist From: David Howells To: Al Viro Cc: Jeff Layton , Steve French , Shyam Prasad N , Rohith Surabattula , linux-cachefs@redhat.com, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:41 +0000 Message-ID: <167391058194.2311931.1725331547727885666.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Provide a function for filling in a scatterlist from the list of pages contained in an iterator. The function is passed FOLL_SOURCE_BUF or FOLL_DEST_BUF to indicate how the extracted pages are to be used. If the iterator is UBUF- or IOBUF-type, the pages have a ref (FOLL_SOURCE_BUF) or a pin (FOLL_DEST_BUF) taken on them. If the iterator is BVEC-, KVEC- or XARRAY-type, no ref is taken on the pages and it is left to the caller to manage their lifetime. It cannot be assumed that a ref can be validly taken, particularly in the case of a KVEC iterator. Changes: ======== ver #6) - Pass in a gup_flags argument to allow FOLL_SOURCE_BUF and FOLL_DEST_BUF and other FOLL_* flags to be passed in. - Don't pass back the cleanup mode - iov_iter_extract_mode() can be used to determine that. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1] Link: https://lore.kernel.org/r/166697255985.61150.16489950598033809487.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732027275.3186319.5186488812166611598.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166869691313.3723671.10714823767342163891.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166920905749.1461876.12079195122363691498.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997423514.9475.11145024341505464337.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305165398.1521586.12353215176136705725.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344730041.2425628.14391053364759792950.stgit@warthog.procyon.org.uk/ # v5 --- fs/netfs/iterator.c | 269 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 4 + mm/vmalloc.c | 1 3 files changed, 274 insertions(+) diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index f7f26de1a247..1d20ad2123b5 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -7,7 +7,9 @@ #include #include +#include #include +#include #include #include "internal.h" @@ -100,3 +102,270 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, return npages; } EXPORT_SYMBOL_GPL(netfs_extract_user_iter); + +/* + * Extract as list of up to sg_max pages from UBUF- or IOVEC-class iterators, + * pin or get refs on them appropriate and add them to the scatterlist. + */ +static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + struct page **pages; + unsigned int npages; + ssize_t ret = 0, res; + size_t len, off; + + /* We decant the page list into the tail of the scatterlist */ + pages = (void *)sgtable->sgl + array_size(sg_max, sizeof(struct scatterlist)); + pages -= sg_max; + + do { + res = iov_iter_extract_pages(iter, &pages, maxsize, sg_max, + gup_flags, &off); + if (res < 0) + goto failed; + + len = res; + maxsize -= len; + ret += len; + npages = DIV_ROUND_UP(off + len, PAGE_SIZE); + sg_max -= npages; + + for (; npages < 0; npages--) { + struct page *page = *pages; + size_t seg = min_t(size_t, PAGE_SIZE - off, len); + + *pages++ = NULL; + sg_set_page(sg, page, len, off); + sgtable->nents++; + sg++; + len -= seg; + off = 0; + } + } while (maxsize > 0 && sg_max > 0); + + return ret; + +failed: + while (sgtable->nents > sgtable->orig_nents) + put_page(sg_page(&sgtable->sgl[--sgtable->nents])); + return res; +} + +/* + * Extract up to sg_max pages from a BVEC-type iterator and add them to the + * scatterlist. The pages are not pinned. + */ +static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + const struct bio_vec *bv = iter->bvec; + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + sg_set_page(sg, bv[i].bv_page, len, off); + sgtable->nents++; + sg++; + sg_max--; + + ret += len; + maxsize -= len; + if (maxsize <= 0 || sg_max == 0) + break; + start = 0; + } + + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/* + * Extract up to sg_max pages from a KVEC-type iterator and add them to the + * scatterlist. This can deal with vmalloc'd buffers as well as kmalloc'd or + * static buffers. The pages are not pinned. + */ +static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + const struct kvec *kv = iter->kvec; + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + struct page *page; + unsigned long kaddr; + size_t off, len, seg; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + kaddr = (unsigned long)kv[i].iov_base + start; + off = kaddr & ~PAGE_MASK; + len = min_t(size_t, maxsize, len - start); + kaddr &= PAGE_MASK; + + maxsize -= len; + ret += len; + do { + seg = min_t(size_t, len, PAGE_SIZE - off); + if (is_vmalloc_or_module_addr((void *)kaddr)) + page = vmalloc_to_page((void *)kaddr); + else + page = virt_to_page(kaddr); + + sg_set_page(sg, page, len, off); + sgtable->nents++; + sg++; + sg_max--; + + len -= seg; + kaddr += PAGE_SIZE; + off = 0; + } while (len > 0 && sg_max > 0); + + if (maxsize <= 0 || sg_max == 0) + break; + start = 0; + } + + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/* + * Extract up to sg_max folios from an XARRAY-type iterator and add them to + * the scatterlist. The pages are not pinned. + */ +static ssize_t netfs_extract_xarray_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + struct xarray *xa = iter->xarray; + struct folio *folio; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t offset, len; + XA_STATE(xas, xa, index); + + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + if (xas_retry(&xas, folio)) + continue; + if (WARN_ON(xa_is_value(folio))) + break; + if (WARN_ON(folio_test_hugetlb(folio))) + break; + + offset = offset_in_folio(folio, start); + len = min_t(size_t, maxsize, folio_size(folio) - offset); + + sg_set_page(sg, folio_page(folio, 0), len, offset); + sgtable->nents++; + sg++; + sg_max--; + + maxsize -= len; + ret += len; + if (maxsize <= 0 || sg_max == 0) + break; + } + + rcu_read_unlock(); + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/** + * netfs_extract_iter_to_sg - Extract pages from an iterator and add ot an sglist + * @iter: The iterator to extract from + * @maxsize: The amount of iterator to copy + * @sgtable: The scatterlist table to fill in + * @sg_max: Maximum number of elements in @sgtable that may be filled + * @gup_flags: Direction indicator and additional flags + * + * Extract the page fragments from the given amount of the source iterator and + * add them to a scatterlist that refers to all of those bits, to a maximum + * addition of @sg_max elements. + * + * The pages referred to by UBUF- and IOVEC-type iterators are extracted and + * pinned; BVEC-, KVEC- and XARRAY-type are extracted but aren't pinned; PIPE- + * and DISCARD-type are not supported. + * + * No end mark is placed on the scatterlist; that's left to the caller. + * + * @gup_flags should indicate FOLL_SOURCE_BUF or FOLL_DEST_BUF plus any + * additional flags needed. + * + * If successul, @sgtable->nents is updated to include the number of elements + * added and the number of bytes added is returned. @sgtable->orig_nents is + * left unaltered. + * + * The iov_iter_extract_mode() function should be used to query how cleanup + * should be performed. + */ +ssize_t netfs_extract_iter_to_sg(struct iov_iter *iter, size_t maxsize, + struct sg_table *sgtable, unsigned int sg_max, + unsigned int gup_flags) +{ + if (maxsize == 0) + return 0; + + switch (iov_iter_type(iter)) { + case ITER_UBUF: + case ITER_IOVEC: + return netfs_extract_user_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + case ITER_BVEC: + return netfs_extract_bvec_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + case ITER_KVEC: + return netfs_extract_kvec_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + case ITER_XARRAY: + return netfs_extract_xarray_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + default: + pr_err("netfs_extract_iter_to_sg(%u) unsupported\n", + iov_iter_type(iter)); + WARN_ON_ONCE(1); + return -EIO; + } +} +EXPORT_SYMBOL_GPL(netfs_extract_iter_to_sg); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index a45757dd382d..2493df855f05 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -298,6 +298,10 @@ void netfs_put_subrequest(struct netfs_io_subrequest *subreq, void netfs_stats_show(struct seq_file *); ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, struct iov_iter *new, unsigned int gup_flags); +struct sg_table; +ssize_t netfs_extract_iter_to_sg(struct iov_iter *iter, size_t len, + struct sg_table *sgtable, unsigned int sg_max, + unsigned int gup_flags); /** * netfs_inode - Get the netfs inode context from the inode diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ca71de7c9d77..61f5bec0f2b6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -656,6 +656,7 @@ int is_vmalloc_or_module_addr(const void *x) #endif return is_vmalloc_addr(x); } +EXPORT_SYMBOL_GPL(is_vmalloc_or_module_addr); /* * Walk a vmap address to the struct page it maps. Huge vmap mappings will From patchwork Mon Jan 16 23:09:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80355C46467 for ; Mon, 16 Jan 2023 23:13:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235386AbjAPXNG (ORCPT ); Mon, 16 Jan 2023 18:13:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235355AbjAPXMX (ORCPT ); Mon, 16 Jan 2023 18:12:23 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 494A62D163 for ; Mon, 16 Jan 2023 15:09:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kKmnjesqbVcK16NFYaezZHmG2fXFwkY2y+msB04GC+4=; b=X5pJx92RUSqw7kr5bZAYndiyfuO1A1PwqxrwqJlvSAvM9JfFNDvICYXbrFYlpB/r4IiYs/ BkdRWiUNOmhMD04HZoenxdE2n7Zbgrh5At2nrbEW80VUcCUGiUv2V13/4ImhO3bHNHJFvm sFrYo/4UXb5UFbnIBmSg55rW+2hrKoo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-523-Dg-7CkFxNXKEqeYPM6L_Fg-1; Mon, 16 Jan 2023 18:09:52 -0500 X-MC-Unique: Dg-7CkFxNXKEqeYPM6L_Fg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7ADA62A59560; Mon, 16 Jan 2023 23:09:51 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 181692026D4B; Mon, 16 Jan 2023 23:09:50 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 15/34] af_alg: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Herbert Xu , linux-crypto@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:49 +0000 Message-ID: <167391058954.2311931.2012230616335750882.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert AF_ALG to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Herbert Xu cc: linux-crypto@vger.kernel.org --- crypto/af_alg.c | 9 ++++++--- include/crypto/if_alg.h | 1 + 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 7a68db157fae..c99e09fce71f 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -534,15 +534,18 @@ static const struct net_proto_family alg_family = { int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, unsigned int gup_flags) { + struct page **pages = sgl->pages; size_t off; ssize_t n; int npages, i; - n = iov_iter_get_pages(iter, sgl->pages, len, ALG_MAX_PAGES, &off, - gup_flags); + n = iov_iter_extract_pages(iter, &pages, len, ALG_MAX_PAGES, + gup_flags, &off); if (n < 0) return n; + sgl->cleanup_mode = iov_iter_extract_mode(iter, gup_flags); + npages = DIV_ROUND_UP(off + n, PAGE_SIZE); if (WARN_ON(npages == 0)) return -EINVAL; @@ -576,7 +579,7 @@ void af_alg_free_sg(struct af_alg_sgl *sgl) int i; for (i = 0; i < sgl->npages; i++) - put_page(sgl->pages[i]); + page_put_unpin(sgl->pages[i], sgl->cleanup_mode); } EXPORT_SYMBOL_GPL(af_alg_free_sg); diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 12058ab6cad9..95b3b7517d3f 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -61,6 +61,7 @@ struct af_alg_sgl { struct scatterlist sg[ALG_MAX_PAGES + 1]; struct page *pages[ALG_MAX_PAGES]; unsigned int npages; + unsigned int cleanup_mode; }; /* TX SGL entry */ From patchwork Mon Jan 16 23:09:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103888 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF0BCC54EBE for ; Mon, 16 Jan 2023 23:14:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235343AbjAPXOE (ORCPT ); Mon, 16 Jan 2023 18:14:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235224AbjAPXNF (ORCPT ); Mon, 16 Jan 2023 18:13:05 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45C7F2B605 for ; Mon, 16 Jan 2023 15:10:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PjIAiip7N5L8hcYwP5P+y/gD9B8I5Bbt/wX0DBUObE8=; b=c6guEuPVRuhwAhXvYOuq7F4vyCOAAE9kQuCeHyYNCIAeKvq0V6ZOAXtwJczK7u4u3XZlWl iXLkeeW+BDGzIABjJCZut4z+ByjdJRfPyt6qZv0lh89a8N73BbOOMgyRQEHpvtOSUXfjkZ 4cIH/0pGdp40eGgt8co3dKyIGMDf6fs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-464-BpfiqjMTNrS7NCWxGctuPg-1; Mon, 16 Jan 2023 18:09:59 -0500 X-MC-Unique: BpfiqjMTNrS7NCWxGctuPg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A3942101AA78; Mon, 16 Jan 2023 23:09:58 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E19E2166B26; Mon, 16 Jan 2023 23:09:57 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 16/34] af_alg: [RFC] Use netfs_extract_iter_to_sg() to create scatterlists From: David Howells To: Al Viro Cc: Herbert Xu , linux-crypto@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:56 +0000 Message-ID: <167391059663.2311931.12037449511418464282.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Use netfs_extract_iter_to_sg() to decant the destination iterator into a scatterlist in af_alg_get_rsgl(). af_alg_make_sg() can then be removed. Note that if this fits, netfs_extract_iter_to_sg() should move to core code. Signed-off-by: David Howells cc: Herbert Xu cc: linux-crypto@vger.kernel.org --- crypto/af_alg.c | 63 +++++++++++++---------------------------------- crypto/algif_hash.c | 21 +++++++++++----- include/crypto/if_alg.h | 7 +---- 3 files changed, 35 insertions(+), 56 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index c99e09fce71f..c5fbe39366ff 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -531,55 +532,22 @@ static const struct net_proto_family alg_family = { .owner = THIS_MODULE, }; -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, - unsigned int gup_flags) -{ - struct page **pages = sgl->pages; - size_t off; - ssize_t n; - int npages, i; - - n = iov_iter_extract_pages(iter, &pages, len, ALG_MAX_PAGES, - gup_flags, &off); - if (n < 0) - return n; - - sgl->cleanup_mode = iov_iter_extract_mode(iter, gup_flags); - - npages = DIV_ROUND_UP(off + n, PAGE_SIZE); - if (WARN_ON(npages == 0)) - return -EINVAL; - /* Add one extra for linking */ - sg_init_table(sgl->sg, npages + 1); - - for (i = 0, len = n; i < npages; i++) { - int plen = min_t(int, len, PAGE_SIZE - off); - - sg_set_page(sgl->sg + i, sgl->pages[i], plen, off); - - off = 0; - len -= plen; - } - sg_mark_end(sgl->sg + npages - 1); - sgl->npages = npages; - - return n; -} -EXPORT_SYMBOL_GPL(af_alg_make_sg); - static void af_alg_link_sg(struct af_alg_sgl *sgl_prev, struct af_alg_sgl *sgl_new) { - sg_unmark_end(sgl_prev->sg + sgl_prev->npages - 1); - sg_chain(sgl_prev->sg, sgl_prev->npages + 1, sgl_new->sg); + sg_unmark_end(sgl_prev->sgt.sgl + sgl_prev->sgt.nents - 1); + sg_chain(sgl_prev->sgt.sgl, sgl_prev->sgt.nents + 1, sgl_new->sgt.sgl); } void af_alg_free_sg(struct af_alg_sgl *sgl) { int i; - for (i = 0; i < sgl->npages; i++) - page_put_unpin(sgl->pages[i], sgl->cleanup_mode); + if (!(sgl->cleanup_mode & (FOLL_PIN | FOLL_GET))) + return; + + for (i = 0; i < sgl->sgt.nents; i++) + page_put_unpin(sg_page(&sgl->sgt.sgl[i]), sgl->cleanup_mode); } EXPORT_SYMBOL_GPL(af_alg_free_sg); @@ -1293,8 +1261,8 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, while (maxsize > len && msg_data_left(msg)) { struct af_alg_rsgl *rsgl; + ssize_t err; size_t seglen; - int err; /* limit the amount of readable buffers */ if (!af_alg_readable(sk)) @@ -1311,17 +1279,22 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, return -ENOMEM; } - rsgl->sgl.npages = 0; + rsgl->sgl.sgt.sgl = rsgl->sgl.sgl; + rsgl->sgl.sgt.nents = 0; + rsgl->sgl.sgt.orig_nents = 0; list_add_tail(&rsgl->list, &areq->rsgl_list); - /* make one iovec available as scatterlist */ - err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen, - FOLL_DEST_BUF); + err = netfs_extract_iter_to_sg(&msg->msg_iter, seglen, + &rsgl->sgl.sgt, ALG_MAX_PAGES, + FOLL_DEST_BUF); if (err < 0) { rsgl->sg_num_bytes = 0; return err; } + rsgl->sgl.cleanup_mode = iov_iter_extract_mode(&msg->msg_iter, + FOLL_DEST_BUF); + /* chain the new scatterlist with previous one */ if (areq->last_rsgl) af_alg_link_sg(&areq->last_rsgl->sgl, &rsgl->sgl); diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index fe3d2258145f..5aef6818a9ff 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -14,6 +14,7 @@ #include #include #include +#include #include struct hash_ctx { @@ -91,14 +92,22 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg, if (len > limit) len = limit; - len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len, - FOLL_SOURCE_BUF); + ctx->sgl.sgt.sgl = ctx->sgl.sgl; + ctx->sgl.sgt.nents = 0; + ctx->sgl.sgt.orig_nents = 0; + + len = netfs_extract_iter_to_sg(&msg->msg_iter, len, + &ctx->sgl.sgt, ALG_MAX_PAGES, + FOLL_SOURCE_BUF); if (len < 0) { err = copied ? 0 : len; goto unlock; } - ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, NULL, len); + ctx->sgl.cleanup_mode = iov_iter_extract_mode(&msg->msg_iter, + FOLL_SOURCE_BUF); + + ahash_request_set_crypt(&ctx->req, ctx->sgl.sgt.sgl, NULL, len); err = crypto_wait_req(crypto_ahash_update(&ctx->req), &ctx->wait); @@ -142,8 +151,8 @@ static ssize_t hash_sendpage(struct socket *sock, struct page *page, flags |= MSG_MORE; lock_sock(sk); - sg_init_table(ctx->sgl.sg, 1); - sg_set_page(ctx->sgl.sg, page, size, offset); + sg_init_table(ctx->sgl.sgl, 1); + sg_set_page(ctx->sgl.sgl, page, size, offset); if (!(flags & MSG_MORE)) { err = hash_alloc_result(sk, ctx); @@ -152,7 +161,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct page *page, } else if (!ctx->more) hash_free_result(sk, ctx); - ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, ctx->result, size); + ahash_request_set_crypt(&ctx->req, ctx->sgl.sgl, ctx->result, size); if (!(flags & MSG_MORE)) { if (ctx->more) diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 95b3b7517d3f..424a2071705d 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -58,9 +58,8 @@ struct af_alg_type { }; struct af_alg_sgl { - struct scatterlist sg[ALG_MAX_PAGES + 1]; - struct page *pages[ALG_MAX_PAGES]; - unsigned int npages; + struct sg_table sgt; + struct scatterlist sgl[ALG_MAX_PAGES + 1]; unsigned int cleanup_mode; }; @@ -166,8 +165,6 @@ int af_alg_release(struct socket *sock); void af_alg_release_parent(struct sock *sk); int af_alg_accept(struct sock *sk, struct socket *newsock, bool kern); -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, - unsigned int gup_flags); void af_alg_free_sg(struct af_alg_sgl *sgl); static inline struct alg_sock *alg_sk(struct sock *sk) From patchwork Mon Jan 16 23:10:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75B3BC46467 for ; Mon, 16 Jan 2023 23:14:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235394AbjAPXON (ORCPT ); Mon, 16 Jan 2023 18:14:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235424AbjAPXNl (ORCPT ); Mon, 16 Jan 2023 18:13:41 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFC662B60C for ; Mon, 16 Jan 2023 15:10:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EU0pOV6WBLH3AThSCCtWqw1z4qNF39IxpGe04ts73SI=; b=XdRv4DEXFrqGvqhWwAn30UJi3or9rwiD6o4vjlcozaTcNv5BxlR4a8onAWQmKNw1ru0rF/ Fz5wsCjLuaIYfhJkZX47DRkZIQeO1DgeshD1VN+2yGYn1orXwwtTF2pLVdUl86JINTInkc ziOfC52IGIWlBOr+zoCovCbBROei9iw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-185-RByxuKzUOKioXzZbR3mJUw-1; Mon, 16 Jan 2023 18:10:06 -0500 X-MC-Unique: RByxuKzUOKioXzZbR3mJUw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0ACC71C0432B; Mon, 16 Jan 2023 23:10:06 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 57C3C1121319; Mon, 16 Jan 2023 23:10:04 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 17/34] scsi: [RFC] Use netfs_extract_iter_to_sg() From: David Howells To: Al Viro Cc: "James E.J. Bottomley" , "Martin K. Petersen" , Christoph Hellwig , linux-scsi@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:03 +0000 Message-ID: <167391060380.2311931.5962669831677025433.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Use netfs_extract_iter_to_sg() to build a scatterlist from an iterator. Note that if this fits, netfs_extract_iter_to_sg() should move to core code. Signed-off-by: David Howells cc: James E.J. Bottomley cc: Martin K. Petersen cc: Christoph Hellwig cc: linux-scsi@vger.kernel.org --- drivers/vhost/scsi.c | 78 +++++++++++++++----------------------------------- 1 file changed, 23 insertions(+), 55 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 5d10837d19ec..af897cc4036d 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "vhost.h" @@ -75,6 +76,9 @@ struct vhost_scsi_cmd { u32 tvc_prot_sgl_count; /* Saved unpacked SCSI LUN for vhost_scsi_target_queue_cmd() */ u32 tvc_lun; + /* Cleanup modes for scatterlists */ + unsigned int tvc_cleanup_mode; + unsigned int tvc_prot_cleanup_mode; /* Pointer to the SGL formatted memory from virtio-scsi */ struct scatterlist *tvc_sgl; struct scatterlist *tvc_prot_sgl; @@ -339,11 +343,13 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd) if (tv_cmd->tvc_sgl_count) { for (i = 0; i < tv_cmd->tvc_sgl_count; i++) - put_page(sg_page(&tv_cmd->tvc_sgl[i])); + page_put_unpin(sg_page(&tv_cmd->tvc_sgl[i]), + tv_cmd->tvc_cleanup_mode); } if (tv_cmd->tvc_prot_sgl_count) { for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++) - put_page(sg_page(&tv_cmd->tvc_prot_sgl[i])); + page_put_unpin(sg_page(&tv_cmd->tvc_prot_sgl[i]), + tv_cmd->tvc_prot_cleanup_mode); } sbitmap_clear_bit(&svq->scsi_tags, se_cmd->map_tag); @@ -631,41 +637,6 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg, return cmd; } -/* - * Map a user memory range into a scatterlist - * - * Returns the number of scatterlist entries used or -errno on error. - */ -static int -vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd, - struct iov_iter *iter, - struct scatterlist *sgl, - bool write) -{ - struct page **pages = cmd->tvc_upages; - struct scatterlist *sg = sgl; - ssize_t bytes; - size_t offset; - unsigned int npages = 0, gup_flags = 0; - - gup_flags |= write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; - - bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - VHOST_SCSI_PREALLOC_UPAGES, &offset, - gup_flags); - /* No pages were pinned */ - if (bytes <= 0) - return bytes < 0 ? bytes : -EFAULT; - - while (bytes) { - unsigned n = min_t(unsigned, PAGE_SIZE - offset, bytes); - sg_set_page(sg++, pages[npages++], n, offset); - bytes -= n; - offset = 0; - } - return npages; -} - static int vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls) { @@ -689,24 +660,19 @@ vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls) static int vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool write, struct iov_iter *iter, - struct scatterlist *sg, int sg_count) + struct scatterlist *sg, int sg_count, + unsigned int *cleanup_mode) { - struct scatterlist *p = sg; - int ret; + struct sg_table sgt = { .sgl = sg }; + unsigned int gup_flags = write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + ssize_t ret; - while (iov_iter_count(iter)) { - ret = vhost_scsi_map_to_sgl(cmd, iter, sg, write); - if (ret < 0) { - while (p < sg) { - struct page *page = sg_page(p++); - if (page) - put_page(page); - } - return ret; - } - sg += ret; - } - return 0; + ret = netfs_extract_iter_to_sg(iter, LONG_MAX, &sgt, sg_count, gup_flags); + if (ret > 0) + sg_mark_end(sg + sgt.nents - 1); + + *cleanup_mode = iov_iter_extract_mode(iter, gup_flags); + return ret; } static int @@ -730,7 +696,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd, ret = vhost_scsi_iov_to_sgl(cmd, write, prot_iter, cmd->tvc_prot_sgl, - cmd->tvc_prot_sgl_count); + cmd->tvc_prot_sgl_count, + &cmd->tvc_prot_cleanup_mode); if (ret < 0) { cmd->tvc_prot_sgl_count = 0; return ret; @@ -747,7 +714,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd, cmd->tvc_sgl, cmd->tvc_sgl_count); ret = vhost_scsi_iov_to_sgl(cmd, write, data_iter, - cmd->tvc_sgl, cmd->tvc_sgl_count); + cmd->tvc_sgl, cmd->tvc_sgl_count, + &cmd->tvc_cleanup_mode); if (ret < 0) { cmd->tvc_sgl_count = 0; return ret; From patchwork Mon Jan 16 23:10:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103890 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4DFDC54EBE for ; Mon, 16 Jan 2023 23:14:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235409AbjAPXOf (ORCPT ); Mon, 16 Jan 2023 18:14:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235296AbjAPXNy (ORCPT ); Mon, 16 Jan 2023 18:13:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 966C923C46 for ; Mon, 16 Jan 2023 15:10:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910616; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jarh3eMoc9hZrrF4c4ACwfJRYikpAl6+h2t1z40q5Pk=; b=VkqXekKWGlgvPHdiBnhSr1uO6KEveFBM2oxV/+dJjLq7rcsDU8WgiE/XuuYpnB2U5wi6Tk sf4JQ2BaOXhi+/H6xFaALl9VlwxArqsb++EzLwnRDzly840lvZpi2v+6TEPeSYiMKEdFO3 U/2YIVeg8JDk2FTDC0eUWaQYniav/yQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-193-WvNlc5M6Mx-qoev1bLXoGA-1; Mon, 16 Jan 2023 18:10:13 -0500 X-MC-Unique: WvNlc5M6Mx-qoev1bLXoGA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F0A42A59569; Mon, 16 Jan 2023 23:10:13 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B36BE53AA; Mon, 16 Jan 2023 23:10:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 18/34] dio: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:11 +0000 Message-ID: <167391061117.2311931.16807283804788007499.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert the generic direct-I/O code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org --- fs/direct-io.c | 57 ++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index b1e26a706e31..b4d2c9f85a5b 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -142,9 +142,11 @@ struct dio { /* * pages[] (and any fields placed after it) are not zeroed out at - * allocation time. Don't add new fields after pages[] unless you - * wish that they not be zeroed. + * allocation time. Don't add new fields after pages[] unless you wish + * that they not be zeroed. Pages may have a ref taken, a pin emplaced + * or no retention measures. */ + unsigned int cleanup_mode; /* How pages should be cleaned up (0/FOLL_GET/PIN) */ union { struct page *pages[DIO_PAGES]; /* page buffer */ struct work_struct complete_work;/* deferred AIO completion */ @@ -167,12 +169,13 @@ static inline unsigned dio_pages_present(struct dio_submit *sdio) static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) { const enum req_op dio_op = dio->opf & REQ_OP_MASK; + unsigned int gup_flags = + op_is_write(dio_op) ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + struct page **pages = dio->pages; ssize_t ret; - ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from, - op_is_write(dio_op) ? - FOLL_SOURCE_BUF : FOLL_DEST_BUF); + ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX, DIO_PAGES, + gup_flags, &sdio->from); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { struct page *page = ZERO_PAGE(0); @@ -183,7 +186,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) */ if (dio->page_errors == 0) dio->page_errors = ret; - get_page(page); + dio->cleanup_mode = 0; dio->pages[0] = page; sdio->head = 0; sdio->tail = 1; @@ -197,6 +200,8 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) sdio->head = 0; sdio->tail = (ret + PAGE_SIZE - 1) / PAGE_SIZE; sdio->to = ((ret - 1) & (PAGE_SIZE - 1)) + 1; + dio->cleanup_mode = + iov_iter_extract_mode(sdio->iter, gup_flags); return 0; } return ret; @@ -400,6 +405,10 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, * we request a valid number of vectors. */ bio = bio_alloc(bdev, nr_vecs, dio->opf, GFP_KERNEL); + if (!(dio->cleanup_mode & FOLL_GET)) + bio_clear_flag(bio, BIO_PAGE_REFFED); + if (dio->cleanup_mode & FOLL_PIN) + bio_set_flag(bio, BIO_PAGE_PINNED); bio->bi_iter.bi_sector = first_sector; if (dio->is_async) bio->bi_end_io = dio_bio_end_aio; @@ -443,13 +452,18 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) sdio->logical_offset_in_bio = 0; } +static void dio_cleanup_page(struct dio *dio, struct page *page) +{ + page_put_unpin(page, dio->cleanup_mode); +} + /* * Release any resources in case of a failure */ static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio) { while (sdio->head < sdio->tail) - put_page(dio->pages[sdio->head++]); + dio_cleanup_page(dio, dio->pages[sdio->head++]); } /* @@ -704,7 +718,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio, * * Return zero on success. Non-zero means the caller needs to start a new BIO. */ -static inline int dio_bio_add_page(struct dio_submit *sdio) +static inline int dio_bio_add_page(struct dio *dio, struct dio_submit *sdio) { int ret; @@ -771,11 +785,11 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio, goto out; } - if (dio_bio_add_page(sdio) != 0) { + if (dio_bio_add_page(dio, sdio) != 0) { dio_bio_submit(dio, sdio); ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh); if (ret == 0) { - ret = dio_bio_add_page(sdio); + ret = dio_bio_add_page(dio, sdio); BUG_ON(ret != 0); } } @@ -832,13 +846,16 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, */ if (sdio->cur_page) { ret = dio_send_cur_page(dio, sdio, map_bh); - put_page(sdio->cur_page); + dio_cleanup_page(dio, sdio->cur_page); sdio->cur_page = NULL; if (ret) return ret; } - get_page(page); /* It is in dio */ + ret = try_grab_page(page, dio->cleanup_mode); /* It is in dio */ + if (ret < 0) + return ret; + sdio->cur_page = page; sdio->cur_page_offset = offset; sdio->cur_page_len = len; @@ -853,7 +870,7 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, ret = dio_send_cur_page(dio, sdio, map_bh); if (sdio->bio) dio_bio_submit(dio, sdio); - put_page(sdio->cur_page); + dio_cleanup_page(dio, sdio->cur_page); sdio->cur_page = NULL; } return ret; @@ -954,7 +971,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, ret = get_more_blocks(dio, sdio, map_bh); if (ret) { - put_page(page); + dio_cleanup_page(dio, page); goto out; } if (!buffer_mapped(map_bh)) @@ -999,7 +1016,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, /* AKPM: eargh, -ENOTBLK is a hack */ if (dio_op == REQ_OP_WRITE) { - put_page(page); + dio_cleanup_page(dio, page); return -ENOTBLK; } @@ -1012,7 +1029,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, if (sdio->block_in_file >= i_size_aligned >> blkbits) { /* We hit eof */ - put_page(page); + dio_cleanup_page(dio, page); goto out; } zero_user(page, from, 1 << blkbits); @@ -1052,7 +1069,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, sdio->next_block_for_io, map_bh); if (ret) { - put_page(page); + dio_cleanup_page(dio, page); goto out; } sdio->next_block_for_io += this_chunk_blocks; @@ -1068,7 +1085,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, } /* Drop the ref which was taken in get_user_pages() */ - put_page(page); + dio_cleanup_page(dio, page); } out: return ret; @@ -1288,7 +1305,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, ret2 = dio_send_cur_page(dio, &sdio, &map_bh); if (retval == 0) retval = ret2; - put_page(sdio.cur_page); + dio_cleanup_page(dio, sdio.cur_page); sdio.cur_page = NULL; } if (sdio.bio) From patchwork Mon Jan 16 23:10:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BED8BC54EBE for ; Mon, 16 Jan 2023 23:15:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235382AbjAPXPU (ORCPT ); Mon, 16 Jan 2023 18:15:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235406AbjAPXOW (ORCPT ); Mon, 16 Jan 2023 18:14:22 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D72712BECE for ; Mon, 16 Jan 2023 15:10:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910626; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hzJ1pMyiJ+RsXEy2r6s2nQM3QoHbxUG3+tLaUVatkNA=; b=JEIJnZH2RtRFht3O4/QsA+ekFK8W1s4gu6DxNY/Gou88cFx14JFdanyV4JSmxGrM9ndcyM CH+AZDGp9fh/1C88KTri+g7IKXL+xdnyM1QTTydbp2Ems7lEJUTe8MkDXnv9mn/eh7rv+J 7lA3X/ZhwQHtOI1reNQG317+Ap0v/H4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-578-oNGc_qOmMcuLavBHLMbwog-1; Mon, 16 Jan 2023 18:10:20 -0500 X-MC-Unique: oNGc_qOmMcuLavBHLMbwog-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4A6583806628; Mon, 16 Jan 2023 23:10:20 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id C70FC40C2064; Mon, 16 Jan 2023 23:10:18 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 19/34] fuse: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Miklos Szeredi , Christoph Hellwig , linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:18 +0000 Message-ID: <167391061826.2311931.4301280201217181104.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert the fuse code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Miklos Szeredi cc: Al Viro cc: Christoph Hellwig cc: linux-fsdevel@vger.kernel.org --- fs/fuse/dev.c | 25 +++++++++++++++++++------ fs/fuse/file.c | 26 ++++++++++++++++++-------- fs/fuse/fuse_i.h | 1 + 3 files changed, 38 insertions(+), 14 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e3d8443e24a6..107497e68726 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -641,6 +641,7 @@ static int unlock_request(struct fuse_req *req) struct fuse_copy_state { int write; + unsigned int cleanup_mode; /* Page cleanup mode (0/FOLL_GET/PIN) */ struct fuse_req *req; struct iov_iter *iter; struct pipe_buffer *pipebufs; @@ -661,6 +662,11 @@ static void fuse_copy_init(struct fuse_copy_state *cs, int write, cs->iter = iter; } +static void fuse_release_copy_page(struct fuse_copy_state *cs, struct page *page) +{ + page_put_unpin(page, cs->cleanup_mode); +} + /* Unmap and put previous page of userspace buffer */ static void fuse_copy_finish(struct fuse_copy_state *cs) { @@ -675,7 +681,7 @@ static void fuse_copy_finish(struct fuse_copy_state *cs) flush_dcache_page(cs->pg); set_page_dirty_lock(cs->pg); } - put_page(cs->pg); + fuse_release_copy_page(cs, cs->pg); } cs->pg = NULL; } @@ -704,6 +710,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) BUG_ON(!cs->nr_segs); cs->currbuf = buf; + cs->cleanup_mode = FOLL_GET; cs->pg = buf->page; cs->offset = buf->offset; cs->len = buf->len; @@ -722,6 +729,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) buf->len = 0; cs->currbuf = buf; + cs->cleanup_mode = FOLL_GET; cs->pg = page; cs->offset = 0; cs->len = PAGE_SIZE; @@ -729,15 +737,18 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) cs->nr_segs++; } } else { + unsigned int gup_flags = cs->write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + struct page **pages = &cs->pg; size_t off; - err = iov_iter_get_pages(cs->iter, &page, PAGE_SIZE, 1, &off, - cs->write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); + + err = iov_iter_extract_pages(cs->iter, &pages, PAGE_SIZE, 1, + gup_flags, &off); if (err < 0) return err; BUG_ON(!err); cs->len = err; cs->offset = off; - cs->pg = page; + cs->cleanup_mode = iov_iter_extract_mode(cs->iter, gup_flags); } return lock_request(cs->req); @@ -899,10 +910,12 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; - get_page(page); + err = try_grab_page(page, cs->cleanup_mode); + if (err < 0) + return err; err = unlock_request(cs->req); if (err) { - put_page(page); + fuse_release_copy_page(cs, page); return err; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 68c196437306..c317300e757a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -624,6 +624,11 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, args->out_args[0].size = count; } +static void fuse_release_page(struct fuse_args_pages *ap, struct page *page) +{ + page_put_unpin(page, ap->cleanup_mode); +} + static void fuse_release_user_pages(struct fuse_args_pages *ap, bool should_dirty) { @@ -632,7 +637,7 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap, for (i = 0; i < ap->num_pages; i++) { if (should_dirty) set_page_dirty_lock(ap->pages[i]); - put_page(ap->pages[i]); + fuse_release_page(ap, ap->pages[i]); } } @@ -920,7 +925,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, else SetPageError(page); unlock_page(page); - put_page(page); + fuse_release_page(ap, page); } if (ia->ff) fuse_file_put(ia->ff, false, false); @@ -1153,7 +1158,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, } if (ia->write.page_locked && (i == ap->num_pages - 1)) unlock_page(page); - put_page(page); + fuse_release_page(ap, page); } return err; @@ -1172,6 +1177,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, ap->args.in_pages = true; ap->descs[0].offset = offset; + ap->cleanup_mode = FOLL_GET; do { size_t tmp; @@ -1200,7 +1206,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, if (!tmp) { unlock_page(page); - put_page(page); + fuse_release_page(ap, page); goto again; } @@ -1393,9 +1399,12 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, size_t *nbytesp, int write, unsigned int max_pages) { + unsigned int gup_flags = write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; size_t nbytes = 0; /* # bytes already packed in req */ ssize_t ret = 0; + ap->cleanup_mode = iov_iter_extract_mode(ii, gup_flags); + /* Special case for kernel I/O: can copy directly into the buffer */ if (iov_iter_is_kvec(ii)) { unsigned long user_addr = fuse_get_user_addr(ii); @@ -1412,12 +1421,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, } while (nbytes < *nbytesp && ap->num_pages < max_pages) { + struct page **pages = &ap->pages[ap->num_pages]; unsigned npages; size_t start; - ret = iov_iter_get_pages(ii, &ap->pages[ap->num_pages], - *nbytesp - nbytes, - max_pages - ap->num_pages, - &start, write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); + ret = iov_iter_extract_pages(ii, &pages, + *nbytesp - nbytes, + max_pages - ap->num_pages, + gup_flags, &start); if (ret < 0) break; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index c673faefdcb9..7b6be1dd7593 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -271,6 +271,7 @@ struct fuse_args_pages { struct page **pages; struct fuse_page_desc *descs; unsigned int num_pages; + unsigned int cleanup_mode; }; #define FUSE_ARGS(args) struct fuse_args args = {} From patchwork Mon Jan 16 23:10:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103892 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D738C54EBE for ; Mon, 16 Jan 2023 23:15:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235401AbjAPXP2 (ORCPT ); Mon, 16 Jan 2023 18:15:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235330AbjAPXOq (ORCPT ); Mon, 16 Jan 2023 18:14:46 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D21E30190 for ; Mon, 16 Jan 2023 15:10:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910631; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QRj9zo8/4XGZMXr1F9brjuZIjMqL4+pPYiSkNjAxJaI=; b=EFkKm63Itofdn7Ab6jJYf3FpYtzOi4nJNbquwPl/bJ9gEDXnshUHve4Fgkk57tnuEhJ/3L tiONZAd9yBe8ax/p7YSRt4vb5aDYIQ6KtgryMQSwSa4G+WETntyLGDANw9JxCMMLEcQBzU XfIjSDgmdAPa9D1nabsN709E+/ocnO8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-R7KYRgH-NsqGjrjFzNEidA-1; Mon, 16 Jan 2023 18:10:27 -0500 X-MC-Unique: R7KYRgH-NsqGjrjFzNEidA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4611F2A59556; Mon, 16 Jan 2023 23:10:27 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id F21AD51FF; Mon, 16 Jan 2023 23:10:25 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 20/34] vfs: Make splice use iov_iter_extract_pages() From: David Howells To: Al Viro Cc: Christoph Hellwig , Matthew Wilcox , linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:25 +0000 Message-ID: <167391062544.2311931.15195962488932892568.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Make splice's iter_to_pipe() use iov_iter_extract_pages(). Splice requests will rejected if the request if the cleanup mode is going to be anything other than put_pages() since we're going to be attaching pages from the iterator to a pipe and then returning to the caller, leaving the spliced pages to their fates at some unknown time in the future. Note this will cause some requests to fail that could work before - such as splicing from an XARRAY-type iterator - if there's any way to do it as extraction doesn't take refs or pins on non-user-backed iterators. Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org --- fs/splice.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 19c5b5adc548..c3433266ba1b 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1159,14 +1159,18 @@ static int iter_to_pipe(struct iov_iter *from, size_t total = 0; int ret = 0; + /* For the moment, all pages attached to a pipe must have refs, not pins. */ + if (WARN_ON(iov_iter_extract_mode(from, FOLL_SOURCE_BUF) != FOLL_GET)) + return -EIO; + while (iov_iter_count(from)) { - struct page *pages[16]; + struct page *pages[16], **ppages = pages; ssize_t left; size_t start; int i, n; - left = iov_iter_get_pages(from, pages, ~0UL, 16, &start, - FOLL_SOURCE_BUF); + left = iov_iter_extract_pages(from, &ppages, ~0UL, 16, + FOLL_SOURCE_BUF, &start); if (left <= 0) { ret = left; break; From patchwork Mon Jan 16 23:10:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 079FAC67871 for ; Mon, 16 Jan 2023 23:15:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235293AbjAPXPf (ORCPT ); Mon, 16 Jan 2023 18:15:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235259AbjAPXPC (ORCPT ); Mon, 16 Jan 2023 18:15:02 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8EBC3028D for ; Mon, 16 Jan 2023 15:10:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ca3YM5+ZDMsvf+3X9heZWXQBMUIpoxlz5ebJhHm9RTk=; b=DoaMkf8TPejz9BWZD02+pGjpzUl2+GAk89PFFSgR32ND+fflDUMEbGrY48KL+ZFhXOeuHQ TgEBencnE5ntqOsww0MpTDMN69Amseb1QW535zFO9v+s7qJ5M45lKjq5eZpXFRJC5dXuVT RK4SOKwmL/hFt7g1jyeU1nOuQDS6WNE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-474-t5zTBbIRNZmpiJVxDWfEjg-1; Mon, 16 Jan 2023 18:10:35 -0500 X-MC-Unique: t5zTBbIRNZmpiJVxDWfEjg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B9D3F3C0F42B; Mon, 16 Jan 2023 23:10:34 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE0491121315; Mon, 16 Jan 2023 23:10:32 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 21/34] 9p: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Dominique Martinet , Eric Van Hensbergen , Latchesar Ionkov , Christian Schoenebeck , v9fs-developer@lists.sourceforge.net, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:32 +0000 Message-ID: <167391063242.2311931.3275290816918213423.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert the 9p filesystem to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Dominique Martinet cc: Eric Van Hensbergen cc: Latchesar Ionkov cc: Christian Schoenebeck cc: v9fs-developer@lists.sourceforge.net --- net/9p/trans_common.c | 6 ++- net/9p/trans_common.h | 3 +- net/9p/trans_virtio.c | 89 ++++++++++++++----------------------------------- 3 files changed, 31 insertions(+), 67 deletions(-) diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c index c827f694551c..31d133412677 100644 --- a/net/9p/trans_common.c +++ b/net/9p/trans_common.c @@ -12,13 +12,15 @@ * p9_release_pages - Release pages after the transaction. * @pages: array of pages to be put * @nr_pages: size of array + * @cleanup_mode: How to clean up the pages. */ -void p9_release_pages(struct page **pages, int nr_pages) +void p9_release_pages(struct page **pages, int nr_pages, + unsigned int cleanup_mode) { int i; for (i = 0; i < nr_pages; i++) if (pages[i]) - put_page(pages[i]); + page_put_unpin(pages[i], cleanup_mode); } EXPORT_SYMBOL(p9_release_pages); diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h index 32134db6abf3..9b20eb4f2359 100644 --- a/net/9p/trans_common.h +++ b/net/9p/trans_common.h @@ -4,4 +4,5 @@ * Author Venkateswararao Jujjuri */ -void p9_release_pages(struct page **pages, int nr_pages); +void p9_release_pages(struct page **pages, int nr_pages, + unsigned int cleanup_mode); diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index eb28b54fe5f6..561f7cbd79da 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -310,73 +310,34 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, struct iov_iter *data, int count, size_t *offs, - int *need_drop, + int *cleanup_mode, unsigned int gup_flags) { int nr_pages; int err; + int n; if (!iov_iter_count(data)) return 0; - if (!iov_iter_is_kvec(data)) { - int n; - /* - * We allow only p9_max_pages pinned. We wait for the - * Other zc request to finish here - */ - if (atomic_read(&vp_pinned) >= chan->p9_max_pages) { - err = wait_event_killable(vp_wq, - (atomic_read(&vp_pinned) < chan->p9_max_pages)); - if (err == -ERESTARTSYS) - return err; - } - n = iov_iter_get_pages_alloc(data, pages, count, offs, - gup_flags); - if (n < 0) - return n; - *need_drop = 1; - nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE); - atomic_add(nr_pages, &vp_pinned); - return n; - } else { - /* kernel buffer, no need to pin pages */ - int index; - size_t len; - void *p; - - /* we'd already checked that it's non-empty */ - while (1) { - len = iov_iter_single_seg_count(data); - if (likely(len)) { - p = data->kvec->iov_base + data->iov_offset; - break; - } - iov_iter_advance(data, 0); - } - if (len > count) - len = count; - - nr_pages = DIV_ROUND_UP((unsigned long)p + len, PAGE_SIZE) - - (unsigned long)p / PAGE_SIZE; - - *pages = kmalloc_array(nr_pages, sizeof(struct page *), - GFP_NOFS); - if (!*pages) - return -ENOMEM; - - *need_drop = 0; - p -= (*offs = offset_in_page(p)); - for (index = 0; index < nr_pages; index++) { - if (is_vmalloc_addr(p)) - (*pages)[index] = vmalloc_to_page(p); - else - (*pages)[index] = kmap_to_page(p); - p += PAGE_SIZE; - } - iov_iter_advance(data, len); - return len; + /* + * We allow only p9_max_pages pinned. We wait for the + * Other zc request to finish here + */ + if (atomic_read(&vp_pinned) >= chan->p9_max_pages) { + err = wait_event_killable(vp_wq, + (atomic_read(&vp_pinned) < chan->p9_max_pages)); + if (err == -ERESTARTSYS) + return err; } + + n = iov_iter_extract_pages(data, pages, count, offs, gup_flags); + if (n < 0) + return n; + *cleanup_mode = iov_iter_extract_mode(data, gup_flags); + nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE); + atomic_add(nr_pages, &vp_pinned); + return n; } static void handle_rerror(struct p9_req_t *req, int in_hdr_len, @@ -431,7 +392,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, struct virtio_chan *chan = client->trans; struct scatterlist *sgs[4]; size_t offs; - int need_drop = 0; + int cleanup_mode = 0; int kicked = 0; p9_debug(P9_DEBUG_TRANS, "virtio request\n"); @@ -439,7 +400,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, if (uodata) { __le32 sz; int n = p9_get_mapped_pages(chan, &out_pages, uodata, - outlen, &offs, &need_drop, + outlen, &offs, &cleanup_mode, FOLL_DEST_BUF); if (n < 0) { err = n; @@ -459,7 +420,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, memcpy(&req->tc.sdata[0], &sz, sizeof(sz)); } else if (uidata) { int n = p9_get_mapped_pages(chan, &in_pages, uidata, - inlen, &offs, &need_drop, + inlen, &offs, &cleanup_mode, FOLL_SOURCE_BUF); if (n < 0) { err = n; @@ -546,14 +507,14 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, * Non kernel buffers are pinned, unpin them */ err_out: - if (need_drop) { + if (cleanup_mode) { if (in_pages) { p9_release_pages(in_pages, in_nr_pages); - atomic_sub(in_nr_pages, &vp_pinned); + atomic_sub(in_nr_pages, &vp_pinned, cleanup_mode); } if (out_pages) { p9_release_pages(out_pages, out_nr_pages); - atomic_sub(out_nr_pages, &vp_pinned); + atomic_sub(out_nr_pages, &vp_pinned, cleanup_mode); } /* wakeup anybody waiting for slots to pin pages */ wake_up(&vp_wq); From patchwork Mon Jan 16 23:10:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03C74C54EBE for ; Mon, 16 Jan 2023 23:16:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235426AbjAPXQ0 (ORCPT ); Mon, 16 Jan 2023 18:16:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235309AbjAPXPy (ORCPT ); Mon, 16 Jan 2023 18:15:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB99E2C652 for ; Mon, 16 Jan 2023 15:10:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910652; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=21npy4d/0NS7rpte2Lw8y8mTFb/H6639N8dVp3qG+wk=; b=JCyj93HO0TG7/GsYppRNTYnhjEXvRGavGzLRQxJv4yMuScqRJS44JFZpx4u99Jy7RoQYmS UuOl07Uqam1AvGcG0K9ebePG3x7ATjY3CoURztrM7bKFEyRZpJ/Q/yQc+TZZCJ8kn9qmVM MGbuu8eokzgPzU27Xw4tUZaHOEtOIdw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-480-asss4pKVPImrS-lvo5oIYA-1; Mon, 16 Jan 2023 18:10:42 -0500 X-MC-Unique: asss4pKVPImrS-lvo5oIYA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 096C83C0F42B; Mon, 16 Jan 2023 23:10:42 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D01339D6D; Mon, 16 Jan 2023 23:10:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 22/34] nfs: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Trond Myklebust , Anna Schumaker , Jeff Layton , linux-nfs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:39 +0000 Message-ID: <167391063989.2311931.13252453380684759087.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert the NFS direct I/O code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Trond Myklebust cc: Anna Schumaker cc: Jeff Layton cc: linux-nfs@vger.kernel.org --- fs/nfs/direct.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 42af84685f20..4a3108db2cb6 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -142,11 +142,15 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) return 0; } -static void nfs_direct_release_pages(struct page **pages, unsigned int npages) +static void nfs_direct_release_pages(struct page **pages, unsigned int npages, + unsigned int cleanup_mode) { unsigned int i; - for (i = 0; i < npages; i++) - put_page(pages[i]); + + if (cleanup_mode) { + for (i = 0; i < npages; i++) + page_put_unpin(pages[i], cleanup_mode); + } } void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo, @@ -327,17 +331,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, inode_dio_begin(inode); while (iov_iter_count(iter)) { - struct page **pagevec; + struct page **pagevec = NULL; size_t bytes; size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc(iter, &pagevec, - rsize, &pgbase, - FOLL_DEST_BUF); + result = iov_iter_extract_pages(iter, &pagevec, rsize, INT_MAX, + FOLL_DEST_BUF, &pgbase); if (result < 0) break; - + bytes = result; npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE; for (i = 0; i < npages; i++) { @@ -363,7 +366,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + nfs_direct_release_pages(pagevec, npages, + iov_iter_extract_mode(iter, FOLL_DEST_BUF)); kvfree(pagevec); if (result < 0) break; @@ -787,14 +791,13 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, NFS_I(inode)->write_io += iov_iter_count(iter); while (iov_iter_count(iter)) { - struct page **pagevec; + struct page **pagevec = NULL; size_t bytes; size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc(iter, &pagevec, - wsize, &pgbase, - FOLL_SOURCE_BUF); + result = iov_iter_extract_pages(iter, &pagevec, wsize, INT_MAX, + FOLL_SOURCE_BUF, &pgbase); if (result < 0) break; @@ -831,7 +834,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + nfs_direct_release_pages(pagevec, npages, + iov_iter_extract_mode(iter, FOLL_SOURCE_BUF)); kvfree(pagevec); if (result < 0) break; From patchwork Mon Jan 16 23:10:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3EF2C54EBE for ; Mon, 16 Jan 2023 23:16:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235438AbjAPXQs (ORCPT ); Mon, 16 Jan 2023 18:16:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235341AbjAPXQG (ORCPT ); Mon, 16 Jan 2023 18:16:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15BFD193CB for ; Mon, 16 Jan 2023 15:10:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910655; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YK/GsmtKWjs0Z6+QV9idzb2/DMCcKOjB4/+/VpYvNS4=; b=MsJ6/0JysY6mWWEzwbAMOOgOnM4qS25SH9SB3MxhuCQ9HalvhHqKfv99Gs+6Vr+dwQIQHn yEhxfb24gwSxAWOx7aVyj7eaE5axhb3kMmdlZhogzpaZfoRUPHmx4HKCDXvweiiKYYGHsz rYlHTZrDJ5XJX9T2JQqKbtWv7H0lJB8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-651-cmJBKWPzMk-bODE0brYFcw-1; Mon, 16 Jan 2023 18:10:50 -0500 X-MC-Unique: cmJBKWPzMk-bODE0brYFcw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 66E5B3C0F42D; Mon, 16 Jan 2023 23:10:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B1A8A1121315; Mon, 16 Jan 2023 23:10:47 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 23/34] cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:47 +0000 Message-ID: <167391064717.2311931.7504820268968962092.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Provide cifs_splice_read() to use a bvec rather than an pipe iterator as the latter cannot so easily be split and advanced, which is necessary to pass an iterator down to the bottom levels. Upstream cifs gets around this problem by using iov_iter_get_pages() to prefill the pipe and then passing the list of pages down. This is done by: (1) Bulk-allocate a bunch of pages to carry as much of the requested amount of data as possible, but without overrunning the available slots in the pipe and add them to an ITER_BVEC. (2) Synchronously call ->read_iter() to read into the buffer. (3) Discard any unused pages. (4) Load the remaining pages into the pipe in order and advance the head pointer. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: Al Viro cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/166732028113.3186319.1793644937097301358.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsfs.c | 12 ++++--- fs/cifs/cifsfs.h | 3 ++ fs/cifs/file.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/splice.c | 1 + 4 files changed, 102 insertions(+), 6 deletions(-) diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 10e00c624922..3c57e8b11692 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -1358,7 +1358,7 @@ const struct file_operations cifs_file_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1378,7 +1378,7 @@ const struct file_operations cifs_file_strict_ops = { .fsync = cifs_strict_fsync, .flush = cifs_flush, .mmap = cifs_file_strict_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1398,7 +1398,7 @@ const struct file_operations cifs_file_direct_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = cifs_ioctl, .copy_file_range = cifs_copy_file_range, @@ -1416,7 +1416,7 @@ const struct file_operations cifs_file_nobrl_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1434,7 +1434,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = { .fsync = cifs_strict_fsync, .flush = cifs_flush, .mmap = cifs_file_strict_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1452,7 +1452,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = cifs_ioctl, .copy_file_range = cifs_copy_file_range, diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 63a0ac2b9355..25decebbc478 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -100,6 +100,9 @@ extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to); extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from); extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from); extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from); +extern ssize_t cifs_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags); extern int cifs_flock(struct file *pfile, int cmd, struct file_lock *plock); extern int cifs_lock(struct file *, int, struct file_lock *); extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index d100b9cb8682..f1297386a185 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -5273,3 +5273,95 @@ const struct address_space_operations cifs_addr_ops_smallbuf = { .launder_folio = cifs_launder_folio, .migrate_folio = filemap_migrate_folio, }; + +/* + * Splice data from a file into a pipe. + */ +ssize_t cifs_splice_read(struct file *file, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + LIST_HEAD(pages); + struct iov_iter to; + struct bio_vec *bv; + struct kiocb kiocb; + struct page *page; + unsigned int head; + ssize_t ret; + size_t used, npages, chunk, remain, reclaim; + int i; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + npages = DIV_ROUND_UP(len, PAGE_SIZE); + + bv = kmalloc(array_size(npages, sizeof(bv[0])), GFP_KERNEL); + if (!bv) + return -ENOMEM; + + npages = alloc_pages_bulk_list(GFP_USER, npages, &pages); + if (!npages) { + kfree(bv); + return -ENOMEM; + } + + remain = len = min_t(size_t, len, npages * PAGE_SIZE); + + for (i = 0; i < npages; i++) { + chunk = min_t(size_t, PAGE_SIZE, remain); + page = list_first_entry(&pages, struct page, lru); + list_del_init(&page->lru); + bv[i].bv_page = page; + bv[i].bv_offset = 0; + bv[i].bv_len = chunk; + remain -= chunk; + } + + /* Do the I/O */ + iov_iter_bvec(&to, READ, bv, npages, len); + init_sync_kiocb(&kiocb, file); + kiocb.ki_pos = *ppos; + ret = call_read_iter(file, &kiocb, &to); + + reclaim = npages * PAGE_SIZE; + remain = 0; + if (ret > 0) { + reclaim -= ret; + remain = ret; + *ppos = kiocb.ki_pos; + file_accessed(file); + } else if (ret < 0) { + /* + * callers of ->splice_read() expect -EAGAIN on + * "can't put anything in there", rather than -EFAULT. + */ + if (ret == -EFAULT) + ret = -EAGAIN; + } + + /* Free any pages that didn't get touched at all. */ + for (; reclaim >= PAGE_SIZE; reclaim -= PAGE_SIZE) + __free_page(bv[--npages].bv_page); + + /* Push the remaining pages into the pipe. */ + head = pipe->head; + for (i = 0; i < npages; i++) { + struct pipe_buffer *buf = &pipe->bufs[head & (pipe->ring_size - 1)]; + + chunk = min_t(size_t, remain, PAGE_SIZE); + *buf = (struct pipe_buffer) { + .ops = &default_pipe_buf_ops, + .page = bv[i].bv_page, + .offset = 0, + .len = chunk, + }; + head++; + remain -= chunk; + } + pipe->head = head; + + kfree(bv); + return ret; +} diff --git a/fs/splice.c b/fs/splice.c index c3433266ba1b..1245ffb64414 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -330,6 +330,7 @@ const struct pipe_buf_operations default_pipe_buf_ops = { .try_steal = generic_pipe_buf_try_steal, .get = generic_pipe_buf_get, }; +EXPORT_SYMBOL(default_pipe_buf_ops); /* Pipe buffer operations for a socket and similar. */ const struct pipe_buf_operations nosteal_pipe_buf_ops = { From patchwork Mon Jan 16 23:10:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54507C54EBE for ; Mon, 16 Jan 2023 23:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235441AbjAPXQw (ORCPT ); Mon, 16 Jan 2023 18:16:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235106AbjAPXQM (ORCPT ); Mon, 16 Jan 2023 18:16:12 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31007244BF for ; Mon, 16 Jan 2023 15:11:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910659; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lvoePorPb+s/SNMB7xx6LbwM0BqIOrroTztDrtCxjAY=; b=D6NbpbR6ITs+cHgFVlZHDTtQWTW3rDMqBOkTMqnZrZsbGcPXgH7RxX1/b++oL3qyoOxfZG UvOw6ndUjsurg6a76ej+7ZaPsqTInJKJfbzizpzPrD30tcdVuHAkVrktR7bCK50R+yILdg yM5Jr2S6Gxy/RsKZ3xAcI/TmeR3pBJI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-193-lT3YyhGPO9uz0g4OFhAvXg-1; Mon, 16 Jan 2023 18:10:57 -0500 X-MC-Unique: lT3YyhGPO9uz0g4OFhAvXg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F272E85CCE0; Mon, 16 Jan 2023 23:10:56 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1AF9E40C6EC4; Mon, 16 Jan 2023 23:10:55 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 24/34] cifs: Add a function to build an RDMA SGE list from an iterator From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Jeff Layton , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-rdma@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:54 +0000 Message-ID: <167391065455.2311931.6594946160942957670.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a function to add elements onto an RDMA SGE list representing page fragments extracted from a BVEC-, KVEC- or XARRAY-type iterator and DMA mapped until the maximum number of elements is reached. Nothing is done to make sure the pages remain present - that must be done by the caller. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/166697256704.61150.17388516338310645808.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732028840.3186319.8512284239779728860.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/smbdirect.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 224 insertions(+) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 3e693ffd0662..78a76752fafd 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -44,6 +44,17 @@ static int smbd_post_send_page(struct smbd_connection *info, static void destroy_mr_list(struct smbd_connection *info); static int allocate_mr_list(struct smbd_connection *info); +struct smb_extract_to_rdma { + struct ib_sge *sge; + unsigned int nr_sge; + unsigned int max_sge; + struct ib_device *device; + u32 local_dma_lkey; + enum dma_data_direction direction; +}; +static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, + struct smb_extract_to_rdma *rdma); + /* SMBD version number */ #define SMBD_V1 0x0100 @@ -2480,3 +2491,216 @@ int smbd_deregister_mr(struct smbd_mr *smbdirect_mr) return rc; } + +static bool smb_set_sge(struct smb_extract_to_rdma *rdma, + struct page *lowest_page, size_t off, size_t len) +{ + struct ib_sge *sge = &rdma->sge[rdma->nr_sge]; + u64 addr; + + addr = ib_dma_map_page(rdma->device, lowest_page, + off, len, rdma->direction); + if (ib_dma_mapping_error(rdma->device, addr)) + return false; + + sge->addr = addr; + sge->length = len; + sge->lkey = rdma->local_dma_lkey; + rdma->nr_sge++; + return true; +} + +/* + * Extract page fragments from a BVEC-class iterator and add them to an RDMA + * element list. The pages are not pinned. + */ +static ssize_t smb_extract_bvec_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + const struct bio_vec *bv = iter->bvec; + unsigned long start = iter->iov_offset; + unsigned int i, sge_max = rdma->max_sge; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + if (!smb_set_sge(rdma, bv[i].bv_page, off, len)) + return -EIO; + sge_max--; + + ret += len; + maxsize -= len; + if (maxsize <= 0 || sge_max == 0) + break; + start = 0; + } + + return ret; +} + +/* + * Extract fragments from a KVEC-class iterator and add them to an RDMA list. + * This can deal with vmalloc'd buffers as well as kmalloc'd or static buffers. + * The pages are not pinned. + */ +static ssize_t smb_extract_kvec_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + const struct kvec *kv = iter->kvec; + unsigned long start = iter->iov_offset; + unsigned int i, sge_max = rdma->max_sge; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + struct page *page; + unsigned long kaddr; + size_t off, len, seg; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + kaddr = (unsigned long)kv[i].iov_base + start; + off = kaddr & ~PAGE_MASK; + len = min_t(size_t, maxsize, len - start); + kaddr &= PAGE_MASK; + + maxsize -= len; + ret += len; + do { + seg = min_t(size_t, len, PAGE_SIZE - off); + + if (is_vmalloc_or_module_addr((void *)kaddr)) + page = vmalloc_to_page((void *)kaddr); + else + page = virt_to_page(kaddr); + + if (!smb_set_sge(rdma, page, off, len)) + return -EIO; + sge_max--; + + len -= seg; + kaddr += PAGE_SIZE; + off = 0; + } while (len > 0 && sge_max > 0); + + if (maxsize <= 0 || sge_max == 0) + break; + start = 0; + } + + return ret; +} + +/* + * Extract folio fragments from an XARRAY-class iterator and add them to an + * RDMA list. The folios are not pinned. + */ +static ssize_t smb_extract_xarray_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + struct xarray *xa = iter->xarray; + struct folio *folio; + unsigned int sge_max = rdma->max_sge; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t off, len; + XA_STATE(xas, xa, index); + + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + if (xas_retry(&xas, folio)) + continue; + if (WARN_ON(xa_is_value(folio))) + break; + if (WARN_ON(folio_test_hugetlb(folio))) + break; + + off = offset_in_folio(folio, start); + len = min_t(size_t, maxsize, folio_size(folio) - off); + + if (!smb_set_sge(rdma, folio_page(folio, 0), off, len)) { + rcu_read_lock(); + return -EIO; + } + sge_max--; + + maxsize -= len; + ret += len; + if (maxsize <= 0 || sge_max == 0) + break; + } + + rcu_read_unlock(); + return ret; +} + +/* + * Extract page fragments from up to the given amount of the source iterator + * and build up an RDMA list that refers to all of those bits. The RDMA list + * is appended to, up to the maximum number of elements set in the parameter + * block. + * + * The extracted page fragments are not pinned or ref'd in any way; if an + * IOVEC/UBUF-type iterator is to be used, it should be converted to a + * BVEC-type iterator and the pages pinned, ref'd or otherwise held in some + * way. + */ +static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, + struct smb_extract_to_rdma *rdma) +{ + ssize_t ret; + int before = rdma->nr_sge; + + if (iov_iter_is_discard(iter) || + iov_iter_is_pipe(iter) || + user_backed_iter(iter)) { + WARN_ON_ONCE(1); + return -EIO; + } + + switch (iov_iter_type(iter)) { + case ITER_BVEC: + ret = smb_extract_bvec_to_rdma(iter, rdma, len); + break; + case ITER_KVEC: + ret = smb_extract_kvec_to_rdma(iter, rdma, len); + break; + case ITER_XARRAY: + ret = smb_extract_xarray_to_rdma(iter, rdma, len); + break; + default: + BUG(); + } + + if (ret > 0) { + iov_iter_advance(iter, ret); + } else if (ret < 0) { + while (rdma->nr_sge > before) { + struct ib_sge *sge = &rdma->sge[rdma->nr_sge--]; + + ib_dma_unmap_single(rdma->device, sge->addr, sge->length, + rdma->direction); + sge->addr = 0; + } + } + + return ret; +} From patchwork Mon Jan 16 23:11:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A40DC46467 for ; Mon, 16 Jan 2023 23:17:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235430AbjAPXRa (ORCPT ); Mon, 16 Jan 2023 18:17:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233684AbjAPXQj (ORCPT ); Mon, 16 Jan 2023 18:16:39 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 563D72798B for ; Mon, 16 Jan 2023 15:11:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910666; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mkjFMRxDIKJKWLRwMUOLvuSiSTpWYy2LHznevDgStcI=; b=H0yiEkW12tMxVFtWyC69FXPzAlho9BOPYF3PPjPqKqH/fTcvszmeZ8DLE8WojBojlND+5T r0i2IALV9LzjRA7K9oxlDIWdO7jqzazgZ6aQjAF4HJcmvpQONOmVVVUHYD7OxTOcAVjCeR qejQJzmG9YhqTUR555Um61SmheJoiWY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-203-enKb_4ZtORmXcmB3l_IJKw-1; Mon, 16 Jan 2023 18:11:05 -0500 X-MC-Unique: enKb_4ZtORmXcmB3l_IJKw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 710612A59560; Mon, 16 Jan 2023 23:11:04 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id A58A94010D46; Mon, 16 Jan 2023 23:11:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 25/34] cifs: Add a function to Hash the contents of an iterator From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-crypto@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:02 +0000 Message-ID: <167391066212.2311931.16097548940184155209.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a function to push the contents of a BVEC-, KVEC- or XARRAY-type iterator into a symmetric hash algorithm. UBUF- and IOBUF-type iterators are not supported on the assumption that either we're doing buffered I/O, in which case we won't see them, or we're doing direct I/O, in which case the iterator will have been extracted into a BVEC-type iterator higher up. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-crypto@vger.kernel.org Link: https://lore.kernel.org/r/166697257423.61150.12070648579830206483.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732029577.3186319.17162612653237909961.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsencrypt.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index 5db73c0f792a..e13f26371540 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -24,6 +24,150 @@ #include "../smbfs_common/arc4.h" #include +/* + * Hash data from a BVEC-type iterator. + */ +static int cifs_shash_bvec(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + const struct bio_vec *bv = iter->bvec; + unsigned long start = iter->iov_offset; + unsigned int i; + void *p; + int ret; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + p = kmap_local_page(bv[i].bv_page); + ret = crypto_shash_update(shash, p + off, len); + kunmap_local(p); + if (ret < 0) + return ret; + + maxsize -= len; + if (maxsize <= 0) + break; + start = 0; + } + + return 0; +} + +/* + * Hash data from a KVEC-type iterator. + */ +static int cifs_shash_kvec(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + const struct kvec *kv = iter->kvec; + unsigned long start = iter->iov_offset; + unsigned int i; + int ret; + + for (i = 0; i < iter->nr_segs; i++) { + size_t len; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + ret = crypto_shash_update(shash, kv[i].iov_base + start, len); + if (ret < 0) + return ret; + maxsize -= len; + + if (maxsize <= 0) + break; + start = 0; + } + + return 0; +} + +/* + * Hash data from an XARRAY-type iterator. + */ +static ssize_t cifs_shash_xarray(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + struct folio *folios[16], *folio; + unsigned int nr, i, j, npages; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t last, index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t len, offset, foffset; + void *p; + + if (maxsize == 0) + return 0; + + last = (start + maxsize - 1) / PAGE_SIZE; + do { + nr = xa_extract(iter->xarray, (void **)folios, index, last, + ARRAY_SIZE(folios), XA_PRESENT); + if (nr == 0) + return -EIO; + + for (i = 0; i < nr; i++) { + folio = folios[i]; + npages = folio_nr_pages(folio); + foffset = start - folio_pos(folio); + offset = foffset % PAGE_SIZE; + for (j = foffset / PAGE_SIZE; j < npages; j++) { + len = min_t(size_t, maxsize, PAGE_SIZE - offset); + p = kmap_local_page(folio_page(folio, j)); + ret = crypto_shash_update(shash, p, len); + kunmap_local(p); + if (ret < 0) + return ret; + maxsize -= len; + if (maxsize <= 0) + return 0; + start += len; + offset = 0; + index++; + } + } + } while (nr == ARRAY_SIZE(folios)); + return 0; +} + +/* + * Pass the data from an iterator into a hash. + */ +static int cifs_shash_iter(const struct iov_iter *iter, size_t maxsize, + struct shash_desc *shash) +{ + if (maxsize == 0) + return 0; + + switch (iov_iter_type(iter)) { + case ITER_BVEC: + return cifs_shash_bvec(iter, maxsize, shash); + case ITER_KVEC: + return cifs_shash_kvec(iter, maxsize, shash); + case ITER_XARRAY: + return cifs_shash_xarray(iter, maxsize, shash); + default: + pr_err("cifs_shash_iter(%u) unsupported\n", iov_iter_type(iter)); + WARN_ON_ONCE(1); + return -EIO; + } +} + int __cifs_calc_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server, char *signature, struct shash_desc *shash) From patchwork Mon Jan 16 23:11:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A74E1C678D4 for ; Mon, 16 Jan 2023 23:17:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232712AbjAPXRy (ORCPT ); Mon, 16 Jan 2023 18:17:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235375AbjAPXRE (ORCPT ); Mon, 16 Jan 2023 18:17:04 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD392303DF for ; Mon, 16 Jan 2023 15:11:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910678; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fiBnMGx+baxgSoLj5goLQrtXR0Ipe5sXcwIz+552lHw=; b=TQc0RdNCELI+F6dyZAMg4sYiQISGzTsQxNUxOfUUAmNRA7yYHSOLNuvgMN3pXtLn1up11G 8upP+/N4u/JdTBRmY/HIEt8cjOkABQJ2EAhFXquPjDeo79eVRdLPSLEikYqRX/+uID7x3W lhVIKHhg/xW7rFWokOJ/V2fw0ytyki0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-587-htQYbudlN-yAtPigeJ0tJw-1; Mon, 16 Jan 2023 18:11:12 -0500 X-MC-Unique: htQYbudlN-yAtPigeJ0tJw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA17485CCE3; Mon, 16 Jan 2023 23:11:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 247C02166B26; Mon, 16 Jan 2023 23:11:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 26/34] cifs: Add some helper functions From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:09 +0000 Message-ID: <167391066959.2311931.17352170557719525141.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add some helper functions to manipulate the folio marks by iterating through a list of folios held in an xarray rather than using a page list. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928616583.457102.15157033997163988344.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211418840.3154751.3090684430628501879.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348878940.2106726.204291614267188735.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364825674.3334034.3356201708659748648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126394799.708021.10637797063862600488.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258147.61150.9940790486999562110.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732030314.3186319.9209944805565413627.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsfs.h | 3 ++ fs/cifs/file.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+) diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 25decebbc478..ea628da503c6 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -113,6 +113,9 @@ extern int cifs_file_strict_mmap(struct file *file, struct vm_area_struct *vma); extern const struct file_operations cifs_dir_ops; extern int cifs_dir_open(struct inode *inode, struct file *file); extern int cifs_readdir(struct file *file, struct dir_context *ctx); +extern void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len); /* Functions related to dir entries */ extern const struct dentry_operations cifs_dentry_ops; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index f1297386a185..2873f28bf388 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -36,6 +36,99 @@ #include "cifs_ioctl.h" #include "cached_dir.h" +/* + * Completion of write to server. + */ +void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + folio_detach_private(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Failure of write to server. + */ +void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + folio_set_error(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Redirty pages after a temporary failure. + */ +void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + filemap_dirty_folio(folio->mapping, folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + /* * Mark as invalid, all open files on tree connections since they * were closed when session to server was lost. From patchwork Mon Jan 16 23:11:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09CA8C678D4 for ; Mon, 16 Jan 2023 23:18:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229842AbjAPXSB (ORCPT ); Mon, 16 Jan 2023 18:18:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235362AbjAPXRJ (ORCPT ); Mon, 16 Jan 2023 18:17:09 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 197E0303F5 for ; Mon, 16 Jan 2023 15:11:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=js62ZRaWnqbjmPpizk27IottYyKMHFmsKuCH/pU3Ec8=; b=Pa6o7nXNr0B2F0MWL02AluH//FROB2DT3BZtG8cvX5h+bl+UCOPlthykv5xYBENLUqUMLu am6qS6clq4eK/K+oh41fPTSmGw2fK04hqNv3OjdpFWI2/0wsQBYD7GBXJwTOSJdyRacQSZ eo+ThdFDjEEL2SlpB4g08S9uXGKN1S0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-637-NULuP-QOMeChjslrrhpo2A-1; Mon, 16 Jan 2023 18:11:19 -0500 X-MC-Unique: NULuP-QOMeChjslrrhpo2A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1BEB2802C1C; Mon, 16 Jan 2023 23:11:19 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8030D2026D4B; Mon, 16 Jan 2023 23:11:17 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 27/34] cifs: Add a function to read into an iter from a socket From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:16 +0000 Message-ID: <167391067696.2311931.12784274342375267019.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add a helper function to read data from a socket into the given iterator. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928617874.457102.10021662143234315566.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211419563.3154751.18431990381145195050.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348879662.2106726.16881134187242702351.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364826398.3334034.12541600783145647319.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126395495.708021.12328677373159554478.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258876.61150.3530237818849429372.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732031039.3186319.10691316510079412635.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsproto.h | 3 +++ fs/cifs/connect.c | 16 ++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index 1207b39686fb..cb7a3fe89278 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -244,6 +244,9 @@ extern int cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, unsigned int page_offset, unsigned int to_read); +int cifs_read_iter_from_socket(struct TCP_Server_Info *server, + struct iov_iter *iter, + unsigned int to_read); extern int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb); void cifs_mount_put_conns(struct cifs_mount_ctx *mnt_ctx); int cifs_mount_get_session(struct cifs_mount_ctx *mnt_ctx); diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index d371259d6808..68d6d74c2f4e 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -765,6 +765,22 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, return cifs_readv_from_socket(server, &smb_msg); } +int +cifs_read_iter_from_socket(struct TCP_Server_Info *server, struct iov_iter *iter, + unsigned int to_read) +{ + struct msghdr smb_msg; + int ret; + + smb_msg.msg_iter = *iter; + if (smb_msg.msg_iter.count > to_read) + smb_msg.msg_iter.count = to_read; + ret = cifs_readv_from_socket(server, &smb_msg); + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + static bool is_smb_response(struct TCP_Server_Info *server, unsigned char type) { From patchwork Mon Jan 16 23:11:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103918 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9650EC678D8 for ; Mon, 16 Jan 2023 23:20:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235447AbjAPXUl (ORCPT ); Mon, 16 Jan 2023 18:20:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235526AbjAPXSh (ORCPT ); Mon, 16 Jan 2023 18:18:37 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4346D31E00 for ; Mon, 16 Jan 2023 15:11:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910699; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=74/QgH+XJrzH5EAulnEvBRSGrOR7PfUP7fIYYmuq8L8=; b=OxjjhlFFbfbmx48Jh2pQFaxhuFUxR993IRCFfuSDu6y/mBB/LW58bmQwWrpTZTmNcDe4TV iua0SCYoebely+wmI9uO9KZFUym36BxILeogd638Ol465nGGhw9E9F46u/QkxqvQzZx4xj QaxQCtikZ5+lNdp3dt+0ul333flZu3E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-671-uQyuEPLfMXSz_xVa0tooCA-1; Mon, 16 Jan 2023 18:11:35 -0500 X-MC-Unique: uQyuEPLfMXSz_xVa0tooCA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 777EA1869B60; Mon, 16 Jan 2023 23:11:34 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9911B2166B26; Mon, 16 Jan 2023 23:11:32 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 29/34] cifs: Build the RDMA SGE list directly from an iterator From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Jeff Layton , linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:32 +0000 Message-ID: <167391069208.2311931.17037009522123506578.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In the depths of the cifs RDMA code, extract part of an iov iterator directly into an SGE list without going through an intermediate scatterlist. Note that this doesn't support extraction from an IOBUF- or UBUF-type iterator (ie. user-supplied buffer). The assumption is that the higher layers will extract those to a BVEC-type iterator first and do whatever is required to stop the pages from going away. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/166697260361.61150.5064013393408112197.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732032518.3186319.1859601819981624629.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/smbdirect.c | 111 ++++++++++++++++++--------------------------------- 1 file changed, 39 insertions(+), 72 deletions(-) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 8bd320f0156e..4691b5a8e1ff 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -828,16 +828,16 @@ static int smbd_post_send(struct smbd_connection *info, return rc; } -static int smbd_post_send_sgl(struct smbd_connection *info, - struct scatterlist *sgl, int data_length, int remaining_data_length) +static int smbd_post_send_iter(struct smbd_connection *info, + struct iov_iter *iter, + int *_remaining_data_length) { - int num_sgs; int i, rc; int header_length; + int data_length; struct smbd_request *request; struct smbd_data_transfer *packet; int new_credits; - struct scatterlist *sg; wait_credit: /* Wait for send credits. A SMBD packet needs one credit */ @@ -881,6 +881,30 @@ static int smbd_post_send_sgl(struct smbd_connection *info, } request->info = info; + memset(request->sge, 0, sizeof(request->sge)); + + /* Fill in the data payload to find out how much data we can add */ + if (iter) { + struct smb_extract_to_rdma extract = { + .nr_sge = 1, + .max_sge = SMBDIRECT_MAX_SEND_SGE, + .sge = request->sge, + .device = info->id->device, + .local_dma_lkey = info->pd->local_dma_lkey, + .direction = DMA_TO_DEVICE, + }; + + rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, + &extract); + if (rc < 0) + goto err_dma; + data_length = rc; + request->num_sge = extract.nr_sge; + *_remaining_data_length -= data_length; + } else { + data_length = 0; + request->num_sge = 1; + } /* Fill in the packet header */ packet = smbd_request_payload(request); @@ -902,7 +926,7 @@ static int smbd_post_send_sgl(struct smbd_connection *info, else packet->data_offset = cpu_to_le32(24); packet->data_length = cpu_to_le32(data_length); - packet->remaining_data_length = cpu_to_le32(remaining_data_length); + packet->remaining_data_length = cpu_to_le32(*_remaining_data_length); packet->padding = 0; log_outgoing(INFO, "credits_requested=%d credits_granted=%d data_offset=%d data_length=%d remaining_data_length=%d\n", @@ -918,7 +942,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, if (!data_length) header_length = offsetof(struct smbd_data_transfer, padding); - request->num_sge = 1; request->sge[0].addr = ib_dma_map_single(info->id->device, (void *)packet, header_length, @@ -932,23 +955,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, request->sge[0].length = header_length; request->sge[0].lkey = info->pd->local_dma_lkey; - /* Fill in the packet data payload */ - num_sgs = sgl ? sg_nents(sgl) : 0; - for_each_sg(sgl, sg, num_sgs, i) { - request->sge[i+1].addr = - ib_dma_map_page(info->id->device, sg_page(sg), - sg->offset, sg->length, DMA_TO_DEVICE); - if (ib_dma_mapping_error( - info->id->device, request->sge[i+1].addr)) { - rc = -EIO; - request->sge[i+1].addr = 0; - goto err_dma; - } - request->sge[i+1].length = sg->length; - request->sge[i+1].lkey = info->pd->local_dma_lkey; - request->num_sge++; - } - rc = smbd_post_send(info, request); if (!rc) return 0; @@ -987,8 +993,10 @@ static int smbd_post_send_sgl(struct smbd_connection *info, */ static int smbd_post_send_empty(struct smbd_connection *info) { + int remaining_data_length = 0; + info->count_send_empty++; - return smbd_post_send_sgl(info, NULL, 0, 0); + return smbd_post_send_iter(info, NULL, &remaining_data_length); } /* @@ -1923,42 +1931,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) return rc; } -/* - * Send the contents of an iterator - * @iter: The iterator to send - * @_remaining_data_length: remaining data to send in this payload - */ -static int smbd_post_send_iter(struct smbd_connection *info, - struct iov_iter *iter, - int *_remaining_data_length) -{ - struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; - unsigned int max_payload = info->max_send_size - sizeof(struct smbd_data_transfer); - unsigned int cleanup_mode; - ssize_t rc; - - do { - struct sg_table sgtable = { .sgl = sgl }; - size_t maxlen = min_t(size_t, *_remaining_data_length, max_payload); - - sg_init_table(sgtable.sgl, ARRAY_SIZE(sgl)); - rc = netfs_extract_iter_to_sg(iter, maxlen, - &sgtable, ARRAY_SIZE(sgl), - &cleanup_mode); - if (rc < 0) - break; - if (WARN_ON_ONCE(sgtable.nents == 0)) - return -EIO; - WARN_ON(cleanup_mode != 0); - - sg_mark_end(&sgl[sgtable.nents - 1]); - *_remaining_data_length -= rc; - rc = smbd_post_send_sgl(info, sgl, rc, *_remaining_data_length); - } while (rc == 0 && iov_iter_count(iter) > 0); - - return rc; -} - /* * Send data to transport * Each rqst is transported as a SMBDirect payload @@ -2240,16 +2212,17 @@ static struct smbd_mr *get_mr(struct smbd_connection *info) static int smbd_iter_to_mr(struct smbd_connection *info, struct iov_iter *iter, struct scatterlist *sgl, - unsigned int num_pages) + unsigned int num_pages, + bool writing) { struct sg_table sgtable = { .sgl = sgl }; - unsigned int cleanup_mode; int ret; sg_init_table(sgl, num_pages); ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), - &sgtable, num_pages, &cleanup_mode); + &sgtable, num_pages, + writing ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); WARN_ON(ret < 0); return ret; } @@ -2291,7 +2264,7 @@ struct smbd_mr *smbd_register_mr(struct smbd_connection *info, log_rdma_mr(INFO, "num_pages=0x%x count=0x%zx\n", num_pages, iov_iter_count(iter)); - smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages); + smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages, writing); rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir); if (!rc) { @@ -2602,13 +2575,6 @@ static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, ssize_t ret; int before = rdma->nr_sge; - if (iov_iter_is_discard(iter) || - iov_iter_is_pipe(iter) || - user_backed_iter(iter)) { - WARN_ON_ONCE(1); - return -EIO; - } - switch (iov_iter_type(iter)) { case ITER_BVEC: ret = smb_extract_bvec_to_rdma(iter, rdma, len); @@ -2620,7 +2586,8 @@ static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, ret = smb_extract_xarray_to_rdma(iter, rdma, len); break; default: - BUG(); + WARN_ON_ONCE(1); + return -EIO; } if (ret > 0) { From patchwork Mon Jan 16 23:11:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C442DC46467 for ; Mon, 16 Jan 2023 23:20:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235344AbjAPXUc (ORCPT ); Mon, 16 Jan 2023 18:20:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235664AbjAPXTO (ORCPT ); Mon, 16 Jan 2023 18:19:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A53532CFF7 for ; Mon, 16 Jan 2023 15:12:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910708; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M+1LABBnnU+RnWR128vQInrdJjzioo+01iOiO20pla8=; b=MrC450d5kMWEw2bwgCCuL1cXY9OszdNua1cNp/VEbM4te3P6I6SwGYkXEnqP1f3Vpz8tUW eMl1+D2M/QjmZyBIqeIZ/23+uDswwpKnZNO0+MOuoUUHPuvQp2zP6Gb8U6XcYhf5VENYJg xA70plNz65GUeFBgSTQ2877o4r4ugi4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-145-h2ESI8cHOX60ntS-S9i6nQ-1; Mon, 16 Jan 2023 18:11:42 -0500 X-MC-Unique: h2ESI8cHOX60ntS-S9i6nQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F0EBD380390A; Mon, 16 Jan 2023 23:11:41 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2B970492B10; Mon, 16 Jan 2023 23:11:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 30/34] cifs: Remove unused code From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:39 +0000 Message-ID: <167391069962.2311931.2392376351847891810.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Remove a bunch of functions that are no longer used and are commented out after the conversion to use iterators throughout the I/O path. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928621823.457102.8777804402615654773.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211421039.3154751.15199634443157779005.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348881165.2106726.2993852968344861224.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364827876.3334034.9331465096417303889.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126396915.708021.2010212654244139442.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697261080.61150.17513116912567922274.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732033255.3186319.5527423437137895940.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/file.c | 606 -------------------------------------------------------- 1 file changed, 606 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index cfa8ad8a59c4..6baf591f63a3 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2603,314 +2603,6 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) return rc; } -#if 0 // TODO: Remove for iov_iter support -static struct cifs_writedata * -wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, - pgoff_t end, pgoff_t *index, - unsigned int *found_pages) -{ - struct cifs_writedata *wdata; - - wdata = cifs_writedata_alloc((unsigned int)tofind, - cifs_writev_complete); - if (!wdata) - return NULL; - - *found_pages = find_get_pages_range_tag(mapping, index, end, - PAGECACHE_TAG_DIRTY, tofind, wdata->pages); - return wdata; -} - -static unsigned int -wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages, - struct address_space *mapping, - struct writeback_control *wbc, - pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done) -{ - unsigned int nr_pages = 0, i; - struct page *page; - - for (i = 0; i < found_pages; i++) { - page = wdata->pages[i]; - /* - * At this point we hold neither the i_pages lock nor the - * page lock: the page may be truncated or invalidated - * (changing page->mapping to NULL), or even swizzled - * back from swapper_space to tmpfs file mapping - */ - - if (nr_pages == 0) - lock_page(page); - else if (!trylock_page(page)) - break; - - if (unlikely(page->mapping != mapping)) { - unlock_page(page); - break; - } - - if (!wbc->range_cyclic && page->index > end) { - *done = true; - unlock_page(page); - break; - } - - if (*next && (page->index != *next)) { - /* Not next consecutive page */ - unlock_page(page); - break; - } - - if (wbc->sync_mode != WB_SYNC_NONE) - wait_on_page_writeback(page); - - if (PageWriteback(page) || - !clear_page_dirty_for_io(page)) { - unlock_page(page); - break; - } - - /* - * This actually clears the dirty bit in the radix tree. - * See cifs_writepage() for more commentary. - */ - set_page_writeback(page); - if (page_offset(page) >= i_size_read(mapping->host)) { - *done = true; - unlock_page(page); - end_page_writeback(page); - break; - } - - wdata->pages[i] = page; - *next = page->index + 1; - ++nr_pages; - } - - /* reset index to refind any pages skipped */ - if (nr_pages == 0) - *index = wdata->pages[0]->index + 1; - - /* put any pages we aren't going to use */ - for (i = nr_pages; i < found_pages; i++) { - put_page(wdata->pages[i]); - wdata->pages[i] = NULL; - } - - return nr_pages; -} - -static int -wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages, - struct address_space *mapping, struct writeback_control *wbc) -{ - int rc; - - wdata->sync_mode = wbc->sync_mode; - wdata->nr_pages = nr_pages; - wdata->offset = page_offset(wdata->pages[0]); - wdata->pagesz = PAGE_SIZE; - wdata->tailsz = min(i_size_read(mapping->host) - - page_offset(wdata->pages[nr_pages - 1]), - (loff_t)PAGE_SIZE); - wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz; - wdata->pid = wdata->cfile->pid; - - rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); - if (rc) - return rc; - - if (wdata->cfile->invalidHandle) - rc = -EAGAIN; - else - rc = wdata->server->ops->async_writev(wdata, - cifs_writedata_release); - - return rc; -} - -static int -cifs_writepage_locked(struct page *page, struct writeback_control *wbc); - -static int cifs_write_one_page(struct page *page, struct writeback_control *wbc, - void *data) -{ - struct address_space *mapping = data; - int ret; - - ret = cifs_writepage_locked(page, wbc); - unlock_page(page); - mapping_set_error(mapping, ret); - return ret; -} - -static int cifs_writepages(struct address_space *mapping, - struct writeback_control *wbc) -{ - struct inode *inode = mapping->host; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); - struct TCP_Server_Info *server; - bool done = false, scanned = false, range_whole = false; - pgoff_t end, index; - struct cifs_writedata *wdata; - struct cifsFileInfo *cfile = NULL; - int rc = 0; - int saved_rc = 0; - unsigned int xid; - - /* - * If wsize is smaller than the page cache size, default to writing - * one page at a time. - */ - if (cifs_sb->ctx->wsize < PAGE_SIZE) - return write_cache_pages(mapping, wbc, cifs_write_one_page, - mapping); - - xid = get_xid(); - if (wbc->range_cyclic) { - index = mapping->writeback_index; /* Start from prev offset */ - end = -1; - } else { - index = wbc->range_start >> PAGE_SHIFT; - end = wbc->range_end >> PAGE_SHIFT; - if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) - range_whole = true; - scanned = true; - } - server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); - -retry: - while (!done && index <= end) { - unsigned int i, nr_pages, found_pages, wsize; - pgoff_t next = 0, tofind, saved_index = index; - struct cifs_credits credits_on_stack; - struct cifs_credits *credits = &credits_on_stack; - int get_file_rc = 0; - - if (cfile) - cifsFileInfo_put(cfile); - - rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); - - /* in case of an error store it to return later */ - if (rc) - get_file_rc = rc; - - rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, - &wsize, credits); - if (rc != 0) { - done = true; - break; - } - - tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1; - - wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index, - &found_pages); - if (!wdata) { - rc = -ENOMEM; - done = true; - add_credits_and_wake_if(server, credits, 0); - break; - } - - if (found_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - break; - } - - nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc, - end, &index, &next, &done); - - /* nothing to write? */ - if (nr_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - continue; - } - - wdata->credits = credits_on_stack; - wdata->cfile = cfile; - wdata->server = server; - cfile = NULL; - - if (!wdata->cfile) { - cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", - get_file_rc); - if (is_retryable_error(get_file_rc)) - rc = get_file_rc; - else - rc = -EBADF; - } else - rc = wdata_send_pages(wdata, nr_pages, mapping, wbc); - - for (i = 0; i < nr_pages; ++i) - unlock_page(wdata->pages[i]); - - /* send failure -- clean up the mess */ - if (rc != 0) { - add_credits_and_wake_if(server, &wdata->credits, 0); - for (i = 0; i < nr_pages; ++i) { - if (is_retryable_error(rc)) - redirty_page_for_writepage(wbc, - wdata->pages[i]); - else - SetPageError(wdata->pages[i]); - end_page_writeback(wdata->pages[i]); - put_page(wdata->pages[i]); - } - if (!is_retryable_error(rc)) - mapping_set_error(mapping, rc); - } - kref_put(&wdata->refcount, cifs_writedata_release); - - if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) { - index = saved_index; - continue; - } - - /* Return immediately if we received a signal during writing */ - if (is_interrupt_error(rc)) { - done = true; - break; - } - - if (rc != 0 && saved_rc == 0) - saved_rc = rc; - - wbc->nr_to_write -= nr_pages; - if (wbc->nr_to_write <= 0) - done = true; - - index = next; - } - - if (!scanned && !done) { - /* - * We hit the last page and there is more work to be done: wrap - * back to the start of the file - */ - scanned = true; - index = 0; - goto retry; - } - - if (saved_rc != 0) - rc = saved_rc; - - if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) - mapping->writeback_index = index; - - if (cfile) - cifsFileInfo_put(cfile); - free_xid(xid); - /* Indication to update ctime and mtime as close is deferred */ - set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); - return rc; -} -#endif - /* * Extend the region to be written back to include subsequent contiguously * dirty pages if possible, but don't sleep while doing so. @@ -3505,49 +3197,6 @@ int cifs_flush(struct file *file, fl_owner_t id) return rc; } -#if 0 // TODO: Remove for iov_iter support -static int -cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) -{ - int rc = 0; - unsigned long i; - - for (i = 0; i < num_pages; i++) { - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[i]) { - /* - * save number of pages we have already allocated and - * return with ENOMEM error - */ - num_pages = i; - rc = -ENOMEM; - break; - } - } - - if (rc) { - for (i = 0; i < num_pages; i++) - put_page(pages[i]); - } - return rc; -} - -static inline -size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len) -{ - size_t num_pages; - size_t clen; - - clen = min_t(const size_t, len, wsize); - num_pages = DIV_ROUND_UP(clen, PAGE_SIZE); - - if (cur_len) - *cur_len = clen; - - return num_pages; -} -#endif - static void cifs_uncached_writedata_release(struct kref *refcount) { @@ -3580,50 +3229,6 @@ cifs_uncached_writev_complete(struct work_struct *work) kref_put(&wdata->refcount, cifs_uncached_writedata_release); } -#if 0 // TODO: Remove for iov_iter support -static int -wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, - size_t *len, unsigned long *num_pages) -{ - size_t save_len, copied, bytes, cur_len = *len; - unsigned long i, nr_pages = *num_pages; - - save_len = cur_len; - for (i = 0; i < nr_pages; i++) { - bytes = min_t(const size_t, cur_len, PAGE_SIZE); - copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from); - cur_len -= copied; - /* - * If we didn't copy as much as we expected, then that - * may mean we trod into an unmapped area. Stop copying - * at that point. On the next pass through the big - * loop, we'll likely end up getting a zero-length - * write and bailing out of it. - */ - if (copied < bytes) - break; - } - cur_len = save_len - cur_len; - *len = cur_len; - - /* - * If we have no data to send, then that probably means that - * the copy above failed altogether. That's most likely because - * the address in the iovec was bogus. Return -EFAULT and let - * the caller free anything we allocated and bail out. - */ - if (!cur_len) - return -EFAULT; - - /* - * i + 1 now represents the number of pages we actually used in - * the copy phase above. - */ - *num_pages = i + 1; - return 0; -} -#endif - static int cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, struct cifs_aio_ctx *ctx) @@ -4212,83 +3817,6 @@ cifs_uncached_readv_complete(struct work_struct *work) kref_put(&rdata->refcount, cifs_readdata_release); } -#if 0 // TODO: Remove for iov_iter support - -static int -uncached_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) -{ - int result = 0; - unsigned int i; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; - - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - size_t n; - unsigned int segment_size = rdata->pagesz; - - if (i == 0) - segment_size -= page_offset; - else - page_offset = 0; - - - if (len <= 0) { - /* no need to hold page hostage */ - rdata->pages[i] = NULL; - rdata->nr_pages--; - put_page(page); - continue; - } - - n = len; - if (len >= segment_size) - /* enough data to fill the page */ - n = segment_size; - else - rdata->tailsz = len; - len -= n; - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; - } - - return rdata->got_bytes > 0 && result != -ECONNABORTED ? - rdata->got_bytes : result; -} - -static int -cifs_uncached_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return uncached_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_uncached_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return uncached_fill_pages(server, rdata, iter, iter->count); -} -#endif - static int cifs_resend_rdata(struct cifs_readdata *rdata, struct list_head *rdata_list, struct cifs_aio_ctx *ctx) @@ -4901,140 +4429,6 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma) return rc; } -#if 0 // TODO: Remove for iov_iter support - -static void -cifs_readv_complete(struct work_struct *work) -{ - unsigned int i, got_bytes; - struct cifs_readdata *rdata = container_of(work, - struct cifs_readdata, work); - - got_bytes = rdata->got_bytes; - for (i = 0; i < rdata->nr_pages; i++) { - struct page *page = rdata->pages[i]; - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) { - flush_dcache_page(page); - SetPageUptodate(page); - } else - SetPageError(page); - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) - cifs_readpage_to_fscache(rdata->mapping->host, page); - - unlock_page(page); - - got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes); - - put_page(page); - rdata->pages[i] = NULL; - } - kref_put(&rdata->refcount, cifs_readdata_release); -} - -static int -readpages_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) -{ - int result = 0; - unsigned int i; - u64 eof; - pgoff_t eof_index; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; - - /* determine the eof that the server (probably) has */ - eof = CIFS_I(rdata->mapping->host)->server_eof; - eof_index = eof ? (eof - 1) >> PAGE_SHIFT : 0; - cifs_dbg(FYI, "eof=%llu eof_index=%lu\n", eof, eof_index); - - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - unsigned int to_read = rdata->pagesz; - size_t n; - - if (i == 0) - to_read -= page_offset; - else - page_offset = 0; - - n = to_read; - - if (len >= to_read) { - len -= to_read; - } else if (len > 0) { - /* enough for partial page, fill and zero the rest */ - zero_user(page, len + page_offset, to_read - len); - n = rdata->tailsz = len; - len = 0; - } else if (page->index > eof_index) { - /* - * The VFS will not try to do readahead past the - * i_size, but it's possible that we have outstanding - * writes with gaps in the middle and the i_size hasn't - * caught up yet. Populate those with zeroed out pages - * to prevent the VFS from repeatedly attempting to - * fill them until the writes are flushed. - */ - zero_user(page, 0, PAGE_SIZE); - flush_dcache_page(page); - SetPageUptodate(page); - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; - } else { - /* no need to hold page hostage */ - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; - } - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; - } - - return rdata->got_bytes > 0 && result != -ECONNABORTED ? - rdata->got_bytes : result; -} - -static int -cifs_readpages_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return readpages_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_readpages_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return readpages_fill_pages(server, rdata, iter, iter->count); -} -#endif - /* * Unlock a bunch of folios in the pagecache. */ From patchwork Mon Jan 16 23:11:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3151BC678D4 for ; Mon, 16 Jan 2023 23:20:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235435AbjAPXUe (ORCPT ); Mon, 16 Jan 2023 18:20:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235772AbjAPXTf (ORCPT ); Mon, 16 Jan 2023 18:19:35 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D85632E44 for ; Mon, 16 Jan 2023 15:12:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910713; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Nw4FQbdWbVj4MBsaopY4M6MScTLsVpfgXMad66KYZwU=; b=RjpdJDD3sLbe7gnvYV1G4r09VyO3HtvoRyFwJftJlu/7dYHqrHegEX9QrbeHEp/7Y5JrHL yH1Mzt6gLeiPH9rsTiyqKGiHFE+yJC/RXpuxXTjDPCMGAtU6CLdcUFusq1YHwYQSp6Pc0p srP+nU6Ap7+DvxNqKNhm8EpnzSWMq8Y= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-14-yziFx7y5NACL1vK30mvFNA-1; Mon, 16 Jan 2023 18:11:50 -0500 X-MC-Unique: yziFx7y5NACL1vK30mvFNA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7BA33380390A; Mon, 16 Jan 2023 23:11:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id AEA5240C6EC4; Mon, 16 Jan 2023 23:11:47 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 31/34] cifs: Fix problem with encrypted RDMA data read From: David Howells To: Al Viro Cc: Steve French , Tom Talpey , Long Li , Namjae Jeon , Stefan Metzmacher , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:47 +0000 Message-ID: <167391070712.2311931.8909671251130425914.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When the cifs client is talking to the ksmbd server by RDMA and the ksmbd server has "smb3 encryption = yes" in its config file, the normal PDU stream is encrypted, but the directly-delivered data isn't in the stream (and isn't encrypted), but is rather delivered by DDP/RDMA packets (at least with IWarp). Currently, the direct delivery fails with: buf can not contain only a part of read data WARNING: CPU: 0 PID: 4619 at fs/cifs/smb2ops.c:4731 handle_read_data+0x393/0x405 ... RIP: 0010:handle_read_data+0x393/0x405 ... smb3_handle_read_data+0x30/0x37 receive_encrypted_standard+0x141/0x224 cifs_demultiplex_thread+0x21a/0x63b kthread+0xe7/0xef ret_from_fork+0x22/0x30 The problem apparently stemming from the fact that it's trying to manage the decryption, but the data isn't in the smallbuf, the bigbuf or the page array). This can be fixed simply by inserting an extra case into handle_read_data() that checks to see if use_rdma_mr is true, and if it is, just setting rdata->got_bytes to the length of data delivered and allowing normal continuation. This can be seen in an IWarp packet trace. With the upstream code, it does a DDP/RDMA packet, which produces the warning above and then retries, retrieving the data inline, spread across several SMBDirect messages that get glued together into a single PDU. With the patch applied, only the DDP/RDMA packet is seen. Note that this doesn't happen if the server isn't told to encrypt stuff and it does also happen with softRoCE. Signed-off-by: David Howells cc: Steve French cc: Tom Talpey cc: Long Li cc: Namjae Jeon cc: Stefan Metzmacher cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/166855224228.1998592.2212551359609792175.stgit@warthog.procyon.org.uk/ # v1 --- fs/cifs/smb2ops.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 387effcb905d..fabb1e135faa 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4720,6 +4720,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, if (length < 0) return length; rdata->got_bytes = data_len; + } else if (use_rdma_mr) { + /* The data was delivered directly by RDMA. */ + rdata->got_bytes = data_len; } else { /* read response payload cannot be in both buf and pages */ WARN_ONCE(1, "buf can not contain only a part of read data"); From patchwork Mon Jan 16 23:11:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED280C678D9 for ; Mon, 16 Jan 2023 23:20:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235445AbjAPXUj (ORCPT ); Mon, 16 Jan 2023 18:20:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235845AbjAPXT6 (ORCPT ); Mon, 16 Jan 2023 18:19:58 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA8D733459 for ; Mon, 16 Jan 2023 15:12:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910720; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ntGo6rTYFCYIHyffDu5aqHQltoEFdTWJMIOs9Lz75sM=; b=ZWGDLyE961YOSOWdqQXYXaJu49QwaTUEfSPvhL5UtsxWgWUht41UmhhRmEcwJLfc3J4zSF maWU2VvbkHAtrGzZz7njkWaaSRg7UexssEFtwRDg+Rk8b7bDNXF6uMUkdJea5Q6r0IXbcp ZMxWfoufCHH7mOnK1OSJogcBIWbG6Xw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-259-ab7qPHtTPKW5GI6TkEaaVA-1; Mon, 16 Jan 2023 18:11:57 -0500 X-MC-Unique: ab7qPHtTPKW5GI6TkEaaVA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D6A10811E6E; Mon, 16 Jan 2023 23:11:56 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2EF9E1415108; Mon, 16 Jan 2023 23:11:55 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 32/34] cifs: DIO to/from KVEC-type iterators should now work From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:54 +0000 Message-ID: <167391071464.2311931.14722270915404689054.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org DIO to/from KVEC-type iterators should now work as the iterator is passed down to the socket in non-RDMA/non-crypto mode and in RDMA or crypto mode care is taken to handle vmap/vmalloc correctly and not take page refs when building a scatterlist. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org --- fs/cifs/file.c | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 6baf591f63a3..7f1e01cee83d 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3545,16 +3545,6 @@ static ssize_t __cifs_writev( struct cifs_aio_ctx *ctx; int rc; - /* - * iov_iter_get_pages_alloc doesn't work with ITER_KVEC. - * In this case, fall back to non-direct write function. - * this could be improved by getting pages directly in ITER_KVEC - */ - if (direct && iov_iter_is_kvec(from)) { - cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n"); - direct = false; - } - rc = generic_write_checks(iocb, from); if (rc <= 0) return rc; @@ -4090,16 +4080,6 @@ static ssize_t __cifs_readv( loff_t offset = iocb->ki_pos; struct cifs_aio_ctx *ctx; - /* - * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC, - * fall back to data copy read path - * this could be improved by getting pages directly in ITER_KVEC - */ - if (direct && iov_iter_is_kvec(to)) { - cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n"); - direct = false; - } - len = iov_iter_count(to); if (!len) return 0; From patchwork Mon Jan 16 23:12:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103920 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04505C54EBE for ; Mon, 16 Jan 2023 23:23:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235448AbjAPXWz (ORCPT ); Mon, 16 Jan 2023 18:22:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235473AbjAPXUu (ORCPT ); Mon, 16 Jan 2023 18:20:50 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A27832B2B0 for ; Mon, 16 Jan 2023 15:13:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910727; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kr0mxDWcfk97y+SD+0NWqgRc5pNq3uh0Z5vIhOFCaC0=; b=X+erdiog96e+mLHxcwXrqo6iQyZEmkF5ythWPoc3OuJFq5uqum7f4v5/98GTgNHWHLEi53 PNb1hjK+94R/S1tfkXLQLQDUrxZadYqqXw1pwQpU/YmYOVXDERQM61AUzGwlvOF5CUzj8H k4QrofBIcsQYqn/z8PrOwBFvXkW2LDw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-70-WiODrIuvPam5N8R9hwNvnA-1; Mon, 16 Jan 2023 18:12:05 -0500 X-MC-Unique: WiODrIuvPam5N8R9hwNvnA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0F5C03C0F660; Mon, 16 Jan 2023 23:12:05 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8DEF914171B8; Mon, 16 Jan 2023 23:12:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 33/34] net: [RFC][WIP] Mark each skb_frags as to how they should be cleaned up From: David Howells To: Al Viro Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:12:02 +0000 Message-ID: <167391072201.2311931.4013360052592980054.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org [!] NOTE: This patch is mostly for illustrative/discussion purposes and makes an incomplete change and the networking code may not compile thereafter. There are a couple of problems with pasting foreign pages into sk_buffs with zerocopy that are analogous to the problems with direct I/O: (1) Pages derived from kernel buffers, such as KVEC iterators should not have refs taken on them. Rather, the caller should do whatever it needs to to retain the memory. (2) Pages derived from userspace buffers must not have refs taken on them if they're going to be written to (analogous to direct I/O read) as this may cause a malfunction of the VM CoW mechanism with a concurrent fork. Rather, they should have pins taken on them (FOLL_PIN). This will affect zerocopy-recvmsg where that is exists (eg. TLS, I think, though that might be decrypt-offload). This is further complicated by the possibility of a sk_buff containing data from mixed sources - for example a network filesystem might generate a message consisting of some metadata from a kernel buffer (which should not be pinned) and some data from userspace (which should have a ref taken). To this end, each page fragment attached to a sk_buff needs labelling with the appropriate cleanup to be applied. Do this by: (1) Replace struct bio_vec as the basis of skb_frag_t with a new struct skb_frag. This has an offset and a length, as before, plus a 'page_and_mode' member that contains the cleanup mode in the bottom two bits and the page pointer in the remaining bits. (FOLL_GET and FOLL_PIN got renumbered to bits 0 and 1 in an earlier patch). (2) The cleanup mode can be one of FOLL_GET (put a ref on the page), FOLL_PIN (unpin the page) or 0 (do nothing). (3) skb_frag_page() is used to access the page pointer as before. (4) __skb_frag_set_page() and skb_frag_set_page() acquire an extra argument to indicate the cleanup mode. (5) The cleanup mode is set to FOLL_GET on everything for the moment. (6) __skb_frag_ref() will call try_grab_page(), passing the cleanup mode to indicate whether an extra ref, an extra pin or nothing is required. [!] NOTE: If the cleanup mode was 0, this skbuff will also not pin the page and the caller needs to be aware of that. (7) __skb_frag_unref() will call page_put_unpin() to do the appropriate cleanup, based on the mode. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: netdev@vger.kernel.org --- drivers/net/tun.c | 2 - include/linux/skbuff.h | 124 ++++++++++++++++++++++++++++++------------------ io_uring/net.c | 2 - net/bpf/test_run.c | 2 - net/core/datagram.c | 3 + net/core/gro.c | 2 - net/core/skbuff.c | 16 +++--- net/ipv4/ip_output.c | 2 - net/ipv4/tcp.c | 4 +- net/ipv6/esp6.c | 5 +- net/ipv6/ip6_output.c | 2 - net/packet/af_packet.c | 2 - net/xfrm/xfrm_ipcomp.c | 2 - 13 files changed, 101 insertions(+), 67 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a7d17c680f4a..6c467c5163b2 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1496,7 +1496,7 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, } page = virt_to_head_page(frag); skb_fill_page_desc(skb, i - 1, page, - frag - page_address(page), fragsz); + frag - page_address(page), fragsz, FOLL_GET); } return skb; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4c8492401a10..a1a77909509b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -357,7 +357,51 @@ extern int sysctl_max_skb_frags; */ #define GSO_BY_FRAGS 0xFFFF -typedef struct bio_vec skb_frag_t; +struct skb_frag { + unsigned long page_and_mode; /* page pointer | cleanup_mode (0/FOLL_GET/PIN) */ + unsigned int len; + unsigned int offset; +}; +typedef struct skb_frag skb_frag_t; + +/** + * skb_frag_cleanup() - Returns the cleanup mode for an skb fragment + * @frag: skb fragment + * + * Returns the cleanup mode associated with @frag. It will be FOLL_GET, + * FOLL_PUT or 0. + */ +static inline unsigned int skb_frag_cleanup(const skb_frag_t *frag) +{ + return frag->page_and_mode & 3; +} + +/** + * skb_frag_page() - Returns the page in an skb fragment + * @frag: skb fragment + * + * Returns the &struct page associated with @frag. + */ +static inline struct page *skb_frag_page(const skb_frag_t *frag) +{ + return (struct page *)(frag->page_and_mode & ~3); +} + +/** + * __skb_frag_set_page() - Sets the page in an skb fragment + * @frag: skb fragment + * @page: The page to set + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) + * + * Sets the fragment @frag to contain @page with the specified method of + * cleaning it up. + */ +static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page, + unsigned int cleanup_mode) +{ + cleanup_mode &= FOLL_GET | FOLL_PIN; + frag->page_and_mode = (unsigned long)page | cleanup_mode; +} /** * skb_frag_size() - Returns the size of a skb fragment @@ -365,7 +409,7 @@ typedef struct bio_vec skb_frag_t; */ static inline unsigned int skb_frag_size(const skb_frag_t *frag) { - return frag->bv_len; + return frag->len; } /** @@ -375,7 +419,7 @@ static inline unsigned int skb_frag_size(const skb_frag_t *frag) */ static inline void skb_frag_size_set(skb_frag_t *frag, unsigned int size) { - frag->bv_len = size; + frag->len = size; } /** @@ -385,7 +429,7 @@ static inline void skb_frag_size_set(skb_frag_t *frag, unsigned int size) */ static inline void skb_frag_size_add(skb_frag_t *frag, int delta) { - frag->bv_len += delta; + frag->len += delta; } /** @@ -395,7 +439,7 @@ static inline void skb_frag_size_add(skb_frag_t *frag, int delta) */ static inline void skb_frag_size_sub(skb_frag_t *frag, int delta) { - frag->bv_len -= delta; + frag->len -= delta; } /** @@ -2388,7 +2432,8 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, int i, struct page *page, - int off, int size) + int off, int size, + unsigned int cleanup_mode) { skb_frag_t *frag = &shinfo->frags[i]; @@ -2397,9 +2442,9 @@ static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, * that not all callers have unique ownership of the page but rely * on page_is_pfmemalloc doing the right thing(tm). */ - frag->bv_page = page; - frag->bv_offset = off; + __skb_frag_set_page(frag, page, cleanup_mode); skb_frag_size_set(frag, size); + frag->offset = off; } /** @@ -2421,6 +2466,7 @@ static inline void skb_len_add(struct sk_buff *skb, int delta) * @page: the page to use for this fragment * @off: the offset to the data with @page * @size: the length of the data + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * Initialises the @i'th fragment of @skb to point to &size bytes at * offset @off within @page. @@ -2428,9 +2474,11 @@ static inline void skb_len_add(struct sk_buff *skb, int delta) * Does not take any additional reference on the fragment. */ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, - struct page *page, int off, int size) + struct page *page, int off, int size, + unsigned int cleanup_mode) { - __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size, + cleanup_mode); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; @@ -2443,6 +2491,7 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, * @page: the page to use for this fragment * @off: the offset to the data with @page * @size: the length of the data + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * As per __skb_fill_page_desc() -- initialises the @i'th fragment of * @skb to point to @size bytes at offset @off within @page. In @@ -2451,9 +2500,10 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, * Does not take any additional reference on the fragment. */ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, - struct page *page, int off, int size) + struct page *page, int off, int size, + unsigned int cleanup_mode) { - __skb_fill_page_desc(skb, i, page, off, size); + __skb_fill_page_desc(skb, i, page, off, size, cleanup_mode); skb_shinfo(skb)->nr_frags = i + 1; } @@ -2464,17 +2514,18 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, * @page: the page to use for this fragment * @off: the offset to the data with @page * @size: the length of the data + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * Variant of skb_fill_page_desc() which does not deal with * pfmemalloc, if page is not owned by us. */ static inline void skb_fill_page_desc_noacc(struct sk_buff *skb, int i, struct page *page, int off, - int size) + int size, unsigned int cleanup_mode) { struct skb_shared_info *shinfo = skb_shinfo(skb); - __skb_fill_page_desc_noacc(shinfo, i, page, off, size); + __skb_fill_page_desc_noacc(shinfo, i, page, off, size, cleanup_mode); shinfo->nr_frags = i + 1; } @@ -3301,7 +3352,7 @@ static inline void skb_propagate_pfmemalloc(const struct page *page, */ static inline unsigned int skb_frag_off(const skb_frag_t *frag) { - return frag->bv_offset; + return frag->offset; } /** @@ -3311,7 +3362,7 @@ static inline unsigned int skb_frag_off(const skb_frag_t *frag) */ static inline void skb_frag_off_add(skb_frag_t *frag, int delta) { - frag->bv_offset += delta; + frag->offset += delta; } /** @@ -3321,7 +3372,7 @@ static inline void skb_frag_off_add(skb_frag_t *frag, int delta) */ static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset) { - frag->bv_offset = offset; + frag->offset = offset; } /** @@ -3332,18 +3383,7 @@ static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset) static inline void skb_frag_off_copy(skb_frag_t *fragto, const skb_frag_t *fragfrom) { - fragto->bv_offset = fragfrom->bv_offset; -} - -/** - * skb_frag_page - retrieve the page referred to by a paged fragment - * @frag: the paged fragment - * - * Returns the &struct page associated with @frag. - */ -static inline struct page *skb_frag_page(const skb_frag_t *frag) -{ - return frag->bv_page; + fragto->offset = fragfrom->offset; } /** @@ -3354,7 +3394,9 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag) */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + struct page *page = skb_frag_page(frag); + + try_grab_page(page, skb_frag_cleanup(frag)); } /** @@ -3385,7 +3427,7 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) if (recycle && page_pool_return_skb_page(page)) return; #endif - put_page(page); + page_put_unpin(page, skb_frag_cleanup(frag)); } /** @@ -3439,19 +3481,7 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag) static inline void skb_frag_page_copy(skb_frag_t *fragto, const skb_frag_t *fragfrom) { - fragto->bv_page = fragfrom->bv_page; -} - -/** - * __skb_frag_set_page - sets the page contained in a paged fragment - * @frag: the paged fragment - * @page: the page to set - * - * Sets the fragment @frag to contain @page. - */ -static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page) -{ - frag->bv_page = page; + fragto->page_and_mode = fragfrom->page_and_mode; } /** @@ -3459,13 +3489,15 @@ static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page) * @skb: the buffer * @f: the fragment offset * @page: the page to set + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * Sets the @f'th fragment of @skb to contain @page. */ static inline void skb_frag_set_page(struct sk_buff *skb, int f, - struct page *page) + struct page *page, + unsigned int cleanup_mode) { - __skb_frag_set_page(&skb_shinfo(skb)->frags[f], page); + __skb_frag_set_page(&skb_shinfo(skb)->frags[f], page, cleanup_mode); } bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio); diff --git a/io_uring/net.c b/io_uring/net.c index fbc34a7c2743..1d3e24404d75 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1043,7 +1043,7 @@ static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb, copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, - v.bv_offset, v.bv_len); + v.bv_offset, v.bv_len, FOLL_GET); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } if (bi.bi_size) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 2723623429ac..9ed2de52e1be 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1370,7 +1370,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, } frag = &sinfo->frags[sinfo->nr_frags++]; - __skb_frag_set_page(frag, page); + __skb_frag_set_page(frag, page, FOLL_GET); data_len = min_t(u32, kattr->test.data_size_in - size, PAGE_SIZE); diff --git a/net/core/datagram.c b/net/core/datagram.c index 9f0914b781ad..122bfb144d32 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -678,7 +678,8 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, page_ref_sub(last_head, refs); refs = 0; } - skb_fill_page_desc_noacc(skb, frag++, head, start, size); + skb_fill_page_desc_noacc(skb, frag++, head, start, size, + FOLL_GET); } if (refs) page_ref_sub(last_head, refs); diff --git a/net/core/gro.c b/net/core/gro.c index fd8c6a7e8d3e..dfbf2279ce5c 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -228,7 +228,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) pinfo->nr_frags = nr_frags + 1 + skbinfo->nr_frags; - __skb_frag_set_page(frag, page); + __skb_frag_set_page(frag, page, FOLL_GET); skb_frag_off_set(frag, first_offset); skb_frag_size_set(frag, first_size); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4a0eb5593275..a6a21a27ebb4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -765,7 +765,7 @@ EXPORT_SYMBOL(__napi_alloc_skb); void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, int size, unsigned int truesize) { - skb_fill_page_desc(skb, i, page, off, size); + skb_fill_page_desc(skb, i, page, off, size, FOLL_GET); skb->len += size; skb->data_len += size; skb->truesize += truesize; @@ -1666,10 +1666,10 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) /* skb frags point to kernel buffers */ for (i = 0; i < new_frags - 1; i++) { - __skb_fill_page_desc(skb, i, head, 0, PAGE_SIZE); + __skb_fill_page_desc(skb, i, head, 0, PAGE_SIZE, FOLL_GET); head = (struct page *)page_private(head); } - __skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off); + __skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off, FOLL_GET); skb_shinfo(skb)->nr_frags = new_frags; release: @@ -3389,7 +3389,7 @@ skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen) if (plen) { page = virt_to_head_page(from->head); offset = from->data - (unsigned char *)page_address(page); - __skb_fill_page_desc(to, 0, page, offset, plen); + __skb_fill_page_desc(to, 0, page, offset, plen, FOLL_GET); get_page(page); j = 1; len -= plen; @@ -4040,7 +4040,7 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, } else if (i < MAX_SKB_FRAGS) { skb_zcopy_downgrade_managed(skb); get_page(page); - skb_fill_page_desc_noacc(skb, i, page, offset, size); + skb_fill_page_desc_noacc(skb, i, page, offset, size, FOLL_GET); } else { return -EMSGSIZE; } @@ -4077,7 +4077,7 @@ static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb) struct page *page; page = virt_to_head_page(frag_skb->head); - __skb_frag_set_page(&head_frag, page); + __skb_frag_set_page(&head_frag, page, FOLL_GET); skb_frag_off_set(&head_frag, frag_skb->data - (unsigned char *)page_address(page)); skb_frag_size_set(&head_frag, skb_headlen(frag_skb)); @@ -5521,7 +5521,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, offset = from->data - (unsigned char *)page_address(page); skb_fill_page_desc(to, to_shinfo->nr_frags, - page, offset, skb_headlen(from)); + page, offset, skb_headlen(from), FOLL_GET); *fragstolen = true; } else { if (to_shinfo->nr_frags + @@ -6221,7 +6221,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len, fill_page: chunk = min_t(unsigned long, data_len, PAGE_SIZE << order); - skb_fill_page_desc(skb, i, page, 0, chunk); + skb_fill_page_desc(skb, i, page, 0, chunk, FOLL_GET); data_len -= chunk; npages -= 1 << order; } diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 922c87ef1ab5..43ea2e7aeeea 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1221,7 +1221,7 @@ static int __ip_append_data(struct sock *sk, goto error; __skb_fill_page_desc(skb, i, pfrag->page, - pfrag->offset, 0); + pfrag->offset, 0, FOLL_GET); skb_shinfo(skb)->nr_frags = ++i; get_page(pfrag->page); } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c567d5e8053e..2cb88e67e152 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1016,7 +1016,7 @@ static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags, skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { get_page(page); - skb_fill_page_desc_noacc(skb, i, page, offset, copy); + skb_fill_page_desc_noacc(skb, i, page, offset, copy, FOLL_GET); } if (!(flags & MSG_NO_SHARED_FRAGS)) @@ -1385,7 +1385,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { skb_fill_page_desc(skb, i, pfrag->page, - pfrag->offset, copy); + pfrag->offset, copy, FOLL_GET); page_ref_inc(pfrag->page); } pfrag->offset += copy; diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index 14ed868680c6..13e9d36e132e 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -529,7 +529,7 @@ int esp6_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info nfrags = skb_shinfo(skb)->nr_frags; __skb_fill_page_desc(skb, nfrags, page, pfrag->offset, - tailen); + tailen, FOLL_GET); skb_shinfo(skb)->nr_frags = ++nfrags; pfrag->offset = pfrag->offset + allocsize; @@ -635,7 +635,8 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info page = pfrag->page; get_page(page); /* replace page frags in skb with new page */ - __skb_fill_page_desc(skb, 0, page, pfrag->offset, skb->data_len); + __skb_fill_page_desc(skb, 0, page, pfrag->offset, skb->data_len, + FOLL_GET); pfrag->offset = pfrag->offset + allocsize; spin_unlock_bh(&x->lock); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 60fd91bb5171..117fb2bdad02 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1780,7 +1780,7 @@ static int __ip6_append_data(struct sock *sk, goto error; __skb_fill_page_desc(skb, i, pfrag->page, - pfrag->offset, 0); + pfrag->offset, 0, FOLL_GET); skb_shinfo(skb)->nr_frags = ++i; get_page(pfrag->page); } diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b5ab98ca2511..15c9f17ce7d8 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2630,7 +2630,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, data += len; flush_dcache_page(page); get_page(page); - skb_fill_page_desc(skb, nr_frags, page, offset, len); + skb_fill_page_desc(skb, nr_frags, page, offset, len, FOLL_GET); to_write -= len; offset = 0; len_max = PAGE_SIZE; diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c index 80143360bf09..8e9574e00cd0 100644 --- a/net/xfrm/xfrm_ipcomp.c +++ b/net/xfrm/xfrm_ipcomp.c @@ -74,7 +74,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb) if (!page) return -ENOMEM; - __skb_frag_set_page(frag, page); + __skb_frag_set_page(frag, page, FOLL_GET); len = PAGE_SIZE; if (dlen < len) From patchwork Mon Jan 16 23:12:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13103919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC54AC46467 for ; Mon, 16 Jan 2023 23:22:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235434AbjAPXWv (ORCPT ); Mon, 16 Jan 2023 18:22:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235475AbjAPXUu (ORCPT ); Mon, 16 Jan 2023 18:20:50 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85E9934C04 for ; Mon, 16 Jan 2023 15:13:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RemqltCjkZ1MEJ2va2p40ZyP53eaWFY2Pd4/PSibIZ0=; b=O9PEtYImUX54fBXzeQ/P0wLFkD6cFDoyDbpLEY11c6N68XHpZeW6rzD83TLNJTTo5x23x7 PBG/zA4JubXo1+Tfuk9kdWW+USnc0haUH50uVD/2VtDkPY04RWVxiaIKPXhkIZGRPyqGUK r4avHdNr5zRkeNOgdoxOsrSXdTym2j0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-QixFgPowM6ytp9eOTe22RA-1; Mon, 16 Jan 2023 18:12:13 -0500 X-MC-Unique: QixFgPowM6ytp9eOTe22RA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8248E1C0432A; Mon, 16 Jan 2023 23:12:12 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B86192026D4B; Mon, 16 Jan 2023 23:12:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 34/34] net: [RFC][WIP] Make __zerocopy_sg_from_iter() correctly pin or leave pages unref'd From: David Howells To: Al Viro Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:12:10 +0000 Message-ID: <167391073019.2311931.11127613443740355536.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Make __zerocopy_sg_from_iter() call iov_iter_extract_pages() to get pages that have been ref'd, pinned or left alone as appropriate. As this is only used for source buffers, pinning isn't an option, but being unref'd is. The way __zerocopy_sg_from_iter() merges fragments is also altered, such that fragments must also match their cleanup modes to be merged. An extra helper and wrapper, folio_put_unpin_sub() and page_put_unpin_sub() are added to allow multiple refs to be put/unpinned. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: netdev@vger.kernel.org --- include/linux/mm.h | 2 ++ mm/gup.c | 25 +++++++++++++++++++++++++ net/core/datagram.c | 23 +++++++++++++---------- 3 files changed, 40 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f14edb192394..e3923b89c75e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1368,7 +1368,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) #endif void folio_put_unpin(struct folio *folio, unsigned int flags); +void folio_put_unpin_sub(struct folio *folio, unsigned int flags, unsigned int refs); void page_put_unpin(struct page *page, unsigned int flags); +void page_put_unpin_sub(struct page *page, unsigned int flags, unsigned int refs); /* * The identification function is mainly used by the buddy allocator for diff --git a/mm/gup.c b/mm/gup.c index 3ee4b4c7e0cb..49dd27ba6c13 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -213,6 +213,31 @@ void page_put_unpin(struct page *page, unsigned int flags) } EXPORT_SYMBOL_GPL(page_put_unpin); +/** + * folio_put_unpin_sub - Unpin/put a folio as appropriate + * @folio: The folio to release + * @flags: gup flags indicating the mode of release (FOLL_*) + * @refs: Number of refs/pins to drop + * + * Release a folio according to the flags. If FOLL_GET is set, the folio has a + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left + * unaltered. + */ +void folio_put_unpin_sub(struct folio *folio, unsigned int flags, + unsigned int refs) +{ + if (flags & (FOLL_GET | FOLL_PIN)) + gup_put_folio(folio, refs, flags); +} +EXPORT_SYMBOL_GPL(folio_put_unpin_sub); + +void page_put_unpin_sub(struct page *page, unsigned int flags, + unsigned int refs) +{ + folio_put_unpin_sub(page_folio(page), flags, refs); +} +EXPORT_SYMBOL_GPL(page_put_unpin_sub); + /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount * @page: pointer to page to be grabbed diff --git a/net/core/datagram.c b/net/core/datagram.c index 122bfb144d32..63ea1f8817e0 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -614,6 +614,7 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { + unsigned int cleanup_mode = iov_iter_extract_mode(from, FOLL_SOURCE_BUF); int frag; if (msg && msg->msg_ubuf && msg->sg_from_iter) @@ -622,7 +623,7 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { - struct page *pages[MAX_SKB_FRAGS]; + struct page *pages[MAX_SKB_FRAGS], **ppages = pages; struct page *last_head = NULL; size_t start; ssize_t copied; @@ -632,9 +633,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; - copied = iov_iter_get_pages(from, pages, length, - MAX_SKB_FRAGS - frag, &start, - FOLL_SOURCE_BUF); + copied = iov_iter_extract_pages(from, &ppages, length, + MAX_SKB_FRAGS - frag, + FOLL_SOURCE_BUF, &start); if (copied < 0) return -EFAULT; @@ -662,12 +663,14 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, skb_frag_t *last = &skb_shinfo(skb)->frags[frag - 1]; if (head == skb_frag_page(last) && + cleanup_mode == skb_frag_cleanup(last) && start == skb_frag_off(last) + skb_frag_size(last)) { skb_frag_size_add(last, size); /* We combined this page, we need to release - * a reference. Since compound pages refcount - * is shared among many pages, batch the refcount - * adjustments to limit false sharing. + * a reference or a pin. Since compound pages + * refcount is shared among many pages, batch + * the refcount adjustments to limit false + * sharing. */ last_head = head; refs++; @@ -675,14 +678,14 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, } } if (refs) { - page_ref_sub(last_head, refs); + page_put_unpin_sub(last_head, cleanup_mode, refs); refs = 0; } skb_fill_page_desc_noacc(skb, frag++, head, start, size, - FOLL_GET); + cleanup_mode); } if (refs) - page_ref_sub(last_head, refs); + page_put_unpin_sub(last_head, cleanup_mode, refs); } return 0; }