From patchwork Tue Aug 3 19:18:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E4C1C4338F for ; Tue, 3 Aug 2021 19:18:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 651736109F for ; Tue, 3 Aug 2021 19:18:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239633AbhHCTSs (ORCPT ); Tue, 3 Aug 2021 15:18:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:43527 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239628AbhHCTSq (ORCPT ); Tue, 3 Aug 2021 15:18:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018315; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eHAPyz9FQE5Ee8AyZmG2GtC2H3b8hfWmrC+ZK8I1sOc=; b=h3X1vuU9g/hBayDGIBivvGHAQmPjsGQFmKEf3/oF4oiWaf5KUhB53fTUHGDrbo1C1GrCxe AQDPG7wUagbDqmQgnN3EKLgNFvHpE4a+00VNmPP+38ObQd8sP5XYSDViZB58ZXqmfyqyHk 4aOIhGC07Cgp1n0kWfyv7tV1iVvRSDo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-174-wmjaZHIfN8m5iYZckG7Xmw-1; Tue, 03 Aug 2021 15:18:32 -0400 X-MC-Unique: wmjaZHIfN8m5iYZckG7Xmw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8E14D800489; Tue, 3 Aug 2021 19:18:30 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 24BAA60C0F; Tue, 3 Aug 2021 19:18:27 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 01/12] iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value Date: Tue, 3 Aug 2021 21:18:07 +0200 Message-Id: <20210803191818.993968-2-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Both iov_iter_get_pages and iov_iter_get_pages_alloc return the number of bytes of the iovec they could get the pages for. When they cannot get any pages, they're supposed to return 0, but when the start of the iovec isn't page aligned, the calculation goes wrong and they return a negative value. Fix both functions. In addition, change iov_iter_get_pages_alloc to return NULL in that case to prevent resource leaks. It seems that the cifs and nfs filesystems don't handle the zero case very well. Could the maintainers please have a look? Signed-off-by: Andreas Gruenbacher --- lib/iov_iter.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index e23123ae3a13..25dfc48536d7 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1484,7 +1484,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, res = get_user_pages_fast(addr, n, iov_iter_rw(i) != WRITE ? FOLL_WRITE : 0, pages); - if (unlikely(res < 0)) + if (unlikely(res <= 0)) return res; return (res == n ? len : res * PAGE_SIZE) - *start; } @@ -1608,8 +1608,9 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, return -ENOMEM; res = get_user_pages_fast(addr, n, iov_iter_rw(i) != WRITE ? FOLL_WRITE : 0, p); - if (unlikely(res < 0)) { + if (unlikely(res <= 0)) { kvfree(p); + *pages = NULL; return res; } *pages = p; From patchwork Tue Aug 3 19:18:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED626C4320A for ; Tue, 3 Aug 2021 19:18:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D5E9F61037 for ; Tue, 3 Aug 2021 19:18:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239660AbhHCTSu (ORCPT ); Tue, 3 Aug 2021 15:18:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:55500 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239646AbhHCTSt (ORCPT ); Tue, 3 Aug 2021 15:18:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018317; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q17h1pdEGdt8JIaw02VsUlnrBoQF8gFRk/rFD3Gs3xY=; b=GA50NCjLF7qb+UTdPk9+GkVvwKsAZgpunRtO1Ej+Kfkcxy8ErqZVtjJnkkK9DMonHAxLkf o/dLTDY2yYTu3Hh6SB+sTz5UHlvhgLQWFkdD0ykB0GTT5IBoUdxPgUox4jCcfdTIbcYNjt PRr8e5S+NuJi7qb6F0RXLYesJzF0WWw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-187-vsWp9TtcOrevVIR2_y5zBw-1; Tue, 03 Aug 2021 15:18:36 -0400 X-MC-Unique: vsWp9TtcOrevVIR2_y5zBw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D6CDA3E75E; Tue, 3 Aug 2021 19:18:33 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id E72B160C0F; Tue, 3 Aug 2021 19:18:30 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Paul Mackerras Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher , kvm-ppc@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v5 02/12] powerpc/kvm: Fix kvm_use_magic_page Date: Tue, 3 Aug 2021 21:18:08 +0200 Message-Id: <20210803191818.993968-3-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When switching from __get_user to fault_in_pages_readable, commit 9f9eae5ce717 broke kvm_use_magic_page: fault_in_pages_readable returns 0 on success. Fix that. Fixes: 9f9eae5ce717 ("powerpc/kvm: Prefer fault_in_pages_readable function") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Andreas Gruenbacher --- arch/powerpc/kernel/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 617eba82531c..d89cf802d9aa 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -669,7 +669,7 @@ static void __init kvm_use_magic_page(void) on_each_cpu(kvm_map_magic_page, &features, 1); /* Quick self-test to see if the mapping works */ - if (!fault_in_pages_readable((const char *)KVM_MAGIC_PAGE, sizeof(u32))) { + if (fault_in_pages_readable((const char *)KVM_MAGIC_PAGE, sizeof(u32))) { kvm_patching_worked = false; return; } From patchwork Tue Aug 3 19:18:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68536C432BE for ; Tue, 3 Aug 2021 19:18:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4F48960F56 for ; Tue, 3 Aug 2021 19:18:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238837AbhHCTS5 (ORCPT ); Tue, 3 Aug 2021 15:18:57 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:22769 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239671AbhHCTSx (ORCPT ); Tue, 3 Aug 2021 15:18:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TmjdhUHB6Se8zZxQbIb0gs+fxpMofOgOHXWX8Hp4lPs=; b=GAGFUbOM7ra0kUmRfBTyaO52rRf7TrRwf8oZ4YloWWxArvDko/SBm2R4Pqqg4tRTI+cvi2 lzsWxrCpnUyvc8CPJJZOa1bQW6YhrK8EWgliBU6fsp0buTV+29u7Y0vuX6IFS8kVl9rpD4 9MdykostyVGv+9+ZOr0jozSogvbXbJw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-439-egEPNhDaOm-iguiPRWiQ9A-1; Tue, 03 Aug 2021 15:18:38 -0400 X-MC-Unique: egEPNhDaOm-iguiPRWiQ9A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C57BB3E75B; Tue, 3 Aug 2021 19:18:36 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 394C060C0F; Tue, 3 Aug 2021 19:18:34 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Date: Tue, 3 Aug 2021 21:18:09 +0200 Message-Id: <20210803191818.993968-4-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Turn fault_in_pages_{readable,writeable} into versions that return the number of bytes faulted in instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in, but also new users that are happy if any pages can be faulted in. Neither of these functions is entirely trivial and it doesn't seem useful to inline them, so move them to mm/gup.c. Rename the functions to fault_in_{readable,writeable} to make sure that code that uses them can be fixed instead of breaking silently. Signed-off-by: Andreas Gruenbacher --- arch/powerpc/kernel/kvm.c | 3 +- arch/powerpc/kernel/signal_32.c | 4 +- arch/powerpc/kernel/signal_64.c | 2 +- arch/x86/kernel/fpu/signal.c | 8 ++-- drivers/gpu/drm/armada/armada_gem.c | 7 ++-- fs/btrfs/ioctl.c | 7 ++-- include/linux/pagemap.h | 57 ++--------------------------- lib/iov_iter.c | 12 +++--- mm/filemap.c | 2 +- mm/gup.c | 52 ++++++++++++++++++++++++++ 10 files changed, 78 insertions(+), 76 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index d89cf802d9aa..b8fe9f16dec2 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -669,7 +669,8 @@ static void __init kvm_use_magic_page(void) on_each_cpu(kvm_map_magic_page, &features, 1); /* Quick self-test to see if the mapping works */ - if (fault_in_pages_readable((const char *)KVM_MAGIC_PAGE, sizeof(u32))) { + if (fault_in_readable((const char __user *)KVM_MAGIC_PAGE, + sizeof(u32)) != sizeof(u32)) { kvm_patching_worked = false; return; } diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 0608581967f0..4619604b85f6 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1048,7 +1048,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, if (new_ctx == NULL) return 0; if (!access_ok(new_ctx, ctx_size) || - fault_in_pages_readable((u8 __user *)new_ctx, ctx_size)) + fault_in_readable((char __user *)new_ctx, ctx_size) != ctx_size) return -EFAULT; /* @@ -1237,7 +1237,7 @@ SYSCALL_DEFINE3(debug_setcontext, struct ucontext __user *, ctx, #endif if (!access_ok(ctx, sizeof(*ctx)) || - fault_in_pages_readable((u8 __user *)ctx, sizeof(*ctx))) + fault_in_readable((char __user *)ctx, sizeof(*ctx)) != sizeof(*ctx)) return -EFAULT; /* diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 1831bba0582e..889612ac23ca 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -688,7 +688,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, if (new_ctx == NULL) return 0; if (!access_ok(new_ctx, ctx_size) || - fault_in_pages_readable((u8 __user *)new_ctx, ctx_size)) + fault_in_readable((char __user *)new_ctx, ctx_size) != ctx_size) return -EFAULT; /* diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index 445c57c9c539..9cb7c4d2c6d2 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -205,7 +205,8 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size) fpregs_unlock(); if (ret) { - if (!fault_in_pages_writeable(buf_fx, fpu_user_xstate_size)) + if (fault_in_writeable(buf_fx, fpu_user_xstate_size) == + fpu_user_xstate_size) goto retry; return -EFAULT; } @@ -278,10 +279,9 @@ static int restore_fpregs_from_user(void __user *buf, u64 xrestore, if (ret != -EFAULT) return -EINVAL; - ret = fault_in_pages_readable(buf, size); - if (!ret) + if (fault_in_readable(buf, size) == size) goto retry; - return ret; + return -EFAULT; } /* diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c index 21909642ee4c..ceb68a5ee31f 100644 --- a/drivers/gpu/drm/armada/armada_gem.c +++ b/drivers/gpu/drm/armada/armada_gem.c @@ -336,7 +336,7 @@ int armada_gem_pwrite_ioctl(struct drm_device *dev, void *data, struct drm_armada_gem_pwrite *args = data; struct armada_gem_object *dobj; char __user *ptr; - int ret; + int ret = 0; DRM_DEBUG_DRIVER("handle %u off %u size %u ptr 0x%llx\n", args->handle, args->offset, args->size, args->ptr); @@ -349,9 +349,8 @@ int armada_gem_pwrite_ioctl(struct drm_device *dev, void *data, if (!access_ok(ptr, args->size)) return -EFAULT; - ret = fault_in_pages_readable(ptr, args->size); - if (ret) - return ret; + if (fault_in_readable(ptr, args->size) != args->size) + return -EFAULT; dobj = armada_gem_object_lookup(file, args->handle); if (dobj == NULL) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0ba98e08a029..c30382f89544 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2244,9 +2244,10 @@ static noinline int search_ioctl(struct inode *inode, key.offset = sk->min_offset; while (1) { - ret = fault_in_pages_writeable(ubuf + sk_offset, - *buf_size - sk_offset); - if (ret) + size_t size = *buf_size - sk_offset; + + ret = -EFAULT; + if (fault_in_writeable(ubuf + sk_offset, size) != size) break; ret = btrfs_search_forward(root, &key, path, sk->min_transid); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ed02aa522263..7c9edc9694d9 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -734,61 +734,10 @@ int wait_on_page_private_2_killable(struct page *page); extern void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter); /* - * Fault everything in given userspace address range in. + * Fault in userspace address range. */ -static inline int fault_in_pages_writeable(char __user *uaddr, int size) -{ - char __user *end = uaddr + size - 1; - - if (unlikely(size == 0)) - return 0; - - if (unlikely(uaddr > end)) - return -EFAULT; - /* - * Writing zeroes into userspace here is OK, because we know that if - * the zero gets there, we'll be overwriting it. - */ - do { - if (unlikely(__put_user(0, uaddr) != 0)) - return -EFAULT; - uaddr += PAGE_SIZE; - } while (uaddr <= end); - - /* Check whether the range spilled into the next page. */ - if (((unsigned long)uaddr & PAGE_MASK) == - ((unsigned long)end & PAGE_MASK)) - return __put_user(0, end); - - return 0; -} - -static inline int fault_in_pages_readable(const char __user *uaddr, int size) -{ - volatile char c; - const char __user *end = uaddr + size - 1; - - if (unlikely(size == 0)) - return 0; - - if (unlikely(uaddr > end)) - return -EFAULT; - - do { - if (unlikely(__get_user(c, uaddr) != 0)) - return -EFAULT; - uaddr += PAGE_SIZE; - } while (uaddr <= end); - - /* Check whether the range spilled into the next page. */ - if (((unsigned long)uaddr & PAGE_MASK) == - ((unsigned long)end & PAGE_MASK)) { - return __get_user(c, end); - } - - (void)c; - return 0; -} +size_t fault_in_writeable(char __user *uaddr, size_t size); +size_t fault_in_readable(const char __user *uaddr, size_t size); int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t index, gfp_t gfp_mask); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 25dfc48536d7..92d877a698f0 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -191,7 +191,8 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b buf = iov->iov_base + skip; copy = min(bytes, iov->iov_len - skip); - if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_pages_writeable(buf, copy)) { + if (IS_ENABLED(CONFIG_HIGHMEM) && + fault_in_writeable(buf, copy) == copy) { kaddr = kmap_atomic(page); from = kaddr + offset; @@ -275,7 +276,8 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t buf = iov->iov_base + skip; copy = min(bytes, iov->iov_len - skip); - if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_pages_readable(buf, copy)) { + if (IS_ENABLED(CONFIG_HIGHMEM) && + fault_in_readable(buf, copy) == copy) { kaddr = kmap_atomic(page); to = kaddr + offset; @@ -446,13 +448,11 @@ int iov_iter_fault_in_readable(const struct iov_iter *i, size_t bytes) bytes = i->count; for (p = i->iov, skip = i->iov_offset; bytes; p++, skip = 0) { size_t len = min(bytes, p->iov_len - skip); - int err; if (unlikely(!len)) continue; - err = fault_in_pages_readable(p->iov_base + skip, len); - if (unlikely(err)) - return err; + if (fault_in_readable(p->iov_base + skip, len) != len) + return -EFAULT; bytes -= len; } } diff --git a/mm/filemap.c b/mm/filemap.c index d1458ecf2f51..4dec3bc7752e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -88,7 +88,7 @@ * ->lock_page (access_process_vm) * * ->i_mutex (generic_perform_write) - * ->mmap_lock (fault_in_pages_readable->do_page_fault) + * ->mmap_lock (fault_in_readable->do_page_fault) * * bdi->wb.list_lock * sb_lock (fs/fs-writeback.c) diff --git a/mm/gup.c b/mm/gup.c index 42b8b1fa6521..d04984d5d93c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1669,6 +1669,58 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start, } #endif /* !CONFIG_MMU */ +size_t fault_in_writeable(char __user *uaddr, size_t size) +{ + char __user *start = uaddr, *end; + + if (unlikely(size == 0)) + return 0; + if (!PAGE_ALIGNED(uaddr)) { + if (unlikely(__put_user(0, uaddr) != 0)) + return 0; + uaddr = (char __user *)PAGE_ALIGN((unsigned long)uaddr); + } + end = (char __user *)PAGE_ALIGN((unsigned long)start + size); + if (unlikely(end < start)) + end = NULL; + while (uaddr != end) { + if (unlikely(__put_user(0, uaddr) != 0)) + goto out; + uaddr += PAGE_SIZE; + } + +out: + return min_t(size_t, uaddr - start, size); +} +EXPORT_SYMBOL(fault_in_writeable); + +size_t fault_in_readable(const char __user *uaddr, size_t size) +{ + const char __user *start = uaddr, *end; + volatile char c; + + if (unlikely(size == 0)) + return 0; + if (!PAGE_ALIGNED(uaddr)) { + if (unlikely(__get_user(c, uaddr) != 0)) + return 0; + uaddr = (const char __user *)PAGE_ALIGN((unsigned long)uaddr); + } + end = (const char __user *)PAGE_ALIGN((unsigned long)start + size); + if (unlikely(end < start)) + end = NULL; + while (uaddr != end) { + if (unlikely(__get_user(c, uaddr) != 0)) + goto out; + uaddr += PAGE_SIZE; + } + +out: + (void)c; + return min_t(size_t, uaddr - start, size); +} +EXPORT_SYMBOL(fault_in_readable); + /** * get_dump_page() - pin user page in memory while writing it to core dump * @addr: user address From patchwork Tue Aug 3 19:18:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C032C4338F for ; Tue, 3 Aug 2021 19:18:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54EE761037 for ; Tue, 3 Aug 2021 19:18:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239692AbhHCTS4 (ORCPT ); Tue, 3 Aug 2021 15:18:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:59002 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239672AbhHCTSx (ORCPT ); Tue, 3 Aug 2021 15:18:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d4E1Qin+ePcwNeaFJu+kDVoz1ZDgT3Tu2W5ZUgmPT3Q=; b=FyWHxquRRB3ZjgHNvUXuuTOdTpAt7+GPv0phr/Z+LnxGAiqnI0xJ9ffNgGmHd9RDgt+3TI sPHtD3pZ0BS1lJPpdhEWumAdmWHYtw8qg95j/zPGiKhNveS897jW57W7ZJZ4GDA8csOx4F pTtTfyx7EIzsLIHRxfR9Dpv+8bInmys= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-392-I4hoRkuWOxmVpzPQ0rrEZQ-1; Tue, 03 Aug 2021 15:18:41 -0400 X-MC-Unique: I4hoRkuWOxmVpzPQ0rrEZQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8D5771802968; Tue, 3 Aug 2021 19:18:39 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 271F460C0F; Tue, 3 Aug 2021 19:18:36 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 04/12] Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Date: Tue, 3 Aug 2021 21:18:10 +0200 Message-Id: <20210803191818.993968-5-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Turn iov_iter_fault_in_readable into a function that returns the number of bytes faulted in instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in, but also new users that are happy if any pages can be faulted in. Rename iov_iter_fault_in_readable to an unfortunately clumsy fault_in_iov_iter_readable to make sure that code that uses it can be fixed instead of breaking silently. Fix up the existing users. Signed-off-by: Andreas Gruenbacher --- fs/btrfs/file.c | 3 ++- fs/f2fs/file.c | 6 +++--- fs/fuse/file.c | 2 +- fs/iomap/buffered-io.c | 2 +- fs/ntfs/file.c | 2 +- include/linux/uio.h | 2 +- lib/iov_iter.c | 33 ++++++++++++++++++++++----------- mm/filemap.c | 2 +- 8 files changed, 32 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index ee34497500e1..8ff9e0bb5b0f 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1698,7 +1698,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, * Fault pages before locking them in prepare_pages * to avoid recursive lock */ - if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, write_bytes) != + write_bytes)) { ret = -EFAULT; break; } diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 6afd4562335f..7c172573f18a 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4255,16 +4255,16 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = generic_write_checks(iocb, from); if (ret > 0) { + size_t count = iov_iter_count(from); bool preallocated = false; size_t target_size = 0; int err; - if (iov_iter_fault_in_readable(from, iov_iter_count(from))) + if (fault_in_iov_iter_readable(from, count) != count) set_inode_flag(inode, FI_NO_PREALLOC); if ((iocb->ki_flags & IOCB_NOWAIT)) { - if (!f2fs_overwrite_io(inode, iocb->ki_pos, - iov_iter_count(from)) || + if (!f2fs_overwrite_io(inode, iocb->ki_pos, count) || f2fs_has_inline_data(inode) || f2fs_force_buffered_io(inode, iocb, from)) { clear_inode_flag(inode, FI_NO_PREALLOC); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 97f860cfc195..d5dd01f20f1e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1160,7 +1160,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, again: err = -EFAULT; - if (iov_iter_fault_in_readable(ii, bytes)) + if (fault_in_iov_iter_readable(ii, bytes) != bytes) break; err = -ENOMEM; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 87ccb3438bec..d5de094fef73 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -749,7 +749,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, * same page as we're writing to, without it being marked * up-to-date. */ - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, bytes) != bytes)) { status = -EFAULT; break; } diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index ab4f3362466d..cddac274c35a 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -1829,7 +1829,7 @@ static ssize_t ntfs_perform_write(struct file *file, struct iov_iter *i, * pages being swapped out between us bringing them into memory * and doing the actual copying. */ - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, bytes) != bytes)) { status = -EFAULT; break; } diff --git a/include/linux/uio.h b/include/linux/uio.h index 82c3c3e819e0..12d30246c2e9 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -119,7 +119,7 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, struct iov_iter *i); void iov_iter_advance(struct iov_iter *i, size_t bytes); void iov_iter_revert(struct iov_iter *i, size_t bytes); -int iov_iter_fault_in_readable(const struct iov_iter *i, size_t bytes); +size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t bytes); size_t iov_iter_single_seg_count(const struct iov_iter *i); size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 92d877a698f0..c0fa1618561c 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -432,33 +432,44 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by } /* + * fault_in_iov_iter_readable - fault in iov iterator for reading + * @i: iterator + * @size: maximum length + * * Fault in one or more iovecs of the given iov_iter, to a maximum length of - * bytes. For each iovec, fault in each page that constitutes the iovec. + * @size. For each iovec, fault in each page that constitutes the iovec. + * + * Returns the number of bytes faulted in, or 0 if no bytes could be faulted in + * (i.e., because the address is invalid). * - * Return 0 on success, or non-zero if the memory could not be accessed (i.e. - * because it is an invalid address). + * Always returns the number of avaliable bytes for non-user space iterators. */ -int iov_iter_fault_in_readable(const struct iov_iter *i, size_t bytes) +size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t size) { + if (size > i->count) + size = i->count; + if (iter_is_iovec(i)) { const struct iovec *p; + size_t bytes = size; size_t skip; - if (bytes > i->count) - bytes = i->count; for (p = i->iov, skip = i->iov_offset; bytes; p++, skip = 0) { size_t len = min(bytes, p->iov_len - skip); + size_t ret; if (unlikely(!len)) continue; - if (fault_in_readable(p->iov_base + skip, len) != len) - return -EFAULT; - bytes -= len; + ret = fault_in_readable(p->iov_base + skip, len); + bytes -= ret; + if (ret != len) + break; } + return size - bytes; } - return 0; + return size; } -EXPORT_SYMBOL(iov_iter_fault_in_readable); +EXPORT_SYMBOL(fault_in_iov_iter_readable); void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov, unsigned long nr_segs, diff --git a/mm/filemap.c b/mm/filemap.c index 4dec3bc7752e..5f5aed060c9e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3643,7 +3643,7 @@ ssize_t generic_perform_write(struct file *file, * same page as we're writing to, without it being marked * up-to-date. */ - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, bytes) != bytes)) { status = -EFAULT; break; } From patchwork Tue Aug 3 19:18:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD148C43214 for ; Tue, 3 Aug 2021 19:18:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 956C2610FF for ; Tue, 3 Aug 2021 19:18:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239686AbhHCTS6 (ORCPT ); Tue, 3 Aug 2021 15:18:58 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:43380 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239658AbhHCTS5 (ORCPT ); Tue, 3 Aug 2021 15:18:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018325; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wn0G7DzjNFS1NPx6mwUt+Tc3WQYx1yMUPzqFjXZr23Q=; b=OmH3agdk/ZlX0P/1CCh+NtxNkZ1gzxh6PUGBJMDRDbNsQHu/+qqad+gc+7LJIV9P8g88uE jjzRVb5mOnGwgJCVh5QDnu5X4Mw/346ENZZyHKzNImE8bwLg/TreqjFFTux/mFD+bYqyef fPb39xpt70y8Ql/Cf0G5rFzDwnW5ABo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-507-2SkAx6-fMW6adrqFDRv6gw-1; Tue, 03 Aug 2021 15:18:44 -0400 X-MC-Unique: 2SkAx6-fMW6adrqFDRv6gw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4E760107ACF5; Tue, 3 Aug 2021 19:18:42 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id E239060C0F; Tue, 3 Aug 2021 19:18:39 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 05/12] iov_iter: Introduce fault_in_iov_iter_writeable Date: Tue, 3 Aug 2021 21:18:11 +0200 Message-Id: <20210803191818.993968-6-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce a new fault_in_iov_iter_writeable helper for safely faulting in an iterator for writing. Uses get_user_pages() to fault in the pages without actually writing to them, which would be destructive. We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that the iterator passed to .read_iter isn't in memory. Signed-off-by: Andreas Gruenbacher --- include/linux/pagemap.h | 1 + include/linux/uio.h | 1 + lib/iov_iter.c | 41 +++++++++++++++++++++++++++ mm/gup.c | 61 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 104 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 7c9edc9694d9..a629807edb8c 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -737,6 +737,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter); * Fault in userspace address range. */ size_t fault_in_writeable(char __user *uaddr, size_t size); +size_t fault_in_safe_writeable(const char __user *uaddr, size_t size); size_t fault_in_readable(const char __user *uaddr, size_t size); int add_to_page_cache_locked(struct page *page, struct address_space *mapping, diff --git a/include/linux/uio.h b/include/linux/uio.h index 12d30246c2e9..ffa431aeb067 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -120,6 +120,7 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, void iov_iter_advance(struct iov_iter *i, size_t bytes); void iov_iter_revert(struct iov_iter *i, size_t bytes); size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t bytes); +size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t bytes); size_t iov_iter_single_seg_count(const struct iov_iter *i); size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index c0fa1618561c..4ffc76801eaa 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -471,6 +471,47 @@ size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t size) } EXPORT_SYMBOL(fault_in_iov_iter_readable); +/* + * fault_in_iov_iter_writeable - fault in iov iterator for writing + * @i: iterator + * @size: maximum length + * + * Faults in the iterator using get_user_pages(), i.e., without triggering + * hardware page faults. This is primarily useful when we know that some or + * all of the pages in @i aren't in memory. + * + * Returns the number of bytes faulted in, or 0 if no bytes could be faulted in + * (i.e., because the address is invalid). + * + * Always returns the number of avaliable bytes for non-user space iterators. + */ +size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t size) +{ + if (size > i->count) + size = i->count; + + if (iter_is_iovec(i)) { + const struct iovec *p; + size_t bytes = size; + size_t skip; + + for (p = i->iov, skip = i->iov_offset; bytes; p++, skip = 0) { + size_t len = min(bytes, p->iov_len - skip); + size_t ret; + + if (unlikely(!len)) + continue; + ret = fault_in_safe_writeable(p->iov_base + skip, len); + bytes -= ret; + if (ret != len) + break; + } + return size - bytes; + } + return size; +} +EXPORT_SYMBOL(fault_in_iov_iter_writeable); + void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov, unsigned long nr_segs, size_t count) diff --git a/mm/gup.c b/mm/gup.c index d04984d5d93c..7218e27c2481 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1694,6 +1694,67 @@ size_t fault_in_writeable(char __user *uaddr, size_t size) } EXPORT_SYMBOL(fault_in_writeable); +/** + * fault_in_safe_writeable - fault in an address range for writing + * @uaddr: start of address range + * @size: length of address range + * + * Faults in an address range using get_user_pages, i.e., without triggering + * hardware page faults. This is primarily useful when we know that some or + * all of the pages in the address range aren't in memory. + * + * Other than fault_in_writeable(), this function is non-destructive. + * + * Note that we don't pin or otherwise hold the pages referenced that we fault + * in. There's no guarantee that they'll stay in memory for any duration of + * time. + * + * Returns the number of bytes faulted in from @uaddr. + */ +size_t fault_in_safe_writeable(const char __user *uaddr, size_t size) +{ + unsigned long start = (unsigned long)uaddr; + unsigned long end, nstart, nend; + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma = NULL; + int locked = 0; + + /* FIXME: Protect against overflow! */ + + end = PAGE_ALIGN(start + size); + for (nstart = start & PAGE_MASK; nstart < end; nstart = nend) { + unsigned long nr_pages; + long ret; + + if (!locked) { + locked = 1; + mmap_read_lock(mm); + vma = find_vma(mm, nstart); + } else if (nstart >= vma->vm_end) + vma = vma->vm_next; + if (!vma || vma->vm_start >= end) + break; + nend = min(end, vma->vm_end); + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) + continue; + if (nstart < vma->vm_start) + nstart = vma->vm_start; + nr_pages = (nend - nstart) / PAGE_SIZE; + ret = __get_user_pages_locked(mm, nstart, nr_pages, + NULL, NULL, &locked, + FOLL_TOUCH | FOLL_WRITE); + if (ret <= 0) + break; + nend = nstart + ret * PAGE_SIZE; + } + if (locked) + mmap_read_unlock(mm); + if (nstart > start) + return min(nstart - start, size); + return 0; +} +EXPORT_SYMBOL(fault_in_safe_writeable); + size_t fault_in_readable(const char __user *uaddr, size_t size) { const char __user *start = uaddr, *end; From patchwork Tue Aug 3 19:18:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96DBDC4338F for ; Tue, 3 Aug 2021 19:20:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7679F61078 for ; Tue, 3 Aug 2021 19:20:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239162AbhHCTUO (ORCPT ); Tue, 3 Aug 2021 15:20:14 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:39153 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239702AbhHCTTE (ORCPT ); Tue, 3 Aug 2021 15:19:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hSS15ajp7yjVj1XeNBbLODAMJtXekv1X+I9IyWo/qGc=; b=eaByFeO+kaGIuMWBbQjqk41snX1B/bBQp+nrKGwq6JN2dx2J/nCZwGVFtt1VasnakA9VDJ C7W9BMimJ6aD+FzzXDDOhVh0UfGgq3H2vX1LlQN9fdJy08PViYl6JLCp7eQ5TX938//NfE T5Hf1DfAgBKFDB9uI4g8WHCXsalL/yM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-597-pBTCAMVxNAe4VrNvx08pSQ-1; Tue, 03 Aug 2021 15:18:50 -0400 X-MC-Unique: pBTCAMVxNAe4VrNvx08pSQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9DB95802CB9; Tue, 3 Aug 2021 19:18:48 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id A48C360C0F; Tue, 3 Aug 2021 19:18:42 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 06/12] gfs2: Add wrapper for iomap_file_buffered_write Date: Tue, 3 Aug 2021 21:18:12 +0200 Message-Id: <20210803191818.993968-7-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a wrapper around iomap_file_buffered_write. We'll add code for when the operation needs to be retried here later. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 84ec053d43b4..55ec1cadc9e6 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -876,6 +876,18 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) return written ? written : ret; } +static ssize_t gfs2_file_buffered_write(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + ssize_t ret; + + current->backing_dev_info = inode_to_bdi(inode); + ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); + current->backing_dev_info = NULL; + return ret; +} + /** * gfs2_file_write_iter - Perform a write to a file * @iocb: The io context @@ -927,9 +939,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) goto out_unlock; iocb->ki_flags |= IOCB_DSYNC; - current->backing_dev_info = inode_to_bdi(inode); - buffered = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); - current->backing_dev_info = NULL; + buffered = gfs2_file_buffered_write(iocb, from); if (unlikely(buffered <= 0)) { if (!ret) ret = buffered; @@ -951,9 +961,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (!ret || ret2 > 0) ret += ret2; } else { - current->backing_dev_info = inode_to_bdi(inode); - ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); - current->backing_dev_info = NULL; + ret = gfs2_file_buffered_write(iocb, from); if (likely(ret > 0)) { iocb->ki_pos += ret; ret = generic_write_sync(iocb, ret); From patchwork Tue Aug 3 19:18:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F92EC4338F for ; Tue, 3 Aug 2021 19:19:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 35E1A60F56 for ; Tue, 3 Aug 2021 19:19:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239716AbhHCTTZ (ORCPT ); Tue, 3 Aug 2021 15:19:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:50427 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239720AbhHCTTH (ORCPT ); Tue, 3 Aug 2021 15:19:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+5BC1mFHM8gqsESWA6R9wx6W+Jva1HWfryPTXX6lIUw=; b=IaIayPHT50yNXGgGkdBzRWvZvKkbKFmBeEVsFFoiofU8y+JzrVLgvOBDo8fzS4+nZmtk0/ 8MxspfGK820BVs7v8fWvfFxazAdVJ4Nj+km8+MfwWCkGHWOrbUkipz6dzjBmW7YovShI2Y wE724CxcrSNLplMwg57rUopYJyaExpU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-528-GIPZ1TGXPMCWBUpDsykEQQ-1; Tue, 03 Aug 2021 15:18:52 -0400 X-MC-Unique: GIPZ1TGXPMCWBUpDsykEQQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 751B81084F56; Tue, 3 Aug 2021 19:18:51 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id F1B8E60C0F; Tue, 3 Aug 2021 19:18:48 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 07/12] gfs2: Fix mmap + page fault deadlocks for buffered I/O Date: Tue, 3 Aug 2021 21:18:13 +0200 Message-Id: <20210803191818.993968-8-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In the .read_iter and .write_iter file operations, we're accessing user-space memory while holding the inodes glock. There's a possibility that the memory is mapped to the same file, in which case we'd recurse on the same glock. More complex scenarios can involve multiple glocks, processes, and even cluster nodes. Avoids these kinds of problems by disabling page faults while holding a glock. If a page fault occurs, we either end up with a partial read or write, or with -EFAULT if nothing could be read or written. In that case, we drop the glock, fault in the requested pages manually, and repeat the operation. This locking problem in gfs2 was originally reported by Jan Kara. Linus came up with the proposal to disable page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback as well. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 55ec1cadc9e6..c0f86a28f1bf 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -843,6 +843,12 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) size_t written = 0; ssize_t ret; + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and retry. + */ + if (iocb->ki_flags & IOCB_DIRECT) { ret = gfs2_file_direct_read(iocb, to, &gh); if (likely(ret != -ENOTBLK)) @@ -864,13 +870,20 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) } ip = GFS2_I(iocb->ki_filp->f_mapping->host); gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); +retry: ret = gfs2_glock_nq(&gh); if (ret) goto out_uninit; + pagefault_disable(); ret = generic_file_read_iter(iocb, to); + pagefault_enable(); if (ret > 0) written += ret; gfs2_glock_dq(&gh); + if (unlikely(iov_iter_count(to) && (ret > 0 || ret == -EFAULT)) && + iter_is_iovec(to) && + fault_in_iov_iter_writeable(to, SIZE_MAX) != 0) + goto retry; out_uninit: gfs2_holder_uninit(&gh); return written ? written : ret; @@ -882,9 +895,22 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb, struct iov_iter *fro struct inode *inode = file_inode(file); ssize_t ret; + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and retry. + */ + +retry: current->backing_dev_info = inode_to_bdi(inode); + pagefault_disable(); ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); + pagefault_enable(); current->backing_dev_info = NULL; + if (unlikely(ret == -EFAULT) && + iter_is_iovec(from) && + fault_in_iov_iter_readable(from, SIZE_MAX) != 0) + goto retry; return ret; } From patchwork Tue Aug 3 19:18:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BDADC432BE for ; Tue, 3 Aug 2021 19:19:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1781661078 for ; Tue, 3 Aug 2021 19:19:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239820AbhHCTT1 (ORCPT ); Tue, 3 Aug 2021 15:19:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:47089 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239724AbhHCTTJ (ORCPT ); Tue, 3 Aug 2021 15:19:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018337; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C5hG7wZS4PGjXuQhhl87E2VxCPw17qdRVBF2bAwnrnA=; b=dDPfs8iCjSG4UHJUyNBXky+vAXNTl56geH5bWLyEF2MylgpUbHX8tTwq+3tLd1PGVcb7BK iRoj4KONtdvJIWfTl/vzJ3VfNfR/xWZDteV+/DTGVAXgFN8PHDhVh3lO1feJT00k4r6Ywt yrSrFcyJwK+5gQv/hTpVCgYRTHbYMio= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-146-HRHaRUdSO_Ci201wR7BTZw-1; Tue, 03 Aug 2021 15:18:55 -0400 X-MC-Unique: HRHaRUdSO_Ci201wR7BTZw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 33C1A8799F0; Tue, 3 Aug 2021 19:18:54 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9DBA60C0F; Tue, 3 Aug 2021 19:18:51 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 08/12] iomap: Fix iomap_dio_rw return value for user copies Date: Tue, 3 Aug 2021 21:18:14 +0200 Message-Id: <20210803191818.993968-9-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When a user copy fails in one of the helpers of iomap_dio_rw, fail with -EFAULT instead of returning 0. This matches what iomap_dio_bio_actor returns when it gets an -EFAULT from bio_iov_iter_get_pages. With these changes, iomap_dio_actor consistently fails with -EFAULT when a user page cannot be faulted in. Signed-off-by: Andreas Gruenbacher --- fs/iomap/direct-io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 9398b8c31323..8054f5d6c273 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -370,7 +370,7 @@ iomap_dio_hole_actor(loff_t length, struct iomap_dio *dio) { length = iov_iter_zero(length, dio->submit.iter); dio->size += length; - return length; + return length ? length : -EFAULT; } static loff_t @@ -397,7 +397,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, copied = copy_to_iter(iomap->inline_data + pos, length, iter); } dio->size += copied; - return copied; + return copied ? copied : -EFAULT; } static loff_t From patchwork Tue Aug 3 19:18:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06DE6C4320A for ; Tue, 3 Aug 2021 19:19:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E432561078 for ; Tue, 3 Aug 2021 19:19:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239839AbhHCTTa (ORCPT ); Tue, 3 Aug 2021 15:19:30 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:48123 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239744AbhHCTTN (ORCPT ); Tue, 3 Aug 2021 15:19:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NoMbSsZN/tNyp+zwu0JwzF1/xZWL1oDU01iWZBbKHSw=; b=L5LNJjBBc2Vw/cdP0rGBNApxhIOU8xDtgcR6XovYSye9jvfyk2S6J+FQcVg13GyR+oUU7N x6eFtr0wJOZC5T1sMlFmd+Ju/WxyKDbGBdiDAYLNkpVwRTMnBCrh+b3P0YufoEIBsFz4S6 m7B7wECZyjQ6Klq2KJS4zTzH2Lc3Cg4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-454-Li2gt2TaO2auf2N6DTaiaw-1; Tue, 03 Aug 2021 15:18:58 -0400 X-MC-Unique: Li2gt2TaO2auf2N6DTaiaw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F29D51006C80; Tue, 3 Aug 2021 19:18:56 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B5A060C0F; Tue, 3 Aug 2021 19:18:54 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 09/12] iomap: Support restarting direct I/O requests after user copy failures Date: Tue, 3 Aug 2021 21:18:15 +0200 Message-Id: <20210803191818.993968-10-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In iomap_dio_rw, when iomap_apply returns an -EFAULT error, complete the request synchronously. Either return a partial result (when the IOMAP_DIO_FAULT_RETRY flag is set and the caller is thus prepared to handle partial results), or reset the iterator to the start and fail to allow restarting the request. Signed-off-by: Andreas Gruenbacher --- fs/iomap/direct-io.c | 13 +++++++++++++ include/linux/iomap.h | 7 +++++++ 2 files changed, 20 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 8054f5d6c273..35c3f2bae65a 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -561,6 +561,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio, iomap_dio_actor); if (ret <= 0) { + if (ret == -EFAULT) { + /* + * Finish synchronously and revert the iterator + * when failing the request to allow a retry. + */ + wait_for_completion = true; + if (dio->size && + (dio_flags & IOMAP_DIO_PARTIAL)) + ret = 0; + else + iov_iter_revert(iter, dio->size); + } + /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { wait_for_completion = true; diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 479c1da3e221..bcae4814b8e3 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -267,6 +267,13 @@ struct iomap_dio_ops { */ #define IOMAP_DIO_OVERWRITE_ONLY (1 << 1) +/* + * When a page fault occurs, return a partial synchronous result and allow + * the caller to retry the rest of the operation after dealing with the page + * fault. + */ +#define IOMAP_DIO_PARTIAL (1 << 2) + ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, unsigned int dio_flags); From patchwork Tue Aug 3 19:18:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FF6EC4338F for ; Tue, 3 Aug 2021 19:19:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38F8061100 for ; Tue, 3 Aug 2021 19:19:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239696AbhHCTTa (ORCPT ); Tue, 3 Aug 2021 15:19:30 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:58545 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239690AbhHCTTO (ORCPT ); Tue, 3 Aug 2021 15:19:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018342; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FrvEv+qwGuLoiznuhz4w3CZfHKKm8wrvKFq/hSMon18=; b=YcLjGovypHxL7Pn9VqZb/JgsHMcT/jslkP+m/TD704AYQrtmJVYeIONlNMs4z7Hwy/U09q 1PXd4/5rtzsy4nBr267WJUdJwZMlIMQphjfE9PUAD+OFNgkJaMUn0sxugR5kXopR3/hqIJ YW78ggCdIm8CxXo3gewfF/FYftUKTa8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-137-YnpJIng5MIC8sgvfgko7tg-1; Tue, 03 Aug 2021 15:19:01 -0400 X-MC-Unique: YnpJIng5MIC8sgvfgko7tg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B6DD06409B; Tue, 3 Aug 2021 19:18:59 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 54A7460C0F; Tue, 3 Aug 2021 19:18:57 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 10/12] iomap: Add done_before argument to iomap_dio_rw Date: Tue, 3 Aug 2021 21:18:16 +0200 Message-Id: <20210803191818.993968-11-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a done_before argument to iomap_dio_rw that indicates how much of the request has already been transferred. When the request succeeds, we report that done_before additional bytes were tranferred. This is useful for finishing a request asynchronously when part of the request has already been completed synchronously. We'll use that to allow iomap_dio_rw to be used with page faults disabled: when a page fault occurs while submitting a request, we synchronously complete the part of the request that has already been submitted. The caller can then take care of the page fault and call iomap_dio_rw again for the rest of the request, passing in the number of bytes already tranferred. Signed-off-by: Andreas Gruenbacher --- fs/btrfs/file.c | 5 +++-- fs/ext4/file.c | 5 +++-- fs/gfs2/file.c | 4 ++-- fs/iomap/direct-io.c | 11 ++++++++--- fs/xfs/xfs_file.c | 6 +++--- fs/zonefs/super.c | 4 ++-- include/linux/iomap.h | 4 ++-- 7 files changed, 23 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 8ff9e0bb5b0f..c20cc0fc61d9 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1946,7 +1946,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) } dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops, - 0); + 0, 0); btrfs_inode_unlock(inode, ilock_flags); @@ -3638,7 +3638,8 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to) return 0; btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); - ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops, 0); + ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops, + 0, 0); btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); return ret; } diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 816dedcbd541..4a5e7fd31fb5 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -74,7 +74,7 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) return generic_file_read_iter(iocb, to); } - ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, 0); + ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, 0, 0); inode_unlock_shared(inode); file_accessed(iocb->ki_filp); @@ -566,7 +566,8 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) if (ilock_shared) iomap_ops = &ext4_iomap_overwrite_ops; ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, - (unaligned_io || extend) ? IOMAP_DIO_FORCE_WAIT : 0); + (unaligned_io || extend) ? IOMAP_DIO_FORCE_WAIT : 0, + 0); if (ret == -ENOTBLK) ret = 0; diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index c0f86a28f1bf..d98f690097e2 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -792,7 +792,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, if (ret) goto out_uninit; - ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, 0); + ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, 0, 0); gfs2_glock_dq(gh); out_uninit: gfs2_holder_uninit(gh); @@ -826,7 +826,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, if (offset + len > i_size_read(&ip->i_inode)) goto out; - ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, 0); + ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, 0, 0); if (ret == -ENOTBLK) ret = 0; out: diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 35c3f2bae65a..7564e740aff8 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -31,6 +31,7 @@ struct iomap_dio { atomic_t ref; unsigned flags; int error; + size_t done_before; bool wait_for_completion; union { @@ -126,6 +127,9 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC)) ret = generic_write_sync(iocb, ret); + if (ret >= 0) + ret += dio->done_before; + kfree(dio); return ret; @@ -450,7 +454,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio * __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - unsigned int dio_flags) + unsigned int dio_flags, size_t done_before) { struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = file_inode(iocb->ki_filp); @@ -477,6 +481,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->dops = dops; dio->error = 0; dio->flags = 0; + dio->done_before = done_before; dio->submit.iter = iter; dio->submit.waiter = current; @@ -655,11 +660,11 @@ EXPORT_SYMBOL_GPL(__iomap_dio_rw); ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - unsigned int dio_flags) + unsigned int dio_flags, size_t done_before) { struct iomap_dio *dio; - dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags); + dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, done_before); if (IS_ERR_OR_NULL(dio)) return PTR_ERR_OR_ZERO(dio); return iomap_dio_complete(dio); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index cc3cfb12df53..3103d9bda466 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -259,7 +259,7 @@ xfs_file_dio_read( ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); if (ret) return ret; - ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, 0); + ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, 0, 0); xfs_iunlock(ip, XFS_IOLOCK_SHARED); return ret; @@ -569,7 +569,7 @@ xfs_file_dio_write_aligned( } trace_xfs_file_direct_write(iocb, from); ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops, - &xfs_dio_write_ops, 0); + &xfs_dio_write_ops, 0, 0); out_unlock: if (iolock) xfs_iunlock(ip, iolock); @@ -647,7 +647,7 @@ xfs_file_dio_write_unaligned( trace_xfs_file_direct_write(iocb, from); ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops, - &xfs_dio_write_ops, flags); + &xfs_dio_write_ops, flags, 0); /* * Retry unaligned I/O with exclusive blocking semantics if the DIO diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 70055d486bf7..85ca2f5fe06e 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -864,7 +864,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) ret = zonefs_file_dio_append(iocb, from); else ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, - &zonefs_write_dio_ops, 0); + &zonefs_write_dio_ops, 0, 0); if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && (ret > 0 || ret == -EIOCBQUEUED)) { if (ret > 0) @@ -999,7 +999,7 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) } file_accessed(iocb->ki_filp); ret = iomap_dio_rw(iocb, to, &zonefs_iomap_ops, - &zonefs_read_dio_ops, 0); + &zonefs_read_dio_ops, 0, 0); } else { ret = generic_file_read_iter(iocb, to); if (ret == -EIO) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index bcae4814b8e3..908bda10024c 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -276,10 +276,10 @@ struct iomap_dio_ops { ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - unsigned int dio_flags); + unsigned int dio_flags, size_t done_before); struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - unsigned int dio_flags); + unsigned int dio_flags, size_t done_before); ssize_t iomap_dio_complete(struct iomap_dio *dio); int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); From patchwork Tue Aug 3 19:18:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBD3DC432BE for ; Tue, 3 Aug 2021 19:19:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D77FE61078 for ; Tue, 3 Aug 2021 19:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239761AbhHCTTb (ORCPT ); Tue, 3 Aug 2021 15:19:31 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:49992 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239755AbhHCTTQ (ORCPT ); Tue, 3 Aug 2021 15:19:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018345; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mivNpXg5tDGvIP/WybrtvMwi2KApc4ySr9AuR7qfSDA=; b=XpHBxJAHW4NptA5gbI7mj+9m/1bk4W51TSgdIoTdsyqbuxKP6Sf1t73HXNx8JZugPRr8E/ CZzKfILNpJ0hiaNjVKVx2gnneOSG9twVxUcVfGez31qT7tnrwxOJccXZ7SSBKI2bDvW9Sd SOP4rmXzqcDnf7ojQjq99/mWBeK73ws= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-84-4UJJtRDxPlGWC6fcLelW-g-1; Tue, 03 Aug 2021 15:19:04 -0400 X-MC-Unique: 4UJJtRDxPlGWC6fcLelW-g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 77F29801B3C; Tue, 3 Aug 2021 19:19:02 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1858960C0F; Tue, 3 Aug 2021 19:18:59 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 11/12] iov_iter: Introduce noio flag to disable page faults Date: Tue, 3 Aug 2021 21:18:17 +0200 Message-Id: <20210803191818.993968-12-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce a new noio flag to indicate to get_user_pages to use the FOLL_FAST_ONLY flag. This will cause get_user_pages to fail when it would otherwise fault in a page. Currently, the noio flag is only checked in iov_iter_get_pages and iov_iter_get_pages_alloc. This is enough for iomaop_dio_rw, but it may make sense to check for this flag in other contexts as well. Signed-off-by: Andreas Gruenbacher --- include/linux/uio.h | 1 + lib/iov_iter.c | 20 +++++++++++++++----- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index ffa431aeb067..679e48454497 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -29,6 +29,7 @@ enum iter_type { struct iov_iter { u8 iter_type; + bool noio; bool data_source; size_t iov_offset; size_t count; diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 4ffc76801eaa..66f0c9362bb5 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -519,6 +519,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction, WARN_ON(direction & ~(READ | WRITE)); *i = (struct iov_iter) { .iter_type = ITER_IOVEC, + .noio = false, .data_source = direction, .iov = iov, .nr_segs = nr_segs, @@ -1529,13 +1530,17 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, return 0; if (likely(iter_is_iovec(i))) { + unsigned int gup_flags = 0; unsigned long addr; + if (iov_iter_rw(i) != WRITE) + gup_flags |= FOLL_WRITE; + if (i->noio) + gup_flags |= FOLL_FAST_ONLY; + addr = first_iovec_segment(i, &len, start, maxsize, maxpages); n = DIV_ROUND_UP(len, PAGE_SIZE); - res = get_user_pages_fast(addr, n, - iov_iter_rw(i) != WRITE ? FOLL_WRITE : 0, - pages); + res = get_user_pages_fast(addr, n, gup_flags, pages); if (unlikely(res <= 0)) return res; return (res == n ? len : res * PAGE_SIZE) - *start; @@ -1651,15 +1656,20 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, return 0; if (likely(iter_is_iovec(i))) { + unsigned int gup_flags = 0; unsigned long addr; + if (iov_iter_rw(i) != WRITE) + gup_flags |= FOLL_WRITE; + if (i->noio) + gup_flags |= FOLL_FAST_ONLY; + addr = first_iovec_segment(i, &len, start, maxsize, ~0U); n = DIV_ROUND_UP(len, PAGE_SIZE); p = get_pages_array(n); if (!p) return -ENOMEM; - res = get_user_pages_fast(addr, n, - iov_iter_rw(i) != WRITE ? FOLL_WRITE : 0, p); + res = get_user_pages_fast(addr, n, gup_flags, p); if (unlikely(res <= 0)) { kvfree(p); *pages = NULL; From patchwork Tue Aug 3 19:18:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12417125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A18DC4338F for ; Tue, 3 Aug 2021 19:19:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6243961037 for ; Tue, 3 Aug 2021 19:19:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239876AbhHCTTp (ORCPT ); Tue, 3 Aug 2021 15:19:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:48737 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239730AbhHCTT1 (ORCPT ); Tue, 3 Aug 2021 15:19:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628018354; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IULcZE+GE0RHMHtw9EFWCvHkiLwAkm0qqTvH1XU9M1M=; b=A+sF10RZT0pBWUldMh68lVSoy/TFxohtv94K/uv/m1pSgxD2MLhU0lwA0AuxNQGQL/mWdX r/7W1Jb/xDfhNA5nA3YXY7jQZDThZSIXl+6U9UcMRP7RpxsjSZsHAZns1RX+1OM3yzNevy Mp+/NmyXNuUB68FIGteRH3pxK0I73sQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-495-L1wwZNlZOW-pGMgnx9sAag-1; Tue, 03 Aug 2021 15:19:10 -0400 X-MC-Unique: L1wwZNlZOW-pGMgnx9sAag-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C6971800489; Tue, 3 Aug 2021 19:19:08 +0000 (UTC) Received: from max.com (unknown [10.40.193.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id CD6C560C0F; Tue, 3 Aug 2021 19:19:02 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" Cc: Jan Kara , Matthew Wilcox , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v5 12/12] gfs2: Fix mmap + page fault deadlocks for direct I/O Date: Tue, 3 Aug 2021 21:18:18 +0200 Message-Id: <20210803191818.993968-13-agruenba@redhat.com> In-Reply-To: <20210803191818.993968-1-agruenba@redhat.com> References: <20210803191818.993968-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Also disable page faults during direct I/O requests and implement the same kind of retry logic as in the buffered I/O case. Direct I/O requests differ from buffered I/O requests in that they use bio_iov_iter_get_pages for grabbing page references and faulting in pages instead of triggering physical page faults. Those manual page faults can be disabled with the new iocb->noio flag. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index d98f690097e2..ed42b7675551 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -782,21 +782,47 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, struct file *file = iocb->ki_filp; struct gfs2_inode *ip = GFS2_I(file->f_mapping->host); size_t count = iov_iter_count(to); + size_t written = 0; ssize_t ret; + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and retry. + * + * Unlike generic_file_read_iter, for reads, iomap_dio_rw can trigger + * physical as well as manual page faults, and we need to disable both + * kinds. + */ + if (!count) return 0; /* skip atime */ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, gh); +retry: ret = gfs2_glock_nq(gh); if (ret) goto out_uninit; - ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, 0, 0); + pagefault_disable(); + to->noio = true; + ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, + IOMAP_DIO_PARTIAL, written); + to->noio = false; + pagefault_enable(); + gfs2_glock_dq(gh); + if (ret > 0) + written = ret; + if (unlikely(iov_iter_count(to) && (ret > 0 || ret == -EFAULT)) && + iter_is_iovec(to) && + fault_in_iov_iter_writeable(to, SIZE_MAX) != 0) + goto retry; out_uninit: gfs2_holder_uninit(gh); - return ret; + if (ret < 0) + return ret; + return written; } static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, @@ -809,6 +835,15 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, loff_t offset = iocb->ki_pos; ssize_t ret; + /* + * In this function, we disable page faults when we're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and retry. + * + * For writes, iomap_dio_rw only triggers manual page faults, so we + * don't need to disable physical ones. + */ + /* * Deferred lock, even if its a write, since we do no allocation on * this path. All we need to change is the atime, and this lock mode @@ -818,6 +853,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, * VFS does. */ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, gh); +retry: ret = gfs2_glock_nq(gh); if (ret) goto out_uninit; @@ -826,11 +862,18 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, if (offset + len > i_size_read(&ip->i_inode)) goto out; + from->noio = true; ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, 0, 0); + from->noio = false; + if (ret == -ENOTBLK) ret = 0; out: gfs2_glock_dq(gh); + if (unlikely(ret == -EFAULT) && + iter_is_iovec(from) && + fault_in_iov_iter_readable(from, len) == len) + goto retry; out_uninit: gfs2_holder_uninit(gh); return ret;