From patchwork Wed Apr 6 18:09:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 12804022 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60113C4332F for ; Wed, 6 Apr 2022 20:12:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232873AbiDFUOO (ORCPT ); Wed, 6 Apr 2022 16:14:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234327AbiDFUMn (ORCPT ); Wed, 6 Apr 2022 16:12:43 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBC73262431; Wed, 6 Apr 2022 11:09:32 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6F951B82004; Wed, 6 Apr 2022 18:09:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8B6F7C385A9; Wed, 6 Apr 2022 18:09:27 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Andreas Gruenbacher , Josef Bacik Cc: Al Viro , Andrew Morton , Chris Mason , David Sterba , Will Deacon , linux-fsdevel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 1/3] mm: Add fault_in_subpage_writeable() to probe at sub-page granularity Date: Wed, 6 Apr 2022 19:09:20 +0100 Message-Id: <20220406180922.1522433-2-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220406180922.1522433-1-catalin.marinas@arm.com> References: <20220406180922.1522433-1-catalin.marinas@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On hardware with features like arm64 MTE or SPARC ADI, an access fault can be triggered at sub-page granularity. Depending on how the fault_in_writeable() function is used, the caller can get into a live-lock by continuously retrying the fault-in on an address different from the one where the uaccess failed. In the majority of cases progress is ensured by the following conditions: 1. copy_to_user_nofault() guarantees at least one byte access if the user address is not faulting. 2. The fault_in_writeable() loop is resumed from the first address that could not be accessed by copy_to_user_nofault(). If the loop iteration is restarted from an earlier (initial) point, the loop is repeated with the same conditions and it would live-lock. Introduce an arch-specific probe_subpage_writeable() and call it from the newly added fault_in_subpage_writeable() function. The arch code with sub-page faults will have to implement the specific probing functionality. Note that no other fault_in_subpage_*() functions are added since they have no callers currently susceptible to a live-lock. Signed-off-by: Catalin Marinas Cc: Andrew Morton --- arch/Kconfig | 7 +++++++ include/linux/pagemap.h | 1 + include/linux/uaccess.h | 22 ++++++++++++++++++++++ mm/gup.c | 29 +++++++++++++++++++++++++++++ 4 files changed, 59 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index 29b0167c088b..b34032279926 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -24,6 +24,13 @@ config KEXEC_ELF config HAVE_IMA_KEXEC bool +config ARCH_HAS_SUBPAGE_FAULTS + bool + help + Select if the architecture can check permissions at sub-page + granularity (e.g. arm64 MTE). The probe_user_*() functions + must be implemented. + config HOTPLUG_SMT bool diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 993994cd943a..6165283bdb6f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1046,6 +1046,7 @@ void folio_add_wait_queue(struct folio *folio, wait_queue_entry_t *waiter); * Fault in userspace address range. */ size_t fault_in_writeable(char __user *uaddr, size_t size); +size_t fault_in_subpage_writeable(char __user *uaddr, size_t size); size_t fault_in_safe_writeable(const char __user *uaddr, size_t size); size_t fault_in_readable(const char __user *uaddr, size_t size); diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 546179418ffa..8bbb2dabac19 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -231,6 +231,28 @@ static inline bool pagefault_disabled(void) */ #define faulthandler_disabled() (pagefault_disabled() || in_atomic()) +#ifndef CONFIG_ARCH_HAS_SUBPAGE_FAULTS + +/** + * probe_subpage_writeable: probe the user range for write faults at sub-page + * granularity (e.g. arm64 MTE) + * @uaddr: start of address range + * @size: size of address range + * + * Returns 0 on success, the number of bytes not probed on fault. + * + * It is expected that the caller checked for the write permission of each + * page in the range either by put_user() or GUP. The architecture port can + * implement a more efficient get_user() probing if the same sub-page faults + * are triggered by either a read or a write. + */ +static inline size_t probe_subpage_writeable(void __user *uaddr, size_t size) +{ + return 0; +} + +#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */ + #ifndef ARCH_HAS_NOCACHE_UACCESS static inline __must_check unsigned long diff --git a/mm/gup.c b/mm/gup.c index f598a037eb04..501bc150792c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1648,6 +1648,35 @@ size_t fault_in_writeable(char __user *uaddr, size_t size) } EXPORT_SYMBOL(fault_in_writeable); +/** + * fault_in_subpage_writeable - fault in an address range for writing + * @uaddr: start of address range + * @size: size of address range + * + * Fault in a user address range for writing while checking for permissions at + * sub-page granularity (e.g. arm64 MTE). This function should be used when + * the caller cannot guarantee forward progress of a copy_to_user() loop. + * + * Returns the number of bytes not faulted in (like copy_to_user() and + * copy_from_user()). + */ +size_t fault_in_subpage_writeable(char __user *uaddr, size_t size) +{ + size_t faulted_in; + + /* + * Attempt faulting in at page granularity first for page table + * permission checking. The arch-specific probe_subpage_writeable() + * functions may not check for this. + */ + faulted_in = size - fault_in_writeable(uaddr, size); + if (faulted_in) + faulted_in -= probe_subpage_writeable(uaddr, faulted_in); + + return size - faulted_in; +} +EXPORT_SYMBOL(fault_in_subpage_writeable); + /* * fault_in_safe_writeable - fault in an address range for writing * @uaddr: start of address range From patchwork Wed Apr 6 18:09:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 12804021 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98388C433EF for ; Wed, 6 Apr 2022 20:12:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230465AbiDFUOK (ORCPT ); Wed, 6 Apr 2022 16:14:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236178AbiDFUNg (ORCPT ); Wed, 6 Apr 2022 16:13:36 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A33A264C03; Wed, 6 Apr 2022 11:09:35 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 22E8DB8253C; Wed, 6 Apr 2022 18:09:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7968CC385A1; Wed, 6 Apr 2022 18:09:30 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Andreas Gruenbacher , Josef Bacik Cc: Al Viro , Andrew Morton , Chris Mason , David Sterba , Will Deacon , linux-fsdevel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/3] arm64: Add support for user sub-page fault probing Date: Wed, 6 Apr 2022 19:09:21 +0100 Message-Id: <20220406180922.1522433-3-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220406180922.1522433-1-catalin.marinas@arm.com> References: <20220406180922.1522433-1-catalin.marinas@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org With MTE, even if the pte allows an access, a mismatched tag somewhere within a page can still cause a fault. Select ARCH_HAS_SUBPAGE_FAULTS if MTE is enabled and implement the probe_subpage_writeable() function. Note that get_user() is sufficient for the writeable MTE check since the same tag mismatch fault would be triggered by a read. The caller of probe_subpage_writeable() will need to check the pte permissions (put_user, GUP). Signed-off-by: Catalin Marinas Cc: Will Deacon --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/mte.h | 1 + arch/arm64/include/asm/uaccess.h | 15 +++++++++++++++ arch/arm64/kernel/mte.c | 30 ++++++++++++++++++++++++++++++ 4 files changed, 47 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 57c4c995965f..290b88238103 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1871,6 +1871,7 @@ config ARM64_MTE depends on AS_HAS_LSE_ATOMICS # Required for tag checking in the uaccess routines depends on ARM64_PAN + select ARCH_HAS_SUBPAGE_FAULTS select ARCH_USES_HIGH_VMA_FLAGS help Memory Tagging (part of the ARMv8.5 Extensions) provides diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h index adcb937342f1..aa523591a44e 100644 --- a/arch/arm64/include/asm/mte.h +++ b/arch/arm64/include/asm/mte.h @@ -47,6 +47,7 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg); long get_mte_ctrl(struct task_struct *task); int mte_ptrace_copy_tags(struct task_struct *child, long request, unsigned long addr, unsigned long data); +size_t mte_probe_user_range(const char __user *uaddr, size_t size); #else /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index e8dce0cc5eaa..6677aa7e9993 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -460,4 +460,19 @@ static inline int __copy_from_user_flushcache(void *dst, const void __user *src, } #endif +#ifdef CONFIG_ARCH_HAS_SUBPAGE_FAULTS + +/* + * Return 0 on success, the number of bytes not probed otherwise. + */ +static inline size_t probe_subpage_writeable(const void __user *uaddr, + size_t size) +{ + if (!system_supports_mte()) + return 0; + return mte_probe_user_range(uaddr, size); +} + +#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */ + #endif /* __ASM_UACCESS_H */ diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 78b3e0f8e997..35697a09926f 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -543,3 +544,32 @@ static int register_mte_tcf_preferred_sysctl(void) return 0; } subsys_initcall(register_mte_tcf_preferred_sysctl); + +/* + * Return 0 on success, the number of bytes not probed otherwise. + */ +size_t mte_probe_user_range(const char __user *uaddr, size_t size) +{ + const char __user *end = uaddr + size; + int err = 0; + char val; + + __raw_get_user(val, uaddr, err); + if (err) + return size; + + uaddr = PTR_ALIGN(uaddr, MTE_GRANULE_SIZE); + while (uaddr < end) { + /* + * A read is sufficient for mte, the caller should have probed + * for the pte write permission if required. + */ + __raw_get_user(val, uaddr, err); + if (err) + return end - uaddr; + uaddr += MTE_GRANULE_SIZE; + } + (void)val; + + return 0; +} From patchwork Wed Apr 6 18:09:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 12804027 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E999FC433FE for ; Wed, 6 Apr 2022 20:15:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231202AbiDFURA (ORCPT ); Wed, 6 Apr 2022 16:17:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234336AbiDFUQZ (ORCPT ); Wed, 6 Apr 2022 16:16:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB472264C2E; Wed, 6 Apr 2022 11:09:36 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4ADF861BD1; Wed, 6 Apr 2022 18:09:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F5EEC385A5; Wed, 6 Apr 2022 18:09:33 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Andreas Gruenbacher , Josef Bacik Cc: Al Viro , Andrew Morton , Chris Mason , David Sterba , Will Deacon , linux-fsdevel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Date: Wed, 6 Apr 2022 19:09:22 +0100 Message-Id: <20220406180922.1522433-4-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220406180922.1522433-1-catalin.marinas@arm.com> References: <20220406180922.1522433-1-catalin.marinas@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl") addressed a lockdep warning by pre-faulting the user pages and attempting the copy_to_user_nofault() in an infinite loop. On architectures like arm64 with MTE, an access may fault within a page at a location different from what fault_in_writeable() probed. Since the sk_offset is rewound to the previous struct btrfs_ioctl_search_header boundary, there is no guaranteed forward progress and search_ioctl() may live-lock. Use fault_in_subpage_writeable() instead of fault_in_writeable() to ensure the permission is checked at the right granularity (smaller than PAGE_SIZE). Signed-off-by: Catalin Marinas Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl") Reported-by: Al Viro Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Acked-by: David Sterba --- fs/btrfs/ioctl.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 238cee5b5254..d49e8254f823 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2556,8 +2556,13 @@ static noinline int search_ioctl(struct inode *inode, key.offset = sk->min_offset; while (1) { + size_t len = *buf_size - sk_offset; ret = -EFAULT; - if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset)) + /* + * Ensure that the whole user buffer is faulted in at sub-page + * granularity, otherwise the loop may live-lock. + */ + if (fault_in_subpage_writeable(ubuf + sk_offset, len)) break; ret = btrfs_search_forward(root, &key, path, sk->min_transid);