From patchwork Fri Dec 27 07:28:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 13921586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F3D0E7718B for ; Fri, 27 Dec 2024 07:29:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 121996B0093; Fri, 27 Dec 2024 02:29:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D2786B0095; Fri, 27 Dec 2024 02:29:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB47C6B0096; Fri, 27 Dec 2024 02:29:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CE4EA6B0093 for ; Fri, 27 Dec 2024 02:29:29 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4190D140430 for ; Fri, 27 Dec 2024 07:29:29 +0000 (UTC) X-FDA: 82939913154.28.3694B90 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf16.hostedemail.com (Postfix) with ESMTP id 37973180004 for ; Fri, 27 Dec 2024 07:28:44 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="V7rL/1Cg"; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735284518; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2g+SxczDYlrf43JjC8UJiDwBf+Z9UmJvZ80gCTVUkgY=; b=GJEz2VhAeb/a8PlvAM3VAW60NJ/MxVrNEmdfgZq1rtSxXLu8pYinAYjKPqFrigWPx0pkC3 foT0Q3dfjvlYctTQVwJyxnAiLU0kcnt3aUTTmZPeohT5odrE3r9BqBrETf+1B77psfXAG2 wDpn8SW9eAdaaSt/ZO18E1Z5RKQfUs4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735284518; a=rsa-sha256; cv=none; b=H+9i5OwzAup5F4FGPsDxhaYZ5pg/GY/WPh8onauNi95DC/mD5gWAofnD5NfRF9ifFhh5RR 1FkI5LAZqdgoU5DesHeLGHNHOZA95/Y0TMp+P7E3Zbq1m6euTH4FAVSGvQWH40Oa7nSYOV UqzYF/HRnD5qm3gz/ku0uWLMNDX7QXc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="V7rL/1Cg"; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id EAFEDA41216; Fri, 27 Dec 2024 07:27:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A41ABC4CED7; Fri, 27 Dec 2024 07:29:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735284566; bh=eadd2fhesnrBbYUtyPuDOjZKluNBdPkoQ6My17z11ds=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=V7rL/1CgjCZvQAHg7NIcED5Na02EGjDigmzncnfAGdsE4oPXN9GFMNUqOgGKcIAxK hSznXGjRdJakuaXWQ8k1hzlFSbmpLCpkjLNAHE9LAlkf8Pyn2im17Lax3zToCNnlDO sF35OhD0vR+mDq6BWIXP/rORjFkoB2OmxqrXTUnH03R297ViXYi/DxGzjgG/PiNdu0 UHjn6xvvqWwhXwUiJ2EvGaz72tB1oOF/eaVLHsZ/Slq/nm9qigaXwYR1thVtB6yGTd HfkagPFgCDXBJAQttID2sWzOm9fw+FJoE9uewLZw8sH2jQV7a2Ah68AXl/hMetNBpC 0CKPg/BqE8lzg== From: Mike Rapoport To: Andrew Morton Cc: Andy Lutomirski , Anton Ivanov , Borislav Petkov , Brendan Higgins , Daniel Gomez , Daniel Thompson , Dave Hansen , David Gow , Douglas Anderson , Ingo Molnar , Jason Wessel , Jiri Kosina , Joe Lawrence , Johannes Berg , Josh Poimboeuf , "Kirill A. Shutemov" , Luis Chamberlain , Mark Rutland , Masami Hiramatsu , Mike Rapoport , Miroslav Benes , "H. Peter Anvin" , Peter Zijlstra , Petr Mladek , Petr Pavlu , Rae Moar , Richard Weinberger , Sami Tolvanen , Shuah Khan , Song Liu , Steven Rostedt , Thomas Gleixner , kgdb-bugreport@lists.sourceforge.net, kunit-dev@googlegroups.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-um@lists.infradead.org, live-patching@vger.kernel.org, x86@kernel.org Subject: [PATCH 4/8] execmem: add API for temporal remapping as RW and restoring ROX afterwards Date: Fri, 27 Dec 2024 09:28:21 +0200 Message-ID: <20241227072825.1288491-5-rppt@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241227072825.1288491-1-rppt@kernel.org> References: <20241227072825.1288491-1-rppt@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 37973180004 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: fhkat8gymodwoq16p94fowitujfmdzcq X-HE-Tag: 1735284524-478883 X-HE-Meta: U2FsdGVkX18L5EbdOHbsXLgr2A0iYpEMrNI0ibdVvhxUZwXKNVdQHsi6h7vg5EN5rS2tmxiQCp2AhEibwp+LVGzUghzzMtz6NTxW9/cfQT4b4QAWzA/0B7cu11WXWZFK2yYM/vD7zTzfEYJ4mqPVaRfL6HuANqz6i0vUrZ/9IytJPVOlJiRxTEB7EgWy12QThA75GDQ8bLOdOM+PoHXfTf3RanU311PzCzj34iOSq5JNlcwPfeO8tb+9/iCasN83HknrJbw+HBsTzckOxxfftsVcuXbfXP2gZszLG6VfTPG7d2qgpjDYoXgprZl7redIAR9/MwKa+1Vd89h/FMG34jXSt0+Hmmtq9i3rhq+2/BpAMITEkiqWKAUPwGSa09xaGKSeEOJThSBii5ZwmUzMJ/67IxdZ4osmEFaDZ2aKJrZVTEq0lYZGQAxf2HaWCd914S3wrWqTLrMzEHmkp+Y+5BnJjZETTb+GX08kYbhvBFMKrZnZZ63ZxCjOpD8tdup+Ehcz0bVeVwdl6GZkd3RsZhJBc6wFOiHBXcH7GT7Yf9WJ9XEsg5yBSwKXhrO2sKUYbFSRhD9b2QbqzUaTsiiWNFzIPMaKU0mW4kP2PpRVQN4C1ReS5XFUKj4zIDoFwAK33RK+Jcm3wO5D+WnYcllmt5DLGN/PT0hbmXLX1JVAGHpRr2sUfZD04pmNK20jw7eEo4Uq0mEihsxOzcUusPttPRrwcDxMu0wosSQsYaGWyHOpXY1hk3f37O21iVB6Wx5wU1BRPdo1v0hChX4KvvssBPyg8uWBiTI6bhks2Bdg6pZeBVTPIMcXVVKvQsMiuB57eg+tr703J6JNRexBoKPa9eIr7JrWdckFgdyjxcbL0+56ybcphacVUPGTDHcZHEt+RRVwbupvvaoJ1QjmiWhFFtw1IXj1KtPcKJmt6sb1nO9Y0izhgRlN3un/cB4biABMIw4kC+idItOt23DwqJV PrEm/pwN 1SDBBw8e84GszHEJd/ZK+rFkCiNT8Jnr5yWhX41ctvC7g81x7fb55i/6/KCymGlWn6B12NMSPjZ9ChKjdw/XEoLUgrbcDGBsnY3xZ9KpA3Bfms793FYkzkNEEMDOdj22TfgSCTxD58l4IQbl4VYWdIe+KNB4DYyn/bjw/7e6IqT8cR6CLXYpX/gPlCYZpWB+JP3mpzOJ9douZMyu9NqWyVGQ0zsC4WFmWWpr0sFBd1fD8QJM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Mike Rapoport (Microsoft)" Using a writable copy for ROX memory is cumbersome and error prone. Add API that allow temporarily remapping of ranges in the ROX cache as writable and then restoring their read-only-execute permissions. This API will be later used in modules code and will allow removing nasty games with writable copy in alternatives patching on x86. The restoring of the ROX permissions relies on the ability of architecture to reconstruct large pages in its set_memory_rox() method. Signed-off-by: Mike Rapoport (Microsoft) --- include/linux/execmem.h | 31 +++++++++++ mm/execmem.c | 118 +++++++++++++++++++++++++++++++++------- 2 files changed, 130 insertions(+), 19 deletions(-) diff --git a/include/linux/execmem.h b/include/linux/execmem.h index 64130ae19690..65655a5d1be2 100644 --- a/include/linux/execmem.h +++ b/include/linux/execmem.h @@ -65,6 +65,37 @@ enum execmem_range_flags { * Architectures that use EXECMEM_ROX_CACHE must implement this. */ void execmem_fill_trapping_insns(void *ptr, size_t size, bool writable); + +/** + * execmem_make_temp_rw - temporarily remap region with read-write + * permissions + * @ptr: address of the region to remap + * @size: size of the region to remap + * + * Remaps a part of the cached large page in the ROX cache in the range + * [@ptr, @ptr + @size) as writable and not executable. The caller must + * have exclusive ownership of this range and ensure nothing will try to + * execute code in this range. + * + * Return: 0 on success or negative error code on failure. + */ +int execmem_make_temp_rw(void *ptr, size_t size); + +/** + * execmem_restore_rox - restore read-only-execute permissions + * @ptr: address of the region to remap + * @size: size of the region to remap + * + * Restores read-only-execute permissions on a range [@ptr, @ptr + @size) + * after it was temporarily remapped as writable. Relies on architecture + * implementation of set_memory_rox() to restore mapping using large pages. + * + * Return: 0 on success or negative error code on failure. + */ +int execmem_restore_rox(void *ptr, size_t size); +#else +static inline int execmem_make_temp_rw(void *ptr, size_t size) { return 0; } +static inline int execmem_restore_rox(void *ptr, size_t size) { return 0; } #endif /** diff --git a/mm/execmem.c b/mm/execmem.c index 317b6a8d35be..be6b234c032e 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -89,6 +89,12 @@ static void *execmem_vmalloc(struct execmem_range *range, size_t size, #endif /* CONFIG_MMU */ #ifdef CONFIG_ARCH_HAS_EXECMEM_ROX +struct execmem_area { + struct vm_struct *vm; + unsigned int rw_mappings; + size_t size; +}; + struct execmem_cache { struct mutex mutex; struct maple_tree busy_areas; @@ -135,7 +141,7 @@ static void execmem_cache_clean(struct work_struct *work) struct maple_tree *free_areas = &execmem_cache.free_areas; struct mutex *mutex = &execmem_cache.mutex; MA_STATE(mas, free_areas, 0, ULONG_MAX); - void *area; + struct execmem_area *area; mutex_lock(mutex); mas_for_each(&mas, area, ULONG_MAX) { @@ -143,11 +149,12 @@ static void execmem_cache_clean(struct work_struct *work) if (IS_ALIGNED(size, PMD_SIZE) && IS_ALIGNED(mas.index, PMD_SIZE)) { - struct vm_struct *vm = find_vm_area(area); + struct vm_struct *vm = area->vm; execmem_set_direct_map_valid(vm, true); mas_store_gfp(&mas, NULL, GFP_KERNEL); - vfree(area); + vfree(vm->addr); + kfree(area); } } mutex_unlock(mutex); @@ -155,30 +162,31 @@ static void execmem_cache_clean(struct work_struct *work) static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean); -static int execmem_cache_add(void *ptr, size_t size) +static int execmem_cache_add(void *ptr, size_t size, struct execmem_area *area) { struct maple_tree *free_areas = &execmem_cache.free_areas; struct mutex *mutex = &execmem_cache.mutex; unsigned long addr = (unsigned long)ptr; MA_STATE(mas, free_areas, addr - 1, addr + 1); + struct execmem_area *lower_area = NULL; + struct execmem_area *upper_area = NULL; unsigned long lower, upper; - void *area = NULL; int err; lower = addr; upper = addr + size - 1; mutex_lock(mutex); - area = mas_walk(&mas); - if (area && mas.last == addr - 1) + lower_area = mas_walk(&mas); + if (lower_area && lower_area == area && mas.last == addr - 1) lower = mas.index; - area = mas_next(&mas, ULONG_MAX); - if (area && mas.index == addr + size) + upper_area = mas_next(&mas, ULONG_MAX); + if (upper_area && upper_area == area && mas.index == addr + size) upper = mas.last; mas_set_range(&mas, lower, upper); - err = mas_store_gfp(&mas, (void *)lower, GFP_KERNEL); + err = mas_store_gfp(&mas, area, GFP_KERNEL); mutex_unlock(mutex); if (err) return err; @@ -209,7 +217,8 @@ static void *__execmem_cache_alloc(struct execmem_range *range, size_t size) MA_STATE(mas_busy, busy_areas, 0, ULONG_MAX); struct mutex *mutex = &execmem_cache.mutex; unsigned long addr, last, area_size = 0; - void *area, *ptr = NULL; + struct execmem_area *area; + void *ptr = NULL; int err; mutex_lock(mutex); @@ -228,20 +237,18 @@ static void *__execmem_cache_alloc(struct execmem_range *range, size_t size) /* insert allocated size to busy_areas at range [addr, addr + size) */ mas_set_range(&mas_busy, addr, addr + size - 1); - err = mas_store_gfp(&mas_busy, (void *)addr, GFP_KERNEL); + err = mas_store_gfp(&mas_busy, area, GFP_KERNEL); if (err) goto out_unlock; mas_store_gfp(&mas_free, NULL, GFP_KERNEL); if (area_size > size) { - void *ptr = (void *)(addr + size); - /* * re-insert remaining free size to free_areas at range * [addr + size, last] */ mas_set_range(&mas_free, addr + size, last); - err = mas_store_gfp(&mas_free, ptr, GFP_KERNEL); + err = mas_store_gfp(&mas_free, area, GFP_KERNEL); if (err) { mas_store_gfp(&mas_busy, NULL, GFP_KERNEL); goto out_unlock; @@ -257,16 +264,21 @@ static void *__execmem_cache_alloc(struct execmem_range *range, size_t size) static int execmem_cache_populate(struct execmem_range *range, size_t size) { unsigned long vm_flags = VM_ALLOW_HUGE_VMAP; + struct execmem_area *area; unsigned long start, end; struct vm_struct *vm; size_t alloc_size; int err = -ENOMEM; void *p; + area = kzalloc(sizeof(*area), GFP_KERNEL); + if (!area) + return err; + alloc_size = round_up(size, PMD_SIZE); p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags); if (!p) - return err; + goto err_free_area; vm = find_vm_area(p); if (!vm) @@ -289,7 +301,9 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) if (err) goto err_free_mem; - err = execmem_cache_add(p, alloc_size); + area->size = alloc_size; + area->vm = vm; + err = execmem_cache_add(p, alloc_size, area); if (err) goto err_free_mem; @@ -297,6 +311,8 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) err_free_mem: vfree(p); +err_free_area: + kfree(area); return err; } @@ -305,6 +321,9 @@ static void *execmem_cache_alloc(struct execmem_range *range, size_t size) void *p; int err; + /* make sure everything in the cache is page aligned */ + size = PAGE_ALIGN(size); + p = __execmem_cache_alloc(range, size); if (p) return p; @@ -322,8 +341,8 @@ static bool execmem_cache_free(void *ptr) struct mutex *mutex = &execmem_cache.mutex; unsigned long addr = (unsigned long)ptr; MA_STATE(mas, busy_areas, addr, addr); + struct execmem_area *area; size_t size; - void *area; mutex_lock(mutex); area = mas_walk(&mas); @@ -338,12 +357,73 @@ static bool execmem_cache_free(void *ptr) execmem_fill_trapping_insns(ptr, size, /* writable = */ false); - execmem_cache_add(ptr, size); + execmem_cache_add(ptr, size, area); schedule_work(&execmem_cache_clean_work); return true; } + +int execmem_make_temp_rw(void *ptr, size_t size) +{ + struct maple_tree *busy_areas = &execmem_cache.busy_areas; + unsigned int nr = PAGE_ALIGN(size) >> PAGE_SHIFT; + struct mutex *mutex = &execmem_cache.mutex; + unsigned long addr = (unsigned long)ptr; + MA_STATE(mas, busy_areas, addr, addr); + struct execmem_area *area; + int ret = -ENOMEM; + + mutex_lock(mutex); + area = mas_walk(&mas); + if (!area) + goto out; + + ret = set_memory_nx(addr, nr); + if (ret) + goto out; + + /* + * If a split of large page was required, it already happened when + * we marked the pages NX which guarantees that this call won't + * fail + */ + set_memory_rw(addr, nr); + area->rw_mappings++; + +out: + mutex_unlock(mutex); + return ret; +} + +int execmem_restore_rox(void *ptr, size_t size) +{ + struct maple_tree *busy_areas = &execmem_cache.busy_areas; + struct mutex *mutex = &execmem_cache.mutex; + unsigned long addr = (unsigned long)ptr; + MA_STATE(mas, busy_areas, addr, addr); + struct execmem_area *area; + int err = 0; + + size = PAGE_ALIGN(size); + + mutex_lock(mutex); + mas_for_each(&mas, area, addr + size - 1) { + area->rw_mappings--; + if (!area->rw_mappings) { + unsigned int nr = area->size >> PAGE_SHIFT; + + addr = (unsigned long)area->vm->addr; + err = set_memory_rox(addr, nr); + if (err) + break; + } + } + mutex_unlock(mutex); + + return err; +} + #else /* CONFIG_ARCH_HAS_EXECMEM_ROX */ static void *execmem_cache_alloc(struct execmem_range *range, size_t size) {