From patchwork Fri Oct 7 23:43:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 13001545 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AECCC433F5 for ; Fri, 7 Oct 2022 23:46:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11AE86B0075; Fri, 7 Oct 2022 19:46:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07D876B0078; Fri, 7 Oct 2022 19:46:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D57CE6B007B; Fri, 7 Oct 2022 19:46:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B61336B0075 for ; Fri, 7 Oct 2022 19:46:14 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8D9D7160653 for ; Fri, 7 Oct 2022 23:46:14 +0000 (UTC) X-FDA: 79995789468.30.700031A Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf16.hostedemail.com (Postfix) with ESMTP id 2F6AE180010 for ; Fri, 7 Oct 2022 23:46:14 +0000 (UTC) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 297NZ7BV003301 for ; Fri, 7 Oct 2022 16:46:13 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k27ew0cbf-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 07 Oct 2022 16:46:13 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 7 Oct 2022 16:46:06 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id E31D1DE44310; Fri, 7 Oct 2022 16:43:25 -0700 (PDT) From: Song Liu To: , CC: , , , , , , , , Song Liu Subject: [RFC v2 3/4] modules, x86: use vmalloc_exec for module core Date: Fri, 7 Oct 2022 16:43:14 -0700 Message-ID: <20221007234315.2877365-4-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221007234315.2877365-1-song@kernel.org> References: <20221007234315.2877365-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: lkI8kIYfmSw98v61hr1IAyb8YZ7mEdKt X-Proofpoint-ORIG-GUID: lkI8kIYfmSw98v61hr1IAyb8YZ7mEdKt X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-10-07_04,2022-10-07_01,2022-06-22_01 ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of "prvs=1279faf8c2=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=1279faf8c2=songliubraving@fb.com"; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665186374; a=rsa-sha256; cv=none; b=bQZ2UovyFizydnTQVu3m+W+nBfNEizW7cXqS4srZeVlpFR807TrbL6rSLSJ7tN2NmCjBRl 8KQaUucuidIpWBKlMgkuAJsFe1cJ7ngwf+exbVfekflLcYhOM6yDyjcDCjg29R84nuIcgz OlFk/rjrt2vGafkxpRgl/1hFGJTnMLs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665186374; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7RLyS9R2Zdrq7n0A1ZJTPXGC7L5Fe9blKzeGmBiE+S0=; b=NbEn86VK7yOnzaqbAa/5cEp5Ewp68bFsUC1roCuHXzQ7AmDS5WcuNaDqAe5UsGHOPt4G8G kFwxpFC/Uyvf6t3MqR9ZS88Kyz6GcJCG5bwg4ZoLvW8g5jqoUE7/iBm7lwY404QcCflowj Fj+cX63s2ufBkuK1ZRaugUZVlureLno= X-Rspamd-Queue-Id: 2F6AE180010 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of "prvs=1279faf8c2=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=1279faf8c2=songliubraving@fb.com"; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) X-Rspamd-Server: rspam10 X-Stat-Signature: yjn64xyhab7xr5wbtd5s8s519kr9og1e X-HE-Tag: 1665186374-442018 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a prototype that allows modules to share 2MB text pages with other modules and BPF programs. Current version only covers core_layout. --- arch/x86/Kconfig | 1 + arch/x86/kernel/alternative.c | 30 ++++++++++++++++++++++++------ arch/x86/kernel/module.c | 1 + kernel/module/main.c | 23 +++++++++++++---------- kernel/module/strict_rwx.c | 3 --- kernel/trace/ftrace.c | 3 ++- 6 files changed, 41 insertions(+), 20 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f9920f1341c8..0b1ea05a1da6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -91,6 +91,7 @@ config X86 select ARCH_HAS_SET_DIRECT_MAP select ARCH_HAS_STRICT_KERNEL_RWX select ARCH_HAS_STRICT_MODULE_RWX + select ARCH_WANTS_MODULES_DATA_IN_VMALLOC if X86_64 select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE select ARCH_HAS_SYSCALL_WRAPPER select ARCH_HAS_UBSAN_SANITIZE_ALL diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 4f3204364caa..0e47a558c5bc 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -332,7 +332,13 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, DUMP_BYTES(insn_buff, insn_buff_sz, "%px: final_insn: ", instr); - text_poke_early(instr, insn_buff, insn_buff_sz); + if (system_state < SYSTEM_RUNNING) { + text_poke_early(instr, insn_buff, insn_buff_sz); + } else { + mutex_lock(&text_mutex); + text_poke(instr, insn_buff, insn_buff_sz); + mutex_unlock(&text_mutex); + } next: optimize_nops(instr, a->instrlen); @@ -503,7 +509,13 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) optimize_nops(bytes, len); DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(addr, bytes, len); + if (system_state == SYSTEM_BOOTING) { + text_poke_early(addr, bytes, len); + } else { + mutex_lock(&text_mutex); + text_poke(addr, bytes, len); + mutex_unlock(&text_mutex); + } } } } @@ -568,7 +580,13 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end) if (len == insn.length) { DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(addr, bytes, len); + if (unlikely(system_state == SYSTEM_BOOTING)) { + text_poke_early(addr, bytes, len); + } else { + mutex_lock(&text_mutex); + text_poke(addr, bytes, len); + mutex_unlock(&text_mutex); + } } } } @@ -609,7 +627,7 @@ void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end) */ DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr); DUMP_BYTES(((u8*)&poison), 4, "%px: repl: ", addr); - text_poke_early(addr, &poison, 4); + text_poke(addr, &poison, 4); } } @@ -791,7 +809,7 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start, /* Pad the rest with nops */ add_nops(insn_buff + used, p->len - used); - text_poke_early(p->instr, insn_buff, p->len); + text_poke(p->instr, insn_buff, p->len); } } extern struct paravirt_patch_site __start_parainstructions[], @@ -1699,7 +1717,7 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void * struct text_poke_loc tp; if (unlikely(system_state == SYSTEM_BOOTING)) { - text_poke_early(addr, opcode, len); + text_poke(addr, opcode, len); return; } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index b1abf663417c..577e31647dc4 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -229,6 +229,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, bool early = me->state == MODULE_STATE_UNFORMED; void *(*write)(void *, const void *, size_t) = memcpy; + early = false; if (!early) { write = text_poke; mutex_lock(&text_mutex); diff --git a/kernel/module/main.c b/kernel/module/main.c index a4e4d84b6f4e..b44806e31a56 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include "internal.h" @@ -1203,7 +1204,7 @@ static void free_module(struct module *mod) lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size); /* Finally, free the core (containing the module structure) */ - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC vfree(mod->data_layout.base); #endif @@ -1321,7 +1322,8 @@ static int simplify_symbols(struct module *mod, const struct load_info *info) ksym = resolve_symbol_wait(mod, info, name); /* Ok if resolved. */ if (ksym && !IS_ERR(ksym)) { - sym[i].st_value = kernel_symbol_value(ksym); + unsigned long val = kernel_symbol_value(ksym); + bpf_arch_text_copy(&sym[i].st_value, &val, sizeof(val)); break; } @@ -1342,7 +1344,8 @@ static int simplify_symbols(struct module *mod, const struct load_info *info) secbase = (unsigned long)mod_percpu(mod); else secbase = info->sechdrs[sym[i].st_shndx].sh_addr; - sym[i].st_value += secbase; + secbase += sym[i].st_value; + bpf_arch_text_copy(&sym[i].st_value, &secbase, sizeof(secbase)); break; } } @@ -2123,7 +2126,7 @@ static int move_module(struct module *mod, struct load_info *info) void *ptr; /* Do the allocs. */ - ptr = module_alloc(mod->core_layout.size); + ptr = vmalloc_exec(mod->core_layout.size, PAGE_SIZE); /* * The pointer to this block is stored in the module structure * which is inside the block. Just mark it as not being a @@ -2133,7 +2136,7 @@ static int move_module(struct module *mod, struct load_info *info) if (!ptr) return -ENOMEM; - memset(ptr, 0, mod->core_layout.size); +/* memset(ptr, 0, mod->core_layout.size); */ mod->core_layout.base = ptr; if (mod->init_layout.size) { @@ -2146,7 +2149,7 @@ static int move_module(struct module *mod, struct load_info *info) */ kmemleak_ignore(ptr); if (!ptr) { - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); return -ENOMEM; } memset(ptr, 0, mod->init_layout.size); @@ -2156,7 +2159,7 @@ static int move_module(struct module *mod, struct load_info *info) #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC /* Do the allocs. */ - ptr = vzalloc(mod->data_layout.size); + ptr = module_alloc(mod->data_layout.size); /* * The pointer to this block is stored in the module structure * which is inside the block. Just mark it as not being a @@ -2164,7 +2167,7 @@ static int move_module(struct module *mod, struct load_info *info) */ kmemleak_not_leak(ptr); if (!ptr) { - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); module_memfree(mod->init_layout.base); return -ENOMEM; } @@ -2189,7 +2192,7 @@ static int move_module(struct module *mod, struct load_info *info) dest = mod->core_layout.base + shdr->sh_entsize; if (shdr->sh_type != SHT_NOBITS) - memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size); + bpf_arch_text_copy(dest, (void *)shdr->sh_addr, shdr->sh_size); /* Update sh_addr to point to copy in image. */ shdr->sh_addr = (unsigned long)dest; pr_debug("\t0x%lx %s\n", @@ -2345,7 +2348,7 @@ static void module_deallocate(struct module *mod, struct load_info *info) percpu_modfree(mod); module_arch_freeing_init(mod); module_memfree(mod->init_layout.base); - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC vfree(mod->data_layout.base); #endif diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index 14fbea66f12f..d392eb7bf574 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -85,7 +85,6 @@ void module_enable_x(const struct module *mod) !PAGE_ALIGNED(mod->init_layout.base)) return; - frob_text(&mod->core_layout, set_memory_x); frob_text(&mod->init_layout, set_memory_x); } @@ -98,9 +97,7 @@ void module_enable_ro(const struct module *mod, bool after_init) return; #endif - set_vm_flush_reset_perms(mod->core_layout.base); set_vm_flush_reset_perms(mod->init_layout.base); - frob_text(&mod->core_layout, set_memory_ro); frob_rodata(&mod->data_layout, set_memory_ro); frob_text(&mod->init_layout, set_memory_ro); diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 439e2ab6905e..818418d5b853 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3142,6 +3142,7 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) if (mod) rec_flags |= FTRACE_FL_DISABLED; + ftrace_arch_code_modify_prepare(); for (pg = new_pgs; pg; pg = pg->next) { for (i = 0; i < pg->index; i++) { @@ -3163,7 +3164,7 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) update_cnt++; } } - + ftrace_arch_code_modify_post_process(); stop = ftrace_now(raw_smp_processor_id()); ftrace_update_time = stop - start; ftrace_update_tot_cnt += update_cnt;