[v2,10/20] x86: avoid W^X being broken during modules loading
diff mbox series

Message ID 20190129003422.9328-11-rick.p.edgecombe@intel.com
State New
Headers show
Series
  • Merge text_poke fixes and executable lockdowns
Related show

Commit Message

Edgecombe, Rick P Jan. 29, 2019, 12:34 a.m. UTC
From: Nadav Amit <namit@vmware.com>

When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code.

Prevent having writable executable PTEs in this stage. In addition,
avoiding having W+X mappings can also slightly simplify the patching of
modules code on initialization (e.g., by alternatives and static-key),
as would be done in the next patch.

To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.

Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/kernel/alternative.c | 28 +++++++++++++++++++++-------
 arch/x86/kernel/module.c      |  2 +-
 include/linux/filter.h        |  2 +-
 kernel/module.c               |  5 +++++
 4 files changed, 28 insertions(+), 9 deletions(-)

Comments

Borislav Petkov Feb. 11, 2019, 6:29 p.m. UTC | #1
> Subject: Re: [PATCH v2 10/20] x86: avoid W^X being broken during modules loading

For your next submission, please fix all your subjects:

The tip tree preferred format for patch subject prefixes is
'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
'genirq/core:'. Please do not use file names or complete file paths as
prefix. 'git log path/to/file' should give you a reasonable hint in most
cases.

The condensed patch description in the subject line should start with a
uppercase letter and should be written in imperative tone.


On Mon, Jan 28, 2019 at 04:34:12PM -0800, Rick Edgecombe wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> When modules and BPF filters are loaded, there is a time window in
> which some memory is both writable and executable. An attacker that has
> already found another vulnerability (e.g., a dangling pointer) might be
> able to exploit this behavior to overwrite kernel code.
> 
> Prevent having writable executable PTEs in this stage. In addition,
> avoiding having W+X mappings can also slightly simplify the patching of
> modules code on initialization (e.g., by alternatives and static-key),
> as would be done in the next patch.
> 
> To avoid having W+X mappings, set them initially as RW (NX) and after
> they are set as RO set them as X as well. Setting them as executable is
> done as a separate step to avoid one core in which the old PTE is cached
> (hence writable), and another which sees the updated PTE (executable),
> which would break the W^X protection.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/kernel/alternative.c | 28 +++++++++++++++++++++-------
>  arch/x86/kernel/module.c      |  2 +-
>  include/linux/filter.h        |  2 +-
>  kernel/module.c               |  5 +++++
>  4 files changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 76d482a2b716..69f3e650ada8 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -667,15 +667,29 @@ void __init alternative_instructions(void)
>   * handlers seeing an inconsistent instruction while you patch.
>   */
>  void *__init_or_module text_poke_early(void *addr, const void *opcode,
> -					      size_t len)
> +				       size_t len)
>  {
>  	unsigned long flags;
> -	local_irq_save(flags);
> -	memcpy(addr, opcode, len);
> -	local_irq_restore(flags);
> -	sync_core();
> -	/* Could also do a CLFLUSH here to speed up CPU recovery; but
> -	   that causes hangs on some VIA CPUs. */
> +
> +	if (static_cpu_has(X86_FEATURE_NX) &&

Not a fast path - boot_cpu_has() is fine here.
Nadav Amit Feb. 11, 2019, 6:45 p.m. UTC | #2
> On Feb 11, 2019, at 10:29 AM, Borislav Petkov <bp@alien8.de> wrote:
> 
>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>> index 76d482a2b716..69f3e650ada8 100644
>> --- a/arch/x86/kernel/alternative.c
>> +++ b/arch/x86/kernel/alternative.c
>> @@ -667,15 +667,29 @@ void __init alternative_instructions(void)
>>  * handlers seeing an inconsistent instruction while you patch.
>>  */
>> void *__init_or_module text_poke_early(void *addr, const void *opcode,
>> -					      size_t len)
>> +				       size_t len)
>> {
>> 	unsigned long flags;
>> -	local_irq_save(flags);
>> -	memcpy(addr, opcode, len);
>> -	local_irq_restore(flags);
>> -	sync_core();
>> -	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>> -	   that causes hangs on some VIA CPUs. */
>> +
>> +	if (static_cpu_has(X86_FEATURE_NX) &&
> 
> Not a fast path - boot_cpu_has() is fine here.

Are you sure about that? This path is still used when modules are loaded.
Borislav Petkov Feb. 11, 2019, 7:01 p.m. UTC | #3
On Mon, Feb 11, 2019 at 10:45:26AM -0800, Nadav Amit wrote:
> Are you sure about that? This path is still used when modules are loaded.

Yes, I'm sure. Loading a module does a gazillion things so saving a
couple of insns - yes, boot_cpu_has() is usually a RIP-relative MOV and a
TEST - doesn't show even as a blip on any radar.
Nadav Amit Feb. 11, 2019, 7:09 p.m. UTC | #4
> On Feb 11, 2019, at 11:01 AM, Borislav Petkov <bp@alien8.de> wrote:
> 
> On Mon, Feb 11, 2019 at 10:45:26AM -0800, Nadav Amit wrote:
>> Are you sure about that? This path is still used when modules are loaded.
> 
> Yes, I'm sure. Loading a module does a gazillion things so saving a
> couple of insns - yes, boot_cpu_has() is usually a RIP-relative MOV and a
> TEST - doesn't show even as a blip on any radar.

I fully agree, if that is the standard.

It is just that I find the use of static_cpu_has()/boot_cpu_has() to be very
inconsistent. I doubt that show_cpuinfo_misc(), copy_fpstate_to_sigframe(),
or i915_memcpy_init_early() that use static_cpu_has() are any hotter than
text_poke_early().

Anyhow, I’ll use boot_cpu_has() as you said.
Borislav Petkov Feb. 11, 2019, 7:10 p.m. UTC | #5
On Mon, Feb 11, 2019 at 11:09:25AM -0800, Nadav Amit wrote:
> It is just that I find the use of static_cpu_has()/boot_cpu_has() to be very
> inconsistent. I doubt that show_cpuinfo_misc(), copy_fpstate_to_sigframe(),
> or i915_memcpy_init_early() that use static_cpu_has() are any hotter than
> text_poke_early().

Would some beefing of the comment over it help?
Nadav Amit Feb. 11, 2019, 7:27 p.m. UTC | #6
> On Feb 11, 2019, at 11:10 AM, Borislav Petkov <bp@alien8.de> wrote:
> 
> On Mon, Feb 11, 2019 at 11:09:25AM -0800, Nadav Amit wrote:
>> It is just that I find the use of static_cpu_has()/boot_cpu_has() to be very
>> inconsistent. I doubt that show_cpuinfo_misc(), copy_fpstate_to_sigframe(),
>> or i915_memcpy_init_early() that use static_cpu_has() are any hotter than
>> text_poke_early().
> 
> Would some beefing of the comment over it help?

Is there any comment over static_cpu_has()? ;-)

Anyhow, obviously a comment would be useful.
Borislav Petkov Feb. 11, 2019, 7:42 p.m. UTC | #7
On Mon, Feb 11, 2019 at 11:27:03AM -0800, Nadav Amit wrote:
> Is there any comment over static_cpu_has()? ;-)

Almost:

/*
 * Static testing of CPU features.  Used the same as boot_cpu_has().
 * These will statically patch the target code for additional
 * performance.
 */
static __always_inline __pure bool _static_cpu_has(u16 bit)
Nadav Amit Feb. 11, 2019, 8:32 p.m. UTC | #8
> On Feb 11, 2019, at 11:42 AM, Borislav Petkov <bp@alien8.de> wrote:
> 
> On Mon, Feb 11, 2019 at 11:27:03AM -0800, Nadav Amit wrote:
>> Is there any comment over static_cpu_has()? ;-)
> 
> Almost:
> 
> /*
> * Static testing of CPU features.  Used the same as boot_cpu_has().
> * These will statically patch the target code for additional
> * performance.
> */
> static __always_inline __pure bool _static_cpu_has(u16 bit)

Oh, I missed this comment.

BTW: the “__pure” attribute is useless when “__always_inline” is used.
Unless it is intended to be some sort of comment, of course.
Borislav Petkov March 7, 2019, 7:29 a.m. UTC | #9
On Mon, Feb 11, 2019 at 08:42:51PM +0100, Borislav Petkov wrote:
> On Mon, Feb 11, 2019 at 11:27:03AM -0800, Nadav Amit wrote:
> > Is there any comment over static_cpu_has()? ;-)
> 
> Almost:
> 
> /*
>  * Static testing of CPU features.  Used the same as boot_cpu_has().
>  * These will statically patch the target code for additional
>  * performance.
>  */
> static __always_inline __pure bool _static_cpu_has(u16 bit)

Ok, I guess something like that along with converting the obvious slow
path callers to boot_cpu_has():

---
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index ce95b8cbd229..e25d11ad7a88 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -155,9 +155,12 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
 #else
 
 /*
- * Static testing of CPU features.  Used the same as boot_cpu_has().
- * These will statically patch the target code for additional
- * performance.
+ * Static testing of CPU features. Used the same as boot_cpu_has(). It
+ * statically patches the target code for additional performance. Use
+ * static_cpu_has() only in fast paths, where every cycle counts. Which
+ * means that the boot_cpu_has() variant is already fast enough for the
+ * majority of cases and you should stick to using it as it is generally
+ * only two instructions: a RIP-relative MOV and a TEST.
  */
 static __always_inline __pure bool _static_cpu_has(u16 bit)
 {
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index fa2c93cb42a2..c525b053b3b3 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -291,7 +291,7 @@ static inline void copy_xregs_to_kernel_booting(struct xregs_state *xstate)
 
 	WARN_ON(system_state != SYSTEM_BOOTING);
 
-	if (static_cpu_has(X86_FEATURE_XSAVES))
+	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		XSTATE_OP(XSAVES, xstate, lmask, hmask, err);
 	else
 		XSTATE_OP(XSAVE, xstate, lmask, hmask, err);
@@ -313,7 +313,7 @@ static inline void copy_kernel_to_xregs_booting(struct xregs_state *xstate)
 
 	WARN_ON(system_state != SYSTEM_BOOTING);
 
-	if (static_cpu_has(X86_FEATURE_XSAVES))
+	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
 	else
 		XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
@@ -528,8 +528,7 @@ static inline void fpregs_activate(struct fpu *fpu)
  *  - switch_fpu_finish() restores the new state as
  *    necessary.
  */
-static inline void
-switch_fpu_prepare(struct fpu *old_fpu, int cpu)
+static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
 	if (static_cpu_has(X86_FEATURE_FPU) && old_fpu->initialized) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 78778b54f904..a5464b8b6c46 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -175,7 +175,7 @@ static void fixup_cpu_id(struct cpuinfo_x86 *c, int node)
 	this_cpu_write(cpu_llc_id, node);
 
 	/* Account for nodes per socket in multi-core-module processors */
-	if (static_cpu_has(X86_FEATURE_NODEID_MSR)) {
+	if (boot_cpu_has(X86_FEATURE_NODEID_MSR)) {
 		rdmsrl(MSR_FAM10H_NODE_ID, val);
 		nodes = ((val >> 3) & 7) + 1;
 	}
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 804c49493938..64d5aec24203 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -83,7 +83,7 @@ unsigned int aperfmperf_get_khz(int cpu)
 	if (!cpu_khz)
 		return 0;
 
-	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
 		return 0;
 
 	aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
@@ -99,7 +99,7 @@ void arch_freq_prepare_all(void)
 	if (!cpu_khz)
 		return;
 
-	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
 		return;
 
 	for_each_online_cpu(cpu)
@@ -115,7 +115,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
 	if (!cpu_khz)
 		return 0;
 
-	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
 		return 0;
 
 	if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true))
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..95a5faf3a6a0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1668,7 +1668,7 @@ static void setup_getcpu(int cpu)
 	unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
 	struct desc_struct d = { };
 
-	if (static_cpu_has(X86_FEATURE_RDTSCP))
+	if (boot_cpu_has(X86_FEATURE_RDTSCP))
 		write_rdtscp_aux(cpudata);
 
 	/* Store CPU and node number in limit. */
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 8492ef7d9015..3da9a8823e47 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -528,7 +528,7 @@ static void do_inject(void)
 	 * only on the node base core. Refer to D18F3x44[NbMcaToMstCpuEn] for
 	 * Fam10h and later BKDGs.
 	 */
-	if (static_cpu_has(X86_FEATURE_AMD_DCM) &&
+	if (boot_cpu_has(X86_FEATURE_AMD_DCM) &&
 	    b == 4 &&
 	    boot_cpu_data.x86 < 0x17) {
 		toggle_nb_mca_mst_cpu(amd_get_nb_id(cpu));
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 2c8522a39ed5..cb2e49810d68 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -35,11 +35,11 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
 		   "fpu_exception\t: %s\n"
 		   "cpuid level\t: %d\n"
 		   "wp\t\t: yes\n",
-		   static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
-		   static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
-		   static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no",
-		   static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
-		   static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
+		   boot_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
+		   boot_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
+		   boot_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no",
+		   boot_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
+		   boot_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
 		   c->cpuid_level);
 }
 #else
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 6135ae8ce036..b2463fcb20a8 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -113,7 +113,7 @@ static void do_sanity_check(struct mm_struct *mm,
 		 * tables.
 		 */
 		WARN_ON(!had_kernel_mapping);
-		if (static_cpu_has(X86_FEATURE_PTI))
+		if (boot_cpu_has(X86_FEATURE_PTI))
 			WARN_ON(!had_user_mapping);
 	} else {
 		/*
@@ -121,7 +121,7 @@ static void do_sanity_check(struct mm_struct *mm,
 		 * Sync the pgd to the usermode tables.
 		 */
 		WARN_ON(had_kernel_mapping);
-		if (static_cpu_has(X86_FEATURE_PTI))
+		if (boot_cpu_has(X86_FEATURE_PTI))
 			WARN_ON(had_user_mapping);
 	}
 }
@@ -156,7 +156,7 @@ static void map_ldt_struct_to_user(struct mm_struct *mm)
 	k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
 	u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
 
-	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+	if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
 		set_pmd(u_pmd, *k_pmd);
 }
 
@@ -181,7 +181,7 @@ static void map_ldt_struct_to_user(struct mm_struct *mm)
 {
 	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
 
-	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+	if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
 		set_pgd(kernel_to_user_pgdp(pgd), *pgd);
 }
 
@@ -208,7 +208,7 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	spinlock_t *ptl;
 	int i, nr_pages;
 
-	if (!static_cpu_has(X86_FEATURE_PTI))
+	if (!boot_cpu_has(X86_FEATURE_PTI))
 		return 0;
 
 	/*
@@ -271,7 +271,7 @@ static void unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
 		return;
 
 	/* LDT map/unmap is only required for PTI */
-	if (!static_cpu_has(X86_FEATURE_PTI))
+	if (!boot_cpu_has(X86_FEATURE_PTI))
 		return;
 
 	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
@@ -311,7 +311,7 @@ static void free_ldt_pgtables(struct mm_struct *mm)
 	unsigned long start = LDT_BASE_ADDR;
 	unsigned long end = LDT_END_ADDR;
 
-	if (!static_cpu_has(X86_FEATURE_PTI))
+	if (!boot_cpu_has(X86_FEATURE_PTI))
 		return;
 
 	tlb_gather_mmu(&tlb, mm, start, end);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c0e0101133f3..7bbaa6baf37f 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -121,7 +121,7 @@ DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
 
 void __init native_pv_lock_init(void)
 {
-	if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		static_branch_disable(&virt_spin_lock_key);
 }
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 58ac7be52c7a..16a7113e91c5 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -236,7 +236,7 @@ static int get_cpuid_mode(void)
 
 static int set_cpuid_mode(struct task_struct *task, unsigned long cpuid_enabled)
 {
-	if (!static_cpu_has(X86_FEATURE_CPUID_FAULT))
+	if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
 		return -ENODEV;
 
 	if (cpuid_enabled)
@@ -666,7 +666,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
 	if (c->x86_vendor != X86_VENDOR_INTEL)
 		return 0;
 
-	if (!cpu_has(c, X86_FEATURE_MWAIT) || static_cpu_has_bug(X86_BUG_MONITOR))
+	if (!cpu_has(c, X86_FEATURE_MWAIT) || boot_cpu_has_bug(X86_BUG_MONITOR))
 		return 0;
 
 	return 1;
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 725624b6c0c0..d62ebbc5ec78 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -108,7 +108,7 @@ void __noreturn machine_real_restart(unsigned int type)
 	write_cr3(real_mode_header->trampoline_pgd);
 
 	/* Exiting long mode will fail if CR4.PCIDE is set. */
-	if (static_cpu_has(X86_FEATURE_PCID))
+	if (boot_cpu_has(X86_FEATURE_PCID))
 		cr4_clear_bits(X86_CR4_PCIDE);
 #endif
 
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index a092b6b40c6b..6a38717d179c 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -369,7 +369,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	preempt_disable();
 	tsk->thread.sp0 += 16;
 
-	if (static_cpu_has(X86_FEATURE_SEP)) {
+	if (boot_cpu_has(X86_FEATURE_SEP)) {
 		tsk->thread.sysenter_cs = 0;
 		refresh_sysenter_cs(&tsk->thread);
 	}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f13a3a24d360..5ed039bf1b58 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -835,7 +835,7 @@ static void svm_init_erratum_383(void)
 	int err;
 	u64 val;
 
-	if (!static_cpu_has_bug(X86_BUG_AMD_TLB_MMATCH))
+	if (!boot_cpu_has_bug(X86_BUG_AMD_TLB_MMATCH))
 		return;
 
 	/* Use _safe variants to not break nested virtualization */
@@ -889,7 +889,7 @@ static int has_svm(void)
 static void svm_hardware_disable(void)
 {
 	/* Make sure we clean up behind us */
-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR))
+	if (boot_cpu_has(X86_FEATURE_TSCRATEMSR))
 		wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
 
 	cpu_svm_disable();
@@ -931,7 +931,7 @@ static int svm_hardware_enable(void)
 
 	wrmsrl(MSR_VM_HSAVE_PA, page_to_pfn(sd->save_area) << PAGE_SHIFT);
 
-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+	if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
 		wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
 		__this_cpu_write(current_tsc_ratio, TSC_RATIO_DEFAULT);
 	}
@@ -2247,7 +2247,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
 		rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 
-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+	if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
 		u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio;
 		if (tsc_ratio != __this_cpu_read(current_tsc_ratio)) {
 			__this_cpu_write(current_tsc_ratio, tsc_ratio);
@@ -2255,7 +2255,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		}
 	}
 	/* This assumes that the kernel never uses MSR_TSC_AUX */
-	if (static_cpu_has(X86_FEATURE_RDTSCP))
+	if (boot_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
 	if (sd->current_vmcb != svm->vmcb) {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 30a6bcd735ec..0ec24853a0e6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6553,7 +6553,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
 		vmx_set_interrupt_shadow(vcpu, 0);
 
-	if (static_cpu_has(X86_FEATURE_PKU) &&
+	if (boot_cpu_has(X86_FEATURE_PKU) &&
 	    kvm_read_cr4_bits(vcpu, X86_CR4_PKE) &&
 	    vcpu->arch.pkru != vmx->host_pkru)
 		__write_pkru(vcpu->arch.pkru);
@@ -6633,7 +6633,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	 * back on host, so it is safe to read guest PKRU from current
 	 * XSAVE.
 	 */
-	if (static_cpu_has(X86_FEATURE_PKU) &&
+	if (boot_cpu_has(X86_FEATURE_PKU) &&
 	    kvm_read_cr4_bits(vcpu, X86_CR4_PKE)) {
 		vcpu->arch.pkru = __read_pkru();
 		if (vcpu->arch.pkru != vmx->host_pkru)
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index e3cdc85ce5b6..b596ac1eed1c 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -579,7 +579,7 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	if (user && static_cpu_has(X86_FEATURE_PTI))
+	if (user && boot_cpu_has(X86_FEATURE_PTI))
 		pgd = kernel_to_user_pgdp(pgd);
 #endif
 	ptdump_walk_pgd_level_core(m, pgd, false, false);
@@ -592,7 +592,7 @@ void ptdump_walk_user_pgd_level_checkwx(void)
 	pgd_t *pgd = INIT_PGD;
 
 	if (!(__supported_pte_mask & _PAGE_NX) ||
-	    !static_cpu_has(X86_FEATURE_PTI))
+	    !boot_cpu_has(X86_FEATURE_PTI))
 		return;
 
 	pr_info("x86/mm: Checking user space page tables\n");
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 7bd01709a091..3dbf440d4114 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -190,7 +190,7 @@ static void pgd_dtor(pgd_t *pgd)
  * when PTI is enabled. We need them to map the per-process LDT into the
  * user-space page-table.
  */
-#define PREALLOCATED_USER_PMDS	 (static_cpu_has(X86_FEATURE_PTI) ? \
+#define PREALLOCATED_USER_PMDS	 (boot_cpu_has(X86_FEATURE_PTI) ? \
 					KERNEL_PGD_PTRS : 0)
 #define MAX_PREALLOCATED_USER_PMDS KERNEL_PGD_PTRS
 
@@ -292,7 +292,7 @@ static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 
-	if (!static_cpu_has(X86_FEATURE_PTI))
+	if (!boot_cpu_has(X86_FEATURE_PTI))
 		return;
 
 	pgdp = kernel_to_user_pgdp(pgdp);
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 4fee5c3003ed..8c9a54ebda60 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -626,7 +626,7 @@ void pti_set_kernel_image_nonglobal(void)
  */
 void __init pti_init(void)
 {
-	if (!static_cpu_has(X86_FEATURE_PTI))
+	if (!boot_cpu_has(X86_FEATURE_PTI))
 		return;
 
 	pr_info("enabled\n");
diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
index 4ac7c3cf34be..6927a8c0e748 100644
--- a/drivers/cpufreq/amd_freq_sensitivity.c
+++ b/drivers/cpufreq/amd_freq_sensitivity.c
@@ -124,7 +124,7 @@ static int __init amd_freq_sensitivity_init(void)
 			PCI_DEVICE_ID_AMD_KERNCZ_SMBUS, NULL);
 
 	if (!pcidev) {
-		if (!static_cpu_has(X86_FEATURE_PROC_FEEDBACK))
+		if (!boot_cpu_has(X86_FEATURE_PROC_FEEDBACK))
 			return -ENODEV;
 	}
 
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index dd66decf2087..9bbc3dfdebe3 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -520,7 +520,7 @@ static s16 intel_pstate_get_epb(struct cpudata *cpu_data)
 	u64 epb;
 	int ret;
 
-	if (!static_cpu_has(X86_FEATURE_EPB))
+	if (!boot_cpu_has(X86_FEATURE_EPB))
 		return -ENXIO;
 
 	ret = rdmsrl_on_cpu(cpu_data->cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
@@ -534,7 +534,7 @@ static s16 intel_pstate_get_epp(struct cpudata *cpu_data, u64 hwp_req_data)
 {
 	s16 epp;
 
-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
 		/*
 		 * When hwp_req_data is 0, means that caller didn't read
 		 * MSR_HWP_REQUEST, so need to read and get EPP.
@@ -559,7 +559,7 @@ static int intel_pstate_set_epb(int cpu, s16 pref)
 	u64 epb;
 	int ret;
 
-	if (!static_cpu_has(X86_FEATURE_EPB))
+	if (!boot_cpu_has(X86_FEATURE_EPB))
 		return -ENXIO;
 
 	ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
@@ -607,7 +607,7 @@ static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data)
 	if (epp < 0)
 		return epp;
 
-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
 		if (epp == HWP_EPP_PERFORMANCE)
 			return 1;
 		if (epp <= HWP_EPP_BALANCE_PERFORMANCE)
@@ -616,7 +616,7 @@ static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data)
 			return 3;
 		else
 			return 4;
-	} else if (static_cpu_has(X86_FEATURE_EPB)) {
+	} else if (boot_cpu_has(X86_FEATURE_EPB)) {
 		/*
 		 * Range:
 		 *	0x00-0x03	:	Performance
@@ -644,7 +644,7 @@ static int intel_pstate_set_energy_pref_index(struct cpudata *cpu_data,
 
 	mutex_lock(&intel_pstate_limits_lock);
 
-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
 		u64 value;
 
 		ret = rdmsrl_on_cpu(cpu_data->cpu, MSR_HWP_REQUEST, &value);
@@ -819,7 +819,7 @@ static void intel_pstate_hwp_set(unsigned int cpu)
 		epp = cpu_data->epp_powersave;
 	}
 update_epp:
-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
 		value &= ~GENMASK_ULL(31, 24);
 		value |= (u64)epp << 24;
 	} else {
@@ -844,7 +844,7 @@ static void intel_pstate_hwp_force_min_perf(int cpu)
 	value |= HWP_MIN_PERF(min_perf);
 
 	/* Set EPP/EPB to min */
-	if (static_cpu_has(X86_FEATURE_HWP_EPP))
+	if (boot_cpu_has(X86_FEATURE_HWP_EPP))
 		value |= HWP_ENERGY_PERF_PREFERENCE(HWP_EPP_POWERSAVE);
 	else
 		intel_pstate_set_epb(cpu, HWP_EPP_BALANCE_POWERSAVE);
@@ -1191,7 +1191,7 @@ static void __init intel_pstate_sysfs_expose_params(void)
 static void intel_pstate_hwp_enable(struct cpudata *cpudata)
 {
 	/* First disable HWP notification interrupt as we don't process them */
-	if (static_cpu_has(X86_FEATURE_HWP_NOTIFY))
+	if (boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
 		wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
 
 	wrmsrl_on_cpu(cpudata->cpu, MSR_PM_ENABLE, 0x1);
diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index fb77b39a4ce3..3c12e03fa343 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -1178,7 +1178,7 @@ static int powernowk8_init(void)
 	unsigned int i, supported_cpus = 0;
 	int ret;
 
-	if (static_cpu_has(X86_FEATURE_HW_PSTATE)) {
+	if (boot_cpu_has(X86_FEATURE_HW_PSTATE)) {
 		__request_acpi_cpufreq();
 		return -ENODEV;
 	}
H. Peter Anvin March 7, 2019, 4:53 p.m. UTC | #10
On March 6, 2019 11:29:47 PM PST, Borislav Petkov <bp@alien8.de> wrote:
>On Mon, Feb 11, 2019 at 08:42:51PM +0100, Borislav Petkov wrote:
>> On Mon, Feb 11, 2019 at 11:27:03AM -0800, Nadav Amit wrote:
>> > Is there any comment over static_cpu_has()? ;-)
>> 
>> Almost:
>> 
>> /*
>>  * Static testing of CPU features.  Used the same as boot_cpu_has().
>>  * These will statically patch the target code for additional
>>  * performance.
>>  */
>> static __always_inline __pure bool _static_cpu_has(u16 bit)
>
>Ok, I guess something like that along with converting the obvious slow
>path callers to boot_cpu_has():
>
>---
>diff --git a/arch/x86/include/asm/cpufeature.h
>b/arch/x86/include/asm/cpufeature.h
>index ce95b8cbd229..e25d11ad7a88 100644
>--- a/arch/x86/include/asm/cpufeature.h
>+++ b/arch/x86/include/asm/cpufeature.h
>@@ -155,9 +155,12 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c,
>unsigned int bit);
> #else
> 
> /*
>- * Static testing of CPU features.  Used the same as boot_cpu_has().
>- * These will statically patch the target code for additional
>- * performance.
>+ * Static testing of CPU features. Used the same as boot_cpu_has(). It
>+ * statically patches the target code for additional performance. Use
>+ * static_cpu_has() only in fast paths, where every cycle counts.
>Which
>+ * means that the boot_cpu_has() variant is already fast enough for
>the
>+ * majority of cases and you should stick to using it as it is
>generally
>+ * only two instructions: a RIP-relative MOV and a TEST.
>  */
> static __always_inline __pure bool _static_cpu_has(u16 bit)
> {
>diff --git a/arch/x86/include/asm/fpu/internal.h
>b/arch/x86/include/asm/fpu/internal.h
>index fa2c93cb42a2..c525b053b3b3 100644
>--- a/arch/x86/include/asm/fpu/internal.h
>+++ b/arch/x86/include/asm/fpu/internal.h
>@@ -291,7 +291,7 @@ static inline void
>copy_xregs_to_kernel_booting(struct xregs_state *xstate)
> 
> 	WARN_ON(system_state != SYSTEM_BOOTING);
> 
>-	if (static_cpu_has(X86_FEATURE_XSAVES))
>+	if (boot_cpu_has(X86_FEATURE_XSAVES))
> 		XSTATE_OP(XSAVES, xstate, lmask, hmask, err);
> 	else
> 		XSTATE_OP(XSAVE, xstate, lmask, hmask, err);
>@@ -313,7 +313,7 @@ static inline void
>copy_kernel_to_xregs_booting(struct xregs_state *xstate)
> 
> 	WARN_ON(system_state != SYSTEM_BOOTING);
> 
>-	if (static_cpu_has(X86_FEATURE_XSAVES))
>+	if (boot_cpu_has(X86_FEATURE_XSAVES))
> 		XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
> 	else
> 		XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
>@@ -528,8 +528,7 @@ static inline void fpregs_activate(struct fpu *fpu)
>  *  - switch_fpu_finish() restores the new state as
>  *    necessary.
>  */
>-static inline void
>-switch_fpu_prepare(struct fpu *old_fpu, int cpu)
>+static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
> {
> 	if (static_cpu_has(X86_FEATURE_FPU) && old_fpu->initialized) {
> 		if (!copy_fpregs_to_fpstate(old_fpu))
>diff --git a/arch/x86/kernel/apic/apic_numachip.c
>b/arch/x86/kernel/apic/apic_numachip.c
>index 78778b54f904..a5464b8b6c46 100644
>--- a/arch/x86/kernel/apic/apic_numachip.c
>+++ b/arch/x86/kernel/apic/apic_numachip.c
>@@ -175,7 +175,7 @@ static void fixup_cpu_id(struct cpuinfo_x86 *c, int
>node)
> 	this_cpu_write(cpu_llc_id, node);
> 
> 	/* Account for nodes per socket in multi-core-module processors */
>-	if (static_cpu_has(X86_FEATURE_NODEID_MSR)) {
>+	if (boot_cpu_has(X86_FEATURE_NODEID_MSR)) {
> 		rdmsrl(MSR_FAM10H_NODE_ID, val);
> 		nodes = ((val >> 3) & 7) + 1;
> 	}
>diff --git a/arch/x86/kernel/cpu/aperfmperf.c
>b/arch/x86/kernel/cpu/aperfmperf.c
>index 804c49493938..64d5aec24203 100644
>--- a/arch/x86/kernel/cpu/aperfmperf.c
>+++ b/arch/x86/kernel/cpu/aperfmperf.c
>@@ -83,7 +83,7 @@ unsigned int aperfmperf_get_khz(int cpu)
> 	if (!cpu_khz)
> 		return 0;
> 
>-	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
>+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
> 		return 0;
> 
> 	aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
>@@ -99,7 +99,7 @@ void arch_freq_prepare_all(void)
> 	if (!cpu_khz)
> 		return;
> 
>-	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
>+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
> 		return;
> 
> 	for_each_online_cpu(cpu)
>@@ -115,7 +115,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
> 	if (!cpu_khz)
> 		return 0;
> 
>-	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
>+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
> 		return 0;
> 
> 	if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true))
>diff --git a/arch/x86/kernel/cpu/common.c
>b/arch/x86/kernel/cpu/common.c
>index cb28e98a0659..95a5faf3a6a0 100644
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -1668,7 +1668,7 @@ static void setup_getcpu(int cpu)
>	unsigned long cpudata = vdso_encode_cpunode(cpu,
>early_cpu_to_node(cpu));
> 	struct desc_struct d = { };
> 
>-	if (static_cpu_has(X86_FEATURE_RDTSCP))
>+	if (boot_cpu_has(X86_FEATURE_RDTSCP))
> 		write_rdtscp_aux(cpudata);
> 
> 	/* Store CPU and node number in limit. */
>diff --git a/arch/x86/kernel/cpu/mce/inject.c
>b/arch/x86/kernel/cpu/mce/inject.c
>index 8492ef7d9015..3da9a8823e47 100644
>--- a/arch/x86/kernel/cpu/mce/inject.c
>+++ b/arch/x86/kernel/cpu/mce/inject.c
>@@ -528,7 +528,7 @@ static void do_inject(void)
> 	 * only on the node base core. Refer to D18F3x44[NbMcaToMstCpuEn] for
> 	 * Fam10h and later BKDGs.
> 	 */
>-	if (static_cpu_has(X86_FEATURE_AMD_DCM) &&
>+	if (boot_cpu_has(X86_FEATURE_AMD_DCM) &&
> 	    b == 4 &&
> 	    boot_cpu_data.x86 < 0x17) {
> 		toggle_nb_mca_mst_cpu(amd_get_nb_id(cpu));
>diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
>index 2c8522a39ed5..cb2e49810d68 100644
>--- a/arch/x86/kernel/cpu/proc.c
>+++ b/arch/x86/kernel/cpu/proc.c
>@@ -35,11 +35,11 @@ static void show_cpuinfo_misc(struct seq_file *m,
>struct cpuinfo_x86 *c)
> 		   "fpu_exception\t: %s\n"
> 		   "cpuid level\t: %d\n"
> 		   "wp\t\t: yes\n",
>-		   static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
>-		   static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
>-		   static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no",
>-		   static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
>-		   static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
>+		   boot_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
>+		   boot_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
>+		   boot_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no",
>+		   boot_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
>+		   boot_cpu_has(X86_FEATURE_FPU) ? "yes" : "no",
> 		   c->cpuid_level);
> }
> #else
>diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
>index 6135ae8ce036..b2463fcb20a8 100644
>--- a/arch/x86/kernel/ldt.c
>+++ b/arch/x86/kernel/ldt.c
>@@ -113,7 +113,7 @@ static void do_sanity_check(struct mm_struct *mm,
> 		 * tables.
> 		 */
> 		WARN_ON(!had_kernel_mapping);
>-		if (static_cpu_has(X86_FEATURE_PTI))
>+		if (boot_cpu_has(X86_FEATURE_PTI))
> 			WARN_ON(!had_user_mapping);
> 	} else {
> 		/*
>@@ -121,7 +121,7 @@ static void do_sanity_check(struct mm_struct *mm,
> 		 * Sync the pgd to the usermode tables.
> 		 */
> 		WARN_ON(had_kernel_mapping);
>-		if (static_cpu_has(X86_FEATURE_PTI))
>+		if (boot_cpu_has(X86_FEATURE_PTI))
> 			WARN_ON(had_user_mapping);
> 	}
> }
>@@ -156,7 +156,7 @@ static void map_ldt_struct_to_user(struct mm_struct
>*mm)
> 	k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
> 	u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
> 
>-	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
>+	if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
> 		set_pmd(u_pmd, *k_pmd);
> }
> 
>@@ -181,7 +181,7 @@ static void map_ldt_struct_to_user(struct mm_struct
>*mm)
> {
> 	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
> 
>-	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
>+	if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
> 		set_pgd(kernel_to_user_pgdp(pgd), *pgd);
> }
> 
>@@ -208,7 +208,7 @@ map_ldt_struct(struct mm_struct *mm, struct
>ldt_struct *ldt, int slot)
> 	spinlock_t *ptl;
> 	int i, nr_pages;
> 
>-	if (!static_cpu_has(X86_FEATURE_PTI))
>+	if (!boot_cpu_has(X86_FEATURE_PTI))
> 		return 0;
> 
> 	/*
>@@ -271,7 +271,7 @@ static void unmap_ldt_struct(struct mm_struct *mm,
>struct ldt_struct *ldt)
> 		return;
> 
> 	/* LDT map/unmap is only required for PTI */
>-	if (!static_cpu_has(X86_FEATURE_PTI))
>+	if (!boot_cpu_has(X86_FEATURE_PTI))
> 		return;
> 
> 	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
>@@ -311,7 +311,7 @@ static void free_ldt_pgtables(struct mm_struct *mm)
> 	unsigned long start = LDT_BASE_ADDR;
> 	unsigned long end = LDT_END_ADDR;
> 
>-	if (!static_cpu_has(X86_FEATURE_PTI))
>+	if (!boot_cpu_has(X86_FEATURE_PTI))
> 		return;
> 
> 	tlb_gather_mmu(&tlb, mm, start, end);
>diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>index c0e0101133f3..7bbaa6baf37f 100644
>--- a/arch/x86/kernel/paravirt.c
>+++ b/arch/x86/kernel/paravirt.c
>@@ -121,7 +121,7 @@ DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> 
> void __init native_pv_lock_init(void)
> {
>-	if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
>+	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
> 		static_branch_disable(&virt_spin_lock_key);
> }
> 
>diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>index 58ac7be52c7a..16a7113e91c5 100644
>--- a/arch/x86/kernel/process.c
>+++ b/arch/x86/kernel/process.c
>@@ -236,7 +236,7 @@ static int get_cpuid_mode(void)
> 
>static int set_cpuid_mode(struct task_struct *task, unsigned long
>cpuid_enabled)
> {
>-	if (!static_cpu_has(X86_FEATURE_CPUID_FAULT))
>+	if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
> 		return -ENODEV;
> 
> 	if (cpuid_enabled)
>@@ -666,7 +666,7 @@ static int prefer_mwait_c1_over_halt(const struct
>cpuinfo_x86 *c)
> 	if (c->x86_vendor != X86_VENDOR_INTEL)
> 		return 0;
> 
>-	if (!cpu_has(c, X86_FEATURE_MWAIT) ||
>static_cpu_has_bug(X86_BUG_MONITOR))
>+	if (!cpu_has(c, X86_FEATURE_MWAIT) ||
>boot_cpu_has_bug(X86_BUG_MONITOR))
> 		return 0;
> 
> 	return 1;
>diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
>index 725624b6c0c0..d62ebbc5ec78 100644
>--- a/arch/x86/kernel/reboot.c
>+++ b/arch/x86/kernel/reboot.c
>@@ -108,7 +108,7 @@ void __noreturn machine_real_restart(unsigned int
>type)
> 	write_cr3(real_mode_header->trampoline_pgd);
> 
> 	/* Exiting long mode will fail if CR4.PCIDE is set. */
>-	if (static_cpu_has(X86_FEATURE_PCID))
>+	if (boot_cpu_has(X86_FEATURE_PCID))
> 		cr4_clear_bits(X86_CR4_PCIDE);
> #endif
> 
>diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
>index a092b6b40c6b..6a38717d179c 100644
>--- a/arch/x86/kernel/vm86_32.c
>+++ b/arch/x86/kernel/vm86_32.c
>@@ -369,7 +369,7 @@ static long do_sys_vm86(struct vm86plus_struct
>__user *user_vm86, bool plus)
> 	preempt_disable();
> 	tsk->thread.sp0 += 16;
> 
>-	if (static_cpu_has(X86_FEATURE_SEP)) {
>+	if (boot_cpu_has(X86_FEATURE_SEP)) {
> 		tsk->thread.sysenter_cs = 0;
> 		refresh_sysenter_cs(&tsk->thread);
> 	}
>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>index f13a3a24d360..5ed039bf1b58 100644
>--- a/arch/x86/kvm/svm.c
>+++ b/arch/x86/kvm/svm.c
>@@ -835,7 +835,7 @@ static void svm_init_erratum_383(void)
> 	int err;
> 	u64 val;
> 
>-	if (!static_cpu_has_bug(X86_BUG_AMD_TLB_MMATCH))
>+	if (!boot_cpu_has_bug(X86_BUG_AMD_TLB_MMATCH))
> 		return;
> 
> 	/* Use _safe variants to not break nested virtualization */
>@@ -889,7 +889,7 @@ static int has_svm(void)
> static void svm_hardware_disable(void)
> {
> 	/* Make sure we clean up behind us */
>-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR))
>+	if (boot_cpu_has(X86_FEATURE_TSCRATEMSR))
> 		wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
> 
> 	cpu_svm_disable();
>@@ -931,7 +931,7 @@ static int svm_hardware_enable(void)
> 
> 	wrmsrl(MSR_VM_HSAVE_PA, page_to_pfn(sd->save_area) << PAGE_SHIFT);
> 
>-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
>+	if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
> 		wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
> 		__this_cpu_write(current_tsc_ratio, TSC_RATIO_DEFAULT);
> 	}
>@@ -2247,7 +2247,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu,
>int cpu)
> 	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
> 		rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
> 
>-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
>+	if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
> 		u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio;
> 		if (tsc_ratio != __this_cpu_read(current_tsc_ratio)) {
> 			__this_cpu_write(current_tsc_ratio, tsc_ratio);
>@@ -2255,7 +2255,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu,
>int cpu)
> 		}
> 	}
> 	/* This assumes that the kernel never uses MSR_TSC_AUX */
>-	if (static_cpu_has(X86_FEATURE_RDTSCP))
>+	if (boot_cpu_has(X86_FEATURE_RDTSCP))
> 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
> 
> 	if (sd->current_vmcb != svm->vmcb) {
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 30a6bcd735ec..0ec24853a0e6 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -6553,7 +6553,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> 		vmx_set_interrupt_shadow(vcpu, 0);
> 
>-	if (static_cpu_has(X86_FEATURE_PKU) &&
>+	if (boot_cpu_has(X86_FEATURE_PKU) &&
> 	    kvm_read_cr4_bits(vcpu, X86_CR4_PKE) &&
> 	    vcpu->arch.pkru != vmx->host_pkru)
> 		__write_pkru(vcpu->arch.pkru);
>@@ -6633,7 +6633,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> 	 * back on host, so it is safe to read guest PKRU from current
> 	 * XSAVE.
> 	 */
>-	if (static_cpu_has(X86_FEATURE_PKU) &&
>+	if (boot_cpu_has(X86_FEATURE_PKU) &&
> 	    kvm_read_cr4_bits(vcpu, X86_CR4_PKE)) {
> 		vcpu->arch.pkru = __read_pkru();
> 		if (vcpu->arch.pkru != vmx->host_pkru)
>diff --git a/arch/x86/mm/dump_pagetables.c
>b/arch/x86/mm/dump_pagetables.c
>index e3cdc85ce5b6..b596ac1eed1c 100644
>--- a/arch/x86/mm/dump_pagetables.c
>+++ b/arch/x86/mm/dump_pagetables.c
>@@ -579,7 +579,7 @@ void ptdump_walk_pgd_level(struct seq_file *m,
>pgd_t *pgd)
>void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool
>user)
> {
> #ifdef CONFIG_PAGE_TABLE_ISOLATION
>-	if (user && static_cpu_has(X86_FEATURE_PTI))
>+	if (user && boot_cpu_has(X86_FEATURE_PTI))
> 		pgd = kernel_to_user_pgdp(pgd);
> #endif
> 	ptdump_walk_pgd_level_core(m, pgd, false, false);
>@@ -592,7 +592,7 @@ void ptdump_walk_user_pgd_level_checkwx(void)
> 	pgd_t *pgd = INIT_PGD;
> 
> 	if (!(__supported_pte_mask & _PAGE_NX) ||
>-	    !static_cpu_has(X86_FEATURE_PTI))
>+	    !boot_cpu_has(X86_FEATURE_PTI))
> 		return;
> 
> 	pr_info("x86/mm: Checking user space page tables\n");
>diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>index 7bd01709a091..3dbf440d4114 100644
>--- a/arch/x86/mm/pgtable.c
>+++ b/arch/x86/mm/pgtable.c
>@@ -190,7 +190,7 @@ static void pgd_dtor(pgd_t *pgd)
>* when PTI is enabled. We need them to map the per-process LDT into the
>  * user-space page-table.
>  */
>-#define PREALLOCATED_USER_PMDS	 (static_cpu_has(X86_FEATURE_PTI) ? \
>+#define PREALLOCATED_USER_PMDS	 (boot_cpu_has(X86_FEATURE_PTI) ? \
> 					KERNEL_PGD_PTRS : 0)
> #define MAX_PREALLOCATED_USER_PMDS KERNEL_PGD_PTRS
> 
>@@ -292,7 +292,7 @@ static void pgd_mop_up_pmds(struct mm_struct *mm,
>pgd_t *pgdp)
> 
> #ifdef CONFIG_PAGE_TABLE_ISOLATION
> 
>-	if (!static_cpu_has(X86_FEATURE_PTI))
>+	if (!boot_cpu_has(X86_FEATURE_PTI))
> 		return;
> 
> 	pgdp = kernel_to_user_pgdp(pgdp);
>diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
>index 4fee5c3003ed..8c9a54ebda60 100644
>--- a/arch/x86/mm/pti.c
>+++ b/arch/x86/mm/pti.c
>@@ -626,7 +626,7 @@ void pti_set_kernel_image_nonglobal(void)
>  */
> void __init pti_init(void)
> {
>-	if (!static_cpu_has(X86_FEATURE_PTI))
>+	if (!boot_cpu_has(X86_FEATURE_PTI))
> 		return;
> 
> 	pr_info("enabled\n");
>diff --git a/drivers/cpufreq/amd_freq_sensitivity.c
>b/drivers/cpufreq/amd_freq_sensitivity.c
>index 4ac7c3cf34be..6927a8c0e748 100644
>--- a/drivers/cpufreq/amd_freq_sensitivity.c
>+++ b/drivers/cpufreq/amd_freq_sensitivity.c
>@@ -124,7 +124,7 @@ static int __init amd_freq_sensitivity_init(void)
> 			PCI_DEVICE_ID_AMD_KERNCZ_SMBUS, NULL);
> 
> 	if (!pcidev) {
>-		if (!static_cpu_has(X86_FEATURE_PROC_FEEDBACK))
>+		if (!boot_cpu_has(X86_FEATURE_PROC_FEEDBACK))
> 			return -ENODEV;
> 	}
> 
>diff --git a/drivers/cpufreq/intel_pstate.c
>b/drivers/cpufreq/intel_pstate.c
>index dd66decf2087..9bbc3dfdebe3 100644
>--- a/drivers/cpufreq/intel_pstate.c
>+++ b/drivers/cpufreq/intel_pstate.c
>@@ -520,7 +520,7 @@ static s16 intel_pstate_get_epb(struct cpudata
>*cpu_data)
> 	u64 epb;
> 	int ret;
> 
>-	if (!static_cpu_has(X86_FEATURE_EPB))
>+	if (!boot_cpu_has(X86_FEATURE_EPB))
> 		return -ENXIO;
> 
> 	ret = rdmsrl_on_cpu(cpu_data->cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
>@@ -534,7 +534,7 @@ static s16 intel_pstate_get_epp(struct cpudata
>*cpu_data, u64 hwp_req_data)
> {
> 	s16 epp;
> 
>-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
>+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
> 		/*
> 		 * When hwp_req_data is 0, means that caller didn't read
> 		 * MSR_HWP_REQUEST, so need to read and get EPP.
>@@ -559,7 +559,7 @@ static int intel_pstate_set_epb(int cpu, s16 pref)
> 	u64 epb;
> 	int ret;
> 
>-	if (!static_cpu_has(X86_FEATURE_EPB))
>+	if (!boot_cpu_has(X86_FEATURE_EPB))
> 		return -ENXIO;
> 
> 	ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
>@@ -607,7 +607,7 @@ static int
>intel_pstate_get_energy_pref_index(struct cpudata *cpu_data)
> 	if (epp < 0)
> 		return epp;
> 
>-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
>+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
> 		if (epp == HWP_EPP_PERFORMANCE)
> 			return 1;
> 		if (epp <= HWP_EPP_BALANCE_PERFORMANCE)
>@@ -616,7 +616,7 @@ static int
>intel_pstate_get_energy_pref_index(struct cpudata *cpu_data)
> 			return 3;
> 		else
> 			return 4;
>-	} else if (static_cpu_has(X86_FEATURE_EPB)) {
>+	} else if (boot_cpu_has(X86_FEATURE_EPB)) {
> 		/*
> 		 * Range:
> 		 *	0x00-0x03	:	Performance
>@@ -644,7 +644,7 @@ static int
>intel_pstate_set_energy_pref_index(struct cpudata *cpu_data,
> 
> 	mutex_lock(&intel_pstate_limits_lock);
> 
>-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
>+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
> 		u64 value;
> 
> 		ret = rdmsrl_on_cpu(cpu_data->cpu, MSR_HWP_REQUEST, &value);
>@@ -819,7 +819,7 @@ static void intel_pstate_hwp_set(unsigned int cpu)
> 		epp = cpu_data->epp_powersave;
> 	}
> update_epp:
>-	if (static_cpu_has(X86_FEATURE_HWP_EPP)) {
>+	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
> 		value &= ~GENMASK_ULL(31, 24);
> 		value |= (u64)epp << 24;
> 	} else {
>@@ -844,7 +844,7 @@ static void intel_pstate_hwp_force_min_perf(int
>cpu)
> 	value |= HWP_MIN_PERF(min_perf);
> 
> 	/* Set EPP/EPB to min */
>-	if (static_cpu_has(X86_FEATURE_HWP_EPP))
>+	if (boot_cpu_has(X86_FEATURE_HWP_EPP))
> 		value |= HWP_ENERGY_PERF_PREFERENCE(HWP_EPP_POWERSAVE);
> 	else
> 		intel_pstate_set_epb(cpu, HWP_EPP_BALANCE_POWERSAVE);
>@@ -1191,7 +1191,7 @@ static void __init
>intel_pstate_sysfs_expose_params(void)
> static void intel_pstate_hwp_enable(struct cpudata *cpudata)
> {
>	/* First disable HWP notification interrupt as we don't process them
>*/
>-	if (static_cpu_has(X86_FEATURE_HWP_NOTIFY))
>+	if (boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
> 		wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
> 
> 	wrmsrl_on_cpu(cpudata->cpu, MSR_PM_ENABLE, 0x1);
>diff --git a/drivers/cpufreq/powernow-k8.c
>b/drivers/cpufreq/powernow-k8.c
>index fb77b39a4ce3..3c12e03fa343 100644
>--- a/drivers/cpufreq/powernow-k8.c
>+++ b/drivers/cpufreq/powernow-k8.c
>@@ -1178,7 +1178,7 @@ static int powernowk8_init(void)
> 	unsigned int i, supported_cpus = 0;
> 	int ret;
> 
>-	if (static_cpu_has(X86_FEATURE_HW_PSTATE)) {
>+	if (boot_cpu_has(X86_FEATURE_HW_PSTATE)) {
> 		__request_acpi_cpufreq();
> 		return -ENODEV;
> 	}

I'm confused here, and as I'm on my phone on an aircraft I can't check the back thread, but am I gathering that this tries to unbreak W^X during module loading by removing functions which use alternatives?

The right thing to do is to apply alternatives before the pages are marked +x-w, like we do with the kernel proper during early boot, if this isn't already the case (sorry again, see above.)

If we *do*, what is the issue here? Although boot_cpu_has() isn't slow (it should in general be possible to reduce to one testb instruction followed by a conditional jump)  it seems that "avoiding an alternatives slot" *should* be a *very* weak reason, and seems to me to look like papering over some other problem.
Borislav Petkov March 7, 2019, 5:06 p.m. UTC | #11
On Thu, Mar 07, 2019 at 08:53:34AM -0800, hpa@zytor.com wrote:
> If we *do*, what is the issue here? Although boot_cpu_has() isn't
> slow (it should in general be possible to reduce to one testb
> instruction followed by a conditional jump) it seems that "avoiding an
> alternatives slot" *should* be a *very* weak reason, and seems to me
> to look like papering over some other problem.

Forget the current thread: this is simply trying to document when to use
static_cpu_has() and when to use boot_cpu_has(). I get asked about it at
least once a month.

And then it is replacing clear slow paths using static_cpu_has() with
boot_cpu_has() because there's purely no need to patch there. And having
a RIP-relative MOV and a JMP is good enough for slow paths.

Makes sense?
Andy Lutomirski March 7, 2019, 8:02 p.m. UTC | #12
On Thu, Mar 7, 2019 at 9:06 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Mar 07, 2019 at 08:53:34AM -0800, hpa@zytor.com wrote:
> > If we *do*, what is the issue here? Although boot_cpu_has() isn't
> > slow (it should in general be possible to reduce to one testb
> > instruction followed by a conditional jump) it seems that "avoiding an
> > alternatives slot" *should* be a *very* weak reason, and seems to me
> > to look like papering over some other problem.
>
> Forget the current thread: this is simply trying to document when to use
> static_cpu_has() and when to use boot_cpu_has(). I get asked about it at
> least once a month.
>
> And then it is replacing clear slow paths using static_cpu_has() with
> boot_cpu_has() because there's purely no need to patch there. And having
> a RIP-relative MOV and a JMP is good enough for slow paths.
>

Should we maybe rename these functions?  static_cpu_has() is at least
reasonably obvious.  But cpu_feature_enabled() is different for
reasons I've never understood, and boot_cpu_has() is IMO terribly
named.  It's not about the boot cpu -- it's about doing the same thing
but with less bloat and less performance.

(And can we maybe collapse cpu_feature_enabled() and static_cpu_has()
into the same function?)

--Andy
Borislav Petkov March 7, 2019, 8:25 p.m. UTC | #13
On Thu, Mar 07, 2019 at 12:02:13PM -0800, Andy Lutomirski wrote:
> Should we maybe rename these functions?  static_cpu_has() is at least
> reasonably obvious.  But cpu_feature_enabled() is different for
> reasons I've never understood, and boot_cpu_has() is IMO terribly
> named.  It's not about the boot cpu -- it's about doing the same thing
> but with less bloat and less performance.

Well, it does test bits in boot_cpu_data. I don't care about "boot" in
the name though so feel free to suggest something better.

> (And can we maybe collapse cpu_feature_enabled() and static_cpu_has()
> into the same function?)

I'm not sure it would be always ok to involve the DISABLED_MASK*
buildtime stuff in the checks. It probably is but it would need careful
auditing to be sure, first.

Patch
diff mbox series

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 76d482a2b716..69f3e650ada8 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -667,15 +667,29 @@  void __init alternative_instructions(void)
  * handlers seeing an inconsistent instruction while you patch.
  */
 void *__init_or_module text_poke_early(void *addr, const void *opcode,
-					      size_t len)
+				       size_t len)
 {
 	unsigned long flags;
-	local_irq_save(flags);
-	memcpy(addr, opcode, len);
-	local_irq_restore(flags);
-	sync_core();
-	/* Could also do a CLFLUSH here to speed up CPU recovery; but
-	   that causes hangs on some VIA CPUs. */
+
+	if (static_cpu_has(X86_FEATURE_NX) &&
+	    is_module_text_address((unsigned long)addr)) {
+		/*
+		 * Modules text is marked initially as non-executable, so the
+		 * code cannot be running and speculative code-fetches are
+		 * prevented. We can just change the code.
+		 */
+		memcpy(addr, opcode, len);
+	} else {
+		local_irq_save(flags);
+		memcpy(addr, opcode, len);
+		local_irq_restore(flags);
+		sync_core();
+
+		/*
+		 * Could also do a CLFLUSH here to speed up CPU recovery; but
+		 * that causes hangs on some VIA CPUs.
+		 */
+	}
 	return addr;
 }
 
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b052e883dd8c..cfa3106faee4 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -87,7 +87,7 @@  void *module_alloc(unsigned long size)
 	p = __vmalloc_node_range(size, MODULE_ALIGN,
 				    MODULES_VADDR + get_module_load_offset(),
 				    MODULES_END, GFP_KERNEL,
-				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				    PAGE_KERNEL, 0, NUMA_NO_NODE,
 				    __builtin_return_address(0));
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index d531d4250bff..9cdfab7f383c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -681,7 +681,6 @@  bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
 
 static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
 {
-	fp->undo_set_mem = 1;
 	set_memory_ro((unsigned long)fp, fp->pages);
 }
 
@@ -694,6 +693,7 @@  static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
 	set_memory_ro((unsigned long)hdr, hdr->pages);
+	set_memory_x((unsigned long)hdr, hdr->pages);
 }
 
 static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
diff --git a/kernel/module.c b/kernel/module.c
index 2ad1b5239910..ae1b77da6a20 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1950,8 +1950,13 @@  void module_enable_ro(const struct module *mod, bool after_init)
 		return;
 
 	frob_text(&mod->core_layout, set_memory_ro);
+	frob_text(&mod->core_layout, set_memory_x);
+
 	frob_rodata(&mod->core_layout, set_memory_ro);
+
 	frob_text(&mod->init_layout, set_memory_ro);
+	frob_text(&mod->init_layout, set_memory_x);
+
 	frob_rodata(&mod->init_layout, set_memory_ro);
 
 	if (after_init)