Message ID | 20190530030531.331779845@linuxfoundation.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
Hi! > [ Upstream commit f2c65fb3221adc6b73b0549fc7ba892022db9797 ] > > When modules and BPF filters are loaded, there is a time window in > which some memory is both writable and executable. An attacker that has > already found another vulnerability (e.g., a dangling pointer) might be > able to exploit this behavior to overwrite kernel code. Prevent having > writable executable PTEs in this stage. > > In addition, avoiding having W+X mappings can also slightly simplify the > patching of modules code on initialization (e.g., by alternatives and > static-key), as would be done in the next patch. This was actually the > main motivation for this patch. > > To avoid having W+X mappings, set them initially as RW (NX) and after > they are set as RO set them as X as well. Setting them as executable is > done as a separate step to avoid one core in which the old PTE is cached > (hence writable), and another which sees the updated PTE (executable), > which would break the W^X protection. First, is this stable material? Yes, it changes something. But if you assume attacker can write into kernel memory during module load, what prevents him to change the module as he sees fit while it is not executable, simply waiting for system to execute it? I don't see security benefit here. > +++ b/arch/x86/kernel/alternative.c > @@ -662,15 +662,29 @@ void __init alternative_instructions(void) > * handlers seeing an inconsistent instruction while you patch. > */ > void *__init_or_module text_poke_early(void *addr, const void *opcode, > - size_t len) > + size_t len) > { > unsigned long flags; > - local_irq_save(flags); > - memcpy(addr, opcode, len); > - local_irq_restore(flags); > - sync_core(); > - /* Could also do a CLFLUSH here to speed up CPU recovery; but > - that causes hangs on some VIA CPUs. */ > + > + if (boot_cpu_has(X86_FEATURE_NX) && > + is_module_text_address((unsigned long)addr)) { > + /* > + * Modules text is marked initially as non-executable, so the > + * code cannot be running and speculative code-fetches are > + * prevented. Just change the code. > + */ > + memcpy(addr, opcode, len); > + } else { > + local_irq_save(flags); > + memcpy(addr, opcode, len); > + local_irq_restore(flags); > + sync_core(); > + > + /* > + * Could also do a CLFLUSH here to speed up CPU recovery; but > + * that causes hangs on some VIA CPUs. > + */ I don't get it. If code can not be running here, it can not be running in the !NX case, either, and we are free to just change it. Speculative execution should not be a problem, either, as CPUs are supposed to mask it, and there are no known bugs in that area. (Plus, I'd not be surprise if speculative execution ignored NX... just saying :-) ) Pavel
> On May 31, 2019, at 3:37 AM, Pavel Machek <pavel@denx.de> wrote: > > Hi! > >> [ Upstream commit f2c65fb3221adc6b73b0549fc7ba892022db9797 ] >> >> When modules and BPF filters are loaded, there is a time window in >> which some memory is both writable and executable. An attacker that has >> already found another vulnerability (e.g., a dangling pointer) might be >> able to exploit this behavior to overwrite kernel code. Prevent having >> writable executable PTEs in this stage. >> >> In addition, avoiding having W+X mappings can also slightly simplify the >> patching of modules code on initialization (e.g., by alternatives and >> static-key), as would be done in the next patch. This was actually the >> main motivation for this patch. >> >> To avoid having W+X mappings, set them initially as RW (NX) and after >> they are set as RO set them as X as well. Setting them as executable is >> done as a separate step to avoid one core in which the old PTE is cached >> (hence writable), and another which sees the updated PTE (executable), >> which would break the W^X protection. > > First, is this stable material? Yes, it changes something. > > But if you assume attacker can write into kernel memory during module > load, what prevents him to change the module as he sees fit while it > is not executable, simply waiting for system to execute it? > > I don't see security benefit here. I agree that at the moment the benefit it limited. I think the benefit would come later, if the module signature check is performed after the module has been write-protected, but before it is actually executed. >> +++ b/arch/x86/kernel/alternative.c >> @@ -662,15 +662,29 @@ void __init alternative_instructions(void) >> * handlers seeing an inconsistent instruction while you patch. >> */ >> void *__init_or_module text_poke_early(void *addr, const void *opcode, >> - size_t len) >> + size_t len) >> { >> unsigned long flags; >> - local_irq_save(flags); >> - memcpy(addr, opcode, len); >> - local_irq_restore(flags); >> - sync_core(); >> - /* Could also do a CLFLUSH here to speed up CPU recovery; but >> - that causes hangs on some VIA CPUs. */ >> + >> + if (boot_cpu_has(X86_FEATURE_NX) && >> + is_module_text_address((unsigned long)addr)) { >> + /* >> + * Modules text is marked initially as non-executable, so the >> + * code cannot be running and speculative code-fetches are >> + * prevented. Just change the code. >> + */ >> + memcpy(addr, opcode, len); >> + } else { >> + local_irq_save(flags); >> + memcpy(addr, opcode, len); >> + local_irq_restore(flags); >> + sync_core(); >> + >> + /* >> + * Could also do a CLFLUSH here to speed up CPU recovery; but >> + * that causes hangs on some VIA CPUs. >> + */ > > I don't get it. If code can not be running here, it can not be running > in the !NX case, either, and we are free to just change > it. Speculative execution should not be a problem, either, as CPUs are > supposed to mask it, and there are no known bugs in that area. (Plus, > I'd not be surprise if speculative execution ignored NX... just saying > :-) ) Yes, the module code should not run, but speculative execution might cause it to be cached in the instruction cache (as unlikely as it might be, but we need to consider malicious users that play with branch predictors). I am unfamiliar with any bug that might cause the CPU to speculatively ignore the NX bit. Without underestimating Intel’s ability to create terrible bugs, I would assume, for now, that it is safe.
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index b9d5e7c9ef43e..918a23704c0c5 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -662,15 +662,29 @@ void __init alternative_instructions(void) * handlers seeing an inconsistent instruction while you patch. */ void *__init_or_module text_poke_early(void *addr, const void *opcode, - size_t len) + size_t len) { unsigned long flags; - local_irq_save(flags); - memcpy(addr, opcode, len); - local_irq_restore(flags); - sync_core(); - /* Could also do a CLFLUSH here to speed up CPU recovery; but - that causes hangs on some VIA CPUs. */ + + if (boot_cpu_has(X86_FEATURE_NX) && + is_module_text_address((unsigned long)addr)) { + /* + * Modules text is marked initially as non-executable, so the + * code cannot be running and speculative code-fetches are + * prevented. Just change the code. + */ + memcpy(addr, opcode, len); + } else { + local_irq_save(flags); + memcpy(addr, opcode, len); + local_irq_restore(flags); + sync_core(); + + /* + * Could also do a CLFLUSH here to speed up CPU recovery; but + * that causes hangs on some VIA CPUs. + */ + } return addr; } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index f58336af095c9..6645f123419c6 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -87,7 +87,7 @@ void *module_alloc(unsigned long size) p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR + get_module_load_offset(), MODULES_END, GFP_KERNEL, - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, + PAGE_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0)); if (p && (kasan_module_alloc(p, size) < 0)) { vfree(p); diff --git a/include/linux/filter.h b/include/linux/filter.h index 6ecef32fd8bcd..d52a7484aeb2d 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -684,6 +684,7 @@ static inline void bpf_prog_unlock_ro(struct bpf_prog *fp) static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { set_memory_ro((unsigned long)hdr, hdr->pages); + set_memory_x((unsigned long)hdr, hdr->pages); } static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr) diff --git a/kernel/module.c b/kernel/module.c index 38bf28b5cc202..f797c6ace7121 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1949,8 +1949,13 @@ void module_enable_ro(const struct module *mod, bool after_init) return; frob_text(&mod->core_layout, set_memory_ro); + frob_text(&mod->core_layout, set_memory_x); + frob_rodata(&mod->core_layout, set_memory_ro); + frob_text(&mod->init_layout, set_memory_ro); + frob_text(&mod->init_layout, set_memory_x); + frob_rodata(&mod->init_layout, set_memory_ro); if (after_init)