Message ID | 20241004163155.3493183-2-jeffxu@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | seal system mappings | expand |
On 10/04, jeffxu@chromium.org wrote: > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > alter the mapping of vdso, vvar, and sigpage during restore > operations. Consequently, this feature cannot be universally enabled > across all systems. Can't review. But as for uprobes, I'd prefer a simpler patch which doesn't need the new CONFIG_ and/or kernel boot options, something like the patch below. And I don't really like the fact that this patch changes the behaviour of the "generic" _install_special_mapping() helper, but I won't argue. Oleg. --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -430,6 +430,8 @@ extern unsigned int kobjsize(const void *objp); #ifdef CONFIG_64BIT /* VM is sealed, in vm_flags */ #define VM_SEALED _BITUL(63) +#else +#define VM_SEALED 0 #endif /* Bits set in the VMA until the stack is in its final location */ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 40ecab0971ff..388373c11593 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1510,7 +1510,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) } vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE, - VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, + VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_SEALED, &xol_mapping); if (IS_ERR(vma)) { ret = PTR_ERR(vma);
Sorry for the noise, forgot to mention... On 10/04, jeffxu@chromium.org wrote: > > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1535,6 +1535,15 @@ > Permit 'security.evm' to be updated regardless of > current integrity status. > > + exec.seal_system_mappings = [KNL] > + Format: { never | always } > + Seal system mappings: vdso, vvar, sigpage, uprobes, > + vsyscall. > + This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_* > + - 'never': never seal system mappings. > + - 'always': always seal system mappings. > + If not specified or invalid, default is the KCONFIG value. perhaps the documentation should also mention that this new parameter has no effect if CONFIG_64BIT=n. Oleg.
On Sat, Oct 5, 2024 at 1:21 PM Oleg Nesterov <oleg@redhat.com> wrote: > > Sorry for the noise, forgot to mention... > > On 10/04, jeffxu@chromium.org wrote: > > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1535,6 +1535,15 @@ > > Permit 'security.evm' to be updated regardless of > > current integrity status. > > > > + exec.seal_system_mappings = [KNL] > > + Format: { never | always } > > + Seal system mappings: vdso, vvar, sigpage, uprobes, > > + vsyscall. > > + This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_* > > + - 'never': never seal system mappings. > > + - 'always': always seal system mappings. > > + If not specified or invalid, default is the KCONFIG value. > > perhaps the documentation should also mention that this new parameter has > no effect if CONFIG_64BIT=n. Good point, I will add that. Thanks > > Oleg. >
On Sat, Oct 5, 2024 at 1:08 PM Oleg Nesterov <oleg@redhat.com> wrote: > > On 10/04, jeffxu@chromium.org wrote: > > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > > alter the mapping of vdso, vvar, and sigpage during restore > > operations. Consequently, this feature cannot be universally enabled > > across all systems. > > Can't review. > > But as for uprobes, I'd prefer a simpler patch which doesn't need the new > CONFIG_ and/or kernel boot options, something like the patch below. > Ok. > And I don't really like the fact that this patch changes the behaviour > of the "generic" _install_special_mapping() helper, but I won't argue. > This makes the minimum code change currently. If in the future, there is a special mapping type that can't be sealed, we can refactor _install_special_mapping() to support that. > Oleg. > > > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -430,6 +430,8 @@ extern unsigned int kobjsize(const void *objp); > #ifdef CONFIG_64BIT > /* VM is sealed, in vm_flags */ > #define VM_SEALED _BITUL(63) > +#else > +#define VM_SEALED 0 > #endif > > /* Bits set in the VMA until the stack is in its final location */ > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 40ecab0971ff..388373c11593 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -1510,7 +1510,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) > } > > vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE, > - VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, > + VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_SEALED, > &xol_mapping); OK. > if (IS_ERR(vma)) { > ret = PTR_ERR(vma); >
Hi Jeff, On 10/4/24 9:31 AM, jeffxu@chromium.org wrote: > From: Jeff Xu <jeffxu@chromium.org> > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > Signed-off-by: Jeff Xu <jeffxu@chromium.org> > --- > .../admin-guide/kernel-parameters.txt | 9 ++++ > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > fs/exec.c | 53 +++++++++++++++++++ > include/linux/fs.h | 1 + > mm/mmap.c | 1 + > security/Kconfig | 26 +++++++++ > 6 files changed, 97 insertions(+), 2 deletions(-) > > diff --git a/security/Kconfig b/security/Kconfig > index 28e685f53bd1..e289fbb5d676 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -51,6 +51,32 @@ config PROC_MEM_NO_FORCE > > endchoice > > +choice > + prompt "Seal system mappings" > + default SEAL_SYSTEM_MAPPINGS_NEVER > + help > + Seal system mappings such as vdso, vvar, sigpage, uprobes and > + vsyscall. > + Note: kernel command line exec.seal_system_mappings overwrite this. overwrites > + > +config SEAL_SYSTEM_MAPPINGS_NEVER > + bool "Traditional behavior - not sealed" > + help > + Do not seal system mappings. > + This is default. > + > +config SEAL_SYSTEM_MAPPINGS_ALWAYS > + bool "Always seal system mappings" > + depends on 64BIT > + depends on !CHECKPOINT_RESTORE > + help > + Seal system mappings such as vdso, vvar, sigpage, uprobes and > + vsyscall. > + Note: CHECKPOINT_RESTORE might relocate vdso mapping during restore, > + and remap will fail if the mapping is sealed, therefore > + !CHECKPOINT_RESTORE is added as dependency. > +endchoice > + > config SECURITY > bool "Enable different security models" > depends on SYSFS
On Mon, Oct 7, 2024 at 4:42 PM Randy Dunlap <rdunlap@infradead.org> wrote: > > Hi Jeff, > > On 10/4/24 9:31 AM, jeffxu@chromium.org wrote: > > From: Jeff Xu <jeffxu@chromium.org> > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > > > > Signed-off-by: Jeff Xu <jeffxu@chromium.org> > > --- > > .../admin-guide/kernel-parameters.txt | 9 ++++ > > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > > fs/exec.c | 53 +++++++++++++++++++ > > include/linux/fs.h | 1 + > > mm/mmap.c | 1 + > > security/Kconfig | 26 +++++++++ > > 6 files changed, 97 insertions(+), 2 deletions(-) > > > > > > diff --git a/security/Kconfig b/security/Kconfig > > index 28e685f53bd1..e289fbb5d676 100644 > > --- a/security/Kconfig > > +++ b/security/Kconfig > > @@ -51,6 +51,32 @@ config PROC_MEM_NO_FORCE > > > > endchoice > > > > +choice > > + prompt "Seal system mappings" > > + default SEAL_SYSTEM_MAPPINGS_NEVER > > + help > > + Seal system mappings such as vdso, vvar, sigpage, uprobes and > > + vsyscall. > > + Note: kernel command line exec.seal_system_mappings overwrite this. > > overwrites fixed. Thanks > > > + > > +config SEAL_SYSTEM_MAPPINGS_NEVER > > + bool "Traditional behavior - not sealed" > > + help > > + Do not seal system mappings. > > + This is default. > > + > > +config SEAL_SYSTEM_MAPPINGS_ALWAYS > > + bool "Always seal system mappings" > > + depends on 64BIT > > + depends on !CHECKPOINT_RESTORE > > + help > > + Seal system mappings such as vdso, vvar, sigpage, uprobes and > > + vsyscall. > > + Note: CHECKPOINT_RESTORE might relocate vdso mapping during restore, > > + and remap will fail if the mapping is sealed, therefore > > + !CHECKPOINT_RESTORE is added as dependency. > > +endchoice > > + > > config SECURITY > > bool "Enable different security models" > > depends on SYSFS > > -- > ~Randy
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 8dcec6d1cdb8..871ba308bb04 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1535,6 +1535,15 @@ Permit 'security.evm' to be updated regardless of current integrity status. + exec.seal_system_mappings = [KNL] + Format: { never | always } + Seal system mappings: vdso, vvar, sigpage, uprobes, + vsyscall. + This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_* + - 'never': never seal system mappings. + - 'always': always seal system mappings. + If not specified or invalid, default is the KCONFIG value. + early_page_ext [KNL,EARLY] Enforces page_ext initialization to earlier stages so cover more early boot allocations. Please note that as side effect some optimizations diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index 2fb7d53cf333..20a3000550d2 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -32,6 +32,7 @@ #include <linux/mm_types.h> #include <linux/syscalls.h> #include <linux/ratelimit.h> +#include <linux/fs.h> #include <asm/vsyscall.h> #include <asm/unistd.h> @@ -366,8 +367,12 @@ void __init map_vsyscall(void) set_vsyscall_pgtable_user_bits(swapper_pg_dir); } - if (vsyscall_mode == XONLY) - vm_flags_init(&gate_vma, VM_EXEC); + if (vsyscall_mode == XONLY) { + unsigned long vm_flags = VM_EXEC; + + update_seal_exec_system_mappings(&vm_flags); + vm_flags_init(&gate_vma, vm_flags); + } BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) != (unsigned long)VSYSCALL_ADDR); diff --git a/fs/exec.c b/fs/exec.c index 6c53920795c2..bd4d687c37b1 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -68,6 +68,7 @@ #include <linux/user_events.h> #include <linux/rseq.h> #include <linux/ksm.h> +#include <linux/fs_parser.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -2169,3 +2170,55 @@ fs_initcall(init_fs_exec_sysctls); #ifdef CONFIG_EXEC_KUNIT_TEST #include "tests/exec_kunit.c" #endif + +#ifdef CONFIG_64BIT +/* + * Kernel cmdline overwrite for CONFIG_SEAL_SYSTEM_MAPPINGS_X + */ +enum seal_system_mappings_type { + SEAL_SYSTEM_MAPPINGS_NEVER, + SEAL_SYSTEM_MAPPINGS_ALWAYS +}; + +static enum seal_system_mappings_type seal_system_mappings __ro_after_init = + IS_ENABLED(CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS) ? SEAL_SYSTEM_MAPPINGS_ALWAYS : + SEAL_SYSTEM_MAPPINGS_NEVER; + +static const struct constant_table value_table_sys_mapping[] __initconst = { + { "never", SEAL_SYSTEM_MAPPINGS_NEVER}, + { "always", SEAL_SYSTEM_MAPPINGS_ALWAYS}, + { } +}; + +static int __init early_seal_system_mappings_override(char *buf) +{ + if (!buf) + return -EINVAL; + + seal_system_mappings = lookup_constant(value_table_sys_mapping, + buf, seal_system_mappings); + + return 0; +} + +early_param("exec.seal_system_mappings", early_seal_system_mappings_override); + +static bool seal_system_mappings_enabled(void) +{ + if (seal_system_mappings == SEAL_SYSTEM_MAPPINGS_ALWAYS) + return true; + + return false; +} + +void update_seal_exec_system_mappings(unsigned long *vm_flags) +{ + if (seal_system_mappings_enabled()) + *vm_flags |= VM_SEALED; + +} +#else +void update_seal_exec_system_mappings(unsigned long *vm_flags) +{ +} +#endif /* CONFIG_64BIT */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 1e25267e2e48..7539e89ccd03 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3075,6 +3075,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos); extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *); extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *); extern struct file * open_exec(const char *); +extern void update_seal_exec_system_mappings(unsigned long *vm_flags); /* fs/dcache.c -- generic fs support functions */ extern bool is_subdir(struct dentry *, struct dentry *); diff --git a/mm/mmap.c b/mm/mmap.c index ee8f91eaadb9..eb158e39ce0d 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2115,6 +2115,7 @@ struct vm_area_struct *_install_special_mapping( unsigned long addr, unsigned long len, unsigned long vm_flags, const struct vm_special_mapping *spec) { + update_seal_exec_system_mappings(&vm_flags); return __install_special_mapping(mm, addr, len, vm_flags, (void *)spec, &special_mapping_vmops); } diff --git a/security/Kconfig b/security/Kconfig index 28e685f53bd1..e289fbb5d676 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -51,6 +51,32 @@ config PROC_MEM_NO_FORCE endchoice +choice + prompt "Seal system mappings" + default SEAL_SYSTEM_MAPPINGS_NEVER + help + Seal system mappings such as vdso, vvar, sigpage, uprobes and + vsyscall. + Note: kernel command line exec.seal_system_mappings overwrite this. + +config SEAL_SYSTEM_MAPPINGS_NEVER + bool "Traditional behavior - not sealed" + help + Do not seal system mappings. + This is default. + +config SEAL_SYSTEM_MAPPINGS_ALWAYS + bool "Always seal system mappings" + depends on 64BIT + depends on !CHECKPOINT_RESTORE + help + Seal system mappings such as vdso, vvar, sigpage, uprobes and + vsyscall. + Note: CHECKPOINT_RESTORE might relocate vdso mapping during restore, + and remap will fail if the mapping is sealed, therefore + !CHECKPOINT_RESTORE is added as dependency. +endchoice + config SECURITY bool "Enable different security models" depends on SYSFS