diff mbox series

[RFC,v1,1/1] exec: seal system mappings

Message ID 20241004163155.3493183-2-jeffxu@google.com (mailing list archive)
State Superseded
Headers show
Series seal system mappings | expand

Commit Message

Jeff Xu Oct. 4, 2024, 4:31 p.m. UTC
From: Jeff Xu <jeffxu@chromium.org>

Seal vdso, vvar, sigpage, uprobes and vsyscall.

Those mappings are readonly or executable only, sealing can protect
them from ever changing during the life time of the process.

System mappings such as vdso, vvar, and sigpage (for arm) are
generated by the kernel during program initialization. These mappings
are designated as non-writable, and sealing them will prevent them
from ever becoming writeable.

Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [1], thus sealable.

The vdso, vvar, sigpage, and uprobe mappings all invoke the
_install_special_mapping() function. As no other mappings utilize this
function, it is logical to incorporate sealing logic within
_install_special_mapping(). This approach avoids the necessity of
modifying code across various architecture-specific implementations.

The vsyscall mapping, which has its own initialization function, is
sealed in the XONLY case, it seems to be the most common and secure
case of using vsyscall.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the mapping of vdso, vvar, and sigpage during restore
operations. Consequently, this feature cannot be universally enabled
across all systems. To address this, a kernel configuration option has
been introduced to enable or disable this functionality. I tested
CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use
CHECKPOINT_RESTORE, to verify the sealing works.

[1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
 .../admin-guide/kernel-parameters.txt         |  9 ++++
 arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
 fs/exec.c                                     | 53 +++++++++++++++++++
 include/linux/fs.h                            |  1 +
 mm/mmap.c                                     |  1 +
 security/Kconfig                              | 26 +++++++++
 6 files changed, 97 insertions(+), 2 deletions(-)

Comments

Oleg Nesterov Oct. 5, 2024, 8:08 p.m. UTC | #1
On 10/04, jeffxu@chromium.org wrote:
>
> It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> alter the mapping of vdso, vvar, and sigpage during restore
> operations. Consequently, this feature cannot be universally enabled
> across all systems.

Can't review.

But as for uprobes, I'd prefer a simpler patch which doesn't need the new
CONFIG_ and/or kernel boot options, something like the patch below.

And I don't really like the fact that this patch changes the behaviour
of the "generic" _install_special_mapping() helper, but I won't argue.

Oleg.


--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -430,6 +430,8 @@ extern unsigned int kobjsize(const void *objp);
 #ifdef CONFIG_64BIT
 /* VM is sealed, in vm_flags */
 #define VM_SEALED	_BITUL(63)
+#else
+#define VM_SEALED	0
 #endif
 
 /* Bits set in the VMA until the stack is in its final location */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 40ecab0971ff..388373c11593 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1510,7 +1510,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	}
 
 	vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
-				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
+				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_SEALED,
 				&xol_mapping);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
Oleg Nesterov Oct. 5, 2024, 8:20 p.m. UTC | #2
Sorry for the noise, forgot to mention...

On 10/04, jeffxu@chromium.org wrote:
>
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1535,6 +1535,15 @@
>  			Permit 'security.evm' to be updated regardless of
>  			current integrity status.
>  
> +	exec.seal_system_mappings = [KNL]
> +			Format: { never | always }
> +			Seal system mappings: vdso, vvar, sigpage, uprobes,
> +			vsyscall.
> +			This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_*
> +			- 'never':  never seal system mappings.
> +			- 'always': always seal system mappings.
> +			If not specified or invalid, default is the KCONFIG value.

perhaps the documentation should also mention that this new parameter has
no effect if CONFIG_64BIT=n.

Oleg.
Jeff Xu Oct. 7, 2024, 3 p.m. UTC | #3
On Sat, Oct 5, 2024 at 1:21 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Sorry for the noise, forgot to mention...
>
> On 10/04, jeffxu@chromium.org wrote:
> >
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1535,6 +1535,15 @@
> >                       Permit 'security.evm' to be updated regardless of
> >                       current integrity status.
> >
> > +     exec.seal_system_mappings = [KNL]
> > +                     Format: { never | always }
> > +                     Seal system mappings: vdso, vvar, sigpage, uprobes,
> > +                     vsyscall.
> > +                     This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_*
> > +                     - 'never':  never seal system mappings.
> > +                     - 'always': always seal system mappings.
> > +                     If not specified or invalid, default is the KCONFIG value.
>
> perhaps the documentation should also mention that this new parameter has
> no effect if CONFIG_64BIT=n.
Good point, I will add that.

Thanks

>
> Oleg.
>
Jeff Xu Oct. 7, 2024, 3 p.m. UTC | #4
On Sat, Oct 5, 2024 at 1:08 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/04, jeffxu@chromium.org wrote:
> >
> > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > alter the mapping of vdso, vvar, and sigpage during restore
> > operations. Consequently, this feature cannot be universally enabled
> > across all systems.
>
> Can't review.
>
> But as for uprobes, I'd prefer a simpler patch which doesn't need the new
> CONFIG_ and/or kernel boot options, something like the patch below.
>
Ok.

> And I don't really like the fact that this patch changes the behaviour
> of the "generic" _install_special_mapping() helper, but I won't argue.
>
This makes the minimum code change currently.
 If in the future, there is a special mapping type that can't be
sealed, we can refactor _install_special_mapping() to support that.

> Oleg.
>
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -430,6 +430,8 @@ extern unsigned int kobjsize(const void *objp);
>  #ifdef CONFIG_64BIT
>  /* VM is sealed, in vm_flags */
>  #define VM_SEALED      _BITUL(63)
> +#else
> +#define VM_SEALED      0
>  #endif
>
>  /* Bits set in the VMA until the stack is in its final location */
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 40ecab0971ff..388373c11593 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1510,7 +1510,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
>         }
>
>         vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
> -                               VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
> +                               VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_SEALED,
>                                 &xol_mapping);
OK.

>         if (IS_ERR(vma)) {
>                 ret = PTR_ERR(vma);
>
Randy Dunlap Oct. 7, 2024, 11:42 p.m. UTC | #5
Hi Jeff,

On 10/4/24 9:31 AM, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@chromium.org>
> 
> Seal vdso, vvar, sigpage, uprobes and vsyscall.
> 

> 
> Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> ---
>  .../admin-guide/kernel-parameters.txt         |  9 ++++
>  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
>  fs/exec.c                                     | 53 +++++++++++++++++++
>  include/linux/fs.h                            |  1 +
>  mm/mmap.c                                     |  1 +
>  security/Kconfig                              | 26 +++++++++
>  6 files changed, 97 insertions(+), 2 deletions(-)
> 


> diff --git a/security/Kconfig b/security/Kconfig
> index 28e685f53bd1..e289fbb5d676 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -51,6 +51,32 @@ config PROC_MEM_NO_FORCE
>  
>  endchoice
>  
> +choice
> +	prompt "Seal system mappings"
> +	default SEAL_SYSTEM_MAPPINGS_NEVER
> +	help
> +	  Seal system mappings such as vdso, vvar, sigpage, uprobes and
> +	  vsyscall.
> +	  Note: kernel command line exec.seal_system_mappings overwrite this.

	                                                      overwrites

> +
> +config SEAL_SYSTEM_MAPPINGS_NEVER
> +	bool "Traditional behavior - not sealed"
> +	help
> +	  Do not seal system mappings.
> +	  This is default.
> +
> +config SEAL_SYSTEM_MAPPINGS_ALWAYS
> +	bool "Always seal system mappings"
> +	depends on 64BIT
> +	depends on !CHECKPOINT_RESTORE
> +	help
> +	  Seal system mappings such as vdso, vvar, sigpage, uprobes and
> +	  vsyscall.
> +	  Note: CHECKPOINT_RESTORE might relocate vdso mapping during restore,
> +	  and remap will fail if the mapping is sealed, therefore
> +	  !CHECKPOINT_RESTORE is added as dependency.
> +endchoice
> +
>  config SECURITY
>  	bool "Enable different security models"
>  	depends on SYSFS
Jeff Xu Oct. 8, 2024, 4:18 a.m. UTC | #6
On Mon, Oct 7, 2024 at 4:42 PM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> Hi Jeff,
>
> On 10/4/24 9:31 AM, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> >
>
> >
> > Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  9 ++++
> >  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
> >  fs/exec.c                                     | 53 +++++++++++++++++++
> >  include/linux/fs.h                            |  1 +
> >  mm/mmap.c                                     |  1 +
> >  security/Kconfig                              | 26 +++++++++
> >  6 files changed, 97 insertions(+), 2 deletions(-)
> >
>
>
> > diff --git a/security/Kconfig b/security/Kconfig
> > index 28e685f53bd1..e289fbb5d676 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -51,6 +51,32 @@ config PROC_MEM_NO_FORCE
> >
> >  endchoice
> >
> > +choice
> > +     prompt "Seal system mappings"
> > +     default SEAL_SYSTEM_MAPPINGS_NEVER
> > +     help
> > +       Seal system mappings such as vdso, vvar, sigpage, uprobes and
> > +       vsyscall.
> > +       Note: kernel command line exec.seal_system_mappings overwrite this.
>
>                                                               overwrites
fixed.
Thanks

>
> > +
> > +config SEAL_SYSTEM_MAPPINGS_NEVER
> > +     bool "Traditional behavior - not sealed"
> > +     help
> > +       Do not seal system mappings.
> > +       This is default.
> > +
> > +config SEAL_SYSTEM_MAPPINGS_ALWAYS
> > +     bool "Always seal system mappings"
> > +     depends on 64BIT
> > +     depends on !CHECKPOINT_RESTORE
> > +     help
> > +       Seal system mappings such as vdso, vvar, sigpage, uprobes and
> > +       vsyscall.
> > +       Note: CHECKPOINT_RESTORE might relocate vdso mapping during restore,
> > +       and remap will fail if the mapping is sealed, therefore
> > +       !CHECKPOINT_RESTORE is added as dependency.
> > +endchoice
> > +
> >  config SECURITY
> >       bool "Enable different security models"
> >       depends on SYSFS
>
> --
> ~Randy
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8dcec6d1cdb8..871ba308bb04 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1535,6 +1535,15 @@ 
 			Permit 'security.evm' to be updated regardless of
 			current integrity status.
 
+	exec.seal_system_mappings = [KNL]
+			Format: { never | always }
+			Seal system mappings: vdso, vvar, sigpage, uprobes,
+			vsyscall.
+			This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_*
+			- 'never':  never seal system mappings.
+			- 'always': always seal system mappings.
+			If not specified or invalid, default is the KCONFIG value.
+
 	early_page_ext [KNL,EARLY] Enforces page_ext initialization to earlier
 			stages so cover more early boot allocations.
 			Please note that as side effect some optimizations
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 2fb7d53cf333..20a3000550d2 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -32,6 +32,7 @@ 
 #include <linux/mm_types.h>
 #include <linux/syscalls.h>
 #include <linux/ratelimit.h>
+#include <linux/fs.h>
 
 #include <asm/vsyscall.h>
 #include <asm/unistd.h>
@@ -366,8 +367,12 @@  void __init map_vsyscall(void)
 		set_vsyscall_pgtable_user_bits(swapper_pg_dir);
 	}
 
-	if (vsyscall_mode == XONLY)
-		vm_flags_init(&gate_vma, VM_EXEC);
+	if (vsyscall_mode == XONLY) {
+		unsigned long vm_flags = VM_EXEC;
+
+		update_seal_exec_system_mappings(&vm_flags);
+		vm_flags_init(&gate_vma, vm_flags);
+	}
 
 	BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
 		     (unsigned long)VSYSCALL_ADDR);
diff --git a/fs/exec.c b/fs/exec.c
index 6c53920795c2..bd4d687c37b1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -68,6 +68,7 @@ 
 #include <linux/user_events.h>
 #include <linux/rseq.h>
 #include <linux/ksm.h>
+#include <linux/fs_parser.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -2169,3 +2170,55 @@  fs_initcall(init_fs_exec_sysctls);
 #ifdef CONFIG_EXEC_KUNIT_TEST
 #include "tests/exec_kunit.c"
 #endif
+
+#ifdef CONFIG_64BIT
+/*
+ * Kernel cmdline overwrite for CONFIG_SEAL_SYSTEM_MAPPINGS_X
+ */
+enum seal_system_mappings_type {
+	SEAL_SYSTEM_MAPPINGS_NEVER,
+	SEAL_SYSTEM_MAPPINGS_ALWAYS
+};
+
+static enum seal_system_mappings_type seal_system_mappings __ro_after_init =
+	IS_ENABLED(CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS) ? SEAL_SYSTEM_MAPPINGS_ALWAYS :
+	SEAL_SYSTEM_MAPPINGS_NEVER;
+
+static const struct constant_table value_table_sys_mapping[] __initconst = {
+	{ "never", SEAL_SYSTEM_MAPPINGS_NEVER},
+	{ "always", SEAL_SYSTEM_MAPPINGS_ALWAYS},
+	{ }
+};
+
+static int __init early_seal_system_mappings_override(char *buf)
+{
+	if (!buf)
+		return -EINVAL;
+
+	seal_system_mappings = lookup_constant(value_table_sys_mapping,
+			buf, seal_system_mappings);
+
+	return 0;
+}
+
+early_param("exec.seal_system_mappings", early_seal_system_mappings_override);
+
+static bool seal_system_mappings_enabled(void)
+{
+	if (seal_system_mappings == SEAL_SYSTEM_MAPPINGS_ALWAYS)
+		return true;
+
+	return false;
+}
+
+void update_seal_exec_system_mappings(unsigned long *vm_flags)
+{
+	if (seal_system_mappings_enabled())
+		*vm_flags |= VM_SEALED;
+
+}
+#else
+void update_seal_exec_system_mappings(unsigned long *vm_flags)
+{
+}
+#endif /* CONFIG_64BIT */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1e25267e2e48..7539e89ccd03 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3075,6 +3075,7 @@  ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos);
 extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *);
 extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *);
 extern struct file * open_exec(const char *);
+extern void update_seal_exec_system_mappings(unsigned long *vm_flags);
  
 /* fs/dcache.c -- generic fs support functions */
 extern bool is_subdir(struct dentry *, struct dentry *);
diff --git a/mm/mmap.c b/mm/mmap.c
index ee8f91eaadb9..eb158e39ce0d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2115,6 +2115,7 @@  struct vm_area_struct *_install_special_mapping(
 	unsigned long addr, unsigned long len,
 	unsigned long vm_flags, const struct vm_special_mapping *spec)
 {
+	update_seal_exec_system_mappings(&vm_flags);
 	return __install_special_mapping(mm, addr, len, vm_flags, (void *)spec,
 					&special_mapping_vmops);
 }
diff --git a/security/Kconfig b/security/Kconfig
index 28e685f53bd1..e289fbb5d676 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -51,6 +51,32 @@  config PROC_MEM_NO_FORCE
 
 endchoice
 
+choice
+	prompt "Seal system mappings"
+	default SEAL_SYSTEM_MAPPINGS_NEVER
+	help
+	  Seal system mappings such as vdso, vvar, sigpage, uprobes and
+	  vsyscall.
+	  Note: kernel command line exec.seal_system_mappings overwrite this.
+
+config SEAL_SYSTEM_MAPPINGS_NEVER
+	bool "Traditional behavior - not sealed"
+	help
+	  Do not seal system mappings.
+	  This is default.
+
+config SEAL_SYSTEM_MAPPINGS_ALWAYS
+	bool "Always seal system mappings"
+	depends on 64BIT
+	depends on !CHECKPOINT_RESTORE
+	help
+	  Seal system mappings such as vdso, vvar, sigpage, uprobes and
+	  vsyscall.
+	  Note: CHECKPOINT_RESTORE might relocate vdso mapping during restore,
+	  and remap will fail if the mapping is sealed, therefore
+	  !CHECKPOINT_RESTORE is added as dependency.
+endchoice
+
 config SECURITY
 	bool "Enable different security models"
 	depends on SYSFS