mbox series

[v7,0/7] mseal system mappings

Message ID 20250224225246.3712295-1-jeffxu@google.com (mailing list archive)
Headers show
Series mseal system mappings | expand

Message

Jeff Xu Feb. 24, 2025, 10:52 p.m. UTC
From: Jeff Xu <jeffxu@chromium.org>

This is V7 version, addressing comments from V6, without code logic
change.

--------------------------------------------------

History:
V7:
 - Remove cover letter from the first patch (Liam R. Howlett)
 - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
 - logging and fclose() in selftest (Liam R. Howlett)

V6:
  https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/

V5
  https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/

V4:
  https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/

V3:
  https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/

V2:
  https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/

V1:
  https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/

--------------------------------------------------
As discussed during mseal() upstream process [1], mseal() protects
the VMAs of a given virtual memory range against modifications, such
as the read/write (RW) and no-execute (NX) bits. For complete
descriptions of memory sealing, please see mseal.rst [2].

The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped.

The system mappings are readonly only, memory sealing can protect
them from ever changing to writable or unmmap/remapped as different
attributes.

System mappings such as vdso, vvar, and sigpage (arm), vectors (arm)
are created by the kernel during program initialization, and could
be sealed after creation.

Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [3]. It could be sealed from creation.

The vsyscall on x86-64 uses a special address (0xffffffffff600000),
which is outside the mm managed range. This means mprotect, munmap, and
mremap won't work on the vsyscall. Since sealing doesn't enhance
the vsyscall's security, it is skipped in this patch. If we ever seal
the vsyscall, it is probably only for decorative purpose, i.e. showing
the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.

To support mseal of system mappings, architectures must define
CONFIG_ARCH_HAS_MSEAL_SYSTEM_MAPPINGS and update their special mappings
calls to pass mseal flag. Additionally, architectures must confirm they
do not unmap/remap system mappings during the process lifetime.

In this version, we've improved the handling of system mapping sealing from
previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.

Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
  than its actual size). The dynamic linker, which relies on PT_LOAD
  information to determine mapping size, would then split and partially
  seal the vdso mapping. Since each architecture has its own vdso/vvar
  code, fixing this in the kernel would require going through each
  archiecture. Our initial goal was to enable sealing readonly mappings,
  e.g. .text, across all architectures, sealing vdso from kernel since
  creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length
  information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker,
  and there is no effective solution for them.

This feature's security enhancements will benefit ChromeOS, Android,
and other high security systems.

Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
  i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
  Android to ensure the sealing doesn't affect the functionality of
  Chromebook and Android phone.

I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
  normal operations such as browsing the web, open/edit doc are OK.

In addition, Benjamin Berg tested this on UML.

Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]




Jeff Xu (7):
  mseal, system mappings: kernel config and header change
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal, system mappings: enable x86-64
  mseal, system mappings: enable arm64
  mseal, system mappings: enable uml architecture
  mseal, system mappings: uprobe mapping
  mseal, system mappings: update mseal.rst

 Documentation/userspace-api/mseal.rst         |  7 +++
 arch/arm64/Kconfig                            |  1 +
 arch/arm64/kernel/vdso.c                      | 22 +++++++---
 arch/um/Kconfig                               |  1 +
 arch/x86/Kconfig                              |  1 +
 arch/x86/entry/vdso/vma.c                     | 16 ++++---
 arch/x86/um/vdso/vma.c                        |  6 ++-
 include/linux/mm.h                            | 10 +++++
 init/Kconfig                                  | 18 ++++++++
 kernel/events/uprobes.c                       |  5 ++-
 security/Kconfig                              | 18 ++++++++
 .../testing/selftests/x86/test_mremap_vdso.c  | 43 +++++++++++++++++++
 12 files changed, 132 insertions(+), 16 deletions(-)

Comments

Pedro Falcato Feb. 24, 2025, 11:03 p.m. UTC | #1
On Mon, Feb 24, 2025 at 10:52 PM <jeffxu@chromium.org> wrote:
>
> From: Jeff Xu <jeffxu@chromium.org>
>
> This is V7 version, addressing comments from V6, without code logic
> change.
>
> --------------------------------------------------
>
> History:
> V7:
>  - Remove cover letter from the first patch (Liam R. Howlett)
>  - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
>  - logging and fclose() in selftest (Liam R. Howlett)

Jeff, please don't send out new versions of the patchset that quickly.
We were having a discussion on v5, you sent v6 today (acceptable) and
now v7 (while changing barely anything of note). It's hard to track
things this way, and you're just flooding a bunch of mailboxes.

Thanks,
Pedro
Jeff Xu Feb. 24, 2025, 11:07 p.m. UTC | #2
On Mon, Feb 24, 2025 at 3:03 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Mon, Feb 24, 2025 at 10:52 PM <jeffxu@chromium.org> wrote:
> >
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > This is V7 version, addressing comments from V6, without code logic
> > change.
> >
> > --------------------------------------------------
> >
> > History:
> > V7:
> >  - Remove cover letter from the first patch (Liam R. Howlett)
> >  - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
> >  - logging and fclose() in selftest (Liam R. Howlett)
>
> Jeff, please don't send out new versions of the patchset that quickly.
> We were having a discussion on v5, you sent v6 today (acceptable) and
> now v7 (while changing barely anything of note). It's hard to track
> things this way, and you're just flooding a bunch of mailboxes.
>
Ah, I apologize. Sure.

-Jeff


> Thanks,
> Pedro
Lorenzo Stoakes Feb. 25, 2025, 6:09 a.m. UTC | #3
On Mon, Feb 24, 2025 at 03:07:03PM -0800, Jeff Xu wrote:
> On Mon, Feb 24, 2025 at 3:03 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > On Mon, Feb 24, 2025 at 10:52 PM <jeffxu@chromium.org> wrote:
> > >
> > > From: Jeff Xu <jeffxu@chromium.org>
> > >
> > > This is V7 version, addressing comments from V6, without code logic
> > > change.
> > >
> > > --------------------------------------------------
> > >
> > > History:
> > > V7:
> > >  - Remove cover letter from the first patch (Liam R. Howlett)
> > >  - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
> > >  - logging and fclose() in selftest (Liam R. Howlett)
> >
> > Jeff, please don't send out new versions of the patchset that quickly.
> > We were having a discussion on v5, you sent v6 today (acceptable) and
> > now v7 (while changing barely anything of note). It's hard to track
> > things this way, and you're just flooding a bunch of mailboxes.
> >
> Ah, I apologize. Sure.
>
> -Jeff

Thanks, I am behind the eight ball on this in a post-viral fatigue haze, so I'd
especially appreciate relaxing a bit on series pace here haha.

I mean being reasonable I don't want you to feel you're needing to be told 'ok
send vX' now, but I'd simply suggest - wait until things 'settle down' a bit on
comments + everything's addressed and the 'usual suspects' have commented, then
this is a good time to send next version.

I realise I'm maybe not well placed to say this as I've previouisly been famous
for resending WAY too quick haha, but it's something I've worked on myself so I
guess, I relate...

>
>
> > Thanks,
> > Pedro

Thanks!
Lorenzo Stoakes Feb. 25, 2025, 10:32 a.m. UTC | #4
BTW can we please drop the 'mseal, system mappings' prefixes on this
series, it's really weird and makes it really hard for me to actually read
the individual summary lines for each commit. 'mseal:' will do.

I mean really you could argue it's 'mm: mseal: ...' but I'm not quite
_that_ pedantic :)
Lorenzo Stoakes Feb. 25, 2025, 3:18 p.m. UTC | #5
Jeff - looking further in this series, I asked for a couple things for this
series which you've not provided:

1. Some assurance based on code that the kernel-side code doesn't rely on
   VDSO/VVAR etc. mapping. I gave up waiting for this and went and checked
   myself, it looks fine for arm64, x86-64. But it might have been nice had
   you done it :) Apologies if you had and I just missed it.

2. Tests - could you please add some tests to assert that mremap() fails
   for VDSO for instance? You've edited an existing test for VDSO in x86 to
   skip the test should this be enabled, but this is not the same as a self
   test. And obviously doesn't cover arm64.

   This should be relatively strightforward right? You already have code
   for finding out whether VDSO is msealed, so just use that to see if you
   skip, then attempt mremap(), mmap() over etc. + assert it fails.

   Ideally these tests would cover all the cases you've changed.

Please do try to ensure you address requests from maintainers to save on
iterations, while I get the desire to shoot out new versions (I've been
guilty of this in the past), it makes life so much easier and will reduce
version count if you try to get everything done in a one go.

Having said the above, we're really not far off this being a viable
series. You just need to address comments here (+ in v6...) + provide some
tests.

Thanks!

On Mon, Feb 24, 2025 at 10:52:39PM +0000, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@chromium.org>
>
> This is V7 version, addressing comments from V6, without code logic
> change.
>
> --------------------------------------------------
>
> History:
> V7:
>  - Remove cover letter from the first patch (Liam R. Howlett)
>  - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
>  - logging and fclose() in selftest (Liam R. Howlett)
>
> V6:
>   https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/
>

Nitty, but it's really useful to actually include the history for all of
these.

> V5
>   https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/
>
> V4:
>   https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
>
> V3:
>   https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
>
> V2:
>   https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
>
> V1:
>   https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
>
> --------------------------------------------------
> As discussed during mseal() upstream process [1], mseal() protects
> the VMAs of a given virtual memory range against modifications, such
> as the read/write (RW) and no-execute (NX) bits. For complete
> descriptions of memory sealing, please see mseal.rst [2].
>
> The mseal() is useful to mitigate memory corruption issues where a
> corrupted pointer is passed to a memory management system. For
> example, such an attacker primitive can break control-flow integrity
> guarantees since read-only memory that is supposed to be trusted can
> become writable or .text pages can get remapped.
>
> The system mappings are readonly only, memory sealing can protect
> them from ever changing to writable or unmmap/remapped as different
> attributes.
>
> System mappings such as vdso, vvar, and sigpage (arm), vectors (arm)
> are created by the kernel during program initialization, and could
> be sealed after creation.
>
> Unlike the aforementioned mappings, the uprobe mapping is not
> established during program startup. However, its lifetime is the same
> as the process's lifetime [3]. It could be sealed from creation.
>
> The vsyscall on x86-64 uses a special address (0xffffffffff600000),
> which is outside the mm managed range. This means mprotect, munmap, and
> mremap won't work on the vsyscall. Since sealing doesn't enhance
> the vsyscall's security, it is skipped in this patch. If we ever seal
> the vsyscall, it is probably only for decorative purpose, i.e. showing
> the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
>
> It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> alter the system mappings during restore operations. UML(User Mode Linux)
> and gVisor, rr are also known to change the vdso/vvar mappings.
> Consequently, this feature cannot be universally enabled across all
> systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
>
> To support mseal of system mappings, architectures must define
> CONFIG_ARCH_HAS_MSEAL_SYSTEM_MAPPINGS and update their special mappings
> calls to pass mseal flag. Additionally, architectures must confirm they
> do not unmap/remap system mappings during the process lifetime.
>
> In this version, we've improved the handling of system mapping sealing from
> previous versions, instead of modifying the _install_special_mapping
> function itself, which would affect all architectures, we now call
> _install_special_mapping with a sealing flag only within the specific
> architecture that requires it. This targeted approach offers two key
> advantages: 1) It limits the code change's impact to the necessary
> architectures, and 2) It aligns with the software architecture by keeping
> the core memory management within the mm layer, while delegating the
> decision of sealing system mappings to the individual architecture, which
> is particularly relevant since 32-bit architectures never require sealing.
>
> Prior to this patch series, we explored sealing special mappings from
> userspace using glibc's dynamic linker. This approach revealed several
> issues:
> - The PT_LOAD header may report an incorrect length for vdso, (smaller
>   than its actual size). The dynamic linker, which relies on PT_LOAD
>   information to determine mapping size, would then split and partially
>   seal the vdso mapping. Since each architecture has its own vdso/vvar
>   code, fixing this in the kernel would require going through each
>   archiecture. Our initial goal was to enable sealing readonly mappings,
>   e.g. .text, across all architectures, sealing vdso from kernel since
>   creation appears to be simpler than sealing vdso at glibc.
> - The [vvar] mapping header only contains address information, not length
>   information. Similar issues might exist for other special mappings.
> - Mappings like uprobe are not covered by the dynamic linker,
>   and there is no effective solution for them.
>
> This feature's security enhancements will benefit ChromeOS, Android,
> and other high security systems.
>
> Testing:
> This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
> - Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
>   i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
> - Passing various automation tests (e.g. pre-checkin) on ChromeOS and
>   Android to ensure the sealing doesn't affect the functionality of
>   Chromebook and Android phone.
>
> I also tested the feature on Ubuntu on x86-64:
> - With config disabled, vdso/vvar is not sealed,
> - with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
>   normal operations such as browsing the web, open/edit doc are OK.
>
> In addition, Benjamin Berg tested this on UML.
>
> Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
> Link: Documentation/userspace-api/mseal.rst [2]
> Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
>
>
>
>
> Jeff Xu (7):
>   mseal, system mappings: kernel config and header change
>   selftests: x86: test_mremap_vdso: skip if vdso is msealed
>   mseal, system mappings: enable x86-64
>   mseal, system mappings: enable arm64
>   mseal, system mappings: enable uml architecture
>   mseal, system mappings: uprobe mapping
>   mseal, system mappings: update mseal.rst
>
>  Documentation/userspace-api/mseal.rst         |  7 +++
>  arch/arm64/Kconfig                            |  1 +
>  arch/arm64/kernel/vdso.c                      | 22 +++++++---
>  arch/um/Kconfig                               |  1 +
>  arch/x86/Kconfig                              |  1 +
>  arch/x86/entry/vdso/vma.c                     | 16 ++++---
>  arch/x86/um/vdso/vma.c                        |  6 ++-
>  include/linux/mm.h                            | 10 +++++
>  init/Kconfig                                  | 18 ++++++++
>  kernel/events/uprobes.c                       |  5 ++-
>  security/Kconfig                              | 18 ++++++++
>  .../testing/selftests/x86/test_mremap_vdso.c  | 43 +++++++++++++++++++
>  12 files changed, 132 insertions(+), 16 deletions(-)
>
> --
> 2.48.1.658.g4767266eb4-goog
>