Message ID | 20241004163155.3493183-1-jeffxu@google.com (mailing list archive) |
---|---|
Headers | show |
Series | seal system mappings | expand |
* jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]: > From: Jeff Xu <jeffxu@google.com> > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > Those mappings are readonly or executable only, sealing can protect > them from ever changing during the life time of the process. > > System mappings such as vdso, vvar, and sigpage (for arm) are > generated by the kernel during program initialization. These mappings > are designated as non-writable, and sealing them will prevent them > from ever becoming writeable. But it also means they cannot be unmapped, right? I'm not saying it's a thing people should, but recent conversations with the ppc people seem to indicate that people do 'things' to the vdso such as removing it. Won't this change mean they cannot do that, at least if mseal is enabled on ppc64? In which case we would have a different special mapping for powerpc, or any other platform that wants to be able to unmap the vdso (or vvar or whatever else?) In fact, I came across people removing the vdso to catch callers to those functions which they didn't want to allow. In this case enabling the security of mseal would not allow them to stop applications from vdso calls. Again, I'm not saying this is a good (or bad) idea but it happening. > > Unlike the aforementioned mappings, the uprobe mapping is not > established during program startup. However, its lifetime is the same > as the process's lifetime [1], thus sealable. > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > _install_special_mapping() function. As no other mappings utilize this > function, it is logical to incorporate sealing logic within > _install_special_mapping(). This approach avoids the necessity of > modifying code across various architecture-specific implementations. > > The vsyscall mapping, which has its own initialization function, is > sealed in the XONLY case, it seems to be the most common and secure > case of using vsyscall. > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > alter the mapping of vdso, vvar, and sigpage during restore > operations. Consequently, this feature cannot be universally enabled > across all systems. To address this, a kernel configuration option has > been introduced to enable or disable this functionality. I tested > CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use > CHECKPOINT_RESTORE, to verify the sealing works. I am hesitant to say that CRIU is the only user of moving the vdso, as the ppc people wanted the ability for the fallback methods to still function when the vdso was unmapped. I am not sure we can change the user expected behaviour based on a configuration option; users may be able to mmap/munmap but may not be able to boot their own kernel, but maybe it's okay? > > [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ > > Jeff Xu (1): > exec: seal system mappings > > .../admin-guide/kernel-parameters.txt | 9 ++++ > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > fs/exec.c | 53 +++++++++++++++++++ > include/linux/fs.h | 1 + > mm/mmap.c | 1 + > security/Kconfig | 26 +++++++++ > 6 files changed, 97 insertions(+), 2 deletions(-) > > -- > 2.47.0.rc0.187.ge670bccf7e-goog >
Hi Liam, On Mon, Oct 7, 2024 at 7:19 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]: > > From: Jeff Xu <jeffxu@google.com> > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > Those mappings are readonly or executable only, sealing can protect > > them from ever changing during the life time of the process. > > > > System mappings such as vdso, vvar, and sigpage (for arm) are > > generated by the kernel during program initialization. These mappings > > are designated as non-writable, and sealing them will prevent them > > from ever becoming writeable. > > But it also means they cannot be unmapped, right? > > I'm not saying it's a thing people should, but recent conversations > with the ppc people seem to indicate that people do 'things' to the vdso > such as removing it. > > Won't this change mean they cannot do that, at least if mseal is enabled > on ppc64? In which case we would have a different special mapping for > powerpc, or any other platform that wants to be able to unmap the vdso > (or vvar or whatever else?) > > In fact, I came across people removing the vdso to catch callers to > those functions which they didn't want to allow. In this case enabling > the security of mseal would not allow them to stop applications from > vdso calls. Again, I'm not saying this is a good (or bad) idea but it > happening. > > > > > Unlike the aforementioned mappings, the uprobe mapping is not > > established during program startup. However, its lifetime is the same > > as the process's lifetime [1], thus sealable. > > > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > > _install_special_mapping() function. As no other mappings utilize this > > function, it is logical to incorporate sealing logic within > > _install_special_mapping(). This approach avoids the necessity of > > modifying code across various architecture-specific implementations. > > > > The vsyscall mapping, which has its own initialization function, is > > sealed in the XONLY case, it seems to be the most common and secure > > case of using vsyscall. > > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > > alter the mapping of vdso, vvar, and sigpage during restore > > operations. Consequently, this feature cannot be universally enabled > > across all systems. To address this, a kernel configuration option has > > been introduced to enable or disable this functionality. I tested > > CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use > > CHECKPOINT_RESTORE, to verify the sealing works. > > I am hesitant to say that CRIU is the only user of moving the vdso, as > the ppc people wanted the ability for the fallback methods to still > function when the vdso was unmapped. > > I am not sure we can change the user expected behaviour based on a > configuration option; users may be able to mmap/munmap but may not be > able to boot their own kernel, but maybe it's okay? > The text doesn't say CRIU is the **only** feature that is not compatible with this. The default config is "CONFIG_SEAL_SYSTEM_MAPPINGS_NEVER", and distribution needs to opt-in for this feature, such as ChromeOS or Android or other safe-by-default systems that doesn't allow to unmap or remap vdso in production build. Thanks -Jeff > > > > [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ > > > > Jeff Xu (1): > > exec: seal system mappings > > > > .../admin-guide/kernel-parameters.txt | 9 ++++ > > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > > fs/exec.c | 53 +++++++++++++++++++ > > include/linux/fs.h | 1 + > > mm/mmap.c | 1 + > > security/Kconfig | 26 +++++++++ > > 6 files changed, 97 insertions(+), 2 deletions(-) > > > > -- > > 2.47.0.rc0.187.ge670bccf7e-goog > >
* Jeff Xu <jeffxu@chromium.org> [241008 11:01]: > Hi Liam, > > On Mon, Oct 7, 2024 at 7:19 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > * jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]: > > > From: Jeff Xu <jeffxu@google.com> > > > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > > > Those mappings are readonly or executable only, sealing can protect > > > them from ever changing during the life time of the process. > > > > > > System mappings such as vdso, vvar, and sigpage (for arm) are > > > generated by the kernel during program initialization. These mappings > > > are designated as non-writable, and sealing them will prevent them > > > from ever becoming writeable. > > > > But it also means they cannot be unmapped, right? > > > > I'm not saying it's a thing people should, but recent conversations > > with the ppc people seem to indicate that people do 'things' to the vdso > > such as removing it. > > > > Won't this change mean they cannot do that, at least if mseal is enabled > > on ppc64? In which case we would have a different special mapping for > > powerpc, or any other platform that wants to be able to unmap the vdso > > (or vvar or whatever else?) > > > > In fact, I came across people removing the vdso to catch callers to > > those functions which they didn't want to allow. In this case enabling > > the security of mseal would not allow them to stop applications from > > vdso calls. Again, I'm not saying this is a good (or bad) idea but it > > happening. > > > > > > > > Unlike the aforementioned mappings, the uprobe mapping is not > > > established during program startup. However, its lifetime is the same > > > as the process's lifetime [1], thus sealable. > > > > > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > > > _install_special_mapping() function. As no other mappings utilize this > > > function, it is logical to incorporate sealing logic within > > > _install_special_mapping(). This approach avoids the necessity of > > > modifying code across various architecture-specific implementations. > > > > > > The vsyscall mapping, which has its own initialization function, is > > > sealed in the XONLY case, it seems to be the most common and secure > > > case of using vsyscall. > > > > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > > > alter the mapping of vdso, vvar, and sigpage during restore > > > operations. Consequently, this feature cannot be universally enabled > > > across all systems. To address this, a kernel configuration option has > > > been introduced to enable or disable this functionality. I tested > > > CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use > > > CHECKPOINT_RESTORE, to verify the sealing works. > > > > I am hesitant to say that CRIU is the only user of moving the vdso, as > > the ppc people wanted the ability for the fallback methods to still > > function when the vdso was unmapped. > > > > I am not sure we can change the user expected behaviour based on a > > configuration option; users may be able to mmap/munmap but may not be > > able to boot their own kernel, but maybe it's okay? > > > The text doesn't say CRIU is the **only** feature that is not > compatible with this. Fair enough. I read it that way since you pointed out breaking criu is the reason for not enabling this by default, although it's probably the biggest reason against doing this. > > The default config is "CONFIG_SEAL_SYSTEM_MAPPINGS_NEVER", and > distribution needs to opt-in for this feature, such as ChromeOS or > Android or other safe-by-default systems that doesn't allow to unmap > or remap vdso in production build. Okay, but you never stated that they can't be unmapped or remapped in this change; just that they will never become writeable. It is worth adding that detail in the description since it isn't entirely obvious unless you know the workings of mseal. > > Thanks > -Jeff > > > > > > > > [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ > > > > > > Jeff Xu (1): > > > exec: seal system mappings > > > > > > .../admin-guide/kernel-parameters.txt | 9 ++++ > > > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > > > fs/exec.c | 53 +++++++++++++++++++ > > > include/linux/fs.h | 1 + > > > mm/mmap.c | 1 + > > > security/Kconfig | 26 +++++++++ > > > 6 files changed, 97 insertions(+), 2 deletions(-) > > > > > > -- > > > 2.47.0.rc0.187.ge670bccf7e-goog > > >
On Tue, Oct 8, 2024 at 5:42 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * Jeff Xu <jeffxu@chromium.org> [241008 11:01]: > > Hi Liam, > > > > On Mon, Oct 7, 2024 at 7:19 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > > > * jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]: > > > > From: Jeff Xu <jeffxu@google.com> > > > > > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > > > > > Those mappings are readonly or executable only, sealing can protect > > > > them from ever changing during the life time of the process. > > > > > > > > System mappings such as vdso, vvar, and sigpage (for arm) are > > > > generated by the kernel during program initialization. These mappings > > > > are designated as non-writable, and sealing them will prevent them > > > > from ever becoming writeable. > > > > > > But it also means they cannot be unmapped, right? > > > > > > I'm not saying it's a thing people should, but recent conversations > > > with the ppc people seem to indicate that people do 'things' to the vdso > > > such as removing it. > > > > > > Won't this change mean they cannot do that, at least if mseal is enabled > > > on ppc64? In which case we would have a different special mapping for > > > powerpc, or any other platform that wants to be able to unmap the vdso > > > (or vvar or whatever else?) > > > > > > In fact, I came across people removing the vdso to catch callers to > > > those functions which they didn't want to allow. In this case enabling > > > the security of mseal would not allow them to stop applications from > > > vdso calls. Again, I'm not saying this is a good (or bad) idea but it > > > happening. > > > > > > > > > > > Unlike the aforementioned mappings, the uprobe mapping is not > > > > established during program startup. However, its lifetime is the same > > > > as the process's lifetime [1], thus sealable. > > > > > > > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > > > > _install_special_mapping() function. As no other mappings utilize this > > > > function, it is logical to incorporate sealing logic within > > > > _install_special_mapping(). This approach avoids the necessity of > > > > modifying code across various architecture-specific implementations. > > > > > > > > The vsyscall mapping, which has its own initialization function, is > > > > sealed in the XONLY case, it seems to be the most common and secure > > > > case of using vsyscall. > > > > > > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > > > > alter the mapping of vdso, vvar, and sigpage during restore > > > > operations. Consequently, this feature cannot be universally enabled > > > > across all systems. To address this, a kernel configuration option has > > > > been introduced to enable or disable this functionality. I tested > > > > CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use > > > > CHECKPOINT_RESTORE, to verify the sealing works. > > > > > > I am hesitant to say that CRIU is the only user of moving the vdso, as > > > the ppc people wanted the ability for the fallback methods to still > > > function when the vdso was unmapped. > > > > > > I am not sure we can change the user expected behaviour based on a > > > configuration option; users may be able to mmap/munmap but may not be > > > able to boot their own kernel, but maybe it's okay? > > > > > The text doesn't say CRIU is the **only** feature that is not > > compatible with this. > > Fair enough. > > I read it that way since you pointed out breaking criu is the reason for > not enabling this by default, although it's probably the biggest reason > against doing this. > > > > > The default config is "CONFIG_SEAL_SYSTEM_MAPPINGS_NEVER", and > > distribution needs to opt-in for this feature, such as ChromeOS or > > Android or other safe-by-default systems that doesn't allow to unmap > > or remap vdso in production build. > > Okay, but you never stated that they can't be unmapped or remapped in > this change; just that they will never become writeable. It is worth > adding that detail in the description since it isn't entirely obvious > unless you know the workings of mseal. > Thanks, I will improve this section by adding more details on memory sealing or maybe point to the mseal.rst document. > > > > Thanks > > -Jeff > > > > > > > > > > > > [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ > > > > > > > > Jeff Xu (1): > > > > exec: seal system mappings > > > > > > > > .../admin-guide/kernel-parameters.txt | 9 ++++ > > > > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > > > > fs/exec.c | 53 +++++++++++++++++++ > > > > include/linux/fs.h | 1 + > > > > mm/mmap.c | 1 + > > > > security/Kconfig | 26 +++++++++ > > > > 6 files changed, 97 insertions(+), 2 deletions(-) > > > > > > > > -- > > > > 2.47.0.rc0.187.ge670bccf7e-goog > > > >
From: Jeff Xu <jeffxu@google.com> Seal vdso, vvar, sigpage, uprobes and vsyscall. Those mappings are readonly or executable only, sealing can protect them from ever changing during the life time of the process. System mappings such as vdso, vvar, and sigpage (for arm) are generated by the kernel during program initialization. These mappings are designated as non-writable, and sealing them will prevent them from ever becoming writeable. Unlike the aforementioned mappings, the uprobe mapping is not established during program startup. However, its lifetime is the same as the process's lifetime [1], thus sealable. The vdso, vvar, sigpage, and uprobe mappings all invoke the _install_special_mapping() function. As no other mappings utilize this function, it is logical to incorporate sealing logic within _install_special_mapping(). This approach avoids the necessity of modifying code across various architecture-specific implementations. The vsyscall mapping, which has its own initialization function, is sealed in the XONLY case, it seems to be the most common and secure case of using vsyscall. It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may alter the mapping of vdso, vvar, and sigpage during restore operations. Consequently, this feature cannot be universally enabled across all systems. To address this, a kernel configuration option has been introduced to enable or disable this functionality. I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use CHECKPOINT_RESTORE, to verify the sealing works. [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ Jeff Xu (1): exec: seal system mappings .../admin-guide/kernel-parameters.txt | 9 ++++ arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- fs/exec.c | 53 +++++++++++++++++++ include/linux/fs.h | 1 + mm/mmap.c | 1 + security/Kconfig | 26 +++++++++ 6 files changed, 97 insertions(+), 2 deletions(-)