Message ID | 20250212032155.1276806-1-jeffxu@google.com (mailing list archive) |
---|---|
Headers | show |
Series | mseal system mappings | expand |
On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > From: Jeff Xu <jeffxu@chromium.org> > > The commit message in the first patch contains the full description of > this series. Sorry to nit, but it'd be useful to reproduce in the cover letter too! But this obviously isn't urgent, just be nice when we un-RFC. Thanks for sending as RFC, appreciated, keen to figure out a way forward with this series and this gives us space to discuss. One thing that came up recently with the LWN article (...!) was that rr is also impacted by this [0]. I think with this behind a config flag we're fine (this refers to my 'opt-in' comment in the reply on LWN) as my concerns about this being enabled in a broken way without an explicit kernel configuration are addressed, and actually we do expose a means by which a user can detect if the VDSO for instance is sealed via /proc/$pid/[s]maps. So tools like rr and such can be updated to check for this. I wonder if we ought to try to liaise with the known problematic ones? It'd be nice to update the documentation to have a list of 'known problematic userland software with sealed VDSO' so we make people aware. Hopefully we are acheiving the opt-in nature of the thing here, but it makes me wonder whether we need a prctl() interface to optionally disable even if the system has it enabled as a whole. That way, rr for instance can just turn it off for debugging purposes. To be clear I am not trying to add additional extranous tasks here - my one and only motive is to ensure that the feature works and we address concerns about any possible breakage. And I _want the series to land_ :>) I suspect we're close now. I am tied up with a number of other tasks at the moment so apologies if I take a second to get back to this series, but just wanted to say thanks for addressing my various points, and that I will definitely provide review (it's on the whiteboard, the only true guarantee I will do something :P). I will also come back to your testing series which I owe you review on, which is equally on the same whiteboard... Thanks, Lorenzo [0]:https://lwn.net/Articles/1007984/ > > ------------------ > History: > > V5 > - Remove kernel cmd line (Lorenzo Stoakes) > - Add test info (Lorenzo Stoakes) > - Add threat model info (Lorenzo Stoakes) > - Fix x86 selftest: test_mremap_vdso > - Restrict code change to ARM64/x86-64/UM arch only. > - Add userprocess.h to include seal_system_mapping(). > - Remove sealing vsyscall. > - Split the patch. > > V4: > https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/ > > V3: > https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/ > > V2: > https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/ > > V1: > https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/ > > Jeff Xu (7): > mseal, system mappings: kernel config and header change > selftests: x86: test_mremap_vdso: skip if vdso is msealed > mseal, system mappings: enable x86-64 > mseal, system mappings: enable arm64 > mseal, system mappings: enable uml architecture > mseal, system mappings: uprobe mapping > mseal, system mappings: update mseal.rst > > Documentation/userspace-api/mseal.rst | 5 +++ > arch/arm64/Kconfig | 1 + > arch/arm64/kernel/vdso.c | 23 +++++++---- > arch/um/Kconfig | 1 + > arch/x86/Kconfig | 1 + > arch/x86/entry/vdso/vma.c | 17 ++++++--- > arch/x86/um/vdso/vma.c | 7 +++- > include/linux/userprocess.h | 18 +++++++++ > init/Kconfig | 18 +++++++++ > kernel/events/uprobes.c | 6 ++- > security/Kconfig | 18 +++++++++ > .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++ > 12 files changed, 137 insertions(+), 16 deletions(-) > create mode 100644 include/linux/userprocess.h > > -- > 2.48.1.502.g6dc24dfdaf-goog >
On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > From: Jeff Xu <jeffxu@chromium.org> > > > > The commit message in the first patch contains the full description of > > this series. > > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But > this obviously isn't urgent, just be nice when we un-RFC. > > Thanks for sending as RFC, appreciated, keen to figure out a way forward > with this series and this gives us space to discuss. > > One thing that came up recently with the LWN article (...!) was that rr is > also impacted by this [0]. > > I think with this behind a config flag we're fine (this refers to my > 'opt-in' comment in the reply on LWN) as my concerns about this being > enabled in a broken way without an explicit kernel configuration are > addressed, and actually we do expose a means by which a user can detect if > the VDSO for instance is sealed via /proc/$pid/[s]maps. > > So tools like rr and such can be updated to check for this. I wonder if we > ought to try to liaise with the known problematic ones? > > It'd be nice to update the documentation to have a list of 'known > problematic userland software with sealed VDSO' so we make people aware. > > Hopefully we are acheiving the opt-in nature of the thing here, but it > makes me wonder whether we need a prctl() interface to optionally disable > even if the system has it enabled as a whole. Just noting that (as we discussed off-list) doing prctl() would not work, because that would effectively be an munseal for those vdso regions. Possibly something like a personality() flag (that's *not* inherited when AT_SECURE/secureexec). But personalities have other issues... FWIW, although it would (at the moment) be hard to pull off in the libc, I still much prefer it to playing these weird games with CONFIG options and kernel command line options and prctl and personality and whatnot. It seems to me like we're trying to stick policy where it doesn't belong.
(sorry I really am struggling to reply to mail as lore still seems to be broken). On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote: > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > The commit message in the first patch contains the full description of > > > this series. > > > > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But > > this obviously isn't urgent, just be nice when we un-RFC. > > > > Thanks for sending as RFC, appreciated, keen to figure out a way forward > > with this series and this gives us space to discuss. > > > > One thing that came up recently with the LWN article (...!) was that rr is > > also impacted by this [0]. > > > > I think with this behind a config flag we're fine (this refers to my > > 'opt-in' comment in the reply on LWN) as my concerns about this being > > enabled in a broken way without an explicit kernel configuration are > > addressed, and actually we do expose a means by which a user can detect if > > the VDSO for instance is sealed via /proc/$pid/[s]maps. > > > > So tools like rr and such can be updated to check for this. I wonder if we > > ought to try to liaise with the known problematic ones? > > > > It'd be nice to update the documentation to have a list of 'known > > problematic userland software with sealed VDSO' so we make people aware. > > > > Hopefully we are acheiving the opt-in nature of the thing here, but it > > makes me wonder whether we need a prctl() interface to optionally disable > > even if the system has it enabled as a whole. > > Just noting that (as we discussed off-list) doing prctl() would not > work, because that would effectively be an munseal for those vdso > regions. > Possibly something like a personality() flag (that's *not* inherited > when AT_SECURE/secureexec). But personalities have other issues... Thanks, yeah that's a good point, it would have to be implemented as a personality or something similar otherwise you're essentially relying on 'unsealing' which can't be permitted. I'm not sure how useful that'd be for the likes of rr though. But I suppose if it makes everything exec'd by a child inherit it then maybe that works for a debugging session etc.? > > FWIW, although it would (at the moment) be hard to pull off in the > libc, I still much prefer it to playing these weird games with CONFIG > options and kernel command line options and prctl and personality and > whatnot. It seems to me like we're trying to stick policy where it > doesn't belong. The problem is, as a security feature, you don't want to make it trivially easy to disable. I mean we _need_ a config option to be able to strictly enforce only making the feature enable-able on architectures and configuration option combinations that work. But if there is userspace that will be broken, we really have to have some way of avoiding the disconnect between somebody making policy decision at the kernel level and somebody trying to run something. Because I can easily envision somebody enabling this as a 'good security feature' for a distro release or such, only for somebody else to later try rr, CRIU, or whatever else and for it to just not work or fail subtly and to have no idea why. I mean one option is to have it as a CONFIG_ flag _and_ you have to enable it via a tunable, so then it can become sysctl.d policy for instance. The CONFIG_ flag dependency is critical because we don't want to enable this on arches that have not been tested against it. It's vital at any rate that we document everywhere we can that _this might break some userland that depends on remapping the VDSO_. > > -- > Pedro
On Wed, 2025-02-12 at 14:01 +0000, Lorenzo Stoakes wrote: > Thanks, yeah that's a good point, it would have to be implemented as a > personality or something similar otherwise you're essentially relying on > 'unsealing' which can't be permitted. > > I'm not sure how useful that'd be for the likes of rr though. But I suppose > if it makes everything exec'd by a child inherit it then maybe that works > for a debugging session etc.? For whatever that's worth, ARCH=um should not need 'unsealing' or 'not sealing' it for *itself*, but rather only for the *children* it starts, which are for the userspace processes inside of it. Which I suppose could actually start without a VDSO in the first place, but I don't think that's possible now? Which I'll note should not have access to the host, so in a way this outer security feature (sealing) breaks the inner ARCH=um security, I think. johannes
On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote: > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > From: Jeff Xu <jeffxu@chromium.org> > > > > The commit message in the first patch contains the full description of > > this series. > > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But > this obviously isn't urgent, just be nice when we un-RFC. I advised Jeff against this because I've found it can sometimes cause "thread splitting" in that some people reply to the cover letter, and some people reply to the first patch, etc. I've tended to try to keep cover letters very general, with the bulk of the prose in the first patch. > It'd be nice to update the documentation to have a list of 'known > problematic userland software with sealed VDSO' so we make people aware. I like this idea! Probably in mseal.rst, as the Kconfig help already points there. -Kees
On Wed, Feb 12, 2025 at 3:24 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > It'd be nice to update the documentation to have a list of 'known > problematic userland software with sealed VDSO' so we make people aware. > Sure. It will be added in the next version. > > And I _want the series to land_ :>) I suspect we're close now. > Thank you for reviewing the patch and giving support. > I am tied up with a number of other tasks at the moment so apologies if I > take a second to get back to this series, but just wanted to say thanks for > addressing my various points, and that I will definitely provide review > (it's on the whiteboard, the only true guarantee I will do something :P). > > I will also come back to your testing series which I owe you review on, > which is equally on the same whiteboard... > Thank you for the heads up. -Jeff
On Wed, Feb 12, 2025 at 2:05 PM Kees Cook <kees@kernel.org> wrote: > > > It'd be nice to update the documentation to have a list of 'known > > problematic userland software with sealed VDSO' so we make people aware. > > I like this idea! Probably in mseal.rst, as the Kconfig help already > points there. > Will update mseal.rst to include the above suggestion in the next version. Thanks -Jeff > -Kees > > -- > Kees Cook
* Kees Cook <kees@kernel.org> [250212 17:05]: > On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote: > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > The commit message in the first patch contains the full description of > > > this series. > > > > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But > > this obviously isn't urgent, just be nice when we un-RFC. > > I advised Jeff against this because I've found it can sometimes cause > "thread splitting" in that some people reply to the cover letter, and > some people reply to the first patch, etc. I've tended to try to keep > cover letters very general, with the bulk of the prose in the first > patch. Interesting idea, but I think thread splitting is less of a concern than diluting the meaning of a patch by including a lengthy change log with a fraction of the text being about the code that follows. I think this is the reason for a cover letter in the first place; not just version control. After all, we could tack the version information into the first patch too and avoid it being in the final commit message. Thanks, Liam
On February 13, 2025 10:35:21 AM PST, "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote: >* Kees Cook <kees@kernel.org> [250212 17:05]: >> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote: >> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: >> > > From: Jeff Xu <jeffxu@chromium.org> >> > > >> > > The commit message in the first patch contains the full description of >> > > this series. >> > >> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But >> > this obviously isn't urgent, just be nice when we un-RFC. >> >> I advised Jeff against this because I've found it can sometimes cause >> "thread splitting" in that some people reply to the cover letter, and >> some people reply to the first patch, etc. I've tended to try to keep >> cover letters very general, with the bulk of the prose in the first >> patch. > >Interesting idea, but I think thread splitting is less of a concern than >diluting the meaning of a patch by including a lengthy change log with a >fraction of the text being about the code that follows. > >I think this is the reason for a cover letter in the first place; not >just version control. After all, we could tack the version information >into the first patch too and avoid it being in the final commit message. Okay, so to be clear: you'd prefer to put the rationales and other stuff in the cover, and put more specific details in the first patch? I've not liked this because cover letters aren't (except for akpm's trees) included anywhere in git, which makes archeology much harder. -Kees
On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > (sorry I really am struggling to reply to mail as lore still seems to be > broken). > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote: > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > > > The commit message in the first patch contains the full description of > > > > this series. > > > > > > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But > > > this obviously isn't urgent, just be nice when we un-RFC. > > > > > > Thanks for sending as RFC, appreciated, keen to figure out a way forward > > > with this series and this gives us space to discuss. > > > > > > One thing that came up recently with the LWN article (...!) was that rr is > > > also impacted by this [0]. > > > > > > I think with this behind a config flag we're fine (this refers to my > > > 'opt-in' comment in the reply on LWN) as my concerns about this being > > > enabled in a broken way without an explicit kernel configuration are > > > addressed, and actually we do expose a means by which a user can detect if > > > the VDSO for instance is sealed via /proc/$pid/[s]maps. > > > > > > So tools like rr and such can be updated to check for this. I wonder if we > > > ought to try to liaise with the known problematic ones? > > > > > > It'd be nice to update the documentation to have a list of 'known > > > problematic userland software with sealed VDSO' so we make people aware. > > > > > > Hopefully we are acheiving the opt-in nature of the thing here, but it > > > makes me wonder whether we need a prctl() interface to optionally disable > > > even if the system has it enabled as a whole. > > > > Just noting that (as we discussed off-list) doing prctl() would not > > work, because that would effectively be an munseal for those vdso > > regions. > > Possibly something like a personality() flag (that's *not* inherited > > when AT_SECURE/secureexec). But personalities have other issues... > > Thanks, yeah that's a good point, it would have to be implemented as a > personality or something similar otherwise you're essentially relying on > 'unsealing' which can't be permitted. > > I'm not sure how useful that'd be for the likes of rr though. But I suppose > if it makes everything exec'd by a child inherit it then maybe that works > for a debugging session etc.? > > > > > FWIW, although it would (at the moment) be hard to pull off in the > > libc, I still much prefer it to playing these weird games with CONFIG > > options and kernel command line options and prctl and personality and > > whatnot. It seems to me like we're trying to stick policy where it > > doesn't belong. > > The problem is, as a security feature, you don't want to make it trivially > easy to disable. > > I mean we _need_ a config option to be able to strictly enforce only making > the feature enable-able on architectures and configuration option > combinations that work. > > But if there is userspace that will be broken, we really have to have some > way of avoiding the disconnect between somebody making policy decision at > the kernel level and somebody trying to run something. > > Because I can easily envision somebody enabling this as a 'good security > feature' for a distro release or such, only for somebody else to later try > rr, CRIU, or whatever else and for it to just not work or fail subtly and > to have no idea why. Ok so I went looking around for the glibc patchset. It seems they're moving away from tunables and there was a nice GNU_PROPERTY_MEMORY_SEAL added to binutils. So my proposal is to parse this property on the binfmt_elf.c side, and mm would use this to know if we should seal these mappings. This seems to tackle compatibility problems, and glibc isn't sealing programs without this program header anyway. Thoughts? > > I mean one option is to have it as a CONFIG_ flag _and_ you have to enable > it via a tunable, so then it can become sysctl.d policy for instance. sysctl is also an option but the idea of dropping a random feature behind a CONFIG_ that's unusable by lots of people (including the general GNU/Linux ecosystem) is really really unappealing to me.
* Kees Cook <kees@kernel.org> [250213 14:34]: > > > On February 13, 2025 10:35:21 AM PST, "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote: > >* Kees Cook <kees@kernel.org> [250212 17:05]: > >> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote: > >> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > >> > > From: Jeff Xu <jeffxu@chromium.org> > >> > > > >> > > The commit message in the first patch contains the full description of > >> > > this series. > >> > > >> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But > >> > this obviously isn't urgent, just be nice when we un-RFC. > >> > >> I advised Jeff against this because I've found it can sometimes cause > >> "thread splitting" in that some people reply to the cover letter, and > >> some people reply to the first patch, etc. I've tended to try to keep > >> cover letters very general, with the bulk of the prose in the first > >> patch. > > > >Interesting idea, but I think thread splitting is less of a concern than > >diluting the meaning of a patch by including a lengthy change log with a > >fraction of the text being about the code that follows. > > > >I think this is the reason for a cover letter in the first place; not > >just version control. After all, we could tack the version information > >into the first patch too and avoid it being in the final commit message. > > Okay, so to be clear: you'd prefer to put the rationales and other stuff in the cover, and put more specific details in the first patch? I've not liked this because cover letters aren't (except for akpm's trees) included anywhere in git, which makes archeology much harder. Yes, rationales in the cover letter. I like the way the akpm's tree does things because it's the best of both worlds. There is also a separation of the cover letter with the actual commit message on the first patch. Having the full cover letter on patch 1 makes it difficult to understand *that* patch on its own during review. I've also gotten emails from Linus asking why in the ____ing ____ I did it this way when I said why in the cover letter.. to that note I like the patches to have _all_ the necessary details for that one patch, including the sometimes "this is changed in the very next patch" lines to spell out in-transit patches, or what ever else is needed from the cover letter/context. Taking this example, we have a 111 line cover letter and a patch that adds a new file with a single function, and two kconfig options. The justification and reason for the patch is in the middle of that huge block of text. That seems silly. That is to say: Cover letters have the rationale and over-arching reason. Patches have more than enough details to code inspect and know why this patch is necessary. Thanks, Liam
On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote: > On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > > > (sorry I really am struggling to reply to mail as lore still seems to be > > broken). > > > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote: > > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > > > > > The commit message in the first patch contains the full description of > > > > > this series. > > > > > > [...] > > > > > > FWIW, although it would (at the moment) be hard to pull off in the > > > libc, I still much prefer it to playing these weird games with CONFIG > > > options and kernel command line options and prctl and personality and > > > whatnot. It seems to me like we're trying to stick policy where it > > > doesn't belong. > > > > The problem is, as a security feature, you don't want to make it trivially > > easy to disable. > > > > I mean we _need_ a config option to be able to strictly enforce only making > > the feature enable-able on architectures and configuration option > > combinations that work. > > > > But if there is userspace that will be broken, we really have to have some > > way of avoiding the disconnect between somebody making policy decision at > > the kernel level and somebody trying to run something. > > > > Because I can easily envision somebody enabling this as a 'good security > > feature' for a distro release or such, only for somebody else to later try > > rr, CRIU, or whatever else and for it to just not work or fail subtly and > > to have no idea why. > > Ok so I went looking around for the glibc patchset. It seems they're > moving away from tunables and there was a nice > GNU_PROPERTY_MEMORY_SEAL added to binutils. > So my proposal is to parse this property on the binfmt_elf.c side, and > mm would use this to know if we should seal these mappings. This seems > to tackle compatibility problems, > and glibc isn't sealing programs without this program header anyway. Thoughts? It seems to me that doing this ties it to the binary, rather than execution context, which may want to seal/not-seal, etc. I have a sense that it's be better as a secure bit, or prctl, or something like that. The properties seem to be better suited for "this binary _can_ do a thing" or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But maybe there's more to this I'm not considering? > > I mean one option is to have it as a CONFIG_ flag _and_ you have to enable > > it via a tunable, so then it can become sysctl.d policy for instance. > > sysctl is also an option but the idea of dropping a random feature > behind a CONFIG_ that's unusable by lots of people (including the > general GNU/Linux ecosystem) is really really unappealing to me. I agree 100%, but I think we need to make small steps. Behind a CONFIG means we get it implemented, and then we can look at how to make it more flexible. I'm motivated to figure this out because I've long wanted to have a boot param to disable CRIU since I have distro systems that I don't use CRIU on, and I don't want the (very small) interface changes it makes available into seccomp filter visibility. And if CRIU could be run-time based, so could system mapping sealing. :)
On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote: > > On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote: > > On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > (sorry I really am struggling to reply to mail as lore still seems to be > > > broken). > > > > > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote: > > > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes > > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > > > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > > > > > > > The commit message in the first patch contains the full description of > > > > > > this series. > > > > > > > > [...] > > > > > > > > FWIW, although it would (at the moment) be hard to pull off in the > > > > libc, I still much prefer it to playing these weird games with CONFIG > > > > options and kernel command line options and prctl and personality and > > > > whatnot. It seems to me like we're trying to stick policy where it > > > > doesn't belong. > > > > > > The problem is, as a security feature, you don't want to make it trivially > > > easy to disable. > > > > > > I mean we _need_ a config option to be able to strictly enforce only making > > > the feature enable-able on architectures and configuration option > > > combinations that work. > > > > > > But if there is userspace that will be broken, we really have to have some > > > way of avoiding the disconnect between somebody making policy decision at > > > the kernel level and somebody trying to run something. > > > > > > Because I can easily envision somebody enabling this as a 'good security > > > feature' for a distro release or such, only for somebody else to later try > > > rr, CRIU, or whatever else and for it to just not work or fail subtly and > > > to have no idea why. > > > > Ok so I went looking around for the glibc patchset. It seems they're > > moving away from tunables and there was a nice > > GNU_PROPERTY_MEMORY_SEAL added to binutils. > > So my proposal is to parse this property on the binfmt_elf.c side, and > > mm would use this to know if we should seal these mappings. This seems > > to tackle compatibility problems, > > and glibc isn't sealing programs without this program header anyway. Thoughts? > > It seems to me that doing this ties it to the binary, rather than > execution context, which may want to seal/not-seal, etc. I have a sense > that it's be better as a secure bit, or prctl, or something like that. The > properties seem to be better suited for "this binary _can_ do a thing" > or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But > maybe there's more to this I'm not considering? Doesn't this exactly kind of Just Work though? "This binary can do/tolerate sealing". I would blindly guess that we don't have very opinionated shared libraries that do this kind of shenanigans unilaterally, so that's probably not something we really need to worry about (though I admittedly need to read through the glibc patchset, and nail down what they're thinking about doing with linking mseal-ready and mseal-non-ready ELF execs/shared objects together). The problem with something like prctl is that we either indirectly provide some kind of limited form of munseal, or we require some sort of handover (like personality(2) + execve(2)), which both sound like a huge PITA and still don't solve any of our backwards compat issues... all binaries would need to be patched with this prctl/personality()/whatever call, and old ones wouldn't work. The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe the toolchain folks could shed some light?), but it sounds like it'd fit perfectly. I suspect we probably want to parse this on the kernel's side anyway (to seal the main program/interp's segments)[1], then extending them to the kernel system mappings should be somewhat trivial... I don't think we'll ever get a program that can't cope with sealing the system mappings but can cope with sealing itself (and if we do, we just won't seal the entire thing and that's _okay_). Deploying mseal-ready programs could then be done in a phased way by distros. e.g chromeOS and android could simply enable the corresponding linker option in LDFLAGS and let it rip. Other more mainstream distros could obviously take a little longer or test/deploy this on all programs not named gVisor and/or after CRIU is okay with all of this. We then might not need a user-configurable CONFIG_ (only an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and everyone is happy. I glanced through libc-alpha again and it seems like the glibc folks also seem to have reached the same idea, but I'd love to hear from Adhemerval. Am I missing anything? [1] we should probably nail this responsibility handover down before glibc msealing (or bionic) makes it to a release. It'd probably be a little nicer if we could mseal these segments from the kernel instead of forcing the libc to take care of this, now that we have this property.
On 18/02/25 20:18, Pedro Falcato wrote: > On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote: >> >> On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote: >>> On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes >>> <lorenzo.stoakes@oracle.com> wrote: >>>> >>>> (sorry I really am struggling to reply to mail as lore still seems to be >>>> broken). >>>> >>>> On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote: >>>>> On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes >>>>> <lorenzo.stoakes@oracle.com> wrote: >>>>>> >>>>>> On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: >>>>>>> From: Jeff Xu <jeffxu@chromium.org> >>>>>>> >>>>>>> The commit message in the first patch contains the full description of >>>>>>> this series. >>>>>> >>>> [...] >>>>> >>>>> FWIW, although it would (at the moment) be hard to pull off in the >>>>> libc, I still much prefer it to playing these weird games with CONFIG >>>>> options and kernel command line options and prctl and personality and >>>>> whatnot. It seems to me like we're trying to stick policy where it >>>>> doesn't belong. >>>> >>>> The problem is, as a security feature, you don't want to make it trivially >>>> easy to disable. >>>> >>>> I mean we _need_ a config option to be able to strictly enforce only making >>>> the feature enable-able on architectures and configuration option >>>> combinations that work. >>>> >>>> But if there is userspace that will be broken, we really have to have some >>>> way of avoiding the disconnect between somebody making policy decision at >>>> the kernel level and somebody trying to run something. >>>> >>>> Because I can easily envision somebody enabling this as a 'good security >>>> feature' for a distro release or such, only for somebody else to later try >>>> rr, CRIU, or whatever else and for it to just not work or fail subtly and >>>> to have no idea why. >>> >>> Ok so I went looking around for the glibc patchset. It seems they're >>> moving away from tunables and there was a nice >>> GNU_PROPERTY_MEMORY_SEAL added to binutils. >>> So my proposal is to parse this property on the binfmt_elf.c side, and >>> mm would use this to know if we should seal these mappings. This seems >>> to tackle compatibility problems, >>> and glibc isn't sealing programs without this program header anyway. Thoughts? >> >> It seems to me that doing this ties it to the binary, rather than >> execution context, which may want to seal/not-seal, etc. I have a sense >> that it's be better as a secure bit, or prctl, or something like that. The >> properties seem to be better suited for "this binary _can_ do a thing" >> or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But >> maybe there's more to this I'm not considering? > > Doesn't this exactly kind of Just Work though? "This binary can > do/tolerate sealing". I would blindly guess that we don't have very > opinionated shared libraries that do this kind of shenanigans > unilaterally, so that's probably not something we really need to worry > about (though I admittedly need to read through the glibc patchset, > and nail down what they're thinking about doing with linking > mseal-ready and mseal-non-ready ELF execs/shared objects together). > The problem with something like prctl is that we either indirectly > provide some kind of limited form of munseal, or we require some sort > of handover (like personality(2) + execve(2)), which both sound like a > huge PITA and still don't solve any of our backwards compat issues... > all binaries would need to be patched with this > prctl/personality()/whatever call, and old ones wouldn't work. > > The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe > the toolchain folks could shed some light?), but it sounds like it'd > fit perfectly. > I suspect we probably want to parse this on the kernel's side anyway > (to seal the main program/interp's segments)[1], then extending them > to the kernel system mappings should be somewhat trivial... > I don't think we'll ever get a program that can't cope with sealing > the system mappings but can cope with sealing itself (and if we do, we > just won't seal the entire thing and that's _okay_). > > Deploying mseal-ready programs could then be done in a phased way by > distros. e.g chromeOS and android could simply enable the > corresponding linker option in LDFLAGS and let it rip. Other more > mainstream distros could obviously take a little longer or test/deploy > this on all programs not named gVisor and/or after CRIU is okay with > all of this. We then might not need a user-configurable CONFIG_ (only > an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and > everyone is happy. > > I glanced through libc-alpha again and it seems like the glibc folks > also seem to have reached the same idea, but I'd love to hear from > Adhemerval. > > Am I missing anything? Hi Pedro, After discussing with CRIU developers, I plan to change how glibc will handle GNU_PROPERTY_MEMORY_SEAL to allow a more smooth enablement in distros (the latest version is at [1]). The idea is only enable memory sealing iff the main binary (either ET_EXEC or PIE) has the new sealing attribute. If it were the case, the loader will still check if the dependency has the attribute and seal accordingly. So if for any reason the binary does not support memory sealing at all, like CRIU for the snapshot restore phase, it just need to be built with -Wl,-z,nomemory-seal. It is also for the case for libraries, although I think this will be rare. I will also work on adding a way to enable partially non-sealing sections, since Florian has hinted that he wants to use on some libgcc metadata (similar to how RELRO '.data.rel.ro' works). My initial plan is just mimic what OpenBSD does with PT_OPENBSD_MUTABLE, which only works for ET_DYN (and I think it should not be a problem). But it is a userland problem, so no kernel support would be required. > > > [1] we should probably nail this responsibility handover down before > glibc msealing (or bionic) makes it to a release. It'd probably be a > little nicer if we could mseal these segments from the kernel instead > of forcing the libc to take care of this, now that we have this > property. > Keep in mind that GNU_PROPERTY_MEMORY_SEAL is currently a glibc extension and I am not sure if other runtime would be willing to adopt as well. So its presence can not be implied that sealing will be applied by runtime. [1] https://patchwork.sourceware.org/project/glibc/list/?series=43524
On Tue, Feb 18, 2025 at 6:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote: > > > > On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote: > > > On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > (sorry I really am struggling to reply to mail as lore still seems to be > > > > broken). > > > > > > > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote: > > > > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes > > > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote: > > > > > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > > > > > > > > > The commit message in the first patch contains the full description of > > > > > > > this series. > > > > > > > > > > [...] > > > > > > > > > > FWIW, although it would (at the moment) be hard to pull off in the > > > > > libc, I still much prefer it to playing these weird games with CONFIG > > > > > options and kernel command line options and prctl and personality and > > > > > whatnot. It seems to me like we're trying to stick policy where it > > > > > doesn't belong. > > > > > > > > The problem is, as a security feature, you don't want to make it trivially > > > > easy to disable. > > > > > > > > I mean we _need_ a config option to be able to strictly enforce only making > > > > the feature enable-able on architectures and configuration option > > > > combinations that work. > > > > > > > > But if there is userspace that will be broken, we really have to have some > > > > way of avoiding the disconnect between somebody making policy decision at > > > > the kernel level and somebody trying to run something. > > > > > > > > Because I can easily envision somebody enabling this as a 'good security > > > > feature' for a distro release or such, only for somebody else to later try > > > > rr, CRIU, or whatever else and for it to just not work or fail subtly and > > > > to have no idea why. > > > > > > Ok so I went looking around for the glibc patchset. It seems they're > > > moving away from tunables and there was a nice > > > GNU_PROPERTY_MEMORY_SEAL added to binutils. > > > So my proposal is to parse this property on the binfmt_elf.c side, and > > > mm would use this to know if we should seal these mappings. This seems > > > to tackle compatibility problems, > > > and glibc isn't sealing programs without this program header anyway. Thoughts? > > > > It seems to me that doing this ties it to the binary, rather than > > execution context, which may want to seal/not-seal, etc. I have a sense > > that it's be better as a secure bit, or prctl, or something like that. The > > properties seem to be better suited for "this binary _can_ do a thing" > > or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But > > maybe there's more to this I'm not considering? > > Doesn't this exactly kind of Just Work though? "This binary can > do/tolerate sealing". I would blindly guess that we don't have very > opinionated shared libraries that do this kind of shenanigans > unilaterally, so that's probably not something we really need to worry > about (though I admittedly need to read through the glibc patchset, > and nail down what they're thinking about doing with linking > mseal-ready and mseal-non-ready ELF execs/shared objects together). > The problem with something like prctl is that we either indirectly > provide some kind of limited form of munseal, or we require some sort > of handover (like personality(2) + execve(2)), which both sound like a > huge PITA and still don't solve any of our backwards compat issues... > all binaries would need to be patched with this > prctl/personality()/whatever call, and old ones wouldn't work. > > The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe > the toolchain folks could shed some light?), but it sounds like it'd > fit perfectly. > I suspect we probably want to parse this on the kernel's side anyway > (to seal the main program/interp's segments)[1], then extending them > to the kernel system mappings should be somewhat trivial... > I don't think we'll ever get a program that can't cope with sealing > the system mappings but can cope with sealing itself (and if we do, we > just won't seal the entire thing and that's _okay_). > > Deploying mseal-ready programs could then be done in a phased way by > distros. e.g chromeOS and android could simply enable the > corresponding linker option in LDFLAGS and let it rip. Other more > mainstream distros could obviously take a little longer or test/deploy > this on all programs not named gVisor and/or after CRIU is okay with > all of this. We then might not need a user-configurable CONFIG_ (only > an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and > everyone is happy. > > I glanced through libc-alpha again and it seems like the glibc folks > also seem to have reached the same idea, but I'd love to hear from > Adhemerval. > > Am I missing anything? Android's a bit interesting because there isn't really a "binary" in the usual sense. an Android app is basically a shared library of JNI code dlopen()ed into a clone() of an already-initialized Java runtime (the "zygote"). that said, i'm not expecting sealing the vdso to be problematic (even if i'm not sure how useful it is to do so, being unaware of any exploit that's ever used this?). for me the tricky part is when it's used for regular user shared libraries which seems the most convincing use case to me, albeit mainly for app compat reasons ["stopping apps from poking at implementation details they shouldn't, which later causes breakage if the implementation detail changes"]. the irony being that it'll _cause_ app compat problems by breaking all such things all at once. so that'll be "fun" for someone to try to roll out! it also breaks dlclose() (unless your dlopen() was with RTLD_NODELETE, which is not the default on Android either). that's fine for OS libraries that have already been loaded by the zygote, but hard to make a default. that said, i do expect games and banking apps to be interested to try this when they hear about it. anyway, in general i think an ELF property per-library sounds reasonable, matching what we have for arm64 BTI. i have no strong opinion on the system mappings and what if anything executables should do, for the reasons given above, but the elf property sounds reasonable. > [1] we should probably nail this responsibility handover down before > glibc msealing (or bionic) makes it to a release. It'd probably be a > little nicer if we could mseal these segments from the kernel instead > of forcing the libc to take care of this, now that we have this > property. > > -- > Pedro
From: Jeff Xu <jeffxu@chromium.org> The commit message in the first patch contains the full description of this series. ------------------ History: V5 - Remove kernel cmd line (Lorenzo Stoakes) - Add test info (Lorenzo Stoakes) - Add threat model info (Lorenzo Stoakes) - Fix x86 selftest: test_mremap_vdso - Restrict code change to ARM64/x86-64/UM arch only. - Add userprocess.h to include seal_system_mapping(). - Remove sealing vsyscall. - Split the patch. V4: https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/ V3: https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/ V2: https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/ V1: https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/ Jeff Xu (7): mseal, system mappings: kernel config and header change selftests: x86: test_mremap_vdso: skip if vdso is msealed mseal, system mappings: enable x86-64 mseal, system mappings: enable arm64 mseal, system mappings: enable uml architecture mseal, system mappings: uprobe mapping mseal, system mappings: update mseal.rst Documentation/userspace-api/mseal.rst | 5 +++ arch/arm64/Kconfig | 1 + arch/arm64/kernel/vdso.c | 23 +++++++---- arch/um/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vma.c | 17 ++++++--- arch/x86/um/vdso/vma.c | 7 +++- include/linux/userprocess.h | 18 +++++++++ init/Kconfig | 18 +++++++++ kernel/events/uprobes.c | 6 ++- security/Kconfig | 18 +++++++++ .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++ 12 files changed, 137 insertions(+), 16 deletions(-) create mode 100644 include/linux/userprocess.h