Message ID | 20230814-memfd-vm-noexec-uapi-fixes-v2-3-7ff9e3e10ba6@cyphar.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | memfd: cleanups for vm.memfd_noexec | expand |
On Mon, Aug 14, 2023 at 06:40:59PM +1000, Aleksa Sarai wrote: > In order to incentivise userspace to switch to passing MFD_EXEC and > MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call > memfd_create() without the new flags. pr_warn_once() is not useful > because on most systems the one warning is burned up during the boot > process (on my system, systemd does this within the first second of > boot) and thus userspace will in practice never see the warnings to push > them to switch to the new flags. > > The original patchset[1] used pr_warn_ratelimited(), however there were > concerns about the degree of spam in the kernel log[2,3]. The resulting > inability to detect every case was flagged as an issue at the time[4]. > > While we could come up with an alternative rate-limiting scheme such as > only outputting the message if vm.memfd_noexec has been modified, or > only outputting the message once for a given task, these alternatives > have downsides that don't make sense given how low-stakes a single > kernel warning message is. Switching to pr_info_ratelimited() instead > should be fine -- it's possible some monitoring tool will be unhappy > with a stream of warning-level messages but there's already plenty of > info-level message spam in dmesg. > > [1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/ > [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/ > [3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/ > [4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/ > > Cc: stable@vger.kernel.org # v6.3+ > Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> > --- Reviewed-by: Christian Brauner <brauner@kernel.org>
On Mon, 14. Aug 18:40, Aleksa Sarai wrote: > In order to incentivise userspace to switch to passing MFD_EXEC and > MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call > memfd_create() without the new flags. pr_warn_once() is not useful > because on most systems the one warning is burned up during the boot > process (on my system, systemd does this within the first second of > boot) and thus userspace will in practice never see the warnings to push > them to switch to the new flags. > > The original patchset[1] used pr_warn_ratelimited(), however there were > concerns about the degree of spam in the kernel log[2,3]. The resulting > inability to detect every case was flagged as an issue at the time[4]. > > While we could come up with an alternative rate-limiting scheme such as > only outputting the message if vm.memfd_noexec has been modified, or > only outputting the message once for a given task, these alternatives > have downsides that don't make sense given how low-stakes a single > kernel warning message is. Switching to pr_info_ratelimited() instead > should be fine -- it's possible some monitoring tool will be unhappy > with a stream of warning-level messages but there's already plenty of > info-level message spam in dmesg. > > [1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/ > [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/ > [3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/ > [4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/ > > Cc: stable@vger.kernel.org # v6.3+ > Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> > --- > mm/memfd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memfd.c b/mm/memfd.c > index d65485c762de..aa46521057ab 100644 > --- a/mm/memfd.c > +++ b/mm/memfd.c > @@ -315,7 +315,7 @@ SYSCALL_DEFINE2(memfd_create, > return -EINVAL; > > if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { > - pr_warn_once( > + pr_info_ratelimited( > "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n", > current->comm, task_pid_nr(current)); > } > > -- > 2.41.0 > Hello Sarai, i got a lot of messages in dmesg with this. DMESG is unuseable with this. [ 1390.349462] __do_sys_memfd_create: 5 callbacks suppressed [ 1390.349468] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.350106] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.350366] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.359390] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.359453] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.848813] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.849425] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.849673] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.857629] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1390.857674] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1404.819637] __do_sys_memfd_create: 105 callbacks suppressed [ 1404.819641] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1404.819950] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1404.820054] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1404.824240] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1404.824279] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.373186] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.373906] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.374131] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.382397] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.382485] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.499581] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.500077] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.500265] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.512772] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1430.512840] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.388519] __do_sys_memfd_create: 60 callbacks suppressed [ 1444.388525] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.389061] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.389335] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.397909] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.397965] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.503514] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.503658] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.503726] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.507841] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1444.507870] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 1449.707966] __do_sys_memfd_create: 25 callbacks suppressed Best regards Damian
On Fri, 1 Sep 2023 07:13:45 +0200 Damian Tometzki <dtometzki@fedoraproject.org> wrote: > > if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { > > - pr_warn_once( > > + pr_info_ratelimited( > > "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n", > > current->comm, task_pid_nr(current)); > > } > > > > -- > > 2.41.0 > > > Hello Sarai, > > i got a lot of messages in dmesg with this. DMESG is unuseable with > this. > [ 1390.349462] __do_sys_memfd_create: 5 callbacks suppressed > [ 1390.349468] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set > [ 1390.350106] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set OK, thanks, I'll revert this. Spamming everyone even harder isn't a good way to get developers to fix their stuff.
On 2023-09-02, Andrew Morton <akpm@linux-foundation.org> wrote: > On Fri, 1 Sep 2023 07:13:45 +0200 Damian Tometzki <dtometzki@fedoraproject.org> wrote: > > > > if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { > > > - pr_warn_once( > > > + pr_info_ratelimited( > > > "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n", > > > current->comm, task_pid_nr(current)); > > > } > > > > > > -- > > > 2.41.0 > > > > > Hello Sarai, > > > > i got a lot of messages in dmesg with this. DMESG is unuseable with > > this. > > [ 1390.349462] __do_sys_memfd_create: 5 callbacks suppressed > > [ 1390.349468] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set > > [ 1390.350106] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set > > OK, thanks, I'll revert this. Spamming everyone even harder isn't a > good way to get developers to fix their stuff. Sorry, I'm on vacation. I will send a follow-up patch to remove this logging entirely -- if we can't do rate-limited logging then logging a single message effectively at boot time makes no sense. I had hoped that this wouldn't be too much (given there is a fair amount of INFO-level spam in the kernel log) but I guess the default ratelimit (5Hz) is too liberal. Perhaps we can re-consider adding some logging in the future, when more programs have migrated. The only other "reasonable" way to reduce the logging would be to add something to task_struct so we only log once per task, but obviously that's massively overkill. (FWIW, I don't think the logging was ever necessary. There's nothing wrong with running an older program that doesn't pass the flags.)
* Andrew Morton: > OK, thanks, I'll revert this. Spamming everyone even harder isn't a > good way to get developers to fix their stuff. Is this really buggy userspace? Are future kernels going to require some of these flags? That's going to break lots of applications which use memfd_create to enable run-time code generation on locked-down systems because it looked like a stable interface (“don't break userspace” and all that). Thanks, Florian
On 2023-09-05, Florian Weimer <fweimer@redhat.com> wrote: > * Andrew Morton: > > > OK, thanks, I'll revert this. Spamming everyone even harder isn't a > > good way to get developers to fix their stuff. > > Is this really buggy userspace? Are future kernels going to require > some of these flags? > > That's going to break lots of applications which use memfd_create to > enable run-time code generation on locked-down systems because it looked > like a stable interface (“don't break userspace” and all that). There is no userspace breakage with the current behaviour and obviously actually requiring these flags to be passed by default would be a pretty clear userspace breakage and would never be merged. The original intention (as far as I can tell -- the logging behaviour came from the original patchset) was to try to incentivise userspace to start passing the flags so that if distributions decide to set vm.memfd_noexec=1 as a default setting you won't end up with programs that _need_ executable memfds (such as container runtimes) crashing unexpectedly. I also suspect there was an aspect of "well, userspace *should* be passing these flags after we've introduced them". I'm sending a patch to just remove this part of the logging because I don't think it makes sense if you can't rate-limit it sanely, and there's probably an argument to be made that it doesn't make sense at all (at least for the default vm.memfd_noexec=0 setting).
diff --git a/mm/memfd.c b/mm/memfd.c index d65485c762de..aa46521057ab 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -315,7 +315,7 @@ SYSCALL_DEFINE2(memfd_create, return -EINVAL; if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { - pr_warn_once( + pr_info_ratelimited( "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n", current->comm, task_pid_nr(current)); }
In order to incentivise userspace to switch to passing MFD_EXEC and MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call memfd_create() without the new flags. pr_warn_once() is not useful because on most systems the one warning is burned up during the boot process (on my system, systemd does this within the first second of boot) and thus userspace will in practice never see the warnings to push them to switch to the new flags. The original patchset[1] used pr_warn_ratelimited(), however there were concerns about the degree of spam in the kernel log[2,3]. The resulting inability to detect every case was flagged as an issue at the time[4]. While we could come up with an alternative rate-limiting scheme such as only outputting the message if vm.memfd_noexec has been modified, or only outputting the message once for a given task, these alternatives have downsides that don't make sense given how low-stakes a single kernel warning message is. Switching to pr_info_ratelimited() instead should be fine -- it's possible some monitoring tool will be unhappy with a stream of warning-level messages but there's already plenty of info-level message spam in dmesg. [1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/ [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/ [3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/ [4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/ Cc: stable@vger.kernel.org # v6.3+ Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> --- mm/memfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)