diff mbox series

[v2,3/5] memfd: improve userspace warnings for missing exec-related flags

Message ID 20230814-memfd-vm-noexec-uapi-fixes-v2-3-7ff9e3e10ba6@cyphar.com (mailing list archive)
State New
Headers show
Series memfd: cleanups for vm.memfd_noexec | expand

Commit Message

Aleksa Sarai Aug. 14, 2023, 8:40 a.m. UTC
In order to incentivise userspace to switch to passing MFD_EXEC and
MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call
memfd_create() without the new flags. pr_warn_once() is not useful
because on most systems the one warning is burned up during the boot
process (on my system, systemd does this within the first second of
boot) and thus userspace will in practice never see the warnings to push
them to switch to the new flags.

The original patchset[1] used pr_warn_ratelimited(), however there were
concerns about the degree of spam in the kernel log[2,3]. The resulting
inability to detect every case was flagged as an issue at the time[4].

While we could come up with an alternative rate-limiting scheme such as
only outputting the message if vm.memfd_noexec has been modified, or
only outputting the message once for a given task, these alternatives
have downsides that don't make sense given how low-stakes a single
kernel warning message is. Switching to pr_info_ratelimited() instead
should be fine -- it's possible some monitoring tool will be unhappy
with a stream of warning-level messages but there's already plenty of
info-level message spam in dmesg.

[1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/
[2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
[3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/

Cc: stable@vger.kernel.org # v6.3+
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 mm/memfd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian Brauner Aug. 22, 2023, 9:10 a.m. UTC | #1
On Mon, Aug 14, 2023 at 06:40:59PM +1000, Aleksa Sarai wrote:
> In order to incentivise userspace to switch to passing MFD_EXEC and
> MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call
> memfd_create() without the new flags. pr_warn_once() is not useful
> because on most systems the one warning is burned up during the boot
> process (on my system, systemd does this within the first second of
> boot) and thus userspace will in practice never see the warnings to push
> them to switch to the new flags.
> 
> The original patchset[1] used pr_warn_ratelimited(), however there were
> concerns about the degree of spam in the kernel log[2,3]. The resulting
> inability to detect every case was flagged as an issue at the time[4].
> 
> While we could come up with an alternative rate-limiting scheme such as
> only outputting the message if vm.memfd_noexec has been modified, or
> only outputting the message once for a given task, these alternatives
> have downsides that don't make sense given how low-stakes a single
> kernel warning message is. Switching to pr_info_ratelimited() instead
> should be fine -- it's possible some monitoring tool will be unhappy
> with a stream of warning-level messages but there's already plenty of
> info-level message spam in dmesg.
> 
> [1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/
> [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
> [3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
> [4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/
> 
> Cc: stable@vger.kernel.org # v6.3+
> Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>
Damian Tometzki Sept. 1, 2023, 5:13 a.m. UTC | #2
On Mon, 14. Aug 18:40, Aleksa Sarai wrote:
> In order to incentivise userspace to switch to passing MFD_EXEC and
> MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call
> memfd_create() without the new flags. pr_warn_once() is not useful
> because on most systems the one warning is burned up during the boot
> process (on my system, systemd does this within the first second of
> boot) and thus userspace will in practice never see the warnings to push
> them to switch to the new flags.
> 
> The original patchset[1] used pr_warn_ratelimited(), however there were
> concerns about the degree of spam in the kernel log[2,3]. The resulting
> inability to detect every case was flagged as an issue at the time[4].
> 
> While we could come up with an alternative rate-limiting scheme such as
> only outputting the message if vm.memfd_noexec has been modified, or
> only outputting the message once for a given task, these alternatives
> have downsides that don't make sense given how low-stakes a single
> kernel warning message is. Switching to pr_info_ratelimited() instead
> should be fine -- it's possible some monitoring tool will be unhappy
> with a stream of warning-level messages but there's already plenty of
> info-level message spam in dmesg.
> 
> [1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/
> [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
> [3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
> [4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/
> 
> Cc: stable@vger.kernel.org # v6.3+
> Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
>  mm/memfd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memfd.c b/mm/memfd.c
> index d65485c762de..aa46521057ab 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -315,7 +315,7 @@ SYSCALL_DEFINE2(memfd_create,
>  		return -EINVAL;
>  
>  	if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
> -		pr_warn_once(
> +		pr_info_ratelimited(
>  			"%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n",
>  			current->comm, task_pid_nr(current));
>  	}
> 
> -- 
> 2.41.0
>
Hello Sarai,

i got a lot of messages in dmesg with this. DMESG is unuseable with
this. 
[ 1390.349462] __do_sys_memfd_create: 5 callbacks suppressed
[ 1390.349468] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.350106] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.350366] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.359390] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.359453] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.848813] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.849425] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.849673] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.857629] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1390.857674] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1404.819637] __do_sys_memfd_create: 105 callbacks suppressed
[ 1404.819641] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1404.819950] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1404.820054] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1404.824240] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1404.824279] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.373186] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.373906] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.374131] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.382397] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.382485] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.499581] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.500077] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.500265] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.512772] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1430.512840] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.388519] __do_sys_memfd_create: 60 callbacks suppressed
[ 1444.388525] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.389061] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.389335] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.397909] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.397965] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.503514] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.503658] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.503726] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.507841] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1444.507870] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 1449.707966] __do_sys_memfd_create: 25 callbacks suppressed

Best regards
Damian
Andrew Morton Sept. 2, 2023, 10:58 p.m. UTC | #3
On Fri, 1 Sep 2023 07:13:45 +0200 Damian Tometzki <dtometzki@fedoraproject.org> wrote:

> >  	if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
> > -		pr_warn_once(
> > +		pr_info_ratelimited(
> >  			"%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n",
> >  			current->comm, task_pid_nr(current));
> >  	}
> > 
> > -- 
> > 2.41.0
> >
> Hello Sarai,
> 
> i got a lot of messages in dmesg with this. DMESG is unuseable with
> this. 
> [ 1390.349462] __do_sys_memfd_create: 5 callbacks suppressed
> [ 1390.349468] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
> [ 1390.350106] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set

OK, thanks, I'll revert this.  Spamming everyone even harder isn't a
good way to get developers to fix their stuff.
Aleksa Sarai Sept. 4, 2023, 7:09 a.m. UTC | #4
On 2023-09-02, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 1 Sep 2023 07:13:45 +0200 Damian Tometzki <dtometzki@fedoraproject.org> wrote:
> 
> > >  	if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
> > > -		pr_warn_once(
> > > +		pr_info_ratelimited(
> > >  			"%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n",
> > >  			current->comm, task_pid_nr(current));
> > >  	}
> > > 
> > > -- 
> > > 2.41.0
> > >
> > Hello Sarai,
> > 
> > i got a lot of messages in dmesg with this. DMESG is unuseable with
> > this. 
> > [ 1390.349462] __do_sys_memfd_create: 5 callbacks suppressed
> > [ 1390.349468] pipewire-pulse[2930]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
> > [ 1390.350106] pipewire[2712]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
> 
> OK, thanks, I'll revert this.  Spamming everyone even harder isn't a
> good way to get developers to fix their stuff.

Sorry, I'm on vacation. I will send a follow-up patch to remove this
logging entirely -- if we can't do rate-limited logging then logging a
single message effectively at boot time makes no sense. I had hoped that
this wouldn't be too much (given there is a fair amount of INFO-level
spam in the kernel log) but I guess the default ratelimit (5Hz) is too
liberal.

Perhaps we can re-consider adding some logging in the future, when more
programs have migrated. The only other "reasonable" way to reduce the
logging would be to add something to task_struct so we only log once per
task, but obviously that's massively overkill.

(FWIW, I don't think the logging was ever necessary. There's nothing
wrong with running an older program that doesn't pass the flags.)
Florian Weimer Sept. 5, 2023, 4:20 p.m. UTC | #5
* Andrew Morton:

> OK, thanks, I'll revert this.  Spamming everyone even harder isn't a
> good way to get developers to fix their stuff.

Is this really buggy userspace?  Are future kernels going to require
some of these flags?

That's going to break lots of applications which use memfd_create to
enable run-time code generation on locked-down systems because it looked
like a stable interface (“don't break userspace” and all that).

Thanks,
Florian
Aleksa Sarai Sept. 6, 2023, 6:58 a.m. UTC | #6
On 2023-09-05, Florian Weimer <fweimer@redhat.com> wrote:
> * Andrew Morton:
> 
> > OK, thanks, I'll revert this.  Spamming everyone even harder isn't a
> > good way to get developers to fix their stuff.
> 
> Is this really buggy userspace?  Are future kernels going to require
> some of these flags?
> 
> That's going to break lots of applications which use memfd_create to
> enable run-time code generation on locked-down systems because it looked
> like a stable interface (“don't break userspace” and all that).

There is no userspace breakage with the current behaviour and obviously
actually requiring these flags to be passed by default would be a pretty
clear userspace breakage and would never be merged.

The original intention (as far as I can tell -- the logging behaviour
came from the original patchset) was to try to incentivise userspace to
start passing the flags so that if distributions decide to set
vm.memfd_noexec=1 as a default setting you won't end up with programs
that _need_ executable memfds (such as container runtimes) crashing
unexpectedly. I also suspect there was an aspect of "well, userspace
*should* be passing these flags after we've introduced them".

I'm sending a patch to just remove this part of the logging because I
don't think it makes sense if you can't rate-limit it sanely, and
there's probably an argument to be made that it doesn't make sense at
all (at least for the default vm.memfd_noexec=0 setting).
diff mbox series

Patch

diff --git a/mm/memfd.c b/mm/memfd.c
index d65485c762de..aa46521057ab 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -315,7 +315,7 @@  SYSCALL_DEFINE2(memfd_create,
 		return -EINVAL;
 
 	if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
-		pr_warn_once(
+		pr_info_ratelimited(
 			"%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n",
 			current->comm, task_pid_nr(current));
 	}