[04/13] Always expose MAP_UNINITIALIZED to userspace
diff mbox

Message ID 1442271047-4908-5-git-send-email-palmer@dabbelt.com
State New
Headers show

Commit Message

Palmer Dabbelt Sept. 14, 2015, 10:50 p.m. UTC
This used to be hidden behind CONFIG_MMAP_ALLOW_UNINITIALIZED, so
userspace wouldn't actually ever see it be non-zero.  While I could
have kept hiding it, the man pages seem to indicate that
MAP_UNINITIALIZED should be visible:

  mmap(2)
  MAP_UNINITIALIZED (since Linux 2.6.33)
    Don't clear anonymous pages.  This flag is intended to improve
    performance on embedded devices.  This flag is honored only if the
    kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED
    option.  Because of the security implications, that option is
    normally enabled only on embedded devices (i.e., devices where one
    has complete control of the contents of user memory).

and since the only time it shows up in my /usr/include is in this
header I believe this should have been visible to userspace (as
non-zero, which wouldn't do anything when or'd into the flags) all
along.

This change also applies to the xtensa version of this definition,
whic is the same as the generic one.

Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Reviewed-by: Andrew Waterman <waterman@eecs.berkeley.edu>
Reviewed-by: Albert Ou <aou@eecs.berkeley.edu>
---
 arch/xtensa/include/uapi/asm/mman.h    | 4 +---
 include/uapi/asm-generic/mman-common.h | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

Comments

Kirill A. Shutemov Sept. 15, 2015, 12:23 a.m. UTC | #1
On Mon, Sep 14, 2015 at 03:50:38PM -0700, Palmer Dabbelt wrote:
> This used to be hidden behind CONFIG_MMAP_ALLOW_UNINITIALIZED, so
> userspace wouldn't actually ever see it be non-zero.  While I could
> have kept hiding it, the man pages seem to indicate that
> MAP_UNINITIALIZED should be visible:
> 
>   mmap(2)
>   MAP_UNINITIALIZED (since Linux 2.6.33)
>     Don't clear anonymous pages.  This flag is intended to improve
>     performance on embedded devices.  This flag is honored only if the
>     kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED
>     option.  Because of the security implications, that option is
>     normally enabled only on embedded devices (i.e., devices where one
>     has complete control of the contents of user memory).
> 
> and since the only time it shows up in my /usr/include is in this
> header I believe this should have been visible to userspace (as
> non-zero, which wouldn't do anything when or'd into the flags) all
> along.

Are you sure about "wouldn't do anything"?
Suspiciously, 0x4000000 is also (1 << MAP_HUGE_SHIFT). I'm not sure if any
architecture has order-1 huge pages, but still looks like we have conflict
here.

I think it's harmful to expose non-zero MAP_UNINITIALIZED to system which
potentially can handle multiple users. Or non-trivial user space in
general.

Should we leave it at least under '#ifndef CONFIG_MMU'? I don't think it's
possible to have single ABI for MMU and MMU-less systems anyway. And we
can avoid conflict with MAP_HUGE_SHIFT this way.

P.S. MAP_UNINITIALIZED itself looks very broken to me. I probably need dig
mailing list on why it was allowed. 
But that's other topic.
Palmer Dabbelt Sept. 15, 2015, 12:52 a.m. UTC | #2
On Mon, 14 Sep 2015 17:23:58 PDT (-0700), kirill@shutemov.name wrote:
> On Mon, Sep 14, 2015 at 03:50:38PM -0700, Palmer Dabbelt wrote:
>> This used to be hidden behind CONFIG_MMAP_ALLOW_UNINITIALIZED, so
>> userspace wouldn't actually ever see it be non-zero.  While I could
>> have kept hiding it, the man pages seem to indicate that
>> MAP_UNINITIALIZED should be visible:
>>
>>   mmap(2)
>>   MAP_UNINITIALIZED (since Linux 2.6.33)
>>     Don't clear anonymous pages.  This flag is intended to improve
>>     performance on embedded devices.  This flag is honored only if the
>>     kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED
>>     option.  Because of the security implications, that option is
>>     normally enabled only on embedded devices (i.e., devices where one
>>     has complete control of the contents of user memory).
>>
>> and since the only time it shows up in my /usr/include is in this
>> header I believe this should have been visible to userspace (as
>> non-zero, which wouldn't do anything when or'd into the flags) all
>> along.
>
> Are you sure about "wouldn't do anything"?

That was bad writing for me.  I'd originally written something like "I
believe this should have been visible to userspace all along", but
then added the ()'s.  I meant to say:

 * I think MAP_UNINITIALIZED should have been non-zero in userspace.
 * MAP_UNINITAILIZED was zero in userspace.
 * A zero MAP_UNINITIALIZED does nothing when OR'd in.

> Suspiciously, 0x4000000 is also (1 << MAP_HUGE_SHIFT). I'm not sure if any
> architecture has order-1 huge pages, but still looks like we have conflict
> here.
>
> I think it's harmful to expose non-zero MAP_UNINITIALIZED to system which
> potentially can handle multiple users. Or non-trivial user space in
> general.

This doesn't have MAP_UNINITIALIZED do anything by default, it just
defines the flag the same way on all systems.  I was under the
impression that this just happened if I set MAP_UNINITIALIZED.
Looking at MAP_HUGE_SHIFT it mmap.c, that's definitely why my mmap()
test case ignored the set MAP_UNINITIALIZED on my PC.

I'm going to make this

 #ifndef MAP_UNINITAILIZED
 #define MAP_UNINITAILIZED 0
 #endif

and then leave Xtensa's port alone.  This is what Arnd suggested
originally, sorry for the extra work!

> Should we leave it at least under '#ifndef CONFIG_MMU'? I don't think it's
> possible to have single ABI for MMU and MMU-less systems anyway. And we
> can avoid conflict with MAP_HUGE_SHIFT this way.

The whole goal here was to eliminate "#ifndef CONFIG_*" from the
user-visible headers.  This all started because I got bit by a very
similar-looking bug (see patch #1), so I'd prefer not to go down that
route.

> P.S. MAP_UNINITIALIZED itself looks very broken to me. I probably need dig
> mailing list on why it was allowed.
> But that's other topic.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Triplett Sept. 15, 2015, 5:19 a.m. UTC | #3
On Tue, Sep 15, 2015 at 03:23:58AM +0300, Kirill A. Shutemov wrote:
> On Mon, Sep 14, 2015 at 03:50:38PM -0700, Palmer Dabbelt wrote:
> > This used to be hidden behind CONFIG_MMAP_ALLOW_UNINITIALIZED, so
> > userspace wouldn't actually ever see it be non-zero.  While I could
> > have kept hiding it, the man pages seem to indicate that
> > MAP_UNINITIALIZED should be visible:
> > 
> >   mmap(2)
> >   MAP_UNINITIALIZED (since Linux 2.6.33)
> >     Don't clear anonymous pages.  This flag is intended to improve
> >     performance on embedded devices.  This flag is honored only if the
> >     kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED
> >     option.  Because of the security implications, that option is
> >     normally enabled only on embedded devices (i.e., devices where one
> >     has complete control of the contents of user memory).
> > 
> > and since the only time it shows up in my /usr/include is in this
> > header I believe this should have been visible to userspace (as
> > non-zero, which wouldn't do anything when or'd into the flags) all
> > along.
> 
> Are you sure about "wouldn't do anything"?
> Suspiciously, 0x4000000 is also (1 << MAP_HUGE_SHIFT). I'm not sure if any
> architecture has order-1 huge pages, but still looks like we have conflict
> here.
> 
> I think it's harmful to expose non-zero MAP_UNINITIALIZED to system which
> potentially can handle multiple users. Or non-trivial user space in
> general.

The flag should always exist.  If it was defined to conflict with
something else, that's a serious ABI problem.  But the flag
should always exist, even if the kernel ends up ignoring it.

> Should we leave it at least under '#ifndef CONFIG_MMU'? I don't think it's
> possible to have single ABI for MMU and MMU-less systems anyway. And we
> can avoid conflict with MAP_HUGE_SHIFT this way.

No; even if you have an MMU (which is useful for things like fork()), a
system without user separation (for instance, without CONFIG_MULTIUSER)
can reasonably use MAP_UNINITIALIZED.

> P.S. MAP_UNINITIALIZED itself looks very broken to me. I probably need dig
> mailing list on why it was allowed.

That's what the config option *and* explicit flag are for; there are
more than enough warnings about the implications.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A. Shutemov Sept. 15, 2015, 9:42 a.m. UTC | #4
On Mon, Sep 14, 2015 at 10:19:19PM -0700, Josh Triplett wrote:
> On Tue, Sep 15, 2015 at 03:23:58AM +0300, Kirill A. Shutemov wrote:
> > On Mon, Sep 14, 2015 at 03:50:38PM -0700, Palmer Dabbelt wrote:
> > > This used to be hidden behind CONFIG_MMAP_ALLOW_UNINITIALIZED, so
> > > userspace wouldn't actually ever see it be non-zero.  While I could
> > > have kept hiding it, the man pages seem to indicate that
> > > MAP_UNINITIALIZED should be visible:
> > > 
> > >   mmap(2)
> > >   MAP_UNINITIALIZED (since Linux 2.6.33)
> > >     Don't clear anonymous pages.  This flag is intended to improve
> > >     performance on embedded devices.  This flag is honored only if the
> > >     kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED
> > >     option.  Because of the security implications, that option is
> > >     normally enabled only on embedded devices (i.e., devices where one
> > >     has complete control of the contents of user memory).
> > > 
> > > and since the only time it shows up in my /usr/include is in this
> > > header I believe this should have been visible to userspace (as
> > > non-zero, which wouldn't do anything when or'd into the flags) all
> > > along.
> > 
> > Are you sure about "wouldn't do anything"?
> > Suspiciously, 0x4000000 is also (1 << MAP_HUGE_SHIFT). I'm not sure if any
> > architecture has order-1 huge pages, but still looks like we have conflict
> > here.
> > 
> > I think it's harmful to expose non-zero MAP_UNINITIALIZED to system which
> > potentially can handle multiple users. Or non-trivial user space in
> > general.
> 
> The flag should always exist.

Sure. And 0 is perfectly fine value for the flag. Like with MAP_FILE.

> If it was defined to conflict with
> something else, that's a serious ABI problem.  But the flag
> should always exist, even if the kernel ends up ignoring it.
> 
> > Should we leave it at least under '#ifndef CONFIG_MMU'? I don't think it's
> > possible to have single ABI for MMU and MMU-less systems anyway. And we
> > can avoid conflict with MAP_HUGE_SHIFT this way.
> 
> No; even if you have an MMU (which is useful for things like fork()), a
> system without user separation (for instance, without CONFIG_MULTIUSER)
> can reasonably use MAP_UNINITIALIZED.

Can? Yes. Reasonably? I don't think so.

> > P.S. MAP_UNINITIALIZED itself looks very broken to me. I probably need dig
> > mailing list on why it was allowed.
> 
> That's what the config option *and* explicit flag are for; there are
> more than enough warnings about the implications.

I think it's misdesigned. It doesn't require explicid opt-in from a
process who owned the page allocated in MAP_UNINITIALIZED mapping before.

#define MAP_LEAK_ME_SOME_DATA MAP_UNINITIALIZED
Josh Triplett Sept. 15, 2015, 2:07 p.m. UTC | #5
On Tue, Sep 15, 2015 at 12:42:00PM +0300, Kirill A. Shutemov wrote:
> On Mon, Sep 14, 2015 at 10:19:19PM -0700, Josh Triplett wrote:
> > On Tue, Sep 15, 2015 at 03:23:58AM +0300, Kirill A. Shutemov wrote:
> > > On Mon, Sep 14, 2015 at 03:50:38PM -0700, Palmer Dabbelt wrote:
> > > > This used to be hidden behind CONFIG_MMAP_ALLOW_UNINITIALIZED, so
> > > > userspace wouldn't actually ever see it be non-zero.  While I could
> > > > have kept hiding it, the man pages seem to indicate that
> > > > MAP_UNINITIALIZED should be visible:
> > > > 
> > > >   mmap(2)
> > > >   MAP_UNINITIALIZED (since Linux 2.6.33)
> > > >     Don't clear anonymous pages.  This flag is intended to improve
> > > >     performance on embedded devices.  This flag is honored only if the
> > > >     kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED
> > > >     option.  Because of the security implications, that option is
> > > >     normally enabled only on embedded devices (i.e., devices where one
> > > >     has complete control of the contents of user memory).
> > > > 
> > > > and since the only time it shows up in my /usr/include is in this
> > > > header I believe this should have been visible to userspace (as
> > > > non-zero, which wouldn't do anything when or'd into the flags) all
> > > > along.
> > > 
> > > Are you sure about "wouldn't do anything"?
> > > Suspiciously, 0x4000000 is also (1 << MAP_HUGE_SHIFT). I'm not sure if any
> > > architecture has order-1 huge pages, but still looks like we have conflict
> > > here.
> > > 
> > > I think it's harmful to expose non-zero MAP_UNINITIALIZED to system which
> > > potentially can handle multiple users. Or non-trivial user space in
> > > general.
> > 
> > The flag should always exist.
> 
> Sure. And 0 is perfectly fine value for the flag. Like with MAP_FILE.

Rephrasing: the flag should always exist with the correct value.
Whether the kernel handles it or not, the kernel *headers* shouldn't
change to match the kernel, not least of which because they don't
necessarily match the running kernel.  Just like we define the
prototypes for syscalls that the running kernel may return ENOSYS for.

> > If it was defined to conflict with
> > something else, that's a serious ABI problem.  But the flag
> > should always exist, even if the kernel ends up ignoring it.
> > 
> > > Should we leave it at least under '#ifndef CONFIG_MMU'? I don't think it's
> > > possible to have single ABI for MMU and MMU-less systems anyway. And we
> > > can avoid conflict with MAP_HUGE_SHIFT this way.
> > 
> > No; even if you have an MMU (which is useful for things like fork()), a
> > system without user separation (for instance, without CONFIG_MULTIUSER)
> > can reasonably use MAP_UNINITIALIZED.
> 
> Can? Yes. Reasonably? I don't think so.

Not all systems care.  Otherwise you should be complaining more bitterly
about options like CONFIG_MMU=n, which (*gasp*) allow access to *arbitrary
memory*.

> > > P.S. MAP_UNINITIALIZED itself looks very broken to me. I probably need dig
> > > mailing list on why it was allowed.
> > 
> > That's what the config option *and* explicit flag are for; there are
> > more than enough warnings about the implications.
> 
> I think it's misdesigned. It doesn't require explicid opt-in from a
> process who owned the page allocated in MAP_UNINITIALIZED mapping before.
> 
> #define MAP_LEAK_ME_SOME_DATA MAP_UNINITIALIZED

Hence why it has a config option.

The userspace option exists primarily because otherwise userspace might
get surprised by receiving a non-zeroed page.  On a system with the
config option turned on, processes have access to arbitrary freed
memory, as long as they say they can handle not having their memory
pre-zeroed.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Howells Sept. 17, 2015, 10:13 a.m. UTC | #6
Josh Triplett <josh@joshtriplett.org> wrote:

> > Sure. And 0 is perfectly fine value for the flag. Like with MAP_FILE.
> 
> Rephrasing: the flag should always exist with the correct value.
> Whether the kernel handles it or not, the kernel *headers* shouldn't
> change to match the kernel, not least of which because they don't
> necessarily match the running kernel.  Just like we define the
> prototypes for syscalls that the running kernel may return ENOSYS for.

Josh is correct.

CONFIG_xxx *should* *not* be seen in UAPI headers, except inside #ifdef
__KERNEL__ guards under special circumstances - and #ifdef __KERNEL__ guards
*should* *not* be seen in UAPI headers except under special circumstances.

In terms of such special circumstances, take a peek in
include/uapi/linux/acct.h at struct acct with this:

	/* m68k had no padding here. */
	#if !defined(CONFIG_M68K) || !defined(__KERNEL__)
		__u16		ac_ahz;			/* AHZ */
	#endif

in the middle of it...

Or include/{,uapi/}linux/agpgart.h where it defines two different but
same-named variants of several structs.

Now, some of these - particularly things like the latter - can be fixed by
someone who has the time.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 201aec0e0446..2cbc1e717082 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -55,11 +55,9 @@ 
 #define MAP_NONBLOCK	0x20000		/* do not block on IO */
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
-#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
+#ifndef MAP_UNINITIALIZED
 # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
-#else
-# define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
 #endif
 
 /*
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index ddc3b36f1046..7aeeb12db193 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -19,10 +19,8 @@ 
 #define MAP_TYPE	0x0f		/* Mask for type of mapping */
 #define MAP_FIXED	0x10		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
-#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
+#ifndef MAP_UNINITIALIZED
 # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be uninitialized */
-#else
-# define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
 #endif
 
 #define MS_ASYNC	1		/* sync memory asynchronously */