diff mbox series

[man] mmap.2: Document danger of mappings larger than PTRDIFF_MAX

Message ID 20250409200316.1555164-1-jannh@google.com (mailing list archive)
State New
Headers show
Series [man] mmap.2: Document danger of mappings larger than PTRDIFF_MAX | expand

Commit Message

Jann Horn April 9, 2025, 8:03 p.m. UTC
References:
 - C99 draft: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
   section "6.5.6 Additive operators", paragraph 9
 - object size restriction in GCC:
   https://gcc.gnu.org/legacy-ml/gcc/2011-08/msg00221.html
 - glibc malloc restricts object size to <=PTRDIFF_MAX in
   checked_request2size()
---
I'm not sure if we can reasonably do anything about this in the kernel,
given that the kernel does not really have any idea of what userspace
object sizes look like, or whether userspace even wants C semantics.
But we can at least document it...

@man-pages maintainer: Please wait a few days before applying this;
I imagine there might be some discussion about this.

 man/man2/mmap.2 | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)


base-commit: 4c4d9f0f5148caf1271394018d0f7381c1b8b400

Comments

Jakub Wilk April 9, 2025, 8:25 p.m. UTC | #1
* Jann Horn <jannh@google.com>, 2025-04-09 22:03:
> - glibc malloc restricts object size to <=PTRDIFF_MAX in
>   checked_request2size()

FWIW, this is done only since glibc v2.30 (released in 2019):
https://sourceware.org/cgit/glibc/commit/?id=9bf8e29ca136094f
Alejandro Colomar April 9, 2025, 8:41 p.m. UTC | #2
Hi Jan,

On Wed, Apr 09, 2025 at 10:03:16PM +0200, Jann Horn wrote:
> References:
>  - C99 draft: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
>    section "6.5.6 Additive operators", paragraph 9
>  - object size restriction in GCC:
>    https://gcc.gnu.org/legacy-ml/gcc/2011-08/msg00221.html
>  - glibc malloc restricts object size to <=PTRDIFF_MAX in
>    checked_request2size()
> ---
> I'm not sure if we can reasonably do anything about this in the kernel,
> given that the kernel does not really have any idea of what userspace
> object sizes look like,

Hmmm.  Maybe it could reject PTRDIFF_MAX within the kernel, which would
at least work for cases where user-space ptrdiff_t matches the kernel's
ptrdiff_t?  Then only users where they don't match would be unprotected,
but those are hopefully extra careful.

> or whether userspace even wants C semantics.

I guess any language will have to link to C at some point, or have
inherent limitations similar to those of C.

> But we can at least document it...

Yep.  Most people are unaware of this, and believe they can get
SIZE_MAX.

> 
> @man-pages maintainer: Please wait a few days before applying this;
> I imagine there might be some discussion about this.

Okay; see some minor comments below.

> 
>  man/man2/mmap.2 | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/man/man2/mmap.2 b/man/man2/mmap.2
> index caf822103..9cb7dacf3 100644
> --- a/man/man2/mmap.2
> +++ b/man/man2/mmap.2
> @@ -785,6 +785,23 @@ correspond to added or removed regions of the file is unspecified.
>  An application can determine which pages of a mapping are
>  currently resident in the buffer/page cache using
>  .BR mincore (2).
> +.P
> +Unlike typical
> +.BR malloc (3)
> +implementations,
> +.BR mmap ()
> +does not prevent creating objects larger than
> +.B PTRDIFF_MAX.

.BR PTRDIFF_MAX .

(since you want the '.' not bold, but roman)

> +Objects that are larger than
> +.B PTRDIFF_MAX
> +only work in limited ways in standard C (in particular, pointer subtraction

Please break the line also before the '('.

> +results in undefined behavior if the result would be bigger than
> +.B PTRDIFF_MAX).

.BR PTRDIFF_MAX ).

(same reasons)

> +On top of that, GCC also assumes that no object is bigger than
> +.B PTRDIFF_MAX.

.BR PTRDIFF_MAX .

> +.B PTRDIFF_MAX
> +is usually half of the address space size; so for 32-bit processes, it is

Please break the line after ';' and after ',' (and not after 'is').

See also man-pages(7):

$ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
   Use semantic newlines
     In the source of a manual page, new sentences should be started on
     new lines, long sentences should be split  into  lines  at  clause
     breaks  (commas,  semicolons, colons, and so on), and long clauses
     should be split at phrase boundaries.  This convention,  sometimes
     known as "semantic newlines", makes it easier to see the effect of
     patches, which often operate at the level of individual sentences,
     clauses, or phrases.


Have a lovely night!
Alex

> +usually 0x7fffffff (almost 2 GiB).
>  .\"
>  .SS Using MAP_FIXED safely
>  The only safe use for
> 
> base-commit: 4c4d9f0f5148caf1271394018d0f7381c1b8b400
> -- 
> 2.49.0.504.g3bcea36a83-goog
>
Jann Horn April 10, 2025, 6:08 p.m. UTC | #3
On Wed, Apr 9, 2025 at 10:41 PM Alejandro Colomar <alx@kernel.org> wrote:
> On Wed, Apr 09, 2025 at 10:03:16PM +0200, Jann Horn wrote:
> > References:
> >  - C99 draft: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
> >    section "6.5.6 Additive operators", paragraph 9
> >  - object size restriction in GCC:
> >    https://gcc.gnu.org/legacy-ml/gcc/2011-08/msg00221.html
> >  - glibc malloc restricts object size to <=PTRDIFF_MAX in
> >    checked_request2size()
> > ---
> > I'm not sure if we can reasonably do anything about this in the kernel,
> > given that the kernel does not really have any idea of what userspace
> > object sizes look like,
>
> Hmmm.  Maybe it could reject PTRDIFF_MAX within the kernel, which would
> at least work for cases where user-space ptrdiff_t matches the kernel's
> ptrdiff_t?  Then only users where they don't match would be unprotected,
> but those are hopefully extra careful.

Perhaps. But then some tricky things are:

1. How many existing users would we be breaking with such a change?
Probably _someone_ out there is deliberately mapping files over 2G
into 32-bit processes and it sorta worked until now...
2. We don't really have a concept of object size in the kernel, and it
might be hard to reason about whether mmap() is used logically to
create a new object or extend an existing object. I guess we could
limit VMA sizes for 32-bit userspace to 0x7ffff000 and enforce a
1-page gap around mappings that are at least half that size, or
something like that, but that would probably get a bit ugly on the
kernel side...

The first point is really the main concern for me - we might end up
breaking existing users.

> > or whether userspace even wants C semantics.
>
> I guess any language will have to link to C at some point, or have
> inherent limitations similar to those of C.

This limitation is really a result of deciding to make pointer
subtraction return a signed value, so that you can subtract a bigger
pointer from a smaller pointer. I don't know whether other languages
do that.

> > But we can at least document it...
>
> Yep.  Most people are unaware of this, and believe they can get
> SIZE_MAX.
>
> >
> > @man-pages maintainer: Please wait a few days before applying this;
> > I imagine there might be some discussion about this.
>
> Okay; see some minor comments below.

Thanks. (I'll probably be out for the next two weeks or so, I'll
probably get back to this afterwards.)
Alejandro Colomar April 10, 2025, 8:30 p.m. UTC | #4
Hi Jann,

On Thu, Apr 10, 2025 at 08:08:41PM +0200, Jann Horn wrote:
> > Hmmm.  Maybe it could reject PTRDIFF_MAX within the kernel, which would
> > at least work for cases where user-space ptrdiff_t matches the kernel's
> > ptrdiff_t?  Then only users where they don't match would be unprotected,
> > but those are hopefully extra careful.
> 
> Perhaps. But then some tricky things are:
> 
> 1. How many existing users would we be breaking with such a change?
> Probably _someone_ out there is deliberately mapping files over 2G
> into 32-bit processes and it sorta worked until now...
> 2. We don't really have a concept of object size in the kernel, and it
> might be hard to reason about whether mmap() is used logically to
> create a new object or extend an existing object. I guess we could
> limit VMA sizes for 32-bit userspace to 0x7ffff000 and enforce a
> 1-page gap around mappings that are at least half that size, or
> something like that, but that would probably get a bit ugly on the
> kernel side...
> 
> The first point is really the main concern for me - we might end up
> breaking existing users.

Hmmm, okay.  If it ends up being too complex, it also would be bad.
It's easier for careful programmers to just check the size before the
call.  So it's fine to not do the check in the kernel.

> > > or whether userspace even wants C semantics.
> >
> > I guess any language will have to link to C at some point, or have
> > inherent limitations similar to those of C.
> 
> This limitation is really a result of deciding to make pointer
> subtraction return a signed value, so that you can subtract a bigger
> pointer from a smaller pointer. I don't know whether other languages
> do that.
>
> > > But we can at least document it...
> >
> > Yep.  Most people are unaware of this, and believe they can get
> > SIZE_MAX.
> >
> > >
> > > @man-pages maintainer: Please wait a few days before applying this;
> > > I imagine there might be some discussion about this.
> >
> > Okay; see some minor comments below.
> 
> Thanks. (I'll probably be out for the next two weeks or so, I'll
> probably get back to this afterwards.)

Okay, no problem
diff mbox series

Patch

diff --git a/man/man2/mmap.2 b/man/man2/mmap.2
index caf822103..9cb7dacf3 100644
--- a/man/man2/mmap.2
+++ b/man/man2/mmap.2
@@ -785,6 +785,23 @@  correspond to added or removed regions of the file is unspecified.
 An application can determine which pages of a mapping are
 currently resident in the buffer/page cache using
 .BR mincore (2).
+.P
+Unlike typical
+.BR malloc (3)
+implementations,
+.BR mmap ()
+does not prevent creating objects larger than
+.B PTRDIFF_MAX.
+Objects that are larger than
+.B PTRDIFF_MAX
+only work in limited ways in standard C (in particular, pointer subtraction
+results in undefined behavior if the result would be bigger than
+.B PTRDIFF_MAX).
+On top of that, GCC also assumes that no object is bigger than
+.B PTRDIFF_MAX.
+.B PTRDIFF_MAX
+is usually half of the address space size; so for 32-bit processes, it is
+usually 0x7fffffff (almost 2 GiB).
 .\"
 .SS Using MAP_FIXED safely
 The only safe use for