diff mbox series

[v4] mm: Optional full ASLR for mmap() and mremap()

Message ID 20201026160518.9212-1-toiwoton@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v4] mm: Optional full ASLR for mmap() and mremap() | expand

Commit Message

Topi Miettinen Oct. 26, 2020, 4:05 p.m. UTC
Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
enables full randomization of memory mappings created with mmap(NULL,
...). With 2, the base of the VMA used for such mappings is random,
but the mappings are created in predictable places within the VMA and
in sequential order. With 3, new VMAs are created to fully randomize
the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
even if not necessary.

The method is to randomize the new address without considering
VMAs. If the address fails checks because of overlap with the stack
area (or in case of mremap(), overlap with the old mapping), the
operation is retried a few times before falling back to old method.

On 32 bit systems this may cause problems due to increased VM
fragmentation if the address space gets crowded.

On all systems, it will reduce performance and increase memory
usage due to less efficient use of page tables and inability to
merge adjacent VMAs with compatible attributes.

In this example with value of 2, dynamic loader, libc, anonymous
memory reserved with mmap() and locale-archive are located close to
each other:

$ cat /proc/self/maps (only first line for each object shown for brevity)
58c1175b1000-58c1175b3000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
79752ec17000-79752f179000 r--p 00000000 fe:0c 2473999                    /usr/lib/locale/locale-archive
79752f179000-79752f279000 rw-p 00000000 00:00 0
79752f279000-79752f29e000 r--p 00000000 fe:0c 2402415                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
79752f43a000-79752f440000 rw-p 00000000 00:00 0
79752f46f000-79752f470000 r--p 00000000 fe:0c 2400484                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
79752f49b000-79752f49c000 rw-p 00000000 00:00 0
7ffdcad9e000-7ffdcadbf000 rw-p 00000000 00:00 0                          [stack]
7ffdcadd2000-7ffdcadd6000 r--p 00000000 00:00 0                          [vvar]
7ffdcadd6000-7ffdcadd8000 r-xp 00000000 00:00 0                          [vdso]

With 3, they are located at unrelated addresses:
$ echo 3 > /proc/sys/kernel/randomize_va_space
$ cat /proc/self/maps (only first line for each object shown for brevity)
1206a8fa000-1206a8fb000 r--p 00000000 fe:0c 2400484                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
1206a926000-1206a927000 rw-p 00000000 00:00 0
19174173000-19174175000 rw-p 00000000 00:00 0
ac82f419000-ac82f519000 rw-p 00000000 00:00 0
afa66a42000-afa66fa4000 r--p 00000000 fe:0c 2473999                      /usr/lib/locale/locale-archive
d8656ba9000-d8656bce000 r--p 00000000 fe:0c 2402415                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
d8656d6a000-d8656d6e000 rw-p 00000000 00:00 0
5df90b712000-5df90b714000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
7ffe1be4c000-7ffe1be6d000 rw-p 00000000 00:00 0                          [stack]
7ffe1bf07000-7ffe1bf0b000 r--p 00000000 00:00 0                          [vvar]
7ffe1bf0b000-7ffe1bf0d000 r-xp 00000000 00:00 0                          [vdso]

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Jann Horn <jannh@google.com>
CC: Kees Cook <keescook@chromium.org>
CC: Matthew Wilcox <willy@infradead.org>
CC: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
---
v2: also randomize mremap(..., MREMAP_MAYMOVE)
v3: avoid stack area and retry in case of bad random address (Jann
Horn), improve description in kernel.rst (Matthew Wilcox)
v4: use /proc/$pid/maps in the example (Mike Rapaport), CCs (Andrew
Morton), only check randomize_va_space == 3
---
 Documentation/admin-guide/hw-vuln/spectre.rst |  6 ++--
 Documentation/admin-guide/sysctl/kernel.rst   | 15 ++++++++++
 init/Kconfig                                  |  2 +-
 mm/internal.h                                 |  8 +++++
 mm/mmap.c                                     | 30 +++++++++++++------
 mm/mremap.c                                   | 27 +++++++++++++++++
 6 files changed, 75 insertions(+), 13 deletions(-)


base-commit: 3650b228f83adda7e5ee532e2b90429c03f7b9ec

Comments

Matthew Wilcox Nov. 17, 2020, 4:54 p.m. UTC | #1
On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> enables full randomization of memory mappings created with mmap(NULL,
> ...). With 2, the base of the VMA used for such mappings is random,
> but the mappings are created in predictable places within the VMA and
> in sequential order. With 3, new VMAs are created to fully randomize
> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> even if not necessary.

Is this worth it?

https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
Topi Miettinen Nov. 17, 2020, 8:21 p.m. UTC | #2
On 17.11.2020 18.54, Matthew Wilcox wrote:
> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
>> enables full randomization of memory mappings created with mmap(NULL,
>> ...). With 2, the base of the VMA used for such mappings is random,
>> but the mappings are created in predictable places within the VMA and
>> in sequential order. With 3, new VMAs are created to fully randomize
>> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
>> even if not necessary.
> 
> Is this worth it?
> 
> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/

Thanks, very interesting. The paper presents an attack (AnC) which can 
break ASLR even from JavaScript in browsers. In the process it compares 
the memory allocators of Firefox and Chrome. Firefox relies on Linux 
mmap() to randomize the memory location, but Chrome internally chooses 
the randomized address. The paper doesn't present exact numbers to break 
ASLR for Chrome case, but it seems to require more effort. Chrome also 
aggressively randomizes the memory on each allocation, which seems to 
enable further possibilities for AnC to probe the MMU tables.

Disregarding the difference in aggressiveness of memory allocators, I 
think with sysctl.kernel.randomize_va_space=3, the effort for breaking 
ASLR with Firefox should be increased closer to Chrome case since mmap() 
will use the address space more randomly.

I have used this setting now for a month without any visible performance 
issues, so I think the extra bits (for some additional effort to 
attackers) are definitely worth the low cost.

Furthermore, the paper does not describe in detail how the attack would 
continue after breaking ASLR. Perhaps there are assumptions which are 
not valid when the different memory areas are no longer sequential. For 
example, if ASLR is initially broken wrt. the JIT buffer but continuing 
the attack would require other locations to be determined (like stack, 
data segment for main exe or libc etc), further efforts may be needed to 
resolve these locations. With randomize_va_space=2, resolving any 
address (JIT buffer) can reveal the addresses of many other memory areas 
but this is not the case with 3.

-Topi
Mike Rapoport Nov. 18, 2020, 5:40 p.m. UTC | #3
(added one of the AnC paper authors)

On Tue, Nov 17, 2020 at 10:21:30PM +0200, Topi Miettinen wrote:
> On 17.11.2020 18.54, Matthew Wilcox wrote:
> > On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
> > > Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> > > enables full randomization of memory mappings created with mmap(NULL,
> > > ...). With 2, the base of the VMA used for such mappings is random,
> > > but the mappings are created in predictable places within the VMA and
> > > in sequential order. With 3, new VMAs are created to fully randomize
> > > the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> > > even if not necessary.
> > 
> > Is this worth it?
> > 
> > https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
> 
> Thanks, very interesting. The paper presents an attack (AnC) which can break
> ASLR even from JavaScript in browsers. In the process it compares the memory
> allocators of Firefox and Chrome. Firefox relies on Linux mmap() to
> randomize the memory location, but Chrome internally chooses the randomized
> address. The paper doesn't present exact numbers to break ASLR for Chrome
> case, but it seems to require more effort. Chrome also aggressively
> randomizes the memory on each allocation, which seems to enable further
> possibilities for AnC to probe the MMU tables.
> 
> Disregarding the difference in aggressiveness of memory allocators, I think
> with sysctl.kernel.randomize_va_space=3, the effort for breaking ASLR with
> Firefox should be increased closer to Chrome case since mmap() will use the
> address space more randomly.
> 
> I have used this setting now for a month without any visible performance
> issues, so I think the extra bits (for some additional effort to attackers)
> are definitely worth the low cost.
> 
> Furthermore, the paper does not describe in detail how the attack would
> continue after breaking ASLR. Perhaps there are assumptions which are not
> valid when the different memory areas are no longer sequential. For example,
> if ASLR is initially broken wrt. the JIT buffer but continuing the attack
> would require other locations to be determined (like stack, data segment for
> main exe or libc etc), further efforts may be needed to resolve these
> locations. With randomize_va_space=2, resolving any address (JIT buffer) can
> reveal the addresses of many other memory areas but this is not the case
> with 3.
> 
> -Topi
Cristiano Giuffrida Nov. 18, 2020, 6:49 p.m. UTC | #4
Interesting mitigation and discussion!

Regarding the impact on the AnC attack, indeed fine-grained (or full)
mmap() randomization affects AnC in two ways: (i) it breaks the
contiguity of the mmap() region, crippling the sliding primitive AnC
relies on; (ii) it ensures an attacker leaking an address in a
particular VMA can't easily infer addresses in other VMAs. So, in
short, the mitigation does raise the bar against AnC-like attacks and
I see this as a useful addition.

Indeed, we're aware some vendors implemented a similar randomization
strategy in the browser as a mitigation against AnC.

Nonetheless, some additional notes on the two points I raised above:

- (i) [Sliding] Note that an attacker can do away with sliding
depending on the randomization entropy and other available side
channels. For instance, with the recent TagBleed, we show how to
combine a TLB side channel with AnC to exhaust the KASLR entropy.
However, similar attacks should be possible in userland, again
depending on the randomization entropy used. See
https://download.vusec.net/papers/tagbleed_eurosp20.pdf. Combining
side channels with transient/speculative execution attacks can further
lower the bar.

- (ii) [Leaks] Depending on the software vulnerability used for
exploitation, it might not be difficult for an attacker to break
fine-grained randomization across VMAs. That is, leak an address from
VMA 1, use the vulnerability to trigger a normally illegal access to
VMA 2, leak an address from VMA 2, repeat. Of course, the exploit
might take much longer depending on how far on the pointer chasing
chain the target is.

Best,
Cristiano

On Wed, Nov 18, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> (added one of the AnC paper authors)
>
> On Tue, Nov 17, 2020 at 10:21:30PM +0200, Topi Miettinen wrote:
> > On 17.11.2020 18.54, Matthew Wilcox wrote:
> > > On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
> > > > Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> > > > enables full randomization of memory mappings created with mmap(NULL,
> > > > ...). With 2, the base of the VMA used for such mappings is random,
> > > > but the mappings are created in predictable places within the VMA and
> > > > in sequential order. With 3, new VMAs are created to fully randomize
> > > > the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> > > > even if not necessary.
> > >
> > > Is this worth it?
> > >
> > > https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
> >
> > Thanks, very interesting. The paper presents an attack (AnC) which can break
> > ASLR even from JavaScript in browsers. In the process it compares the memory
> > allocators of Firefox and Chrome. Firefox relies on Linux mmap() to
> > randomize the memory location, but Chrome internally chooses the randomized
> > address. The paper doesn't present exact numbers to break ASLR for Chrome
> > case, but it seems to require more effort. Chrome also aggressively
> > randomizes the memory on each allocation, which seems to enable further
> > possibilities for AnC to probe the MMU tables.
> >
> > Disregarding the difference in aggressiveness of memory allocators, I think
> > with sysctl.kernel.randomize_va_space=3, the effort for breaking ASLR with
> > Firefox should be increased closer to Chrome case since mmap() will use the
> > address space more randomly.
> >
> > I have used this setting now for a month without any visible performance
> > issues, so I think the extra bits (for some additional effort to attackers)
> > are definitely worth the low cost.
> >
> > Furthermore, the paper does not describe in detail how the attack would
> > continue after breaking ASLR. Perhaps there are assumptions which are not
> > valid when the different memory areas are no longer sequential. For example,
> > if ASLR is initially broken wrt. the JIT buffer but continuing the attack
> > would require other locations to be determined (like stack, data segment for
> > main exe or libc etc), further efforts may be needed to resolve these
> > locations. With randomize_va_space=2, resolving any address (JIT buffer) can
> > reveal the addresses of many other memory areas but this is not the case
> > with 3.
> >
> > -Topi
>
> --
> Sincerely yours,
> Mike.
Jann Horn Nov. 18, 2020, 10:42 p.m. UTC | #5
On Tue, Nov 17, 2020 at 5:55 PM Matthew Wilcox <willy@infradead.org> wrote:
> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
> > Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> > enables full randomization of memory mappings created with mmap(NULL,
> > ...). With 2, the base of the VMA used for such mappings is random,
> > but the mappings are created in predictable places within the VMA and
> > in sequential order. With 3, new VMAs are created to fully randomize
> > the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> > even if not necessary.
>
> Is this worth it?
>
> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/

Yeah, against local attacks (including from JavaScript), ASLR isn't
very robust; but it should still help against true remote attacks
(modulo crazyness like NetSpectre).

E.g. Mateusz Jurczyk's remote Samsung phone exploit via MMS messages
(https://googleprojectzero.blogspot.com/2020/08/mms-exploit-part-5-defeating-aslr-getting-rce.html)
would've probably been quite a bit harder to pull off if he hadn't
been able to rely on having all those memory mappings sandwiched
together.
Topi Miettinen Nov. 19, 2020, 9:16 a.m. UTC | #6
On 19.11.2020 0.42, Jann Horn wrote:
> On Tue, Nov 17, 2020 at 5:55 PM Matthew Wilcox <willy@infradead.org> wrote:
>> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
>>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
>>> enables full randomization of memory mappings created with mmap(NULL,
>>> ...). With 2, the base of the VMA used for such mappings is random,
>>> but the mappings are created in predictable places within the VMA and
>>> in sequential order. With 3, new VMAs are created to fully randomize
>>> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
>>> even if not necessary.
>>
>> Is this worth it?
>>
>> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
> 
> Yeah, against local attacks (including from JavaScript), ASLR isn't
> very robust; but it should still help against true remote attacks
> (modulo crazyness like NetSpectre).
> 
> E.g. Mateusz Jurczyk's remote Samsung phone exploit via MMS messages
> (https://googleprojectzero.blogspot.com/2020/08/mms-exploit-part-5-defeating-aslr-getting-rce.html)
> would've probably been quite a bit harder to pull off if he hadn't
> been able to rely on having all those memory mappings sandwiched
> together.

Compiling the system with -mcmodel=large should also help, since then 
even within one library, the address space layout of various segments 
(text, data, rodata) could be randomized individually and then finding 
the XOM wouldn't aid in finding the other segments. But this model isn't 
so well supported yet (GCC: 
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html, not sure about 
LLVM).

-Topi
Topi Miettinen Nov. 19, 2020, 9:59 a.m. UTC | #7
On 18.11.2020 20.49, Cristiano Giuffrida wrote:
> Interesting mitigation and discussion!
> 
> Regarding the impact on the AnC attack, indeed fine-grained (or full)
> mmap() randomization affects AnC in two ways: (i) it breaks the
> contiguity of the mmap() region, crippling the sliding primitive AnC
> relies on; (ii) it ensures an attacker leaking an address in a
> particular VMA can't easily infer addresses in other VMAs. So, in
> short, the mitigation does raise the bar against AnC-like attacks and
> I see this as a useful addition.

In your paper the timing for Chrome attacks were not presented, which 
would be interesting if they are comparable to the effect of 
randomize_va_space=3 for Firefox. What's your estimate, how much slower 
it was to break Chrome ASLR vs. Firefox/randomize_va_space=2?

> Indeed, we're aware some vendors implemented a similar randomization
> strategy in the browser as a mitigation against AnC.
> 
> Nonetheless, some additional notes on the two points I raised above:
> 
> - (i) [Sliding] Note that an attacker can do away with sliding
> depending on the randomization entropy and other available side
> channels. For instance, with the recent TagBleed, we show how to
> combine a TLB side channel with AnC to exhaust the KASLR entropy.
> However, similar attacks should be possible in userland, again
> depending on the randomization entropy used. See
> https://download.vusec.net/papers/tagbleed_eurosp20.pdf. Combining
> side channels with transient/speculative execution attacks can further
> lower the bar.

I think the equivalent of randomize_va_space=3 for KASLR would be that 
various kernel structures could be placed randomly with full use of all 
bits in the hardware, instead of low numbers like 9, 10 or 15 bits. 
Maybe also each module could be placed in individual random address 
instead of stuffing all modules together and likewise, instead of single 
page_offset_base, vmalloc_base and vmemmap_base, kernel would use the 
full address space to place various internal structures. I suppose this 
is not trivial.

> - (ii) [Leaks] Depending on the software vulnerability used for
> exploitation, it might not be difficult for an attacker to break
> fine-grained randomization across VMAs. That is, leak an address from
> VMA 1, use the vulnerability to trigger a normally illegal access to
> VMA 2, leak an address from VMA 2, repeat. Of course, the exploit
> might take much longer depending on how far on the pointer chasing
> chain the target is.

Pointers between VMAs may also exist, for example libz.so needs to call 
open(), close(), malloc(), free() etc. from libc.so.

-Topi

> Best,
> Cristiano
> 
> On Wed, Nov 18, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>>
>> (added one of the AnC paper authors)
>>
>> On Tue, Nov 17, 2020 at 10:21:30PM +0200, Topi Miettinen wrote:
>>> On 17.11.2020 18.54, Matthew Wilcox wrote:
>>>> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
>>>>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
>>>>> enables full randomization of memory mappings created with mmap(NULL,
>>>>> ...). With 2, the base of the VMA used for such mappings is random,
>>>>> but the mappings are created in predictable places within the VMA and
>>>>> in sequential order. With 3, new VMAs are created to fully randomize
>>>>> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
>>>>> even if not necessary.
>>>>
>>>> Is this worth it?
>>>>
>>>> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
>>>
>>> Thanks, very interesting. The paper presents an attack (AnC) which can break
>>> ASLR even from JavaScript in browsers. In the process it compares the memory
>>> allocators of Firefox and Chrome. Firefox relies on Linux mmap() to
>>> randomize the memory location, but Chrome internally chooses the randomized
>>> address. The paper doesn't present exact numbers to break ASLR for Chrome
>>> case, but it seems to require more effort. Chrome also aggressively
>>> randomizes the memory on each allocation, which seems to enable further
>>> possibilities for AnC to probe the MMU tables.
>>>
>>> Disregarding the difference in aggressiveness of memory allocators, I think
>>> with sysctl.kernel.randomize_va_space=3, the effort for breaking ASLR with
>>> Firefox should be increased closer to Chrome case since mmap() will use the
>>> address space more randomly.
>>>
>>> I have used this setting now for a month without any visible performance
>>> issues, so I think the extra bits (for some additional effort to attackers)
>>> are definitely worth the low cost.
>>>
>>> Furthermore, the paper does not describe in detail how the attack would
>>> continue after breaking ASLR. Perhaps there are assumptions which are not
>>> valid when the different memory areas are no longer sequential. For example,
>>> if ASLR is initially broken wrt. the JIT buffer but continuing the attack
>>> would require other locations to be determined (like stack, data segment for
>>> main exe or libc etc), further efforts may be needed to resolve these
>>> locations. With randomize_va_space=2, resolving any address (JIT buffer) can
>>> reveal the addresses of many other memory areas but this is not the case
>>> with 3.
>>>
>>> -Topi
>>
>> --
>> Sincerely yours,
>> Mike.
Cristiano Giuffrida Nov. 19, 2020, 10:20 p.m. UTC | #8
On Thu, Nov 19, 2020 at 10:59 AM Topi Miettinen <toiwoton@gmail.com> wrote:
>
> On 18.11.2020 20.49, Cristiano Giuffrida wrote:
> > Interesting mitigation and discussion!
> >
> > Regarding the impact on the AnC attack, indeed fine-grained (or full)
> > mmap() randomization affects AnC in two ways: (i) it breaks the
> > contiguity of the mmap() region, crippling the sliding primitive AnC
> > relies on; (ii) it ensures an attacker leaking an address in a
> > particular VMA can't easily infer addresses in other VMAs. So, in
> > short, the mitigation does raise the bar against AnC-like attacks and
> > I see this as a useful addition.
>
> In your paper the timing for Chrome attacks were not presented, which
> would be interesting if they are comparable to the effect of
> randomize_va_space=3 for Firefox. What's your estimate, how much slower
> it was to break Chrome ASLR vs. Firefox/randomize_va_space=2?
We did present entropy reduction over time for Chrome (see Fig. 8).
But without a proper sliding primitive due to mmap() randomization, we
stopped at 2 bits of residual entropy. Getting the last 2 bits is not
impossible, but indeed slower. Not sure by how much without actually
trying (as mentioned, you might also be able to use other side
channels to compensate).

I forgot to mention that mmap() randomization actually makes attacks
easier in cases where VMAs are not demand paged (see Section VI.B of
the AnC paper), since proper sliding with nonrandomized mmap() would
otherwise need to allocate too much memory.

>
> > Indeed, we're aware some vendors implemented a similar randomization
> > strategy in the browser as a mitigation against AnC.
> >
> > Nonetheless, some additional notes on the two points I raised above:
> >
> > - (i) [Sliding] Note that an attacker can do away with sliding
> > depending on the randomization entropy and other available side
> > channels. For instance, with the recent TagBleed, we show how to
> > combine a TLB side channel with AnC to exhaust the KASLR entropy.
> > However, similar attacks should be possible in userland, again
> > depending on the randomization entropy used. See
> > https://download.vusec.net/papers/tagbleed_eurosp20.pdf. Combining
> > side channels with transient/speculative execution attacks can further
> > lower the bar.
>
> I think the equivalent of randomize_va_space=3 for KASLR would be that
> various kernel structures could be placed randomly with full use of all
> bits in the hardware, instead of low numbers like 9, 10 or 15 bits.
> Maybe also each module could be placed in individual random address
> instead of stuffing all modules together and likewise, instead of single
> page_offset_base, vmalloc_base and vmemmap_base, kernel would use the
> full address space to place various internal structures. I suppose this
> is not trivial.
Indeed it's nontrivial to get similar randomization guarantees for the
kernel. I mentioned TagBleed because similar combined AnC + TLB
attacks should also be possible in the browser. We just happened to
focus on the kernel with TagBleed.

>
> > - (ii) [Leaks] Depending on the software vulnerability used for
> > exploitation, it might not be difficult for an attacker to break
> > fine-grained randomization across VMAs. That is, leak an address from
> > VMA 1, use the vulnerability to trigger a normally illegal access to
> > VMA 2, leak an address from VMA 2, repeat. Of course, the exploit
> > might take much longer depending on how far on the pointer chasing
> > chain the target is.
>
> Pointers between VMAs may also exist, for example libz.so needs to call
> open(), close(), malloc(), free() etc. from libc.so.
Indeed my example above assumed pointers between VMAs. At each step,
you would use a vulnerability to craft a counterfeit object around
existing pointers to other VMAs and move from there.

Note that without existing pointers between VMAs, you can still mount
similar attacks by crafting your own pointers to probe for other VMAs.
Since you'd be blindly probing the address space, you'd need some page
fault suppression mechanism to keep going. But branch misprediction a
la Spectre and similar can do the trick. See our recent BlindSide for
an example of such an attack against the kernel:
https://download.vusec.net/papers/blindside_ccs20.pdf.

>
> -Topi
>
> > Best,
> > Cristiano
> >
> > On Wed, Nov 18, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
> >>
> >> (added one of the AnC paper authors)
> >>
> >> On Tue, Nov 17, 2020 at 10:21:30PM +0200, Topi Miettinen wrote:
> >>> On 17.11.2020 18.54, Matthew Wilcox wrote:
> >>>> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
> >>>>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> >>>>> enables full randomization of memory mappings created with mmap(NULL,
> >>>>> ...). With 2, the base of the VMA used for such mappings is random,
> >>>>> but the mappings are created in predictable places within the VMA and
> >>>>> in sequential order. With 3, new VMAs are created to fully randomize
> >>>>> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> >>>>> even if not necessary.
> >>>>
> >>>> Is this worth it?
> >>>>
> >>>> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
> >>>
> >>> Thanks, very interesting. The paper presents an attack (AnC) which can break
> >>> ASLR even from JavaScript in browsers. In the process it compares the memory
> >>> allocators of Firefox and Chrome. Firefox relies on Linux mmap() to
> >>> randomize the memory location, but Chrome internally chooses the randomized
> >>> address. The paper doesn't present exact numbers to break ASLR for Chrome
> >>> case, but it seems to require more effort. Chrome also aggressively
> >>> randomizes the memory on each allocation, which seems to enable further
> >>> possibilities for AnC to probe the MMU tables.
> >>>
> >>> Disregarding the difference in aggressiveness of memory allocators, I think
> >>> with sysctl.kernel.randomize_va_space=3, the effort for breaking ASLR with
> >>> Firefox should be increased closer to Chrome case since mmap() will use the
> >>> address space more randomly.
> >>>
> >>> I have used this setting now for a month without any visible performance
> >>> issues, so I think the extra bits (for some additional effort to attackers)
> >>> are definitely worth the low cost.
> >>>
> >>> Furthermore, the paper does not describe in detail how the attack would
> >>> continue after breaking ASLR. Perhaps there are assumptions which are not
> >>> valid when the different memory areas are no longer sequential. For example,
> >>> if ASLR is initially broken wrt. the JIT buffer but continuing the attack
> >>> would require other locations to be determined (like stack, data segment for
> >>> main exe or libc etc), further efforts may be needed to resolve these
> >>> locations. With randomize_va_space=2, resolving any address (JIT buffer) can
> >>> reveal the addresses of many other memory areas but this is not the case
> >>> with 3.
> >>>
> >>> -Topi
> >>
> >> --
> >> Sincerely yours,
> >> Mike.
>
Topi Miettinen Nov. 20, 2020, 8:38 a.m. UTC | #9
On 20.11.2020 0.20, Cristiano Giuffrida wrote:
> On Thu, Nov 19, 2020 at 10:59 AM Topi Miettinen <toiwoton@gmail.com> wrote:
>>
>> On 18.11.2020 20.49, Cristiano Giuffrida wrote:
>>> Interesting mitigation and discussion!
>>>
>>> Regarding the impact on the AnC attack, indeed fine-grained (or full)
>>> mmap() randomization affects AnC in two ways: (i) it breaks the
>>> contiguity of the mmap() region, crippling the sliding primitive AnC
>>> relies on; (ii) it ensures an attacker leaking an address in a
>>> particular VMA can't easily infer addresses in other VMAs. So, in
>>> short, the mitigation does raise the bar against AnC-like attacks and
>>> I see this as a useful addition.
>>
>> In your paper the timing for Chrome attacks were not presented, which
>> would be interesting if they are comparable to the effect of
>> randomize_va_space=3 for Firefox. What's your estimate, how much slower
>> it was to break Chrome ASLR vs. Firefox/randomize_va_space=2?
> We did present entropy reduction over time for Chrome (see Fig. 8).
> But without a proper sliding primitive due to mmap() randomization, we
> stopped at 2 bits of residual entropy. Getting the last 2 bits is not
> impossible, but indeed slower. Not sure by how much without actually
> trying (as mentioned, you might also be able to use other side
> channels to compensate).
> 
> I forgot to mention that mmap() randomization actually makes attacks
> easier in cases where VMAs are not demand paged (see Section VI.B of
> the AnC paper), since proper sliding with nonrandomized mmap() would
> otherwise need to allocate too much memory.
> 
>>
>>> Indeed, we're aware some vendors implemented a similar randomization
>>> strategy in the browser as a mitigation against AnC.
>>>
>>> Nonetheless, some additional notes on the two points I raised above:
>>>
>>> - (i) [Sliding] Note that an attacker can do away with sliding
>>> depending on the randomization entropy and other available side
>>> channels. For instance, with the recent TagBleed, we show how to
>>> combine a TLB side channel with AnC to exhaust the KASLR entropy.
>>> However, similar attacks should be possible in userland, again
>>> depending on the randomization entropy used. See
>>> https://download.vusec.net/papers/tagbleed_eurosp20.pdf. Combining
>>> side channels with transient/speculative execution attacks can further
>>> lower the bar.
>>
>> I think the equivalent of randomize_va_space=3 for KASLR would be that
>> various kernel structures could be placed randomly with full use of all
>> bits in the hardware, instead of low numbers like 9, 10 or 15 bits.
>> Maybe also each module could be placed in individual random address
>> instead of stuffing all modules together and likewise, instead of single
>> page_offset_base, vmalloc_base and vmemmap_base, kernel would use the
>> full address space to place various internal structures. I suppose this
>> is not trivial.
> Indeed it's nontrivial to get similar randomization guarantees for the
> kernel. I mentioned TagBleed because similar combined AnC + TLB
> attacks should also be possible in the browser. We just happened to
> focus on the kernel with TagBleed.

Perhaps kernel objects could be also compiled as relocatable shared 
objects, like shared libraries for user applications, so that a they 
could be relocated independently away from the base address of main 
kernel. Also compiling the kernel with -mcmodel=large could allow 
various segments (code, rodata, data) to be located more freely. These 
would make the attacker to do more probing. Again, pointers between the 
objects may make these less useful.

> 
>>
>>> - (ii) [Leaks] Depending on the software vulnerability used for
>>> exploitation, it might not be difficult for an attacker to break
>>> fine-grained randomization across VMAs. That is, leak an address from
>>> VMA 1, use the vulnerability to trigger a normally illegal access to
>>> VMA 2, leak an address from VMA 2, repeat. Of course, the exploit
>>> might take much longer depending on how far on the pointer chasing
>>> chain the target is.
>>
>> Pointers between VMAs may also exist, for example libz.so needs to call
>> open(), close(), malloc(), free() etc. from libc.so.
> Indeed my example above assumed pointers between VMAs. At each step,
> you would use a vulnerability to craft a counterfeit object around
> existing pointers to other VMAs and move from there.
> 
> Note that without existing pointers between VMAs, you can still mount
> similar attacks by crafting your own pointers to probe for other VMAs.
> Since you'd be blindly probing the address space, you'd need some page
> fault suppression mechanism to keep going. But branch misprediction a
> la Spectre and similar can do the trick. See our recent BlindSide for
> an example of such an attack against the kernel:
> https://download.vusec.net/papers/blindside_ccs20.pdf.

In 6.3 the base address of kernel is probed in 0.7s. Wouldn't going from 
9 bits to 32 increase this to 2^21 * 0.7s = ~17 days?

Another mitigation could be to flush all caches on system call entry or 
exit. This would of course decrease performance, but maybe if this was 
done selectively only for critical system services and browsers (maybe 
even only for its JIT thread but not others), perhaps it could be more 
acceptable.

-Topi

> 
>>
>> -Topi
>>
>>> Best,
>>> Cristiano
>>>
>>> On Wed, Nov 18, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>>>>
>>>> (added one of the AnC paper authors)
>>>>
>>>> On Tue, Nov 17, 2020 at 10:21:30PM +0200, Topi Miettinen wrote:
>>>>> On 17.11.2020 18.54, Matthew Wilcox wrote:
>>>>>> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
>>>>>>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
>>>>>>> enables full randomization of memory mappings created with mmap(NULL,
>>>>>>> ...). With 2, the base of the VMA used for such mappings is random,
>>>>>>> but the mappings are created in predictable places within the VMA and
>>>>>>> in sequential order. With 3, new VMAs are created to fully randomize
>>>>>>> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
>>>>>>> even if not necessary.
>>>>>>
>>>>>> Is this worth it?
>>>>>>
>>>>>> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
>>>>>
>>>>> Thanks, very interesting. The paper presents an attack (AnC) which can break
>>>>> ASLR even from JavaScript in browsers. In the process it compares the memory
>>>>> allocators of Firefox and Chrome. Firefox relies on Linux mmap() to
>>>>> randomize the memory location, but Chrome internally chooses the randomized
>>>>> address. The paper doesn't present exact numbers to break ASLR for Chrome
>>>>> case, but it seems to require more effort. Chrome also aggressively
>>>>> randomizes the memory on each allocation, which seems to enable further
>>>>> possibilities for AnC to probe the MMU tables.
>>>>>
>>>>> Disregarding the difference in aggressiveness of memory allocators, I think
>>>>> with sysctl.kernel.randomize_va_space=3, the effort for breaking ASLR with
>>>>> Firefox should be increased closer to Chrome case since mmap() will use the
>>>>> address space more randomly.
>>>>>
>>>>> I have used this setting now for a month without any visible performance
>>>>> issues, so I think the extra bits (for some additional effort to attackers)
>>>>> are definitely worth the low cost.
>>>>>
>>>>> Furthermore, the paper does not describe in detail how the attack would
>>>>> continue after breaking ASLR. Perhaps there are assumptions which are not
>>>>> valid when the different memory areas are no longer sequential. For example,
>>>>> if ASLR is initially broken wrt. the JIT buffer but continuing the attack
>>>>> would require other locations to be determined (like stack, data segment for
>>>>> main exe or libc etc), further efforts may be needed to resolve these
>>>>> locations. With randomize_va_space=2, resolving any address (JIT buffer) can
>>>>> reveal the addresses of many other memory areas but this is not the case
>>>>> with 3.
>>>>>
>>>>> -Topi
>>>>
>>>> --
>>>> Sincerely yours,
>>>> Mike.
>>
Cristiano Giuffrida Nov. 20, 2020, 2:10 p.m. UTC | #10
On Fri, Nov 20, 2020 at 9:38 AM Topi Miettinen <toiwoton@gmail.com> wrote:
>
> On 20.11.2020 0.20, Cristiano Giuffrida wrote:
> > On Thu, Nov 19, 2020 at 10:59 AM Topi Miettinen <toiwoton@gmail.com> wrote:
> >>
> >> On 18.11.2020 20.49, Cristiano Giuffrida wrote:
> >>> Interesting mitigation and discussion!
> >>>
> >>> Regarding the impact on the AnC attack, indeed fine-grained (or full)
> >>> mmap() randomization affects AnC in two ways: (i) it breaks the
> >>> contiguity of the mmap() region, crippling the sliding primitive AnC
> >>> relies on; (ii) it ensures an attacker leaking an address in a
> >>> particular VMA can't easily infer addresses in other VMAs. So, in
> >>> short, the mitigation does raise the bar against AnC-like attacks and
> >>> I see this as a useful addition.
> >>
> >> In your paper the timing for Chrome attacks were not presented, which
> >> would be interesting if they are comparable to the effect of
> >> randomize_va_space=3 for Firefox. What's your estimate, how much slower
> >> it was to break Chrome ASLR vs. Firefox/randomize_va_space=2?
> > We did present entropy reduction over time for Chrome (see Fig. 8).
> > But without a proper sliding primitive due to mmap() randomization, we
> > stopped at 2 bits of residual entropy. Getting the last 2 bits is not
> > impossible, but indeed slower. Not sure by how much without actually
> > trying (as mentioned, you might also be able to use other side
> > channels to compensate).
> >
> > I forgot to mention that mmap() randomization actually makes attacks
> > easier in cases where VMAs are not demand paged (see Section VI.B of
> > the AnC paper), since proper sliding with nonrandomized mmap() would
> > otherwise need to allocate too much memory.
> >
> >>
> >>> Indeed, we're aware some vendors implemented a similar randomization
> >>> strategy in the browser as a mitigation against AnC.
> >>>
> >>> Nonetheless, some additional notes on the two points I raised above:
> >>>
> >>> - (i) [Sliding] Note that an attacker can do away with sliding
> >>> depending on the randomization entropy and other available side
> >>> channels. For instance, with the recent TagBleed, we show how to
> >>> combine a TLB side channel with AnC to exhaust the KASLR entropy.
> >>> However, similar attacks should be possible in userland, again
> >>> depending on the randomization entropy used. See
> >>> https://download.vusec.net/papers/tagbleed_eurosp20.pdf. Combining
> >>> side channels with transient/speculative execution attacks can further
> >>> lower the bar.
> >>
> >> I think the equivalent of randomize_va_space=3 for KASLR would be that
> >> various kernel structures could be placed randomly with full use of all
> >> bits in the hardware, instead of low numbers like 9, 10 or 15 bits.
> >> Maybe also each module could be placed in individual random address
> >> instead of stuffing all modules together and likewise, instead of single
> >> page_offset_base, vmalloc_base and vmemmap_base, kernel would use the
> >> full address space to place various internal structures. I suppose this
> >> is not trivial.
> > Indeed it's nontrivial to get similar randomization guarantees for the
> > kernel. I mentioned TagBleed because similar combined AnC + TLB
> > attacks should also be possible in the browser. We just happened to
> > focus on the kernel with TagBleed.
>
> Perhaps kernel objects could be also compiled as relocatable shared
> objects, like shared libraries for user applications, so that a they
> could be relocated independently away from the base address of main
> kernel. Also compiling the kernel with -mcmodel=large could allow
> various segments (code, rodata, data) to be located more freely. These
> would make the attacker to do more probing. Again, pointers between the
> objects may make these less useful.
>
> >
> >>
> >>> - (ii) [Leaks] Depending on the software vulnerability used for
> >>> exploitation, it might not be difficult for an attacker to break
> >>> fine-grained randomization across VMAs. That is, leak an address from
> >>> VMA 1, use the vulnerability to trigger a normally illegal access to
> >>> VMA 2, leak an address from VMA 2, repeat. Of course, the exploit
> >>> might take much longer depending on how far on the pointer chasing
> >>> chain the target is.
> >>
> >> Pointers between VMAs may also exist, for example libz.so needs to call
> >> open(), close(), malloc(), free() etc. from libc.so.
> > Indeed my example above assumed pointers between VMAs. At each step,
> > you would use a vulnerability to craft a counterfeit object around
> > existing pointers to other VMAs and move from there.
> >
> > Note that without existing pointers between VMAs, you can still mount
> > similar attacks by crafting your own pointers to probe for other VMAs.
> > Since you'd be blindly probing the address space, you'd need some page
> > fault suppression mechanism to keep going. But branch misprediction a
> > la Spectre and similar can do the trick. See our recent BlindSide for
> > an example of such an attack against the kernel:
> > https://download.vusec.net/papers/blindside_ccs20.pdf.
>
> In 6.3 the base address of kernel is probed in 0.7s. Wouldn't going from
> 9 bits to 32 increase this to 2^21 * 0.7s = ~17 days?
In general, increasing the entropy can make the attack much more
difficult to complete in bounded time, yes. However:
- The time to complete a single probe is inherently
vulnerability-specific and the probe we had was not particularly
efficient.
- We didn't really look at optimizations to speed things up, such as
batching multiple probes in a single syscall.
- If you're probing in the browser rather than in the kernel, you
might be able to craft more efficient probes and also more easily fill
up the address space with objects you want to probe for to reduce the
entropy. See our thread spraying paper for an example:
https://www.usenix.net/system/files/conference/usenixsecurity16/sec16_paper_goktas.pdf

>
> Another mitigation could be to flush all caches on system call entry or
> exit. This would of course decrease performance, but maybe if this was
> done selectively only for critical system services and browsers (maybe
> even only for its JIT thread but not others), perhaps it could be more
> acceptable.
Right. Something to keep in mind with these attacks is that flushing
the caches only cripples one particular (although the most common)
kind of covert channel to leak information. But an attacker could in
principle switch to other microarchitectural side effects and covert
channels. See SMoTherSpectre for an example.

>
> -Topi
>
> >
> >>
> >> -Topi
> >>
> >>> Best,
> >>> Cristiano
> >>>
> >>> On Wed, Nov 18, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
> >>>>
> >>>> (added one of the AnC paper authors)
> >>>>
> >>>> On Tue, Nov 17, 2020 at 10:21:30PM +0200, Topi Miettinen wrote:
> >>>>> On 17.11.2020 18.54, Matthew Wilcox wrote:
> >>>>>> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
> >>>>>>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> >>>>>>> enables full randomization of memory mappings created with mmap(NULL,
> >>>>>>> ...). With 2, the base of the VMA used for such mappings is random,
> >>>>>>> but the mappings are created in predictable places within the VMA and
> >>>>>>> in sequential order. With 3, new VMAs are created to fully randomize
> >>>>>>> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> >>>>>>> even if not necessary.
> >>>>>>
> >>>>>> Is this worth it?
> >>>>>>
> >>>>>> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
> >>>>>
> >>>>> Thanks, very interesting. The paper presents an attack (AnC) which can break
> >>>>> ASLR even from JavaScript in browsers. In the process it compares the memory
> >>>>> allocators of Firefox and Chrome. Firefox relies on Linux mmap() to
> >>>>> randomize the memory location, but Chrome internally chooses the randomized
> >>>>> address. The paper doesn't present exact numbers to break ASLR for Chrome
> >>>>> case, but it seems to require more effort. Chrome also aggressively
> >>>>> randomizes the memory on each allocation, which seems to enable further
> >>>>> possibilities for AnC to probe the MMU tables.
> >>>>>
> >>>>> Disregarding the difference in aggressiveness of memory allocators, I think
> >>>>> with sysctl.kernel.randomize_va_space=3, the effort for breaking ASLR with
> >>>>> Firefox should be increased closer to Chrome case since mmap() will use the
> >>>>> address space more randomly.
> >>>>>
> >>>>> I have used this setting now for a month without any visible performance
> >>>>> issues, so I think the extra bits (for some additional effort to attackers)
> >>>>> are definitely worth the low cost.
> >>>>>
> >>>>> Furthermore, the paper does not describe in detail how the attack would
> >>>>> continue after breaking ASLR. Perhaps there are assumptions which are not
> >>>>> valid when the different memory areas are no longer sequential. For example,
> >>>>> if ASLR is initially broken wrt. the JIT buffer but continuing the attack
> >>>>> would require other locations to be determined (like stack, data segment for
> >>>>> main exe or libc etc), further efforts may be needed to resolve these
> >>>>> locations. With randomize_va_space=2, resolving any address (JIT buffer) can
> >>>>> reveal the addresses of many other memory areas but this is not the case
> >>>>> with 3.
> >>>>>
> >>>>> -Topi
> >>>>
> >>>> --
> >>>> Sincerely yours,
> >>>> Mike.
> >>
>
Matthew Wilcox Nov. 20, 2020, 3:27 p.m. UTC | #11
On Fri, Nov 20, 2020 at 10:38:21AM +0200, Topi Miettinen wrote:
> On 20.11.2020 0.20, Cristiano Giuffrida wrote:
> > Indeed it's nontrivial to get similar randomization guarantees for the
> > kernel. I mentioned TagBleed because similar combined AnC + TLB
> > attacks should also be possible in the browser. We just happened to
> > focus on the kernel with TagBleed.
> 
> Perhaps kernel objects could be also compiled as relocatable shared objects,
> like shared libraries for user applications, so that a they could be
> relocated independently away from the base address of main kernel. Also
> compiling the kernel with -mcmodel=large could allow various segments (code,
> rodata, data) to be located more freely. These would make the attacker to do
> more probing. Again, pointers between the objects may make these less
> useful.

They are relocatable shared objects.  They're loaded into the vmalloc
area on some architectures but x86 has a special MODULES_VADDR region.
Maybe just jumbling them into the general vmalloc address range would be
beneficial from a security point of view?  I suspect it's not all
that useful because most modules are loaded early on.

We seem to have randomness mixed into the vmalloc allocations with
DEBUG_AUGMENT_LOWEST_MATCH_CHECK, but there doesn't seem to be an
ASLR option to vmalloc ... Uladzislau?
Topi Miettinen Nov. 20, 2020, 7:37 p.m. UTC | #12
On 20.11.2020 16.10, Cristiano Giuffrida wrote:
> On Fri, Nov 20, 2020 at 9:38 AM Topi Miettinen <toiwoton@gmail.com> wrote:
>>
>> On 20.11.2020 0.20, Cristiano Giuffrida wrote:
>>> On Thu, Nov 19, 2020 at 10:59 AM Topi Miettinen <toiwoton@gmail.com> wrote:
>>>>
>>>> On 18.11.2020 20.49, Cristiano Giuffrida wrote:
>>>>> Interesting mitigation and discussion!
>>>>>
>>>>> Regarding the impact on the AnC attack, indeed fine-grained (or full)
>>>>> mmap() randomization affects AnC in two ways: (i) it breaks the
>>>>> contiguity of the mmap() region, crippling the sliding primitive AnC
>>>>> relies on; (ii) it ensures an attacker leaking an address in a
>>>>> particular VMA can't easily infer addresses in other VMAs. So, in
>>>>> short, the mitigation does raise the bar against AnC-like attacks and
>>>>> I see this as a useful addition.
>>>>
>>>> In your paper the timing for Chrome attacks were not presented, which
>>>> would be interesting if they are comparable to the effect of
>>>> randomize_va_space=3 for Firefox. What's your estimate, how much slower
>>>> it was to break Chrome ASLR vs. Firefox/randomize_va_space=2?
>>> We did present entropy reduction over time for Chrome (see Fig. 8).
>>> But without a proper sliding primitive due to mmap() randomization, we
>>> stopped at 2 bits of residual entropy. Getting the last 2 bits is not
>>> impossible, but indeed slower. Not sure by how much without actually
>>> trying (as mentioned, you might also be able to use other side
>>> channels to compensate).
>>>
>>> I forgot to mention that mmap() randomization actually makes attacks
>>> easier in cases where VMAs are not demand paged (see Section VI.B of
>>> the AnC paper), since proper sliding with nonrandomized mmap() would
>>> otherwise need to allocate too much memory.
>>>
>>>>
>>>>> Indeed, we're aware some vendors implemented a similar randomization
>>>>> strategy in the browser as a mitigation against AnC.
>>>>>
>>>>> Nonetheless, some additional notes on the two points I raised above:
>>>>>
>>>>> - (i) [Sliding] Note that an attacker can do away with sliding
>>>>> depending on the randomization entropy and other available side
>>>>> channels. For instance, with the recent TagBleed, we show how to
>>>>> combine a TLB side channel with AnC to exhaust the KASLR entropy.
>>>>> However, similar attacks should be possible in userland, again
>>>>> depending on the randomization entropy used. See
>>>>> https://download.vusec.net/papers/tagbleed_eurosp20.pdf. Combining
>>>>> side channels with transient/speculative execution attacks can further
>>>>> lower the bar.
>>>>
>>>> I think the equivalent of randomize_va_space=3 for KASLR would be that
>>>> various kernel structures could be placed randomly with full use of all
>>>> bits in the hardware, instead of low numbers like 9, 10 or 15 bits.
>>>> Maybe also each module could be placed in individual random address
>>>> instead of stuffing all modules together and likewise, instead of single
>>>> page_offset_base, vmalloc_base and vmemmap_base, kernel would use the
>>>> full address space to place various internal structures. I suppose this
>>>> is not trivial.
>>> Indeed it's nontrivial to get similar randomization guarantees for the
>>> kernel. I mentioned TagBleed because similar combined AnC + TLB
>>> attacks should also be possible in the browser. We just happened to
>>> focus on the kernel with TagBleed.
>>
>> Perhaps kernel objects could be also compiled as relocatable shared
>> objects, like shared libraries for user applications, so that a they
>> could be relocated independently away from the base address of main
>> kernel. Also compiling the kernel with -mcmodel=large could allow
>> various segments (code, rodata, data) to be located more freely. These
>> would make the attacker to do more probing. Again, pointers between the
>> objects may make these less useful.
>>
>>>
>>>>
>>>>> - (ii) [Leaks] Depending on the software vulnerability used for
>>>>> exploitation, it might not be difficult for an attacker to break
>>>>> fine-grained randomization across VMAs. That is, leak an address from
>>>>> VMA 1, use the vulnerability to trigger a normally illegal access to
>>>>> VMA 2, leak an address from VMA 2, repeat. Of course, the exploit
>>>>> might take much longer depending on how far on the pointer chasing
>>>>> chain the target is.
>>>>
>>>> Pointers between VMAs may also exist, for example libz.so needs to call
>>>> open(), close(), malloc(), free() etc. from libc.so.
>>> Indeed my example above assumed pointers between VMAs. At each step,
>>> you would use a vulnerability to craft a counterfeit object around
>>> existing pointers to other VMAs and move from there.
>>>
>>> Note that without existing pointers between VMAs, you can still mount
>>> similar attacks by crafting your own pointers to probe for other VMAs.
>>> Since you'd be blindly probing the address space, you'd need some page
>>> fault suppression mechanism to keep going. But branch misprediction a
>>> la Spectre and similar can do the trick. See our recent BlindSide for
>>> an example of such an attack against the kernel:
>>> https://download.vusec.net/papers/blindside_ccs20.pdf.
>>
>> In 6.3 the base address of kernel is probed in 0.7s. Wouldn't going from
>> 9 bits to 32 increase this to 2^21 * 0.7s = ~17 days?
> In general, increasing the entropy can make the attack much more
> difficult to complete in bounded time, yes. However:
> - The time to complete a single probe is inherently
> vulnerability-specific and the probe we had was not particularly
> efficient.
> - We didn't really look at optimizations to speed things up, such as
> batching multiple probes in a single syscall.
> - If you're probing in the browser rather than in the kernel, you
> might be able to craft more efficient probes and also more easily fill
> up the address space with objects you want to probe for to reduce the
> entropy. See our thread spraying paper for an example:
> https://www.usenix.net/system/files/conference/usenixsecurity16/sec16_paper_goktas.pdf

Can thread spraying (which allocates lots of large memory blocks) be 
caught by limiting the total address space used by the process via 
cgroup controls and resource limits (for example, systemd directives 
MemoryMax= and LimitAS=)?

(Reading this pre-Spectre paper gave me the same feeling as looking at 
some pre-Covid stuff: the world was so much simpler back then.)

-Topi

> 
>>
>> Another mitigation could be to flush all caches on system call entry or
>> exit. This would of course decrease performance, but maybe if this was
>> done selectively only for critical system services and browsers (maybe
>> even only for its JIT thread but not others), perhaps it could be more
>> acceptable.
> Right. Something to keep in mind with these attacks is that flushing
> the caches only cripples one particular (although the most common)
> kind of covert channel to leak information. But an attacker could in
> principle switch to other microarchitectural side effects and covert
> channels. See SMoTherSpectre for an example.
> 
>>
>> -Topi
>>
>>>
>>>>
>>>> -Topi
>>>>
>>>>> Best,
>>>>> Cristiano
>>>>>
>>>>> On Wed, Nov 18, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>>>>>>
>>>>>> (added one of the AnC paper authors)
>>>>>>
>>>>>> On Tue, Nov 17, 2020 at 10:21:30PM +0200, Topi Miettinen wrote:
>>>>>>> On 17.11.2020 18.54, Matthew Wilcox wrote:
>>>>>>>> On Mon, Oct 26, 2020 at 06:05:18PM +0200, Topi Miettinen wrote:
>>>>>>>>> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
>>>>>>>>> enables full randomization of memory mappings created with mmap(NULL,
>>>>>>>>> ...). With 2, the base of the VMA used for such mappings is random,
>>>>>>>>> but the mappings are created in predictable places within the VMA and
>>>>>>>>> in sequential order. With 3, new VMAs are created to fully randomize
>>>>>>>>> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
>>>>>>>>> even if not necessary.
>>>>>>>>
>>>>>>>> Is this worth it?
>>>>>>>>
>>>>>>>> https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/aslrcache-practical-cache-attacks-mmu/
>>>>>>>
>>>>>>> Thanks, very interesting. The paper presents an attack (AnC) which can break
>>>>>>> ASLR even from JavaScript in browsers. In the process it compares the memory
>>>>>>> allocators of Firefox and Chrome. Firefox relies on Linux mmap() to
>>>>>>> randomize the memory location, but Chrome internally chooses the randomized
>>>>>>> address. The paper doesn't present exact numbers to break ASLR for Chrome
>>>>>>> case, but it seems to require more effort. Chrome also aggressively
>>>>>>> randomizes the memory on each allocation, which seems to enable further
>>>>>>> possibilities for AnC to probe the MMU tables.
>>>>>>>
>>>>>>> Disregarding the difference in aggressiveness of memory allocators, I think
>>>>>>> with sysctl.kernel.randomize_va_space=3, the effort for breaking ASLR with
>>>>>>> Firefox should be increased closer to Chrome case since mmap() will use the
>>>>>>> address space more randomly.
>>>>>>>
>>>>>>> I have used this setting now for a month without any visible performance
>>>>>>> issues, so I think the extra bits (for some additional effort to attackers)
>>>>>>> are definitely worth the low cost.
>>>>>>>
>>>>>>> Furthermore, the paper does not describe in detail how the attack would
>>>>>>> continue after breaking ASLR. Perhaps there are assumptions which are not
>>>>>>> valid when the different memory areas are no longer sequential. For example,
>>>>>>> if ASLR is initially broken wrt. the JIT buffer but continuing the attack
>>>>>>> would require other locations to be determined (like stack, data segment for
>>>>>>> main exe or libc etc), further efforts may be needed to resolve these
>>>>>>> locations. With randomize_va_space=2, resolving any address (JIT buffer) can
>>>>>>> reveal the addresses of many other memory areas but this is not the case
>>>>>>> with 3.
>>>>>>>
>>>>>>> -Topi
>>>>>>
>>>>>> --
>>>>>> Sincerely yours,
>>>>>> Mike.
>>>>
>>
Vlastimil Babka Nov. 24, 2020, 6:27 p.m. UTC | #13
Please CC linux-api on future versions.

On 10/26/20 5:05 PM, Topi Miettinen wrote:
> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> enables full randomization of memory mappings created with mmap(NULL,
> ...). With 2, the base of the VMA used for such mappings is random,
> but the mappings are created in predictable places within the VMA and
> in sequential order. With 3, new VMAs are created to fully randomize
> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> even if not necessary.
> 
> The method is to randomize the new address without considering
> VMAs. If the address fails checks because of overlap with the stack
> area (or in case of mremap(), overlap with the old mapping), the
> operation is retried a few times before falling back to old method.
> 
> On 32 bit systems this may cause problems due to increased VM
> fragmentation if the address space gets crowded.
> 
> On all systems, it will reduce performance and increase memory
> usage due to less efficient use of page tables and inability to
> merge adjacent VMAs with compatible attributes.
> 
> In this example with value of 2, dynamic loader, libc, anonymous
> memory reserved with mmap() and locale-archive are located close to
> each other:
> 
> $ cat /proc/self/maps (only first line for each object shown for brevity)
> 58c1175b1000-58c1175b3000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
> 79752ec17000-79752f179000 r--p 00000000 fe:0c 2473999                    /usr/lib/locale/locale-archive
> 79752f179000-79752f279000 rw-p 00000000 00:00 0
> 79752f279000-79752f29e000 r--p 00000000 fe:0c 2402415                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
> 79752f43a000-79752f440000 rw-p 00000000 00:00 0
> 79752f46f000-79752f470000 r--p 00000000 fe:0c 2400484                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
> 79752f49b000-79752f49c000 rw-p 00000000 00:00 0
> 7ffdcad9e000-7ffdcadbf000 rw-p 00000000 00:00 0                          [stack]
> 7ffdcadd2000-7ffdcadd6000 r--p 00000000 00:00 0                          [vvar]
> 7ffdcadd6000-7ffdcadd8000 r-xp 00000000 00:00 0                          [vdso]
> 
> With 3, they are located at unrelated addresses:
> $ echo 3 > /proc/sys/kernel/randomize_va_space
> $ cat /proc/self/maps (only first line for each object shown for brevity)
> 1206a8fa000-1206a8fb000 r--p 00000000 fe:0c 2400484                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
> 1206a926000-1206a927000 rw-p 00000000 00:00 0
> 19174173000-19174175000 rw-p 00000000 00:00 0
> ac82f419000-ac82f519000 rw-p 00000000 00:00 0
> afa66a42000-afa66fa4000 r--p 00000000 fe:0c 2473999                      /usr/lib/locale/locale-archive
> d8656ba9000-d8656bce000 r--p 00000000 fe:0c 2402415                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
> d8656d6a000-d8656d6e000 rw-p 00000000 00:00 0
> 5df90b712000-5df90b714000 r--p 00000000 fe:0c 1868624                    /usr/bin/cat
> 7ffe1be4c000-7ffe1be6d000 rw-p 00000000 00:00 0                          [stack]
> 7ffe1bf07000-7ffe1bf0b000 r--p 00000000 00:00 0                          [vvar]
> 7ffe1bf0b000-7ffe1bf0d000 r-xp 00000000 00:00 0                          [vdso]
> 
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Jann Horn <jannh@google.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Matthew Wilcox <willy@infradead.org>
> CC: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
> ---
> v2: also randomize mremap(..., MREMAP_MAYMOVE)
> v3: avoid stack area and retry in case of bad random address (Jann
> Horn), improve description in kernel.rst (Matthew Wilcox)
> v4: use /proc/$pid/maps in the example (Mike Rapaport), CCs (Andrew
> Morton), only check randomize_va_space == 3
> ---
>   Documentation/admin-guide/hw-vuln/spectre.rst |  6 ++--
>   Documentation/admin-guide/sysctl/kernel.rst   | 15 ++++++++++
>   init/Kconfig                                  |  2 +-
>   mm/internal.h                                 |  8 +++++
>   mm/mmap.c                                     | 30 +++++++++++++------
>   mm/mremap.c                                   | 27 +++++++++++++++++
>   6 files changed, 75 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
> index e05e581af5cf..9ea250522077 100644
> --- a/Documentation/admin-guide/hw-vuln/spectre.rst
> +++ b/Documentation/admin-guide/hw-vuln/spectre.rst
> @@ -254,7 +254,7 @@ Spectre variant 2
>      left by the previous process will also be cleared.
>   
>      User programs should use address space randomization to make attacks
> -   more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
> +   more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3).
>   
>   3. A virtualized guest attacking the host
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> @@ -499,8 +499,8 @@ Spectre variant 2
>      more overhead and run slower.
>   
>      User programs should use address space randomization
> -   (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
> -   difficult.
> +   (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks
> +   more difficult.
>   
>   3. VM mitigation
>   ^^^^^^^^^^^^^^^^
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d4b32cc32bb7..bc3bb74d544d 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1060,6 +1060,21 @@ that support this feature.
>       Systems with ancient and/or broken binaries should be configured
>       with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
>       address space randomization.
> +
> +3   Additionally enable full randomization of memory mappings created
> +    with mmap(NULL, ...). With 2, the base of the VMA used for such
> +    mappings is random, but the mappings are created in predictable
> +    places within the VMA and in sequential order. With 3, new VMAs
> +    are created to fully randomize the mappings. Also mremap(...,
> +    MREMAP_MAYMOVE) will move the mappings even if not necessary.
> +
> +    On 32 bit systems this may cause problems due to increased VM
> +    fragmentation if the address space gets crowded.
> +
> +    On all systems, it will reduce performance and increase memory
> +    usage due to less efficient use of page tables and inability to
> +    merge adjacent VMAs with compatible attributes.
> +
>   ==  ===========================================================================
>   
>   
> diff --git a/init/Kconfig b/init/Kconfig
> index c9446911cf41..6146e2cd3b77 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1863,7 +1863,7 @@ config COMPAT_BRK
>   	  also breaks ancient binaries (including anything libc5 based).
>   	  This option changes the bootup default to heap randomization
>   	  disabled, and can be overridden at runtime by setting
> -	  /proc/sys/kernel/randomize_va_space to 2.
> +	  /proc/sys/kernel/randomize_va_space to 2 or 3.
>   
>   	  On non-ancient distros (post-2000 ones) N is usually a safe choice.
>   
> diff --git a/mm/internal.h b/mm/internal.h
> index c43ccdddb0f6..b964c8dbb242 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -618,4 +618,12 @@ struct migration_target_control {
>   	gfp_t gfp_mask;
>   };
>   
> +#ifndef arch_get_mmap_end
> +#define arch_get_mmap_end(addr)	(TASK_SIZE)
> +#endif
> +
> +#ifndef arch_get_mmap_base
> +#define arch_get_mmap_base(addr, base) (base)
> +#endif
> +
>   #endif	/* __MM_INTERNAL_H */
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d91ecb00d38c..3677491e999b 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -47,6 +47,7 @@
>   #include <linux/pkeys.h>
>   #include <linux/oom.h>
>   #include <linux/sched/mm.h>
> +#include <linux/elf-randomize.h>
>   
>   #include <linux/uaccess.h>
>   #include <asm/cacheflush.h>
> @@ -73,6 +74,8 @@ const int mmap_rnd_compat_bits_max = CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX;
>   int mmap_rnd_compat_bits __read_mostly = CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
>   #endif
>   
> +#define MAX_RANDOM_MMAP_RETRIES			5
> +
>   static bool ignore_rlimit_data;
>   core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
>   
> @@ -206,7 +209,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>   #ifdef CONFIG_COMPAT_BRK
>   	/*
>   	 * CONFIG_COMPAT_BRK can still be overridden by setting
> -	 * randomize_va_space to 2, which will still cause mm->start_brk
> +	 * randomize_va_space to >= 2, which will still cause mm->start_brk
>   	 * to be arbitrarily shifted
>   	 */
>   	if (current->brk_randomized)
> @@ -1445,6 +1448,23 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>   	if (mm->map_count > sysctl_max_map_count)
>   		return -ENOMEM;
>   
> +	/* Pick a random address even outside current VMAs? */
> +	if (!addr && randomize_va_space == 3) {
> +		int i = MAX_RANDOM_MMAP_RETRIES;
> +		unsigned long max_addr = arch_get_mmap_base(addr, mm->mmap_base);
> +
> +		do {
> +			/* Try a few times to find a free area */
> +			addr = arch_mmap_rnd();
> +			if (addr >= max_addr)
> +				continue;
> +			addr = get_unmapped_area(file, addr, len, pgoff, flags);
> +		} while (--i >= 0 && !IS_ERR_VALUE(addr));
> +
> +		if (IS_ERR_VALUE(addr))
> +			addr = 0;
> +	}
> +
>   	/* Obtain the address to map to. we verify (or select) it and ensure
>   	 * that it represents a valid section of the address space.
>   	 */
> @@ -2142,14 +2162,6 @@ unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info)
>   	return addr;
>   }
>   
> -#ifndef arch_get_mmap_end
> -#define arch_get_mmap_end(addr)	(TASK_SIZE)
> -#endif
> -
> -#ifndef arch_get_mmap_base
> -#define arch_get_mmap_base(addr, base) (base)
> -#endif
> -
>   /* Get an address range which is currently unmapped.
>    * For shmat() with addr=0.
>    *
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 138abbae4f75..c5b2ed2bfd2d 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -24,12 +24,15 @@
>   #include <linux/uaccess.h>
>   #include <linux/mm-arch-hooks.h>
>   #include <linux/userfaultfd_k.h>
> +#include <linux/elf-randomize.h>
>   
>   #include <asm/cacheflush.h>
>   #include <asm/tlbflush.h>
>   
>   #include "internal.h"
>   
> +#define MAX_RANDOM_MREMAP_RETRIES		5
> +
>   static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
>   {
>   	pgd_t *pgd;
> @@ -720,6 +723,30 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>   		goto out;
>   	}
>   
> +	if ((flags & MREMAP_MAYMOVE) && randomize_va_space == 3) {
> +		/*
> +		 * Caller is happy with a different address, so let's
> +		 * move even if not necessary!
> +		 */
> +		int i = MAX_RANDOM_MREMAP_RETRIES;
> +		unsigned long max_addr = arch_get_mmap_base(addr, mm->mmap_base);
> +
> +		do {
> +			/* Try a few times to find a free area */
> +			new_addr = arch_mmap_rnd();
> +			if (new_addr >= max_addr)
> +				continue;
> +			ret = mremap_to(addr, old_len, new_addr, new_len,
> +					&locked, flags, &uf, &uf_unmap_early,
> +					&uf_unmap);
> +			if (!IS_ERR_VALUE(ret))
> +				goto out;
> +		} while (--i >= 0);
> +
> +		/* Give up and try the old address */
> +		new_addr = addr;
> +	}
> +
>   	/*
>   	 * Always allow a shrinking remap: that just unmaps
>   	 * the unnecessary pages..
> 
> base-commit: 3650b228f83adda7e5ee532e2b90429c03f7b9ec
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index e05e581af5cf..9ea250522077 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -254,7 +254,7 @@  Spectre variant 2
    left by the previous process will also be cleared.
 
    User programs should use address space randomization to make attacks
-   more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
+   more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3).
 
 3. A virtualized guest attacking the host
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -499,8 +499,8 @@  Spectre variant 2
    more overhead and run slower.
 
    User programs should use address space randomization
-   (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
-   difficult.
+   (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks
+   more difficult.
 
 3. VM mitigation
 ^^^^^^^^^^^^^^^^
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index d4b32cc32bb7..bc3bb74d544d 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1060,6 +1060,21 @@  that support this feature.
     Systems with ancient and/or broken binaries should be configured
     with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
     address space randomization.
+
+3   Additionally enable full randomization of memory mappings created
+    with mmap(NULL, ...). With 2, the base of the VMA used for such
+    mappings is random, but the mappings are created in predictable
+    places within the VMA and in sequential order. With 3, new VMAs
+    are created to fully randomize the mappings. Also mremap(...,
+    MREMAP_MAYMOVE) will move the mappings even if not necessary.
+
+    On 32 bit systems this may cause problems due to increased VM
+    fragmentation if the address space gets crowded.
+
+    On all systems, it will reduce performance and increase memory
+    usage due to less efficient use of page tables and inability to
+    merge adjacent VMAs with compatible attributes.
+
 ==  ===========================================================================
 
 
diff --git a/init/Kconfig b/init/Kconfig
index c9446911cf41..6146e2cd3b77 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1863,7 +1863,7 @@  config COMPAT_BRK
 	  also breaks ancient binaries (including anything libc5 based).
 	  This option changes the bootup default to heap randomization
 	  disabled, and can be overridden at runtime by setting
-	  /proc/sys/kernel/randomize_va_space to 2.
+	  /proc/sys/kernel/randomize_va_space to 2 or 3.
 
 	  On non-ancient distros (post-2000 ones) N is usually a safe choice.
 
diff --git a/mm/internal.h b/mm/internal.h
index c43ccdddb0f6..b964c8dbb242 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -618,4 +618,12 @@  struct migration_target_control {
 	gfp_t gfp_mask;
 };
 
+#ifndef arch_get_mmap_end
+#define arch_get_mmap_end(addr)	(TASK_SIZE)
+#endif
+
+#ifndef arch_get_mmap_base
+#define arch_get_mmap_base(addr, base) (base)
+#endif
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index d91ecb00d38c..3677491e999b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -47,6 +47,7 @@ 
 #include <linux/pkeys.h>
 #include <linux/oom.h>
 #include <linux/sched/mm.h>
+#include <linux/elf-randomize.h>
 
 #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
@@ -73,6 +74,8 @@  const int mmap_rnd_compat_bits_max = CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX;
 int mmap_rnd_compat_bits __read_mostly = CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
 #endif
 
+#define MAX_RANDOM_MMAP_RETRIES			5
+
 static bool ignore_rlimit_data;
 core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
 
@@ -206,7 +209,7 @@  SYSCALL_DEFINE1(brk, unsigned long, brk)
 #ifdef CONFIG_COMPAT_BRK
 	/*
 	 * CONFIG_COMPAT_BRK can still be overridden by setting
-	 * randomize_va_space to 2, which will still cause mm->start_brk
+	 * randomize_va_space to >= 2, which will still cause mm->start_brk
 	 * to be arbitrarily shifted
 	 */
 	if (current->brk_randomized)
@@ -1445,6 +1448,23 @@  unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (mm->map_count > sysctl_max_map_count)
 		return -ENOMEM;
 
+	/* Pick a random address even outside current VMAs? */
+	if (!addr && randomize_va_space == 3) {
+		int i = MAX_RANDOM_MMAP_RETRIES;
+		unsigned long max_addr = arch_get_mmap_base(addr, mm->mmap_base);
+
+		do {
+			/* Try a few times to find a free area */
+			addr = arch_mmap_rnd();
+			if (addr >= max_addr)
+				continue;
+			addr = get_unmapped_area(file, addr, len, pgoff, flags);
+		} while (--i >= 0 && !IS_ERR_VALUE(addr));
+
+		if (IS_ERR_VALUE(addr))
+			addr = 0;
+	}
+
 	/* Obtain the address to map to. we verify (or select) it and ensure
 	 * that it represents a valid section of the address space.
 	 */
@@ -2142,14 +2162,6 @@  unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info)
 	return addr;
 }
 
-#ifndef arch_get_mmap_end
-#define arch_get_mmap_end(addr)	(TASK_SIZE)
-#endif
-
-#ifndef arch_get_mmap_base
-#define arch_get_mmap_base(addr, base) (base)
-#endif
-
 /* Get an address range which is currently unmapped.
  * For shmat() with addr=0.
  *
diff --git a/mm/mremap.c b/mm/mremap.c
index 138abbae4f75..c5b2ed2bfd2d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -24,12 +24,15 @@ 
 #include <linux/uaccess.h>
 #include <linux/mm-arch-hooks.h>
 #include <linux/userfaultfd_k.h>
+#include <linux/elf-randomize.h>
 
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 
 #include "internal.h"
 
+#define MAX_RANDOM_MREMAP_RETRIES		5
+
 static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
 {
 	pgd_t *pgd;
@@ -720,6 +723,30 @@  SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 		goto out;
 	}
 
+	if ((flags & MREMAP_MAYMOVE) && randomize_va_space == 3) {
+		/*
+		 * Caller is happy with a different address, so let's
+		 * move even if not necessary!
+		 */
+		int i = MAX_RANDOM_MREMAP_RETRIES;
+		unsigned long max_addr = arch_get_mmap_base(addr, mm->mmap_base);
+
+		do {
+			/* Try a few times to find a free area */
+			new_addr = arch_mmap_rnd();
+			if (new_addr >= max_addr)
+				continue;
+			ret = mremap_to(addr, old_len, new_addr, new_len,
+					&locked, flags, &uf, &uf_unmap_early,
+					&uf_unmap);
+			if (!IS_ERR_VALUE(ret))
+				goto out;
+		} while (--i >= 0);
+
+		/* Give up and try the old address */
+		new_addr = addr;
+	}
+
 	/*
 	 * Always allow a shrinking remap: that just unmaps
 	 * the unnecessary pages..