[0/3] mm: Randomize free memory

Message ID	153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Subject: [PATCH 0/3] mm: Randomize free memory From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Cc: Michal Hocko <mhocko@suse.com>, Dave Hansen <dave.hansen@linux.intel.com>, Kees Cook <keescook@chromium.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Sat, 15 Sep 2018 09:23:02 -0700 Message-ID: <153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: Randomize free memory \| expand [0/3] mm: Randomize free memory [1/3] mm: Shuffle initial free memory [2/3] mm: Move buddy list manipulations into helpers [3/3] mm: Maintain randomization of page free lists

Dan Williams Sept. 15, 2018, 4:23 p.m. UTC

Data exfiltration attacks via speculative execution and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects. The kernel page allocator, has
predictable first-in-first-out behavior for physical pages. Pages are
freed in physical address order when first onlined. There are also
mechanisms like CMA that can free large contiguous areas at once
increasing the predictability of allocations in physical memory.

In addition to the security implications this randomization also
stabilizes the average performance of direct-mapped memory-side caches.
This includes memory-side caches like the one on the Knights Landing
processor and those generally described by the ACPI HMAT (Heterogeneous
Memory Attributes Table [1]). Cache conflicts are spread over a random
distribution rather than localized.

Given the performance sensitivity of the page allocator this
randomization is only performed for MAX_ORDER (4MB by default) pages. A
kernel parameter, page_alloc.shuffle_page_order, is included to change
the page size where randomization occurs.

[1]: See ACPI 6.2 Section 5.2.27.5 Memory Side Cache Information Structure 

---

Dan Williams (3):
      mm: Shuffle initial free memory
      mm: Move buddy list manipulations into helpers
      mm: Maintain randomization of page free lists


 include/linux/list.h     |   17 +++
 include/linux/mm.h       |    5 -
 include/linux/mm_types.h |    3 +
 include/linux/mmzone.h   |   57 ++++++++++
 mm/bootmem.c             |    9 +-
 mm/compaction.c          |    4 -
 mm/nobootmem.c           |    7 +
 mm/page_alloc.c          |  267 +++++++++++++++++++++++++++++++++++++++-------
 8 files changed, 317 insertions(+), 52 deletions(-)

Andrew Morton Sept. 17, 2018, 11:12 p.m. UTC | #1

On Sat, 15 Sep 2018 09:23:02 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> Data exfiltration attacks via speculative execution and
> return-oriented-programming attacks rely on the ability to infer the
> location of sensitive data objects. The kernel page allocator, has
> predictable first-in-first-out behavior for physical pages. Pages are
> freed in physical address order when first onlined. There are also
> mechanisms like CMA that can free large contiguous areas at once
> increasing the predictability of allocations in physical memory.
> 
> In addition to the security implications this randomization also
> stabilizes the average performance of direct-mapped memory-side caches.
> This includes memory-side caches like the one on the Knights Landing
> processor and those generally described by the ACPI HMAT (Heterogeneous
> Memory Attributes Table [1]). Cache conflicts are spread over a random
> distribution rather than localized.
> 
> Given the performance sensitivity of the page allocator this
> randomization is only performed for MAX_ORDER (4MB by default) pages. A
> kernel parameter, page_alloc.shuffle_page_order, is included to change
> the page size where randomization occurs.
> 
> [1]: See ACPI 6.2 Section 5.2.27.5 Memory Side Cache Information Structure 

I'm struggling to understand the justification of all of this.  Are
such attacks known to exist?  Or reasonably expected to exist in the
future?  What is the likelihood and what is their cost?  Or is this all
academic and speculative and possibly pointless?

ie, something must have motivated you to do this work rather than
<something-else>.  Please spell out that motivation.

The new module parameter should be documented, please.  Let's try to
help people understand why they might ever want to alter the default
and if so, what settings they should be trying.

How come we aren't also shuffling at memory hot-add time?

Kees Cook Sept. 21, 2018, 7:12 p.m. UTC | #2

On Mon, Sep 17, 2018 at 4:12 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Sat, 15 Sep 2018 09:23:02 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Data exfiltration attacks via speculative execution and
>> return-oriented-programming attacks rely on the ability to infer the
>> location of sensitive data objects. The kernel page allocator, has
>> predictable first-in-first-out behavior for physical pages. Pages are
>> freed in physical address order when first onlined. There are also
>> mechanisms like CMA that can free large contiguous areas at once
>> increasing the predictability of allocations in physical memory.
>>
>> In addition to the security implications this randomization also
>> stabilizes the average performance of direct-mapped memory-side caches.
>> This includes memory-side caches like the one on the Knights Landing
>> processor and those generally described by the ACPI HMAT (Heterogeneous
>> Memory Attributes Table [1]). Cache conflicts are spread over a random
>> distribution rather than localized.
>>
>> Given the performance sensitivity of the page allocator this
>> randomization is only performed for MAX_ORDER (4MB by default) pages. A
>> kernel parameter, page_alloc.shuffle_page_order, is included to change
>> the page size where randomization occurs.
>>
>> [1]: See ACPI 6.2 Section 5.2.27.5 Memory Side Cache Information Structure
>
> I'm struggling to understand the justification of all of this.  Are
> such attacks known to exist?  Or reasonably expected to exist in the
> future?  What is the likelihood and what is their cost?  Or is this all
> academic and speculative and possibly pointless?

While we already have a base-address randomization
(CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
memory layouts would certainly be using the predictability of
allocation ordering (i.e. for attacks where the base address isn't
important: only the relative positions between allocated memory). This
is common in lots of heap-style attacks. They try to gain control over
ordering by spraying allocations, etc.

I'd really like to see this because it gives us something similar to
CONFIG_SLAB_FREELIST_RANDOM but for the page allocator. (This may be
worth mentioning in the series, especially as a comparison to its
behavior and this.)

> ie, something must have motivated you to do this work rather than
> <something-else>.  Please spell out that motivation.

I'd be curious to hear more about the mentioned cache performance
improvements. I love it when a security feature actually _improves_
performance. :)

Thanks for working on this!

-Kees

Elliott, Robert (Servers) Sept. 21, 2018, 11:48 p.m. UTC | #3

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> owner@vger.kernel.org> On Behalf Of Kees Cook
> Sent: Friday, September 21, 2018 2:13 PM
> Subject: Re: [PATCH 0/3] mm: Randomize free memory
...
> I'd be curious to hear more about the mentioned cache performance
> improvements. I love it when a security feature actually _improves_
> performance. :)

It's been a problem in the HPC space:
http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

A kernel module called zonesort is available to try to help:
https://software.intel.com/en-us/articles/xeon-phi-software

and this abandoned patch series proposed that for the kernel:
https://lkml.org/lkml/2017/8/23/195

Dan's patch series doesn't attempt to ensure buffers won't conflict, but
also reduces the chance that the buffers will. This will make performance
more consistent, albeit slower than "optimal" (which is near impossible
to attain in a general-purpose kernel).  That's better than forcing
users to deploy remedies like:
    "To eliminate this gradual degradation, we have added a Stream
     measurement to the Node Health Check that follows each job;
     nodes are rebooted whenever their measured memory bandwidth
     falls below 300 GB/s."

---
Robert Elliott, HPE Persistent Memory

Dan Williams Sept. 22, 2018, 12:06 a.m. UTC | #4

On Fri, Sep 21, 2018 at 4:51 PM Elliott, Robert (Persistent Memory)
<elliott@hpe.com> wrote:
>
>
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> > owner@vger.kernel.org> On Behalf Of Kees Cook
> > Sent: Friday, September 21, 2018 2:13 PM
> > Subject: Re: [PATCH 0/3] mm: Randomize free memory
> ...
> > I'd be curious to hear more about the mentioned cache performance
> > improvements. I love it when a security feature actually _improves_
> > performance. :)
>
> It's been a problem in the HPC space:
> http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
>
> A kernel module called zonesort is available to try to help:
> https://software.intel.com/en-us/articles/xeon-phi-software
>
> and this abandoned patch series proposed that for the kernel:
> https://lkml.org/lkml/2017/8/23/195
>
> Dan's patch series doesn't attempt to ensure buffers won't conflict, but
> also reduces the chance that the buffers will. This will make performance
> more consistent, albeit slower than "optimal" (which is near impossible
> to attain in a general-purpose kernel).  That's better than forcing
> users to deploy remedies like:
>     "To eliminate this gradual degradation, we have added a Stream
>      measurement to the Node Health Check that follows each job;
>      nodes are rebooted whenever their measured memory bandwidth
>      falls below 300 GB/s."

Robert, thanks for that! Yes, instead of run-to-run variations
alternating between almost-never-conflict and nearly-always-conflict,
we'll get a random / average distribution of cache conflicts.

Michal Hocko Oct. 2, 2018, 2:30 p.m. UTC | #5

On Sat 15-09-18 09:23:02, Dan Williams wrote:
> Data exfiltration attacks via speculative execution and
> return-oriented-programming attacks rely on the ability to infer the
> location of sensitive data objects. The kernel page allocator, has
> predictable first-in-first-out behavior for physical pages. Pages are
> freed in physical address order when first onlined. There are also
> mechanisms like CMA that can free large contiguous areas at once
> increasing the predictability of allocations in physical memory.
> 
> In addition to the security implications this randomization also
> stabilizes the average performance of direct-mapped memory-side caches.
> This includes memory-side caches like the one on the Knights Landing
> processor and those generally described by the ACPI HMAT (Heterogeneous
> Memory Attributes Table [1]). Cache conflicts are spread over a random
> distribution rather than localized.
> 
> Given the performance sensitivity of the page allocator this
> randomization is only performed for MAX_ORDER (4MB by default) pages. A
> kernel parameter, page_alloc.shuffle_page_order, is included to change
> the page size where randomization occurs.

I have only glanced through the implementation. The boot allocator part
seems unexpectedly too large but I haven't tried to actually think about
simplification.

It is the more general idea that I am not really sure about. First of
all. Does it make _any_ sense to randomize 4MB blocks by default? Why
cannot we simply have it disabled? Then and more concerning question is,
does it even make sense to have this randomization applied to higher
orders than 0? Attacker might fragment the memory and keep recycling the
lowest order and get the predictable behavior that we have right now.

[0/3] mm: Randomize free memory

Message

Comments