mbox series

[v4,0/3] KASLR feature to randomize each loadable module

Message ID 1535583579-6138-1-git-send-email-rick.p.edgecombe@intel.com (mailing list archive)
Headers show
Series KASLR feature to randomize each loadable module | expand

Message

Rick Edgecombe Aug. 29, 2018, 10:59 p.m. UTC
Hi,

This is v4 of the "KASLR feature to randomize each loadable module" patchset.
The purpose is to increase the randomization and also to make the modules
randomized in relation to each other instead of just the base, so that if one
module leaks the location of the others can't be inferred. It is enabled for
x86_64 for now.

V4 is a few small fixes. I humbly think this is in pretty good shape at this
point, unless anyone has any comments. The only other big change I was
considering was moving the new randomization algorithm into vmalloc so it could
be re-used for other architectures or possibly other vmalloc usages.

A few words on how this was tested - As previously mentioned, the entropy
estimates were done using extracted module text sizes from the in-tree modules.
These were also used to run 100,000's of simulated module allocations by calling
module_alloc from a test module, including testing until allocation failure. The
simulations kept track of every allocation address to make sure there were no
collisions, and verified memory was actually mapped.

In addition the __vmalloc_node_try_addr function has a suite of unit tests that
verify for a bunch of edge cases that it:
 - Allows for allocations when it should
 - Reports the right error code if it collides with a lazy-free area or real
   allocation
 - Verifies it frees a lazy free area when it should

These synthetic tests were also how the performance metrics were gathered.

Changes for V4:
 - Fix issue caused by KASAN, kmemleak being provided different allocation
   lengths (padding).
 - Avoid kmalloc until sure its needed in __vmalloc_node_try_addr.
 - Fix for debug file hang when the last VA is a lazy purge area
 - Fixed issues reported by 0-day build system.

Changes for V3:
 - Code cleanup based on internal feedback. (thanks to Dave Hansen and Andriy
   Shevchenko)
 - Slight refactor of existing algorithm to more cleanly live along side new
   one.
 - BPF synthetic benchmark

Changes for V2:
 - New implementation of __vmalloc_node_try_addr based on the
   __vmalloc_node_range implementation, that only flushes TLB when needed.
 - Modified module loading algorithm to try to reduce the TLB flushes further.
 - Increase "random area" tries in order to increase the number of modules that
   can get high randomness.
 - Increase "random area" size to 2/3 of module area in order to increase the
   number of modules that can get high randomness.
 - Fix for 0day failures on other architectures.
 - Fix for wrong debugfs permissions. (thanks to Jann Horn)
 - Spelling fix. (thanks to Jann Horn)
 - Data on module_alloc performance and TLB flushes. (brought up by Kees Cook
   and Jann Horn)
 - Data on memory usage. (suggested by Jann)


Rick Edgecombe (3):
  vmalloc: Add __vmalloc_node_try_addr function
  x86/modules: Increase randomization for modules
  vmalloc: Add debugfs modfraginfo

 arch/x86/include/asm/pgtable_64_types.h |   7 +
 arch/x86/kernel/module.c                | 165 ++++++++++++++++---
 include/linux/vmalloc.h                 |   3 +
 mm/vmalloc.c                            | 279 +++++++++++++++++++++++++++++++-
 4 files changed, 429 insertions(+), 25 deletions(-)

Comments

Alexei Starovoitov Aug. 30, 2018, 2:27 a.m. UTC | #1
On Wed, Aug 29, 2018 at 03:59:36PM -0700, Rick Edgecombe wrote:
> Hi,
> 
> This is v4 of the "KASLR feature to randomize each loadable module" patchset.
> The purpose is to increase the randomization and also to make the modules
> randomized in relation to each other instead of just the base, so that if one
> module leaks the location of the others can't be inferred. It is enabled for
> x86_64 for now.
> 
> V4 is a few small fixes. I humbly think this is in pretty good shape at this
> point, unless anyone has any comments. The only other big change I was
> considering was moving the new randomization algorithm into vmalloc so it could
> be re-used for other architectures or possibly other vmalloc usages.
> 
> A few words on how this was tested - As previously mentioned, the entropy
> estimates were done using extracted module text sizes from the in-tree modules.
> These were also used to run 100,000's of simulated module allocations by calling
> module_alloc from a test module, including testing until allocation failure. The
> simulations kept track of every allocation address to make sure there were no
> collisions, and verified memory was actually mapped.
> 
> In addition the __vmalloc_node_try_addr function has a suite of unit tests that
> verify for a bunch of edge cases that it:
>  - Allows for allocations when it should
>  - Reports the right error code if it collides with a lazy-free area or real
>    allocation
>  - Verifies it frees a lazy free area when it should
> 
> These synthetic tests were also how the performance metrics were gathered.
> 
> Changes for V4:
>  - Fix issue caused by KASAN, kmemleak being provided different allocation
>    lengths (padding).
>  - Avoid kmalloc until sure its needed in __vmalloc_node_try_addr.
>  - Fix for debug file hang when the last VA is a lazy purge area
>  - Fixed issues reported by 0-day build system.
> 
> Changes for V3:
>  - Code cleanup based on internal feedback. (thanks to Dave Hansen and Andriy
>    Shevchenko)
>  - Slight refactor of existing algorithm to more cleanly live along side new
>    one.
>  - BPF synthetic benchmark

I don't see this benchmark in this patch set.
Could you prepare it as a test in tools/testing/selftests/bpf/ ?
so we can double check what is being tested and run it regularly
like we do for all other tests in there.
Rick Edgecombe Aug. 30, 2018, 6:24 p.m. UTC | #2
On Wed, 2018-08-29 at 19:27 -0700, Alexei Starovoitov wrote:
> On Wed, Aug 29, 2018 at 03:59:36PM -0700, Rick Edgecombe wrote:
> > Changes for V3:
> >  - Code cleanup based on internal feedback. (thanks to Dave Hansen and
> > Andriy
> >    Shevchenko)
> >  - Slight refactor of existing algorithm to more cleanly live along side new
> >    one.
> >  - BPF synthetic benchmark
> I don't see this benchmark in this patch set.
> Could you prepare it as a test in tools/testing/selftests/bpf/ ?
> so we can double check what is being tested and run it regularly
> like we do for all other tests in there.
Sure.

There were two benchmarks I had run with BPF in mind, one was the timing the
module_alloc function in different scenarios, looking to make sure there were no
slowdowns for insertions.

The other was to check if the fragmentation caused any measurable runtime
performance:
"For runtime performance, a synthetic benchmark was run that does 5000000 BPF
JIT invocations each, from varying numbers of parallel processes, while the
kernel compiles sharing the same CPU to stand in for the cache impact of a real
workload. The seccomp filter invocations were just Jann Horn's seccomp filtering
test from this thread http://openwall.com/lists/kernel-hardening/2018/07/18/2,
except non-real time priority. The kernel was configured with KPTI and
retpoline, and pcid was disabled. There wasn't any significant difference
between the new and the old."

From what I know about the bpf kselftest, the first one would probably be a
better fit. Not sure if the second one would fit, with the kernel compiling
sharing the same CPU, a special config, and a huge amount of processes being
spawned... I can try to add a micro-benchmark instead if that sounds good.

Rick