mbox series

[v7,00/14] x86/alternative: text_poke() enhancements

Message ID 20181205013408.47725-1-namit@vmware.com (mailing list archive)
Headers show
Series x86/alternative: text_poke() enhancements | expand

Message

Nadav Amit Dec. 5, 2018, 1:33 a.m. UTC
This patch-set addresses some issues that might affect the security and
the correctness of code patching. It was originally small and mainly
intended to remove the text-poking fixmap PTEs, which can cause PTEs
cached in the TLB in remote cores for unbounded time.

It was then suggested by tglx and Andy that patching of modules can be
simpler by making module code non-executable during setup. This opened a
can of worms, since it required changes of kprobe and ftrace. Ftrace, it
turns out, patches the code using a homegrown mechanism, so it made
sense to make it use text_poke_*() instead. And then, module unloading
seemed fragile and susceptible for attacks, so it required some
attention as well.

This whole story is to clarify two points: (a) this patch-set does *not*
have a full threat-model in mind. It does harden security a bit by
shortening the time-window in which writable executable mappings are
set. Since it consolidates kernel code modifications, it may be used as
a basis for some a future code integrity mechanism, a-la Microsoft's
hypervisor enforced code integrity (HVCI). However, this patch-set does
not provide HVCI-like security.

Which leads me to (b) - the patch-set is big "enough" IMHO. Indeed,
there are open security issues in the kernel when it comes to W^X.  But
some people would want to use Andy's temporary mm-struct for other uses.
So additional security hardening may be left for future patches.

v6->v7:
- Fix kprobes breakage [Rick Edgecombe]
- Fix ftrace breakage [Rick Edgecombe]
- Fix module unloading issues (setting X, race with patching)

v5->v6:
- Panic if anything goes wrong when poking [peterZ]

v4->v5:
- Fix Xen breakage [Damian Tometzki]
- BUG_ON() when poking_mm initialization fails [peterZ]
- Better comments on "x86/mm: temporary mm struct"
- Cleaner removal of the custom poker

v3->v4:
- Setting modules as RO when loading [andy, tglx]
- Adding text_poke_kgdb() to keep the text_mutex assertion [tglx]
- Simpler logic to decide when to use early-poking [peterZ]
- More cleanup

v2->v3:
- Remove the fallback path in text_poke() [peterZ]
- poking_init() was broken due to the local variable poking_addr
- Preallocate tables for the temporary-mm to avoid sleep-in-atomic
- Prevent KASAN from yelling at text_poke()

v1->v2:
- Partial revert of 9222f606506c added to 1/6 [masami]
- Added Masami's reviewed-by tag

RFC->v1:
- Added handling of error in get_locked_pte()
- Remove lockdep assertion, clarify text_mutex use instead [masami]
- Comment fix [peterz]
- Removed remainders of text_poke return value [masami]
- Use __weak for poking_init instead of macros [masami]
- Simplify error handling in poking_init [masami]

Andy Lutomirski (1):
  x86/mm: temporary mm struct

Nadav Amit (13):
  Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"
  x86/jump_label: Use text_poke_early() during early init
  fork: provide a function for copying init_mm
  x86/alternative: initializing temporary mm for patching
  x86/alternative: use temporary mm for text poking
  x86/kgdb: avoid redundant comparison of patched code
  x86/ftrace: Use text_poke_*() infrastructure
  x86/kprobes: Instruction pages initialization enhancements
  x86: avoid W^X being broken during modules loading
  x86/jump-label: remove support for custom poker
  x86/alternative: Remove the return value of text_poke_*()
  module: Do not set nx for module memory before freeing
  module: Prevent module removal racing with text_poke()

 arch/x86/include/asm/fixmap.h        |   2 -
 arch/x86/include/asm/mmu_context.h   |  32 +++++
 arch/x86/include/asm/pgtable.h       |   3 +
 arch/x86/include/asm/text-patching.h |   7 +-
 arch/x86/kernel/alternative.c        | 197 ++++++++++++++++++++-------
 arch/x86/kernel/ftrace.c             |  74 ++++------
 arch/x86/kernel/jump_label.c         |  19 ++-
 arch/x86/kernel/kgdb.c               |  25 +---
 arch/x86/kernel/kprobes/core.c       |  24 +++-
 arch/x86/kernel/module.c             |   2 +-
 arch/x86/mm/init_64.c                |  35 +++++
 arch/x86/xen/mmu_pv.c                |   2 -
 include/linux/filter.h               |   6 +
 include/linux/sched/task.h           |   1 +
 init/main.c                          |   3 +
 kernel/fork.c                        |  24 +++-
 kernel/module.c                      |  50 +++++--
 17 files changed, 349 insertions(+), 157 deletions(-)

Comments

Peter Zijlstra Dec. 6, 2018, 10:03 a.m. UTC | #1
On Tue, Dec 04, 2018 at 05:33:54PM -0800, Nadav Amit wrote:
> Which leads me to (b) - the patch-set is big "enough" IMHO. Indeed,
> there are open security issues in the kernel when it comes to W^X.  But
> some people would want to use Andy's temporary mm-struct for other uses.
> So additional security hardening may be left for future patches.

Yes, at the very least we should get the first 7 patches merged, since
they work and clean up the text poking irrespective of all that W^X
munging.

(also, I think you lost my ACK)
Nadav Amit Dec. 10, 2018, 1:06 a.m. UTC | #2
> On Dec 6, 2018, at 2:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Dec 04, 2018 at 05:33:54PM -0800, Nadav Amit wrote:
>> Which leads me to (b) - the patch-set is big "enough" IMHO. Indeed,
>> there are open security issues in the kernel when it comes to W^X.  But
>> some people would want to use Andy's temporary mm-struct for other uses.
>> So additional security hardening may be left for future patches.
> 
> Yes, at the very least we should get the first 7 patches merged, since
> they work and clean up the text poking irrespective of all that W^X
> munging.
> 
> (also, I think you lost my ACK)

Sorry for that. I will add.

But first, Thomas, Andy, are you ok with going with the first 7 patches?

IIRC, you are the one who asked to add the handling of modules, since it was
not clear whether some synchronization is needed after the poking (that is
done w/memcpy in this early stage).

I can add synchronization if needed until the rest of the series gets in.