mbox series

[v5,00/10] x86/alternative: text_poke() fixes

Message ID 20181113130730.44844-1-namit@vmware.com (mailing list archive)
Headers show
Series x86/alternative: text_poke() fixes | expand

Message

Nadav Amit Nov. 13, 2018, 1:07 p.m. UTC
This patch-set addresses some issues that might affect the security and
the correctness of code patching.

The main issue that the patches deal with is the fact that the fixmap
PTEs that are used for patching are available for access from other
cores and might be exploited. They are not even flushed from the TLB in
remote cores, so the risk is even higher. This set addresses this issue
by introducing a temporary mm that is only used during patching.

To do so, we need to avoid using text_poke() before the poking-mm is
initialized and instead use text_poke_early().

During v3 of this set, Andy & Thomas suggested that early patching of
modules can be improved by simply writing to the memory. This actually
raises a security concern: there should not be any W+X mappings at any
given moment, and modules loading breaks this protection for no good
reason. So this patch also addresses this issue, while (presumably)
improving patching speed by making module memory initially RW(+NX) and
before execution changing it into RO(+X).

In addition the patch addresses various issues that are related to code
patching, and do some cleanup. I removed in this version some
tested-by and reviewed-by tags due to some extensive changes of some
patches.

v4->v5:
- Fix Xen breakage [Damian Tometzki]
- BUG_ON() when poking_mm initialization fails [PeterZ]
- Better comments on "x86/mm: temporary mm struct"
- Cleaner removal of the custom poker

v3->v4:
- Setting modules as RO when loading [Andy, tglx]
- Adding text_poke_kgdb() to keep the text_mutex assertion [tglx]
- Simpler logic to decide when to use early-poking [peterZ]
- More cleanup

v2->v3:
- Remove the fallback path in text_poke() [peterZ]
- poking_init() was broken due to the local variable poking_addr
- Preallocate tables for the temporary-mm to avoid sleep-in-atomic
- Prevent KASAN from yelling at text_poke()

v1->v2:
- Partial revert of 9222f606506c added to 1/6 [masami]
- Added Masami's reviewed-by tag

RFC->v1:
- Added handling of error in get_locked_pte()
- Remove lockdep assertion, clarify text_mutex use instead [masami]
- Comment fix [peterz]
- Removed remainders of text_poke return value [masami]
- Use __weak for poking_init instead of macros [masami]
- Simplify error handling in poking_init [masami]

Andy Lutomirski (1):
  x86/mm: temporary mm struct

Nadav Amit (9):
  Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"
  x86/jump_label: Use text_poke_early() during early init
  fork: provide a function for copying init_mm
  x86/alternative: initializing temporary mm for patching
  x86/alternative: use temporary mm for text poking
  x86/kgdb: avoid redundant comparison of code
  x86: avoid W^X being broken during modules loading
  x86/jump-label: remove support for custom poker
  x86/alternative: remove the return value of text_poke_*()

Andy Lutomirski (1):
  x86/mm: temporary mm struct

Nadav Amit (9):
  Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"
  x86/jump_label: Use text_poke_early() during early init
  fork: provide a function for copying init_mm
  x86/alternative: initializing temporary mm for patching
  x86/alternative: use temporary mm for text poking
  x86/kgdb: avoid redundant comparison of patched code
  x86: avoid W^X being broken during modules loading
  x86/jump-label: remove support for custom poker
  x86/alternative: remove the return value of text_poke_*()

 arch/x86/include/asm/fixmap.h        |   2 -
 arch/x86/include/asm/mmu_context.h   |  32 +++++
 arch/x86/include/asm/pgtable.h       |   3 +
 arch/x86/include/asm/text-patching.h |   9 +-
 arch/x86/kernel/alternative.c        | 208 +++++++++++++++++++++------
 arch/x86/kernel/jump_label.c         |  19 ++-
 arch/x86/kernel/kgdb.c               |  19 +--
 arch/x86/kernel/module.c             |   2 +-
 arch/x86/mm/init_64.c                |  35 +++++
 arch/x86/xen/mmu_pv.c                |   2 -
 include/linux/filter.h               |   6 +
 include/linux/sched/task.h           |   1 +
 init/main.c                          |   3 +
 kernel/fork.c                        |  24 +++-
 kernel/module.c                      |  10 ++
 15 files changed, 292 insertions(+), 83 deletions(-)

Comments

Peter Zijlstra Nov. 20, 2018, 12:42 p.m. UTC | #1
On Tue, Nov 13, 2018 at 05:07:20AM -0800, Nadav Amit wrote:
> v4->v5:
> - Fix Xen breakage [Damian Tometzki]
> - BUG_ON() when poking_mm initialization fails [PeterZ]
> - Better comments on "x86/mm: temporary mm struct"
> - Cleaner removal of the custom poker

I'll re-iterate my position: it is impossible for the text not to match,
and if it somehow does not match, something went sideways in an
unrecoverably fashion.

text_poke() must not fail, ever. If it does, our text is inconsistent
and we must abort/panic/bug.

The only way I will accept anything else is if someone can come up with
a sensible scenario of text_poke() failing and recovering from it.
AFAICT there is no possible way to gracefully recover.

Consider a jump label with multiple patch sites; we patch the first,
then fail. In order to restore to a sane state, we must undo the
patching of the first, but undoing text_poke() fails again. Then
what?

Allowing text_poke() to fail only creates an unfixable mess. Esp. since
there is no sane scenario under which is can fail.

---


--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -695,7 +695,7 @@ void __init_or_module text_poke_early(vo
 __ro_after_init struct mm_struct *poking_mm;
 __ro_after_init unsigned long poking_addr;
 
-static int __text_poke(void *addr, const void *opcode, size_t len)
+static void __text_poke(void *addr, const void *opcode, size_t len)
 {
 	bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
 	temporary_mm_state_t prev;
@@ -731,13 +731,10 @@ static int __text_poke(void *addr, const
 	 * The lock is not really needed, but this allows to avoid open-coding.
 	 */
 	ptep = get_locked_pte(poking_mm, poking_addr, &ptl);
-
 	/*
-	 * If we failed to allocate a PTE, fail. This should *never* happen,
-	 * since we preallocate the PTE.
+	 * This must not fail; preallocated in poking_init().
 	 */
-	if (WARN_ON_ONCE(!ptep))
-		goto out;
+	VM_BUG_ON(!ptep)
 
 	pte = mk_pte(pages[0], PAGE_KERNEL);
 	set_pte_at(poking_mm, poking_addr, ptep, pte);
@@ -795,12 +792,14 @@ static int __text_poke(void *addr, const
 	unuse_temporary_mm(prev);
 
 	pte_unmap_unlock(ptep, ptl);
-out:
-	if (memcmp(addr, opcode, len))
-		r = -EFAULT;
+
+	/*
+	 * If the text doesn't match what we just wrote; something is
+	 * fundamentally screwy, there's nothing we can really do about that.
+	 */
+	BUG_ON(memcmp(addr, opcode, len));
 
 	local_irq_restore(flags);
-	return r;
 }
 
 /**
@@ -814,21 +813,10 @@ static int __text_poke(void *addr, const
  * in a way that permits an atomic write. It also makes sure we fit on a single
  * page.
  */
-int text_poke(void *addr, const void *opcode, size_t len)
+void text_poke(void *addr, const void *opcode, size_t len)
 {
-	int r;
-
 	lockdep_assert_held(&text_mutex);
-
-	r = __text_poke(addr, opcode, len);
-
-	/*
-	 * TODO: change the callers to consider the return value and remove this
-	 *       historical assertion.
-	 */
-	BUG_ON(r);
-
-	return r;
+	__text_poke(addr, opcode, len);
 }
 
 /**
@@ -847,7 +835,7 @@ int text_poke(void *addr, const void *op
  */
 int text_poke_kgdb(void *addr, const void *opcode, size_t len)
 {
-	return __text_poke(addr, opcode, len);
+	__text_poke(addr, opcode, len);
 }
 
 static void do_sync_core(void *info)
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -767,10 +767,8 @@ int kgdb_arch_set_breakpoint(struct kgdb
 	 */
 	if (mutex_is_locked(&text_mutex))
 		return -EBUSY;
-	err = text_poke_kgdb((void *)bpt->bpt_addr, arch_kgdb_ops.gdb_bpt_instr,
+	text_poke_kgdb((void *)bpt->bpt_addr, arch_kgdb_ops.gdb_bpt_instr,
 			     BREAK_INSTR_SIZE);
-	if (err)
-		return err;
 	bpt->type = BP_POKE_BREAKPOINT;
 
 	return err;
@@ -788,11 +786,8 @@ int kgdb_arch_remove_breakpoint(struct k
 	 */
 	if (mutex_is_locked(&text_mutex))
 		goto knl_write;
-	err = text_poke_kgdb((void *)bpt->bpt_addr, bpt->saved_instr,
-			     BREAK_INSTR_SIZE);
-	if (err)
-		goto knl_write;
-	return err;
+	text_poke_kgdb((void *)bpt->bpt_addr, bpt->saved_instr, BREAK_INSTR_SIZE);
+	return 0;
 
 knl_write:
 	return probe_kernel_write((char *)bpt->bpt_addr,
Nadav Amit Nov. 20, 2018, 6:52 p.m. UTC | #2
> On Nov 20, 2018, at 4:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Nov 13, 2018 at 05:07:20AM -0800, Nadav Amit wrote:
>> v4->v5:
>> - Fix Xen breakage [Damian Tometzki]
>> - BUG_ON() when poking_mm initialization fails [PeterZ]
>> - Better comments on "x86/mm: temporary mm struct"
>> - Cleaner removal of the custom poker
> 
> I'll re-iterate my position: it is impossible for the text not to match,
> and if it somehow does not match, something went sideways in an
> unrecoverably fashion.
> 
> text_poke() must not fail, ever. If it does, our text is inconsistent
> and we must abort/panic/bug.
> 
> The only way I will accept anything else is if someone can come up with
> a sensible scenario of text_poke() failing and recovering from it.
> AFAICT there is no possible way to gracefully recover.
> 
> Consider a jump label with multiple patch sites; we patch the first,
> then fail. In order to restore to a sane state, we must undo the
> patching of the first, but undoing text_poke() fails again. Then
> what?
> 
> Allowing text_poke() to fail only creates an unfixable mess. Esp. since
> there is no sane scenario under which is can fail.

Ok, ok... I tried to stand my ground, but I guess I failed. I don’t feel
that strongly about this assertion to argue with you. I’m just the “chicken”
kind of guy.

Yet, take into consideration that I will need to use you as my “vest” once I
get being “shot” for adding BUG_ON(). ;-)

I will send another version tonight, assuming no new issues are raised.

Regards,
NAdav