From patchwork Wed Dec 12 00:03:50 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Rick Edgecombe <rick.p.edgecombe@intel.com>
X-Patchwork-Id: 10725333
Return-Path: 
 <kernel-hardening-return-14672-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5683159A
	for <patchwork-kernel-hardening@patchwork.kernel.org>;
 Wed, 12 Dec 2018 00:12:24 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A58952B550
	for <patchwork-kernel-hardening@patchwork.kernel.org>;
 Wed, 12 Dec 2018 00:12:24 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 98CFA2B583; Wed, 12 Dec 2018 00:12:24 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from mother.openwall.net (mother.openwall.net [195.42.179.200])
	by mail.wl.linuxfoundation.org (Postfix) with SMTP id B3D312B550
	for <patchwork-kernel-hardening@patchwork.kernel.org>;
 Wed, 12 Dec 2018 00:12:23 +0000 (UTC)
Received: (qmail 29757 invoked by uid 550); 12 Dec 2018 00:12:21 -0000
Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:kernel-hardening@lists.openwall.com>
List-Help: <mailto:kernel-hardening-help@lists.openwall.com>
List-Unsubscribe: <mailto:kernel-hardening-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:kernel-hardening-subscribe@lists.openwall.com>
List-ID: <kernel-hardening.lists.openwall.com>
Delivered-To: mailing list kernel-hardening@lists.openwall.com
Received: (qmail 29733 invoked from network); 12 Dec 2018 00:12:20 -0000
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,343,1539673200";
   d="scan'208";a="282839396"
From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: akpm@linux-foundation.org,
	luto@kernel.org,
	will.deacon@arm.com,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	kernel-hardening@lists.openwall.com,
	naveen.n.rao@linux.vnet.ibm.com,
	anil.s.keshavamurthy@intel.com,
	davem@davemloft.net,
	mhiramat@kernel.org,
	rostedt@goodmis.org,
	mingo@redhat.com,
	ast@kernel.org,
	daniel@iogearbox.net,
	jeyu@kernel.org,
	namit@vmware.com,
	netdev@vger.kernel.org,
	ard.biesheuvel@linaro.org,
	jannh@google.com
Cc: kristen@linux.intel.com,
	dave.hansen@intel.com,
	deneen.t.dock@intel.com,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: =?utf-8?q?=5BPATCH_v2_0/4=5D_Don=E2=80=99t_leave_executable_TLB_ent?=
	=?utf-8?q?ries_to_freed_pages?=
Date: Tue, 11 Dec 2018 16:03:50 -0800
Message-Id: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
X-Mailer: git-send-email 2.17.1
MIME-Version: 1.0
X-Virus-Scanned: ClamAV using ClamSMTP

Sometimes when memory is freed via the module subsystem, an executable
permissioned TLB entry can remain to a freed page. If the page is re-used to
back an address that will receive data from userspace, it can result in user
data being mapped as executable in the kernel. The root of this behavior is
vfree lazily flushing the TLB, but not lazily freeing the underlying pages.

This v2 enables vfree to handle freeing memory with special permissions. So now
it can be done with no W^X window, centralizing the logic for this operation,
and also to do this with only one TLB flush on x86.

I'm not sure if the algorithm Andy Lutomirski suggested (to do the whole
teardown with one TLB flush) will work across other architectures or not, so it
is in an x86 arch breakout(arch_vunmap) in this version. The default arch_vunmap
implementation does what Nadav is proposing users of module_alloc do on tear
down so it should be unchanged in behavior, just centralized. The main
difference will be BPF teardown will now get an extra TLB flush on archs that
have set_memory_* defined from set_memory_nx in addition to set_memory_rw. On
x86, due to the more efficient arch version, it will be unchanged at one flush.

The logic enabling this behavior is plugged into kernel/module.c and bpf cross
arch pieces. So it should be enabled for all architectures for regular .ko
modules and bpf but the other module_alloc users will be unchanged for now.

I did find one small downside with this approach, and that is that there is
occasionally one extra directmap page split in modules tear down, since one of
the modules subsections is RW. The x86 arch_vunmap will set the RW directmap of
the pages not present, since it doesn't know the whole thing is not executable,
so sometimes this results in an splitting an extra large page because the paging
structure would have its first special permission. But on the plus side many TLB
flushes are reduced down to one (on x86 here, and likely others in the future).
The other usages of modules (bpf, etc) will not have RW subsections and so this
will not increase. So I am thinking its not a big downside for a few modules
compared to reducing TLB flushes, removing executable stale TLB entries and code
simplicity.

Todo:
 - Merge with Nadav Amit's patchset
 - Test on x86 32 bit with highmem
 - Plug into ftrace and kprobes implementations in Nadav's next version of his
   patchset

Changes since v1:
 - New efficient algorithm on x86 for tearing down executable RO memory and
   flag for this (Andy Lutomirski)
 - Have no W^X violating window on tear down (Nadav Amit)


Rick Edgecombe (4):
  vmalloc: New flags for safe vfree on special perms
  modules: Add new special vfree flags
  bpf: switch to new vmalloc vfree flags
  x86/vmalloc: Add TLB efficient x86 arch_vunmap

 arch/x86/include/asm/set_memory.h |  2 +
 arch/x86/mm/Makefile              |  3 +-
 arch/x86/mm/pageattr.c            | 11 +++--
 arch/x86/mm/vmalloc.c             | 71 ++++++++++++++++++++++++++++++
 include/linux/filter.h            | 26 +++++------
 include/linux/vmalloc.h           |  2 +
 kernel/bpf/core.c                 |  1 -
 kernel/module.c                   | 43 +++++-------------
 mm/vmalloc.c                      | 73 ++++++++++++++++++++++++++++---
 9 files changed, 173 insertions(+), 59 deletions(-)
 create mode 100644 arch/x86/mm/vmalloc.c