From patchwork Thu Apr 29 21:48:01 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Peter Collingbourne <pcc@google.com>
X-Patchwork-Id: 12231989
Return-Path: <SRS0=wbcb=J2=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-31.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EA59CC433B4
	for <linux-mm@archiver.kernel.org>; Thu, 29 Apr 2021 21:48:14 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 5930F61461
	for <linux-mm@archiver.kernel.org>; Thu, 29 Apr 2021 21:48:14 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5930F61461
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org;
 spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 7FB646B006C; Thu, 29 Apr 2021 17:48:13 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7AB1E6B006E; Thu, 29 Apr 2021 17:48:13 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 624BB6B0070; Thu, 29 Apr 2021 17:48:13 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com
 [216.40.44.121])
	by kanga.kvack.org (Postfix) with ESMTP id 45DF76B006C
	for <linux-mm@kvack.org>; Thu, 29 Apr 2021 17:48:13 -0400 (EDT)
Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com
 [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id E6AF912E7
	for <linux-mm@kvack.org>; Thu, 29 Apr 2021 21:48:12 +0000 (UTC)
X-FDA: 78086743224.17.5AAC484
Received: from mail-qt1-f201.google.com (mail-qt1-f201.google.com
 [209.85.160.201])
	by imf29.hostedemail.com (Postfix) with ESMTP id 283FAFA
	for <linux-mm@kvack.org>; Thu, 29 Apr 2021 21:48:07 +0000 (UTC)
Received: by mail-qt1-f201.google.com with SMTP id
 v18-20020ac857920000b02901bad9e4241dso3230927qta.15
        for <linux-mm@kvack.org>; Thu, 29 Apr 2021 14:48:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=date:message-id:mime-version:subject:from:to:cc;
        bh=OjKOeQnN3C7hg5ebsgC3khcdm5zw8hzxyoFqrw6/xD4=;
        b=CfmMAbvWoMeVMTr3EKodFPWE86IlHAyluEJtav4A2Y7WxqIVJTwLhvBMf53h9N+hdB
         gd+BWS1Sw2VkduAKaK1L52gBPpL5Hct+SejFq+QBOHlSVEh8RU004+r90zgU1Otk7G+I
         zqYea9UpoKckv+u1ce/NukOpQmTtCi9yQPpws1PNF9kSkvBhYgz0tevB6jxQr7BNrCTb
         4soV3fbYfAqk5ySiCE0tymSa8e7UohKv5pVtU/9w8e4HfdRjTkSKxK5wv7GST9/Sz5c0
         VOrU9VK2lc8P/fj08oaH4GzaQjgGrh+yY4GX81gWTm+yBRDmO5DFi6RFbIbwgJWp/osX
         LYMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc;
        bh=OjKOeQnN3C7hg5ebsgC3khcdm5zw8hzxyoFqrw6/xD4=;
        b=E5YSMtZAFKArmj6ZbwaxCAiK/rho7IRQVUmv/EtHaZofS2ZgUdAutDzzF8zpnO5zeC
         jUgBNvvaZwe/Hh6X14cYxzonrgdAyrKwIkZDA2OaoHIKmQ4Zp+xYSTB5itqI2sbzC3iv
         AWqaXQczSYF/IgDGdUdwRKK9P/BRzcHYvUdy7dt9MfHDywX8TdKvmPbOnaDM15ikSP+P
         j9k4ahpU43RsdrbTNuqr/CCypfX2R8Jo1mR+YyhyXFBMw4+ot7aqVMdxJdUJxb8Ahhbb
         kZAgU19Z7sv8/ia4jAZ6z8WNMtTQifq/Y7hNYj6qGrFgUQDCABWFUnwRHP5kCJP3N/hG
         3jbA==
X-Gm-Message-State: AOAM530Nx6JOrEYqz38j/fP4V9rbiwuvglwBlH2sws1gLff+JROotyyf
	ehrKivMWdTtz5JBPmJLbTWpP6cM=
X-Google-Smtp-Source: 
 ABdhPJx8pzj8G2042X9xO76Qp7pwUapXIPwbS5rukJB7wS6Cf1YNvayZrGXG0G5TEwVDmGSr6axZHUY=
X-Received: from pcc-desktop.svl.corp.google.com
 ([2620:15c:2ce:200:6ea8:c0ff:2f63:4b08])
 (user=pcc job=sendgmr) by 2002:a0c:be11:: with SMTP id
 k17mr2026355qvg.42.1619732891574;
 Thu, 29 Apr 2021 14:48:11 -0700 (PDT)
Date: Thu, 29 Apr 2021 14:48:01 -0700
Message-Id: <20210429214801.2583336-1-pcc@google.com>
Mime-Version: 1.0
X-Mailer: git-send-email 2.31.1.527.g47e6f16901-goog
Subject: [PATCH v3] mm: improve mprotect(R|W) efficiency on pages referenced
 once
From: Peter Collingbourne <pcc@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Collingbourne <pcc@google.com>,
 Kostya Kortchinsky <kostyak@google.com>,
	Evgenii Stepanov <eugenis@google.com>, linux-mm@kvack.org
Authentication-Results: imf29.hostedemail.com;
	dkim=pass header.d=google.com header.s=20161025 header.b=CfmMAbvW;
	spf=pass (imf29.hostedemail.com: domain of
 3mymLYAMKCCYREEIQQING.EQONKPWZ-OOMXCEM.QTI@flex--pcc.bounces.google.com
 designates 209.85.160.201 as permitted sender)
 smtp.mailfrom=3mymLYAMKCCYREEIQQING.EQONKPWZ-OOMXCEM.QTI@flex--pcc.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-Stat-Signature: ympq88bkcihwufc41e4k6ig7o8rdbwcu
X-Rspamd-Queue-Id: 283FAFA
X-Rspamd-Server: rspam03
Received-SPF: none (flex--pcc.bounces.google.com>: No applicable sender policy
 available) receiver=imf29; identity=mailfrom;
 envelope-from="<3mymLYAMKCCYREEIQQING.EQONKPWZ-OOMXCEM.QTI@flex--pcc.bounces.google.com>";
 helo=mail-qt1-f201.google.com; client-ip=209.85.160.201
X-HE-DKIM-Result: pass/pass
X-HE-Tag: 1619732887-163654
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

In the Scudo memory allocator [1] we would like to be able to
detect use-after-free vulnerabilities involving large allocations
by issuing mprotect(PROT_NONE) on the memory region used for the
allocation when it is deallocated. Later on, after the memory
region has been "quarantined" for a sufficient period of time we
would like to be able to use it for another allocation by issuing
mprotect(PROT_READ|PROT_WRITE).

Before this patch, after removing the write protection, any writes
to the memory region would result in page faults and entering
the copy-on-write code path, even in the usual case where the
pages are only referenced by a single PTE, harming performance
unnecessarily. Make it so that any pages in anonymous mappings that
are only referenced by a single PTE are immediately made writable
during the mprotect so that we can avoid the page faults.

This program shows the critical syscall sequence that we intend to
use in the allocator:

  #include <string.h>
  #include <sys/mman.h>

  enum { kSize = 131072 };

  int main(int argc, char **argv) {
    char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
                              MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    for (int i = 0; i != 100000; ++i) {
      memset(addr, i, kSize);
      mprotect((void *)addr, kSize, PROT_NONE);
      mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
    }
  }

The effect of this patch on the above program was measured on a
DragonBoard 845c by taking the median real time execution time of
10 runs.

Before: 2.94s
After:  0.66s

The effect was also measured using one of the microbenchmarks that
we normally use to benchmark the allocator [2], after modifying it
to make the appropriate mprotect calls [3]. With an allocation size
of 131072 bytes to trigger the allocator's "large allocation" code
path the per-iteration time was measured as follows:

Before: 27450ns
After:   6010ns

This patch means that we do more work during the mprotect call itself
in exchange for less work when the pages are accessed. In the worst
case, the pages are not accessed at all. The effect of this patch in
such cases was measured using the following program:

  #include <string.h>
  #include <sys/mman.h>

  enum { kSize = 131072 };

  int main(int argc, char **argv) {
    char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
                              MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    memset(addr, 1, kSize);
    for (int i = 0; i != 100000; ++i) {
  #ifdef PAGE_FAULT
      memset(addr + (i * 4096) % kSize, i, 4096);
  #endif
      mprotect((void *)addr, kSize, PROT_NONE);
      mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
    }
  }

With PAGE_FAULT undefined (0 pages touched after removing write
protection) the median real time execution time of 100 runs was
measured as follows:

Before: 0.330260s
After:  0.338836s

With PAGE_FAULT defined (1 page touched) the measurements were
as follows:

Before: 0.438048s
After:  0.355661s

So it seems that even with a single page fault the new approach
is faster.

I saw similar results if I adjusted the programs to use a larger
mapping size. With kSize = 1048576 I get these numbers with PAGE_FAULT
undefined:

Before: 1.428988s
After:  1.512016s

i.e. around 5.5%.

And these with PAGE_FAULT defined:

Before: 1.518559s
After:  1.524417s

i.e. about the same.

What I think we may conclude from these results is that for smaller
mappings the advantage of the previous approach, although measurable,
is wiped out by a single page fault. I think we may expect that there
should be at least one access resulting in a page fault (under the
previous approach) after making the pages writable, since the program
presumably made the pages writable for a reason.

For larger mappings we may guesstimate that the new approach wins if
the density of future page faults is > 0.4%. But for the mappings that
are large enough for density to matter (not just the absolute number
of page faults) it doesn't seem like the increase in mprotect latency
would be very large relative to the total mprotect execution time.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I98d75ef90e20330c578871c87494d64b1df3f1b8
Link: [1] https://source.android.com/devices/tech/debug/scudo
Link: [2] https://cs.android.com/android/platform/superproject/+/master:bionic/benchmarks/stdlib_benchmark.cpp;l=53;drc=e8693e78711e8f45ccd2b610e4dbe0b94d551cc9
Link: [3] https://github.com/pcc/llvm-project/commit/scudo-mprotect-secondary2
---
 mm/mprotect.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 94188df1ee55..3cf67de257e0 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -47,6 +47,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
 	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
 	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+	bool anon_writable =
+		vma_is_anonymous(vma) && (vma->vm_flags & VM_WRITE);
 
 	/*
 	 * Can be called with only the mmap_lock for reading by
@@ -132,9 +134,12 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			}
 
 			/* Avoid taking write faults for known dirty pages */
-			if (dirty_accountable && pte_dirty(ptent) &&
-					(pte_soft_dirty(ptent) ||
-					 !(vma->vm_flags & VM_SOFTDIRTY))) {
+			if ((dirty_accountable ||
+			     (anon_writable &&
+			      page_mapcount(pte_page(ptent)) == 1)) &&
+			    pte_dirty(ptent) &&
+			    (pte_soft_dirty(ptent) ||
+			     !(vma->vm_flags & VM_SOFTDIRTY))) {
 				ptent = pte_mkwrite(ptent);
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);