From patchwork Wed Dec 30 00:41:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Collingbourne X-Patchwork-Id: 11992859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-21.6 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86F24C433DB for ; Wed, 30 Dec 2020 00:41:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 076FD207B7 for ; Wed, 30 Dec 2020 00:41:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 076FD207B7 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0FE1D6B00A1; Tue, 29 Dec 2020 19:41:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AEDD6B00A2; Tue, 29 Dec 2020 19:41:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EBACB8D006A; Tue, 29 Dec 2020 19:41:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id D26FF6B00A1 for ; Tue, 29 Dec 2020 19:41:45 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 91803181AEF07 for ; Wed, 30 Dec 2020 00:41:45 +0000 (UTC) X-FDA: 77648095770.17.tent00_2f04df8274a0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 7ABE7180D0180 for ; Wed, 30 Dec 2020 00:41:45 +0000 (UTC) X-HE-Tag: tent00_2f04df8274a0 X-Filterd-Recvd-Size: 8360 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Wed, 30 Dec 2020 00:41:44 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id o19so2811786pjr.8 for ; Tue, 29 Dec 2020 16:41:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:message-id:mime-version:subject:from:to:cc; bh=G8g9rYP/q7h+tMUsxPiitWyWEar80AnWF3+dQi64rDU=; b=Zi+eVP5v3xHADwO7GP5k+oUhSd+nXCuIvrTblccSJpsuo9SKG5wSv6+btm+0Vu+1vx IsCjtmjU6hshlCNsdLjUBqMAErxpmVjFWXMEU8GVadE011JgFXVJMocuQB4rajKZMnkY HUv7jsqbeDVfGYbRMOGcAJ2nZjD4TnJYpG+VywskJsLQVEEU8AE3bBDbxv4XH1IJJb0v Q6aRP9gDMZp4/pjejWUdB9eVj131yVsNPZqpYcW5fSjJ9bGQqeBpxill3n8AMVrWcGkt /Odd+uwqhEdSVpxGvmUsBLODs0EN98DFWKLeAUT3shCKiD8a5sHHLNOT6vnlQFEMy8xp LY5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:message-id:mime-version:subject:from :to:cc; bh=G8g9rYP/q7h+tMUsxPiitWyWEar80AnWF3+dQi64rDU=; b=uDnNqZhbJoW6aLYUTfRaqGa/zHpBoKX5s4f+GIX/zYqa3FFrsafoVN1QkKqodOhsxB WVLB/AKiyHVoLevoBdIx/oMS4W5E/KQ+YQqhyEKhTGt1N7D5rqqgxQ6Iqb1tlk/X3KiD d421SSFdrwbZCPxGsnQbguuQlpHqBWPBk5f8O/papPQsg/bVHIU148oDmdOE8D+ErJ/5 QJDa70izUFtSi/gJPu3wHRHo0/9mL9Bx6hY612JTiDDE+1qryhez51IVlytWrPDPiOV3 wQkEJ0GmpdLae2mfZ3EDJVCj0XNfAmwd9LdgUS7Gp2UbxBxB04fwdrhZTLfJtKZVsqZM WFUg== X-Gm-Message-State: AOAM530mp1SBN6ZzUOhe/+RUqFvBiV/T4ZYI+mQytjIMNt+xkKF/khF+ b7Oh2mqpUcmxlkFR23fTdMroAxU= X-Google-Smtp-Source: ABdhPJzi38c3VnQh9R3+Jtr9OcCJ+G7Q+bIYCb7N4/E7JTuNLsgFiqNgICeP/ewizN95w9HsQXtTFHs= X-Received: from pcc-desktop.svl.corp.google.com ([2620:15c:2ce:0:7220:84ff:fe09:385a]) (user=pcc job=sendgmr) by 2002:a17:90a:a48c:: with SMTP id z12mr6112495pjp.58.1609288903840; Tue, 29 Dec 2020 16:41:43 -0800 (PST) Date: Tue, 29 Dec 2020 16:41:34 -0800 Message-Id: <20201230004134.1185017-1-pcc@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.29.2.729.g45daf8777d-goog Subject: [PATCH v2] mm: improve mprotect(R|W) efficiency on pages referenced once From: Peter Collingbourne To: Andrew Morton Cc: Peter Collingbourne , Kostya Kortchinsky , linux-mm@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the Scudo memory allocator [1] we would like to be able to detect use-after-free vulnerabilities involving large allocations by issuing mprotect(PROT_NONE) on the memory region used for the allocation when it is deallocated. Later on, after the memory region has been "quarantined" for a sufficient period of time we would like to be able to use it for another allocation by issuing mprotect(PROT_READ|PROT_WRITE). Before this patch, after removing the write protection, any writes to the memory region would result in page faults and entering the copy-on-write code path, even in the usual case where the pages are only referenced by a single PTE, harming performance unnecessarily. Make it so that any pages in anonymous mappings that are only referenced by a single PTE are immediately made writable during the mprotect so that we can avoid the page faults. This program shows the critical syscall sequence that we intend to use in the allocator: #include #include enum { kSize = 131072 }; int main(int argc, char **argv) { char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); for (int i = 0; i != 100000; ++i) { memset(addr, i, kSize); mprotect((void *)addr, kSize, PROT_NONE); mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE); } } The effect of this patch on the above program was measured on a DragonBoard 845c by taking the median real time execution time of 10 runs. Before: 3.19s After: 0.79s The effect was also measured using one of the microbenchmarks that we normally use to benchmark the allocator [2], after modifying it to make the appropriate mprotect calls [3]. With an allocation size of 131072 bytes to trigger the allocator's "large allocation" code path the per-iteration time was measured as follows: Before: 33364ns After: 6886ns This patch means that we do more work during the mprotect call itself in exchange for less work when the pages are accessed. In the worst case, the pages are not accessed at all. The effect of this patch in such cases was measured using the following program: #include #include enum { kSize = 131072 }; int main(int argc, char **argv) { char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); memset(addr, 1, kSize); for (int i = 0; i != 100000; ++i) { #ifdef PAGE_FAULT memset(addr + (i * 4096) % kSize, i, 4096); #endif mprotect((void *)addr, kSize, PROT_NONE); mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE); } } With PAGE_FAULT undefined (0 pages touched after removing write protection) the median real time execution time of 100 runs was measured as follows: Before: 0.325928s After: 0.365493s With PAGE_FAULT defined (1 page touched) the measurements were as follows: Before: 0.441516s After: 0.380251s So it seems that even with a single page fault the new approach is faster. I saw similar results if I adjusted the programs to use a larger mapping size. With kSize = 1048576 I get these numbers with PAGE_FAULT undefined: Before: 1.563078s After: 1.607476s i.e. around 3%. And these with PAGE_FAULT defined: Before: 1.684663s After: 1.683272s i.e. about the same. What I think we may conclude from these results is that for smaller mappings the advantage of the previous approach, although measurable, is wiped out by a single page fault. I think we may expect that there should be at least one access resulting in a page fault (under the previous approach) after making the pages writable, since the program presumably made the pages writable for a reason. For larger mappings we may guesstimate that the new approach wins if the density of future page faults is > 0.4%. But for the mappings that are large enough for density to matter (not just the absolute number of page faults) it doesn't seem like the increase in mprotect latency would be very large relative to the total mprotect execution time. Signed-off-by: Peter Collingbourne Link: https://linux-review.googlesource.com/id/I98d75ef90e20330c578871c87494d64b1df3f1b8 Link: [1] https://source.android.com/devices/tech/debug/scudo Link: [2] https://cs.android.com/android/platform/superproject/+/master:bionic/benchmarks/stdlib_benchmark.cpp;l=53;drc=e8693e78711e8f45ccd2b610e4dbe0b94d551cc9 Link: [3] https://github.com/pcc/llvm-project/commit/scudo-mprotect-secondary --- mm/mprotect.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index ab709023e9aa..1f10e041a197 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -47,6 +47,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, bool prot_numa = cp_flags & MM_CP_PROT_NUMA; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + bool anon_writable = + vma_is_anonymous(vma) && (vma->vm_flags & VM_WRITE); /* * Can be called with only the mmap_lock for reading by @@ -136,7 +138,11 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { ptent = pte_mkwrite(ptent); + } else if (anon_writable && + page_mapcount(pte_page(ptent)) == 1) { + ptent = pte_mkwrite(ptent); } + ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; } else if (is_swap_pte(oldpte)) {