mm: introduce reference pages

Introduce a new mmap flag, MAP_REFPAGE, that creates a mapping similar
to an anonymous mapping, but instead of clean pages being backed by the
zero page, they are instead backed by a so-called reference page, whose
address is specified using the offset argument to mmap. Loads from
the mapping will load directly from the reference page, and initial
stores to the mapping will copy-on-write from the reference page.

Reference pages are useful in circumstances where anonymous mappings
combined with manual stores to memory would impose undesirable costs,
either in terms of performance or RSS. Use cases are focused on heap
allocators and include:

- Pattern initialization for the heap. This is where malloc(3) gives
  you memory whose contents are filled with a non-zero pattern
  byte, in order to help detect and mitigate bugs involving use
  of uninitialized memory. Typically this is implemented by having
  the allocator memset the allocation with the pattern byte before
  returning it to the user, but for large allocations this can result
  in a significant increase in RSS, especially for allocations that
  are used sparsely. Even for dense allocations there is a needless
  impact to startup performance when it may be better to amortize it
  throughout the program. By creating allocations using a reference
  page filled with the pattern byte, we can avoid these costs.

- Pre-tagged heap memory. Memory tagging [1] is an upcoming ARMv8.5
  feature which allows for memory to be tagged in order to detect
  certain kinds of memory errors with low overhead. In order to set
  up an allocation to allow memory errors to be detected, the entire
  allocation needs to have the same tag. The issue here is similar to
  pattern initialization in the sense that large tagged allocations
  will be expensive if the tagging is done up front. The idea is that
  the allocator would create reference pages with each of the possible
  memory tags, and use those reference pages for the large allocations.

In order to measure the performance and RSS impact of reference pages,
a version of this patch backported to kernel version 4.14 was tested on
a Pixel 4 together with a modified [2] version of the Scudo allocator
that uses reference pages to implement pattern initialization. A
PDFium test program was used to collect the measurements like so:

$ wget https://static.docs.arm.com/ddi0487/fb/DDI0487F_b_armv8_arm.pdf
$ /system/bin/time -v ./pdfium_test --pages=1-100 DDI0487F_b_armv8_arm.pdf

and the median of 100 runs measurement was taken with three variants
of the allocator:

- "anon" is the baseline (no pattern init)
- "memset" is with pattern init of allocator pages implemented by
  initializing anonymous pages with memset
- "refpage" is with pattern init of allocator pages implemented
  by creating reference pages

All three variants are measured using the patch that I linked. "anon"
is without the patch, "refpage" is with the patch and "memset"
is with the patch with "#if 0" in place of "#if 1" in linux.cpp.
The measurements are as follows:

          Real time (s)    Max RSS (KiB)
anon        2.237081         107088
memset      2.252241         112180
refpage     2.251220         103504

We can see that real time for refpage is about the same or maybe
slightly faster than memset. At this point it is unclear where the
discrepancy in performance between anon and refpage comes from. The
Pixel 4 kernel has transparent hugepages disabled so that can't be it.

I wouldn't trust the RSS number for reference pages (with a test
program that uses an anonymous page as a reference page, I saw the
following output on dmesg:

[75768.572560] BUG: Bad rss-counter state mm:00000000f1cdec59 idx:1 val:-2
[75768.572577] BUG: Bad rss-counter state mm:00000000f1cdec59 idx:3 val:2

indicating that I might not have implemented RSS accounting for
reference pages correctly), but we see straight away an RSS impact
of 5% for memset versus anon. Assuming that accounting for anonymous
pages has been implemented correctly, we can expect the true RSS
number for refpages to be similar to that which I measured for anon.

As an alternative to extending mmap(2), I considered using
userfaultfd to implement reference pages. However, after having taken
a detailed look at the interface, it does not seem suitable to be
used in the context of a general purpose allocator. For example,
UFFD_FEATURE_FORK support would be required in order to correctly
support fork(2) in a process that uses the allocator (although POSIX
does not guarantee support for allocating after fork, many allocators
including Scudo support it, and nothing stops the forked process from
page faulting pre-existing allocations after forking anyway), but
UFFD_FEATURE_FORK has been restricted to root by commit 3c1c24d91ffd
("userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK"),
making it unsuitable for use in an allocator. Furthermore, even if
the interface issues are resolved, I suspect (but have not measured)
that the cost of the multiple context switches between kernel and
userspace would be too high to be used in an allocator anyway.

There are unresolved issues with this patch:

- We need to decide on the semantics associated with remapping or
  unmapping the reference page. As currently implemented, the page is
  looked up by address on each page fault, and a segfault ensues if the
  address is not mapped. It may be better to have the mmap(2) call take
  a reference to the page (failing if not mapped) and the underlying
  vma so that future remappings or unmappings have no effect.

- I have not yet looked at interaction with transparent hugepages.

- We probably need to restrict which kinds of pages are supported as
  reference pages (probably only anonymous and file-backed pages). This
  is somewhat tied to the remapping semantics as we would need
  to decide what happens if a supported page is replaced with an
  unsupported page.

- Finally, the accounting issues as previously mentioned.

However, I am sending this first version of the patch in order to get
early feedback on the idea and whether it is suitable to be added to
the kernel.

[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
[2] https://github.com/pcc/llvm-project/commit/a05f88aaebc7daf262d6885444d9845052026f4b

Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 arch/mips/kernel/vdso.c                |  2 +-
 include/linux/mm.h                     |  2 +-
 include/uapi/asm-generic/mman-common.h |  1 +
 mm/mmap.c                              | 46 +++++++++++++++++++++++---
 4 files changed, 45 insertions(+), 6 deletions(-)

Message ID	20200731203241.50427-1-pcc@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Pkkz=BK=kvack.org=owner-linux-mm@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 273251392 for <patchwork-linux-mm@patchwork.kernel.org>; Fri, 31 Jul 2020 20:32:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D25AE2087C for <patchwork-linux-mm@patchwork.kernel.org>; Fri, 31 Jul 2020 20:32:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wD32Ya+Q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D25AE2087C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 178896B006E; Fri, 31 Jul 2020 16:32:55 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 12B896B0070; Fri, 31 Jul 2020 16:32:55 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 018C78D000B; Fri, 31 Jul 2020 16:32:54 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id DF7A56B006E for <linux-mm@kvack.org>; Fri, 31 Jul 2020 16:32:54 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8F6FD8248047 for <linux-mm@kvack.org>; Fri, 31 Jul 2020 20:32:54 +0000 (UTC) X-FDA: 77099519868.28.pipe77_341704c26f86 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 590396C11 for <linux-mm@kvack.org>; Fri, 31 Jul 2020 20:32:54 +0000 (UTC) X-Spam-Summary: 1,0,0,27848b298036acda,d41d8cd98f00b204,39h8kxwmkciuyllpxxpun.lxvurw36-vvt4jlt.x0p@flex--pcc.bounces.google.com,,RULES_HIT:1:41:152:355:379:541:800:960:966:967:973:988:989:1100:1260:1277:1313:1314:1345:1431:1437:1516:1518:1593:1594:1605:1730:1747:1777:1792:1801:2194:2196:2198:2199:2200:2201:2393:2525:2553:2559:2565:2570:2637:2682:2685:2693:2703:2731:2740:2859:2912:2933:2937:2939:2942:2945:2947:2951:2954:3000:3022:3152:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4321:4385:4605:5007:6119:6261:7875:7903:7974:8660:9025:9969:10004:11658:13141:13146:13148:13149:13161:13229:13230,0,RBL:209.85.219.201:@flex--pcc.bounces.google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04yfm8kyfuwpa1rz3tr3gmjqozsmsoppdk1ubh6wns4dceq5q8obx7h7z8fkuts.7p44utq8iqykexu3uk1gphiwi7fpqxopz7coatapz3zfdf96i384hqdfrb7mniw.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk, SPF:fp,M X-HE-Tag: pipe77_341704c26f86 X-Filterd-Recvd-Size: 14369 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf36.hostedemail.com (Postfix) with ESMTP for <linux-mm@kvack.org>; Fri, 31 Jul 2020 20:32:53 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id t203so6955173yba.11 for <linux-mm@kvack.org>; Fri, 31 Jul 2020 13:32:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=NQX1/nLgGm3+2PtENNZ8wg0kOWx9IUAm3hr895vwa2w=; b=wD32Ya+Ql38aePaOkg9ju7MnCGrtskFMgEQadDEQIHVZOJG6pjmKsyj4UbI5rVPJSg oowUGjFH17Idgf40oxhTyUrxeEu9gFK8q1RdyWNbPmZUe8AObFc0eoja3TkRWI1is9UN 4Lfr/j1+w671upx2tUuiUG918KsHOo8j2ydzieAn/QJLANPdXijp8pWQ9a0TZaljqbzv 0pP7WHsvvUgwKlDeg4atwShSSWmFXdhdP7vtqdgmz7/EY1PbGl1udod7LyNW2RWqMNv+ tgb1Vmyk45qmAHvMY2eSRgrbL9CjLCY6ILWiwgJYW3bwVahfx2pTxNM57by4dOZm+xck IcSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=NQX1/nLgGm3+2PtENNZ8wg0kOWx9IUAm3hr895vwa2w=; b=mI3Qau8H82ODcWbEOK1X0lqhlO/4dRmVGZaTnLspgS2BJMjnTu/SH/BP3Uo0UKX7X4 pHQiw0OIPCJQSwy1BApNS2HrT2rGQchJWmQwE4yy/++KupPGH+VEtqPfYVFBML+qTXIB 79CEU4C6H+WJtm8FnFm7pxmKcrhD1Y8N8muAqZwL0F9/u1Ap70NyOGcn8QJK6o0vT43t aUywc8ckO72VE2/cvINQjJ3VvjbKIscASxtgvWHX7W8yoeE9v8ggAGm1AUJvOqlJjwXv fk6vmAus4KaQDcR4aCGdfyyl0Zal2m3pscAGzu4AjFFkwxrGya6cBLx651LK4SHCqCpP aJYA== X-Gm-Message-State: AOAM531ADkyLAfFrM5BJ3SKrJazjRD6sj7hqIN0VfSPa5K6OemyrNQtp EBVym+dW8tGGoP8Z9qArzxBCUyM= X-Google-Smtp-Source: ABdhPJzZDkvDamRVwykiZJV7mu03Zep8kL7k5MnbO8MQBHxm7pKiEcBDvAHXM1bt+vQ0ufGJX/s57dA= X-Received: by 2002:a25:ccd3:: with SMTP id l202mr8757739ybf.484.1596227572967; Fri, 31 Jul 2020 13:32:52 -0700 (PDT) Date: Fri, 31 Jul 2020 13:32:41 -0700 Message-Id: <20200731203241.50427-1-pcc@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.28.0.163.g6104cc2f0b6-goog Subject: [PATCH] mm: introduce reference pages From: Peter Collingbourne <pcc@google.com> To: Andrew Morton <akpm@linux-foundation.org>, Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Collingbourne <pcc@google.com>, Evgenii Stepanov <eugenis@google.com>, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 590396C11 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org>
Series	mm: introduce reference pages \| expand mm: introduce reference pages

mm: introduce reference pages

Commit Message

Comments

Patch