[v2] mm: introduce reference pages
diff mbox series

Message ID 20200813220937.40973-1-pcc@google.com
State New
Headers show
Series
  • [v2] mm: introduce reference pages
Related show

Commit Message

Peter Collingbourne Aug. 13, 2020, 10:09 p.m. UTC
Introduce a new syscall, refpage_create, which returns a file
descriptor which may be mapped using mmap. Such a mapping is similar
to an anonymous mapping, but instead of clean pages being backed by the
zero page, they are instead backed by a so-called reference page, whose
contents are specified using an argument to refpage_create. Loads from
the mapping will load directly from the reference page, and initial
stores to the mapping will copy-on-write from the reference page.

Reference pages are useful in circumstances where anonymous mappings
combined with manual stores to memory would impose undesirable costs,
either in terms of performance or RSS. Use cases are focused on heap
allocators and include:

- Pattern initialization for the heap. This is where malloc(3) gives
  you memory whose contents are filled with a non-zero pattern
  byte, in order to help detect and mitigate bugs involving use
  of uninitialized memory. Typically this is implemented by having
  the allocator memset the allocation with the pattern byte before
  returning it to the user, but for large allocations this can result
  in a significant increase in RSS, especially for allocations that
  are used sparsely. Even for dense allocations there is a needless
  impact to startup performance when it may be better to amortize it
  throughout the program. By creating allocations using a reference
  page filled with the pattern byte, we can avoid these costs.

- Pre-tagged heap memory. Memory tagging [1] is an upcoming ARMv8.5
  feature which allows for memory to be tagged in order to detect
  certain kinds of memory errors with low overhead. In order to set
  up an allocation to allow memory errors to be detected, the entire
  allocation needs to have the same tag. The issue here is similar to
  pattern initialization in the sense that large tagged allocations
  will be expensive if the tagging is done up front. The idea is that
  the allocator would create reference pages with each of the possible
  memory tags, and use those reference pages for the large allocations.

In order to measure the performance and RSS impact of reference pages,
a version of this patch backported to kernel version 4.14 was tested on
a Pixel 4 together with a modified [2] version of the Scudo allocator
that uses reference pages to implement pattern initialization. A
PDFium test program was used to collect the measurements like so:

$ wget https://static.docs.arm.com/ddi0487/fb/DDI0487F_b_armv8_arm.pdf
$ /system/bin/time -v ./pdfium_test --pages=1-100 DDI0487F_b_armv8_arm.pdf

and the median of 100 runs measurement was taken with three variants
of the allocator:

- "anon" is the baseline (no pattern init)
- "memset" is with pattern init of allocator pages implemented by
  initializing anonymous pages with memset
- "refpage" is with pattern init of allocator pages implemented
  by creating reference pages

All three variants are measured using the patch that I linked. "anon"
is without the patch, "refpage" is with the patch and "memset" is
with a previous version of the patch [3] with "#if 0" in place of
"#if 1" in linux.cpp. The measurements are as follows:

          Real time (s)    Max RSS (KiB)
anon        2.237081         107088
memset      2.252241         112180
refpage     2.243786         107128

We can see that RSS for refpage is almost the same as anon, and real
time overhead is 44% that of memset.

As an alternative to introducing this syscall, I considered using
userfaultfd to implement reference pages. However, after having taken
a detailed look at the interface, it does not seem suitable to be
used in the context of a general purpose allocator. For example,
UFFD_FEATURE_FORK support would be required in order to correctly
support fork(2) in a process that uses the allocator (although POSIX
does not guarantee support for allocating after fork, many allocators
including Scudo support it, and nothing stops the forked process from
page faulting pre-existing allocations after forking anyway), but
UFFD_FEATURE_FORK has been restricted to root by commit 3c1c24d91ffd
("userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK"),
making it unsuitable for use in an allocator. Furthermore, even if
the interface issues are resolved, I suspect (but have not measured)
that the cost of the multiple context switches between kernel and
userspace would be too high to be used in an allocator anyway.

[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
[2] https://github.com/pcc/llvm-project/commit/4871b739f86a631537d1725847a27ac148a392a0
[3] https://github.com/pcc/llvm-project/commit/a05f88aaebc7daf262d6885444d9845052026f4b

Signed-off-by: Peter Collingbourne <pcc@google.com>
---
v2:
- Switch to an approach of adding a new syscall instead of modifying
  mmap(2)
- Move ownership of the reference page to the struct file to avoid
  refcount overflows

 arch/alpha/kernel/syscalls/syscall.tbl      |  1 +
 arch/arm/tools/syscall.tbl                  |  1 +
 arch/arm64/include/asm/unistd.h             |  2 +-
 arch/arm64/include/asm/unistd32.h           |  2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |  1 +
 arch/s390/kernel/syscalls/syscall.tbl       |  1 +
 arch/sh/kernel/syscalls/syscall.tbl         |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |  1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |  1 +
 include/linux/huge_mm.h                     |  7 +++
 include/linux/pgtable.h                     | 10 ++++
 include/linux/syscalls.h                    |  3 ++
 include/uapi/asm-generic/unistd.h           |  4 +-
 kernel/sys_ni.c                             |  1 +
 mm/Makefile                                 |  4 +-
 mm/gup.c                                    |  2 +-
 mm/memory.c                                 | 32 ++++++++----
 mm/migrate.c                                |  4 +-
 mm/refpage.c                                | 56 +++++++++++++++++++++
 28 files changed, 127 insertions(+), 16 deletions(-)
 create mode 100644 mm/refpage.c

Comments

Peter Collingbourne Aug. 13, 2020, 10:20 p.m. UTC | #1
On Thu, Aug 13, 2020 at 3:09 PM Peter Collingbourne <pcc@google.com> wrote:
>
> Introduce a new syscall, refpage_create, which returns a file
> descriptor which may be mapped using mmap. Such a mapping is similar
> to an anonymous mapping, but instead of clean pages being backed by the
> zero page, they are instead backed by a so-called reference page, whose
> contents are specified using an argument to refpage_create. Loads from
> the mapping will load directly from the reference page, and initial
> stores to the mapping will copy-on-write from the reference page.

Catalin, I needed this diff on top of my patch and your latest MTE
series in order for reference pages to cooperate with MTE. The first
hunk is probably fine but without the second one, the tags in the
reference page would not be set correctly in the case where the
mapping used to create the reference page was not mapped with
PROT_MTE. I'm not sure if it's appropriate to have MTE-specific stuff
directly in mm/refpage.c so it probably needs something like a new
architecture interface or a change to an existing one. Do you have any
ideas?

diff --git a/mm/refpage.c b/mm/refpage.c
index c2f62a4f0dc0..7e4e4b2aabe2 100644
--- a/mm/refpage.c
+++ b/mm/refpage.c
@@ -7,6 +7,7 @@ static int refpage_mmap(struct file *file, struct
vm_area_struct *vma)
 {
        vma_set_anonymous(vma);
        vma->vm_private_data = vma->vm_file->private_data;
+       vma->vm_flags |= VM_MTE_ALLOWED;
        return 0;
 }

@@ -44,6 +45,14 @@ SYSCALL_DEFINE2(refpage_create, const void *__user,
content, unsigned long,
        }

        copy_highpage(refpage, userpage);
+
+#ifdef CONFIG_ARM64_MTE
+       if (system_supports_mte() && !test_bit(PG_mte_tagged,
&userpage->flags)) {
+               set_bit(PG_mte_tagged, &refpage->flags);
+               mte_clear_page_tags(page_address(refpage));
+       }
+#endif
+
        put_page(userpage);

        fd = anon_inode_getfd("[refpage]", &refpage_file_operations, refpage,

Peter
Matthew Wilcox Aug. 13, 2020, 10:31 p.m. UTC | #2
On Thu, Aug 13, 2020 at 03:09:37PM -0700, Peter Collingbourne wrote:
> In order to measure the performance and RSS impact of reference pages,
> a version of this patch backported to kernel version 4.14 was tested on
> a Pixel 4 together with a modified [2] version of the Scudo allocator
> that uses reference pages to implement pattern initialization. A
> PDFium test program was used to collect the measurements like so:
> 
> $ wget https://static.docs.arm.com/ddi0487/fb/DDI0487F_b_armv8_arm.pdf
> $ /system/bin/time -v ./pdfium_test --pages=1-100 DDI0487F_b_armv8_arm.pdf
> 
> and the median of 100 runs measurement was taken with three variants
> of the allocator:
> 
> - "anon" is the baseline (no pattern init)
> - "memset" is with pattern init of allocator pages implemented by
>   initializing anonymous pages with memset
> - "refpage" is with pattern init of allocator pages implemented
>   by creating reference pages
> 
> All three variants are measured using the patch that I linked. "anon"
> is without the patch, "refpage" is with the patch and "memset" is
> with a previous version of the patch [3] with "#if 0" in place of
> "#if 1" in linux.cpp. The measurements are as follows:
> 
>           Real time (s)    Max RSS (KiB)
> anon        2.237081         107088
> memset      2.252241         112180
> refpage     2.243786         107128
> 
> We can see that RSS for refpage is almost the same as anon, and real
> time overhead is 44% that of memset.

Umm.  These numbers aren't all /that/ compelling.  memset takes 0.7%
longer than the baseline and consumes 4.8% more memory.  I was expecting
better, to be honest ;-(
Peter Collingbourne Aug. 13, 2020, 11:46 p.m. UTC | #3
On Thu, Aug 13, 2020 at 3:31 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Aug 13, 2020 at 03:09:37PM -0700, Peter Collingbourne wrote:
> > In order to measure the performance and RSS impact of reference pages,
> > a version of this patch backported to kernel version 4.14 was tested on
> > a Pixel 4 together with a modified [2] version of the Scudo allocator
> > that uses reference pages to implement pattern initialization. A
> > PDFium test program was used to collect the measurements like so:
> >
> > $ wget https://static.docs.arm.com/ddi0487/fb/DDI0487F_b_armv8_arm.pdf
> > $ /system/bin/time -v ./pdfium_test --pages=1-100 DDI0487F_b_armv8_arm.pdf
> >
> > and the median of 100 runs measurement was taken with three variants
> > of the allocator:
> >
> > - "anon" is the baseline (no pattern init)
> > - "memset" is with pattern init of allocator pages implemented by
> >   initializing anonymous pages with memset
> > - "refpage" is with pattern init of allocator pages implemented
> >   by creating reference pages
> >
> > All three variants are measured using the patch that I linked. "anon"
> > is without the patch, "refpage" is with the patch and "memset" is
> > with a previous version of the patch [3] with "#if 0" in place of
> > "#if 1" in linux.cpp. The measurements are as follows:
> >
> >           Real time (s)    Max RSS (KiB)
> > anon        2.237081         107088
> > memset      2.252241         112180
> > refpage     2.243786         107128
> >
> > We can see that RSS for refpage is almost the same as anon, and real
> > time overhead is 44% that of memset.
>
> Umm.  These numbers aren't all /that/ compelling.  memset takes 0.7%
> longer than the baseline and consumes 4.8% more memory.  I was expecting
> better, to be honest ;-(

It really depends on your environment and the program that you're
running. 5% can go a long way on memory constrained hardware like
low-end phones. And the number will go higher if the program makes
sparse use of its allocations.

Peter
kernel test robot Aug. 14, 2020, 1 a.m. UTC | #4
Hi Peter,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to mmotm/master hnaz-linux-mm/master arm64/for-next/core tip/x86/asm v5.8 next-20200813]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Collingbourne/mm-introduce-reference-pages/20200814-061235
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git dddcbc139e96bd18d8c65ef7b7e440f0d32457c2
config: m68k-randconfig-r035-20200813 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:6,
                    from include/linux/mm.h:10,
                    from drivers/char/mem.c:12:
   include/linux/pgtable.h: In function 'is_zero_or_refpage_pfn':
>> arch/m68k/include/asm/page_mm.h:165:23: error: implicit declaration of function 'page_to_nid'; did you mean 'zone_to_nid'? [-Werror=implicit-function-declaration]
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from drivers/char/mem.c:12:
   include/linux/mm.h: At top level:
   include/linux/mm.h:1283:19: error: static declaration of 'page_to_nid' follows non-static declaration
    1283 | static inline int page_to_nid(const struct page *page)
         |                   ^~~~~~~~~~~
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:6,
                    from include/linux/mm.h:10,
                    from drivers/char/mem.c:12:
   arch/m68k/include/asm/page_mm.h:165:23: note: previous implicit declaration of 'page_to_nid' was here
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from include/asm-generic/bug.h:5,
                    from arch/m68k/include/asm/bug.h:32,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:9,
                    from drivers/char/mem.c:12:
   include/linux/scatterlist.h: In function 'sg_set_buf':
   arch/m68k/include/asm/page_mm.h:169:49: warning: ordered comparison of pointer with null pointer [-Wextra]
     169 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
         |                                                 ^~
   include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
      78 | # define unlikely(x) __builtin_expect(!!(x), 0)
         |                                          ^
   include/linux/scatterlist.h:143:2: note: in expansion of macro 'BUG_ON'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |  ^~~~~~
   include/linux/scatterlist.h:143:10: note: in expansion of macro 'virt_addr_valid'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |          ^~~~~~~~~~~~~~~
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:6,
                    from include/linux/mm.h:10,
                    from drivers/char/mem.c:12:
   drivers/char/mem.c: In function 'mmap_kmem':
   arch/m68k/include/asm/page_mm.h:169:49: warning: ordered comparison of pointer with null pointer [-Wextra]
     169 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
         |                                                 ^~
   arch/m68k/include/asm/page_mm.h:170:25: note: in expansion of macro 'virt_addr_valid'
     170 | #define pfn_valid(pfn)  virt_addr_valid(pfn_to_virt(pfn))
         |                         ^~~~~~~~~~~~~~~
   drivers/char/mem.c:430:7: note: in expansion of macro 'pfn_valid'
     430 |  if (!pfn_valid(pfn))
         |       ^~~~~~~~~
   drivers/char/mem.c: In function 'read_kmem':
   arch/m68k/include/asm/page_mm.h:169:49: warning: ordered comparison of pointer with null pointer [-Wextra]
     169 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
         |                                                 ^~
   drivers/char/mem.c:476:9: note: in expansion of macro 'virt_addr_valid'
     476 |    if (!virt_addr_valid(kbuf))
         |         ^~~~~~~~~~~~~~~
   drivers/char/mem.c: In function 'do_write_kmem':
   arch/m68k/include/asm/page_mm.h:169:49: warning: ordered comparison of pointer with null pointer [-Wextra]
     169 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
         |                                                 ^~
   drivers/char/mem.c:554:8: note: in expansion of macro 'virt_addr_valid'
     554 |   if (!virt_addr_valid(ptr))
         |        ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from arch/m68k/include/asm/irqflags.h:6,
                    from include/linux/irqflags.h:16,
                    from arch/m68k/include/asm/atomic.h:6,
                    from include/linux/atomic.h:7,
                    from include/linux/rcupdate.h:25,
                    from include/linux/rculist.h:11,
                    from include/linux/pid.h:5,
                    from include/linux/sched.h:14,
                    from include/linux/utsname.h:6,
                    from drivers/char/random.c:312:
   include/linux/pgtable.h: In function 'is_zero_or_refpage_pfn':
>> arch/m68k/include/asm/page_mm.h:165:23: error: implicit declaration of function 'page_to_nid'; did you mean 'zone_to_nid'? [-Werror=implicit-function-declaration]
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from include/linux/bvec.h:13,
                    from include/linux/blk_types.h:10,
                    from include/linux/genhd.h:19,
                    from drivers/char/random.c:323:
   include/linux/mm.h: At top level:
   include/linux/mm.h:1283:19: error: static declaration of 'page_to_nid' follows non-static declaration
    1283 | static inline int page_to_nid(const struct page *page)
         |                   ^~~~~~~~~~~
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from arch/m68k/include/asm/irqflags.h:6,
                    from include/linux/irqflags.h:16,
                    from arch/m68k/include/asm/atomic.h:6,
                    from include/linux/atomic.h:7,
                    from include/linux/rcupdate.h:25,
                    from include/linux/rculist.h:11,
                    from include/linux/pid.h:5,
                    from include/linux/sched.h:14,
                    from include/linux/utsname.h:6,
                    from drivers/char/random.c:312:
   arch/m68k/include/asm/page_mm.h:165:23: note: previous implicit declaration of 'page_to_nid' was here
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from include/linux/kernel.h:11,
                    from include/linux/list.h:9,
                    from include/linux/rculist.h:10,
                    from include/linux/pid.h:5,
                    from include/linux/sched.h:14,
                    from include/linux/utsname.h:6,
                    from drivers/char/random.c:312:
   include/linux/scatterlist.h: In function 'sg_set_buf':
   arch/m68k/include/asm/page_mm.h:169:49: warning: ordered comparison of pointer with null pointer [-Wextra]
     169 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
         |                                                 ^~
   include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
      78 | # define unlikely(x) __builtin_expect(!!(x), 0)
         |                                          ^
   include/linux/scatterlist.h:143:2: note: in expansion of macro 'BUG_ON'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |  ^~~~~~
   include/linux/scatterlist.h:143:10: note: in expansion of macro 'virt_addr_valid'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |          ^~~~~~~~~~~~~~~
   drivers/char/random.c: At top level:
   drivers/char/random.c:2297:6: warning: no previous prototype for 'add_hwgenerator_randomness' [-Wmissing-prototypes]
    2297 | void add_hwgenerator_randomness(const char *buffer, size_t count,
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/wait.h:9,
                    from include/linux/poll.h:8,
                    from drivers/char/tpm/tpm-chip.c:18:
   include/linux/pgtable.h: In function 'is_zero_or_refpage_pfn':
>> arch/m68k/include/asm/page_mm.h:165:23: error: implicit declaration of function 'page_to_nid'; did you mean 'zone_to_nid'? [-Werror=implicit-function-declaration]
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from include/linux/highmem.h:8,
                    from include/linux/tpm.h:24,
                    from include/linux/tpm_eventlog.h:6,
                    from drivers/char/tpm/tpm-chip.c:24:
   include/linux/mm.h: At top level:
   include/linux/mm.h:1283:19: error: static declaration of 'page_to_nid' follows non-static declaration
    1283 | static inline int page_to_nid(const struct page *page)
         |                   ^~~~~~~~~~~
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/wait.h:9,
                    from include/linux/poll.h:8,
                    from drivers/char/tpm/tpm-chip.c:18:
   arch/m68k/include/asm/page_mm.h:165:23: note: previous implicit declaration of 'page_to_nid' was here
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/wait.h:9,
                    from include/linux/poll.h:8,
                    from drivers/char/tpm/tpm-interface.c:22:
   include/linux/pgtable.h: In function 'is_zero_or_refpage_pfn':
>> arch/m68k/include/asm/page_mm.h:165:23: error: implicit declaration of function 'page_to_nid'; did you mean 'zone_to_nid'? [-Werror=implicit-function-declaration]
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from include/linux/kallsyms.h:12,
                    from include/linux/bpf.h:21,
                    from include/linux/bpf-cgroup.h:5,
                    from include/linux/cgroup-defs.h:22,
                    from include/linux/cgroup.h:28,
                    from include/linux/memcontrol.h:13,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from drivers/char/tpm/tpm-interface.c:26:
   include/linux/mm.h: At top level:
   include/linux/mm.h:1283:19: error: static declaration of 'page_to_nid' follows non-static declaration
    1283 | static inline int page_to_nid(const struct page *page)
         |                   ^~~~~~~~~~~
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/wait.h:9,
                    from include/linux/poll.h:8,
                    from drivers/char/tpm/tpm-interface.c:22:
   arch/m68k/include/asm/page_mm.h:165:23: note: previous implicit declaration of 'page_to_nid' was here
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from include/linux/poll.h:6,
                    from drivers/char/tpm/tpm-interface.c:22:
   include/linux/scatterlist.h: In function 'sg_set_buf':
   arch/m68k/include/asm/page_mm.h:169:49: warning: ordered comparison of pointer with null pointer [-Wextra]
     169 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
         |                                                 ^~
   include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
      78 | # define unlikely(x) __builtin_expect(!!(x), 0)
         |                                          ^
   include/linux/scatterlist.h:143:2: note: in expansion of macro 'BUG_ON'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |  ^~~~~~
   include/linux/scatterlist.h:143:10: note: in expansion of macro 'virt_addr_valid'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |          ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
--
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:6,
                    from include/linux/umh.h:4,
                    from include/linux/kmod.h:9,
                    from include/linux/module.h:16,
                    from drivers/char/tpm/st33zp24/i2c.c:7:
   include/linux/pgtable.h: In function 'is_zero_or_refpage_pfn':
>> arch/m68k/include/asm/page_mm.h:165:23: error: implicit declaration of function 'page_to_nid'; did you mean 'zone_to_nid'? [-Werror=implicit-function-declaration]
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   In file included from include/linux/highmem.h:8,
                    from include/linux/tpm.h:24,
                    from drivers/char/tpm/st33zp24/i2c.c:14:
   include/linux/mm.h: At top level:
   include/linux/mm.h:1283:19: error: static declaration of 'page_to_nid' follows non-static declaration
    1283 | static inline int page_to_nid(const struct page *page)
         |                   ^~~~~~~~~~~
   In file included from arch/m68k/include/asm/page.h:60,
                    from arch/m68k/include/asm/thread_info.h:6,
                    from include/linux/thread_info.h:38,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/m68k/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:6,
                    from include/linux/umh.h:4,
                    from include/linux/kmod.h:9,
                    from include/linux/module.h:16,
                    from drivers/char/tpm/st33zp24/i2c.c:7:
   arch/m68k/include/asm/page_mm.h:165:23: note: previous implicit declaration of 'page_to_nid' was here
     165 |  pgdat = &pg_data_map[page_to_nid(__p)];    \
         |                       ^~~~~~~~~~~
   include/linux/pgtable.h:1063:17: note: in expansion of macro 'page_to_pfn'
    1063 |   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
         |                 ^~~~~~~~~~~
   drivers/char/tpm/st33zp24/i2c.c:291:36: warning: 'st33zp24_i2c_acpi_match' defined but not used [-Wunused-const-variable=]
     291 | static const struct acpi_device_id st33zp24_i2c_acpi_match[] = {
         |                                    ^~~~~~~~~~~~~~~~~~~~~~~
   drivers/char/tpm/st33zp24/i2c.c:285:34: warning: 'of_st33zp24_i2c_match' defined but not used [-Wunused-const-variable=]
     285 | static const struct of_device_id of_st33zp24_i2c_match[] = {
         |                                  ^~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +165 arch/m68k/include/asm/page_mm.h

12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  155  
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  156  #define pfn_to_page(pfn) ({						\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  157  	unsigned long __pfn = (pfn);					\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  158  	struct pglist_data *pgdat;					\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  159  	pgdat = __virt_to_node((unsigned long)pfn_to_virt(__pfn));	\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  160  	pgdat->node_mem_map + (__pfn - pgdat->node_start_pfn);		\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  161  })
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  162  #define page_to_pfn(_page) ({						\
ba8f318471f66d5 arch/m68k/include/asm/page_mm.h Ian Campbell   2011-08-18  163  	const struct page *__p = (_page);				\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  164  	struct pglist_data *pgdat;					\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31 @165  	pgdat = &pg_data_map[page_to_nid(__p)];				\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  166  	((__p) - pgdat->node_mem_map) + pgdat->node_start_pfn;		\
12d810c1b8c2b91 include/asm-m68k/page.h         Roman Zippel   2007-05-31  167  })
^1da177e4c3f415 include/asm-m68k/page.h         Linus Torvalds 2005-04-16  168  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot Aug. 14, 2020, 1:01 a.m. UTC | #5
Hi Peter,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to mmotm/master hnaz-linux-mm/master arm64/for-next/core tip/x86/asm v5.8 next-20200813]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Collingbourne/mm-introduce-reference-pages/20200814-061235
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git dddcbc139e96bd18d8c65ef7b7e440f0d32457c2
config: x86_64-randconfig-a015-20200813 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 62ef1cb2079123b86878e4bfed3c14db448f1373)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/x86/kernel/asm-offsets.c:9:
   In file included from include/linux/crypto.h:20:
   In file included from include/linux/slab.h:136:
   In file included from include/linux/kasan.h:14:
>> include/linux/pgtable.h:1063:17: error: implicit declaration of function 'page_to_section' [-Werror,-Wimplicit-function-declaration]
                   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
                                 ^
   include/asm-generic/memory_model.h:81:21: note: expanded from macro 'page_to_pfn'
   #define page_to_pfn __page_to_pfn
                       ^
   include/asm-generic/memory_model.h:64:14: note: expanded from macro '__page_to_pfn'
           int __sec = page_to_section(__pg);                      \
                       ^
   include/linux/pgtable.h:1063:17: note: did you mean '__nr_to_section'?
   include/asm-generic/memory_model.h:81:21: note: expanded from macro 'page_to_pfn'
   #define page_to_pfn __page_to_pfn
                       ^
   include/asm-generic/memory_model.h:64:14: note: expanded from macro '__page_to_pfn'
           int __sec = page_to_section(__pg);                      \
                       ^
   include/linux/mmzone.h:1220:35: note: '__nr_to_section' declared here
   static inline struct mem_section *__nr_to_section(unsigned long nr)
                                     ^
   In file included from arch/x86/kernel/asm-offsets.c:13:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:28:
   In file included from include/linux/cgroup-defs.h:22:
   In file included from include/linux/bpf-cgroup.h:5:
   In file included from include/linux/bpf.h:21:
   In file included from include/linux/kallsyms.h:12:
>> include/linux/mm.h:1444:29: error: static declaration of 'page_to_section' follows non-static declaration
   static inline unsigned long page_to_section(const struct page *page)
                               ^
   include/linux/pgtable.h:1063:17: note: previous implicit declaration is here
                   return pfn == page_to_pfn((struct page *)vma->vm_private_data);
                                 ^
   include/asm-generic/memory_model.h:81:21: note: expanded from macro 'page_to_pfn'
   #define page_to_pfn __page_to_pfn
                       ^
   include/asm-generic/memory_model.h:64:14: note: expanded from macro '__page_to_pfn'
           int __sec = page_to_section(__pg);                      \
                       ^
   2 errors generated.
   make[2]: *** [scripts/Makefile.build:117: arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1203: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:185: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

vim +/page_to_section +1063 include/linux/pgtable.h

  1056	
  1057	static inline int is_zero_or_refpage_pfn(struct vm_area_struct *vma,
  1058						 unsigned long pfn)
  1059	{
  1060		if (is_zero_pfn(pfn))
  1061			return true;
  1062		if (unlikely(!vma->vm_ops && vma->vm_private_data))
> 1063			return pfn == page_to_pfn((struct page *)vma->vm_private_data);
  1064		return false;
  1065	}
  1066	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot Aug. 14, 2020, 1:11 a.m. UTC | #6
Hi Peter,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to mmotm/master hnaz-linux-mm/master arm64/for-next/core tip/x86/asm v5.8 next-20200813]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Collingbourne/mm-introduce-reference-pages/20200814-061235
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git dddcbc139e96bd18d8c65ef7b7e440f0d32457c2
config: s390-allyesconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from arch/s390/include/uapi/asm/unistd.h:14,
                    from arch/s390/include/asm/unistd.h:10,
                    from arch/s390/kernel/compat_audit.c:3:
>> ./arch/s390/include/generated/uapi/asm/unistd_32.h:415: warning: "__NR_faccessat2" redefined
     415 | #define __NR_faccessat2 440
         | 
   ./arch/s390/include/generated/uapi/asm/unistd_32.h:414: note: this is the location of the previous definition
     414 | #define __NR_faccessat2 439
         | 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Patch
diff mbox series

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index a28fb211881d..efbdbceba085 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@ 
 547	common	openat2				sys_openat2
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	faccessat2			sys_faccessat2
+550	common	refpage_create			sys_refpage_create
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 7e8ee4adf269..68f0a0822ed6 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -453,3 +453,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3b859596840d..b3b2019f8d16 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@ 
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		441
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 17e81bd9a2d3..18ff5382341c 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -887,6 +887,8 @@  __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_refpage_create 440
+__SYSCALL(__NR_refpage_create, sys_refpage_create)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index ced9c83e47c9..dd58ddc63d92 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -360,3 +360,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 1a4822de7292..fe9c2ffcbf63 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index a3f4be8e7238..d8ef9318ac7f 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -445,3 +445,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 6b4ee92e3aed..8970f55475c4 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -378,3 +378,4 @@ 
 437	n32	openat2				sys_openat2
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	faccessat2			sys_faccessat2
+440	n32	refpage_create			sys_refpage_create
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 391acbf425a0..894645fc00a2 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -354,3 +354,4 @@ 
 437	n64	openat2				sys_openat2
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	faccessat2			sys_faccessat2
+440	n64	refpage_create			sys_refpage_create
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 5727c5187508..43957e224dbf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -427,3 +427,4 @@ 
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	faccessat2			sys_faccessat2
+440	o32	refpage_create			sys_refpage_create
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 292baabefade..d6d8d7c5e60a 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -437,3 +437,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index be9f74546068..a73e79116f43 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -529,3 +529,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index f1fda4375526..956253c47c07 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@ 
 437  common	openat2			sys_openat2			sys_openat2
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
+440  common	faccessat2		sys_refpage_create		sys_refpage_create
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 96848db9659e..5e3d7f569603 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 46024e80ee86..8b21deb46ef5 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -485,3 +485,4 @@ 
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	refpage_create			sys_refpage_create
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index e31a75262c9c..c614da77e1a0 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -444,3 +444,4 @@ 
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
+440	i386	refpage_create		sys_refpage_create
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9d82078c949a..7f7ab6bab41e 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -361,6 +361,7 @@ 
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
+440	common	refpage_create		sys_refpage_create
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index d216ccba42f7..a086512e8f06 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -410,3 +410,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+439	common	refpage_create			sys_refpage_create
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 467302056e17..a1dc07ff914a 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -175,6 +175,13 @@  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 
 	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
 		return false;
+
+	/*
+	 * Transparent hugepages not currently supported for anonymous VMAs with
+	 * reference pages
+	 */
+	if (unlikely(vma->vm_private_data))
+		return false;
 	return true;
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index a124c21e3204..1059dc75b1e3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1054,6 +1054,16 @@  static inline unsigned long my_zero_pfn(unsigned long addr)
 }
 #endif
 
+static inline int is_zero_or_refpage_pfn(struct vm_area_struct *vma,
+					 unsigned long pfn)
+{
+	if (is_zero_pfn(pfn))
+		return true;
+	if (unlikely(!vma->vm_ops && vma->vm_private_data))
+		return pfn == page_to_pfn((struct page *)vma->vm_private_data);
+	return false;
+}
+
 #ifdef CONFIG_MMU
 
 #ifndef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index dc2b827c81e5..7ee15611729e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -831,6 +831,9 @@  asmlinkage long sys_mremap(unsigned long addr,
 			   unsigned long old_len, unsigned long new_len,
 			   unsigned long flags, unsigned long new_addr);
 
+/* mm/refpage.c */
+asmlinkage long sys_refpage_create(const void __user *content, unsigned long flags);
+
 /* security/keys/keyctl.c */
 asmlinkage long sys_add_key(const char __user *_type,
 			    const char __user *_description,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d..26d99bd30e1e 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -859,9 +859,11 @@  __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_refpage_create 440
+__SYSCALL(__NR_refpage_create, sys_refpage_create)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..01af430d31da 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -291,6 +291,7 @@  COND_SYSCALL(migrate_pages);
 COND_SYSCALL_COMPAT(migrate_pages);
 COND_SYSCALL(move_pages);
 COND_SYSCALL_COMPAT(move_pages);
+COND_SYSCALL(refpage_create);
 
 COND_SYSCALL(perf_event_open);
 COND_SYSCALL(accept4);
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..b2cc6f66d4e7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -35,10 +35,10 @@  CFLAGS_init-mm.o += $(call cc-disable-warning, override-init)
 CFLAGS_init-mm.o += $(call cc-disable-warning, initializer-overrides)
 
 mmu-y			:= nommu.o
-mmu-$(CONFIG_MMU)	:= highmem.o memory.o mincore.o \
+mmu-$(CONFIG_MMU)	:= highmem.o ioremap.o memory.o mincore.o \
 			   mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \
 			   msync.o page_vma_mapped.o pagewalk.o \
-			   pgtable-generic.o rmap.o vmalloc.o ioremap.o
+			   pgtable-generic.o refpage.o rmap.o vmalloc.o
 
 
 ifdef CONFIG_CROSS_MEMORY_ATTACH
diff --git a/mm/gup.c b/mm/gup.c
index 39e58df6925d..5b4c3e3c86b9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -463,7 +463,7 @@  static struct page *follow_page_pte(struct vm_area_struct *vma,
 			goto out;
 		}
 
-		if (is_zero_pfn(pte_pfn(pte))) {
+		if (is_zero_or_refpage_pfn(vma, pte_pfn(pte))) {
 			page = pte_page(pte);
 		} else {
 			ret = follow_pfn_pte(vma, address, ptep, flags);
diff --git a/mm/memory.c b/mm/memory.c
index 228efaca75d3..3289fceae9ca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -602,7 +602,7 @@  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			return vma->vm_ops->find_special_page(vma, addr);
 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
 			return NULL;
-		if (is_zero_pfn(pfn))
+		if (is_zero_or_refpage_pfn(vma, pfn))
 			return NULL;
 		if (pte_devmap(pte))
 			return NULL;
@@ -628,7 +628,7 @@  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 		}
 	}
 
-	if (is_zero_pfn(pfn))
+	if (is_zero_or_refpage_pfn(vma, pfn))
 		return NULL;
 
 check_pfn:
@@ -1880,7 +1880,7 @@  static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn)
 		return true;
 	if (pfn_t_special(pfn))
 		return true;
-	if (is_zero_pfn(pfn_t_to_pfn(pfn)))
+	if (is_zero_or_refpage_pfn(vma, pfn_t_to_pfn(pfn)))
 		return true;
 	return false;
 }
@@ -3322,6 +3322,7 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
+	struct page *refpage = vma->vm_private_data;
 	struct page *page;
 	vm_fault_t ret = 0;
 	pte_t entry;
@@ -3347,11 +3348,16 @@  static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	if (unlikely(pmd_trans_unstable(vmf->pmd)))
 		return 0;
 
-	/* Use the zero-page for reads */
+	/* Use the zero-page, or reference page if set, for reads */
 	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
 			!mm_forbids_zeropage(vma->vm_mm)) {
-		entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
-						vma->vm_page_prot));
+		unsigned long pfn;
+
+		if (unlikely(refpage))
+			pfn = page_to_pfn(refpage);
+		else
+			pfn = my_zero_pfn(vmf->address);
+		entry = pte_mkspecial(pfn_pte(pfn, vma->vm_page_prot));
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 				vmf->address, &vmf->ptl);
 		if (!pte_none(*vmf->pte)) {
@@ -3372,9 +3378,17 @@  static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	/* Allocate our own private page. */
 	if (unlikely(anon_vma_prepare(vma)))
 		goto oom;
-	page = alloc_zeroed_user_highpage_movable(vma, vmf->address);
-	if (!page)
-		goto oom;
+
+	if (unlikely(refpage)) {
+		page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+		if (!page)
+			goto oom;
+		copy_user_highpage(page, refpage, vmf->address, vma);
+	} else {
+		page = alloc_zeroed_user_highpage_movable(vma, vmf->address);
+		if (!page)
+			goto oom;
+	}
 
 	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL))
 		goto oom_free_page;
diff --git a/mm/migrate.c b/mm/migrate.c
index 5053439be6ab..6e9246d09e95 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2841,8 +2841,8 @@  static void migrate_vma_insert_page(struct migrate_vma *migrate,
 	pmd_t *pmdp;
 	pte_t *ptep;
 
-	/* Only allow populating anonymous memory */
-	if (!vma_is_anonymous(vma))
+	/* Only allow populating anonymous memory without a reference page */
+	if (!vma_is_anonymous(vma) || vma->private_data)
 		goto abort;
 
 	pgdp = pgd_offset(mm, addr);
diff --git a/mm/refpage.c b/mm/refpage.c
new file mode 100644
index 000000000000..c5fc66a38a51
--- /dev/null
+++ b/mm/refpage.c
@@ -0,0 +1,56 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/anon_inodes.h>
+#include <linux/fs_context.h>
+#include <linux/highmem.h>
+#include <linux/mount.h>
+#include <linux/syscalls.h>
+
+static int refpage_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	vma_set_anonymous(vma);
+	vma->vm_private_data = vma->vm_file->private_data;
+	return 0;
+}
+
+static int refpage_release(struct inode *inode, struct file *file)
+{
+	put_page(file->private_data);
+	return 0;
+}
+
+static const struct file_operations refpage_file_operations = {
+	.mmap = refpage_mmap,
+	.release = refpage_release,
+};
+
+SYSCALL_DEFINE2(refpage_create, const void *__user, content, unsigned long,
+		flags)
+{
+	unsigned long content_addr = (unsigned long)content;
+	struct page *userpage, *refpage;
+	int fd;
+
+	if (flags != 0)
+		return -EINVAL;
+
+	refpage = alloc_page(GFP_KERNEL);
+	if (!refpage)
+		return -ENOMEM;
+
+	if ((content_addr & (PAGE_SIZE - 1)) != 0 ||
+	    get_user_pages(content_addr, 1, 0, &userpage, 0) != 1) {
+		put_page(refpage);
+		return -EFAULT;
+	}
+
+	copy_highpage(refpage, userpage);
+	put_page(userpage);
+
+	fd = anon_inode_getfd("[refpage]", &refpage_file_operations, refpage,
+			      O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		put_page(refpage);
+
+	return fd;
+}