From patchwork Mon Jul 20 09:24:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 11673379 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8575313A4 for ; Mon, 20 Jul 2020 09:25:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B76122BF3 for ; Mon, 20 Jul 2020 09:25:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237123; bh=hdVX0IsUo7eSkaqkoaF98FlrWWgQtbf1bZ3UXG9vp9E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=D/skD9azY8hsTO7Te5i6BHYUW9AxUToIGzkBBhLcmt225HAfORODOVf03XkFrOFsP Loa4VUSFNAXo8slb+BQOYAKH+jBKMlFCYmb6vPH6eKY94YQx5U0H337L/lAP4/jIwW HbCUq9NFSdMcEHSexWXZ7R9u/pM5m+SjWeFfXk10= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728244AbgGTJZU (ORCPT ); Mon, 20 Jul 2020 05:25:20 -0400 Received: from mail.kernel.org ([198.145.29.99]:38680 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726520AbgGTJZT (ORCPT ); Mon, 20 Jul 2020 05:25:19 -0400 Received: from aquarius.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3425921775; Mon, 20 Jul 2020 09:25:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237118; bh=hdVX0IsUo7eSkaqkoaF98FlrWWgQtbf1bZ3UXG9vp9E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aAapEE0p74NATUUVPXGp3NVqZWe0Odu+heF2bzFKM4D5o65DTfBQlPgNDp8AOF/IR cb4dsV5gr8z+qi4NQe4F+mjwYCMNHlz8tjtqiAAt2PfiU9HMWzI91Vn3jmtmU+kVse wSTYl6Q7+f/fP/UqEgEVOF0otXZadeHcIHitULkU= From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexander Viro , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Idan Yaniv , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mike Rapoport , Mike Rapoport , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: [PATCH 1/6] mm: add definition of PMD_PAGE_ORDER Date: Mon, 20 Jul 2020 12:24:30 +0300 Message-Id: <20200720092435.17469-2-rppt@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200720092435.17469-1-rppt@kernel.org> References: <20200720092435.17469-1-rppt@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Mike Rapoport The definition of PMD_PAGE_ORDER denoting the number of base pages in the second-level leaf page is already used by DAX and maybe handy in other cases as well. Several architectures already have definition of PMD_ORDER as the size of second level page table, so to avoid conflict with these definitions use PMD_PAGE_ORDER name and update DAX respectively. Signed-off-by: Mike Rapoport --- fs/dax.c | 10 +++++----- include/linux/pgtable.h | 3 +++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 11b16729b86f..b91d8c8dda45 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -50,7 +50,7 @@ static inline unsigned int pe_order(enum page_entry_size pe_size) #define PG_PMD_NR (PMD_SIZE >> PAGE_SHIFT) /* The order of a PMD entry */ -#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) +#define PMD_PAGE_ORDER (PMD_SHIFT - PAGE_SHIFT) static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES]; @@ -98,7 +98,7 @@ static bool dax_is_locked(void *entry) static unsigned int dax_entry_order(void *entry) { if (xa_to_value(entry) & DAX_PMD) - return PMD_ORDER; + return PMD_PAGE_ORDER; return 0; } @@ -1456,7 +1456,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, { struct vm_area_struct *vma = vmf->vma; struct address_space *mapping = vma->vm_file->f_mapping; - XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_ORDER); + XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_PAGE_ORDER); unsigned long pmd_addr = vmf->address & PMD_MASK; bool write = vmf->flags & FAULT_FLAG_WRITE; bool sync; @@ -1515,7 +1515,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * entry is already in the array, for instance), it will return * VM_FAULT_FALLBACK. */ - entry = grab_mapping_entry(&xas, mapping, PMD_ORDER); + entry = grab_mapping_entry(&xas, mapping, PMD_PAGE_ORDER); if (xa_is_internal(entry)) { result = xa_to_internal(entry); goto fallback; @@ -1681,7 +1681,7 @@ dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order) if (order == 0) ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn); #ifdef CONFIG_FS_DAX_PMD - else if (order == PMD_ORDER) + else if (order == PMD_PAGE_ORDER) ret = vmf_insert_pfn_pmd(vmf, pfn, FAULT_FLAG_WRITE); #endif else diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 56c1e8eb7bb0..79f8443609e7 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -28,6 +28,9 @@ #define USER_PGTABLES_CEILING 0UL #endif +/* Number of base pages in a second level leaf page */ +#define PMD_PAGE_ORDER (PMD_SHIFT - PAGE_SHIFT) + /* * A page table page can be thought of an array like this: pXd_t[PTRS_PER_PxD] * From patchwork Mon Jul 20 09:24:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 11673383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC7B160D for ; Mon, 20 Jul 2020 09:25:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C546B22B4E for ; Mon, 20 Jul 2020 09:25:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237130; bh=vyyX1LP4325hfnYgliYCjSjLjNxzM/CYe+v7FBLTqDc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=nr4fBexdUeWyYO+Q6IHydQd/1dr+tY/YFCBBxn3p+UmIw6yrF0/dOrpMG32bVbA4O j/xBetQQKS2hqK3/2FFKbOhD0EyZN9GWqhDbiW+o4UeWUEBR2l9eIs10S8hTtQWldb CXhNZLlWhaV+S+kOyk31E5viN+97GaW2OnLjhOLU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728267AbgGTJZ2 (ORCPT ); Mon, 20 Jul 2020 05:25:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:38964 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727769AbgGTJZ1 (ORCPT ); Mon, 20 Jul 2020 05:25:27 -0400 Received: from aquarius.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 738D922482; Mon, 20 Jul 2020 09:25:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237127; bh=vyyX1LP4325hfnYgliYCjSjLjNxzM/CYe+v7FBLTqDc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g5qTGQIOWj1V9NU4yK00jz5UQP5Nt5v+OLzEYnJ2fYYF2s2RpuPHOjWCkAadxsvBt KfGvj5cZa5KE3IDq7FgOMY35cboLHnI1EYroJXHTtzwIE4rD8KANUXvfFp/YUVmUzZ gfaX0xMnS7xqZREdic3XBeaF92Yg2TjKzwFszmFw= From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexander Viro , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Idan Yaniv , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mike Rapoport , Mike Rapoport , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: [PATCH 2/6] mmap: make mlock_future_check() global Date: Mon, 20 Jul 2020 12:24:31 +0300 Message-Id: <20200720092435.17469-3-rppt@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200720092435.17469-1-rppt@kernel.org> References: <20200720092435.17469-1-rppt@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Mike Rapoport It will be used by the upcoming secret memory implementation. Signed-off-by: Mike Rapoport --- mm/internal.h | 3 +++ mm/mmap.c | 5 ++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9886db20d94f..af0a92f8f6bc 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -349,6 +349,9 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma) extern void mlock_vma_page(struct page *page); extern unsigned int munlock_vma_page(struct page *page); +extern int mlock_future_check(struct mm_struct *mm, unsigned long flags, + unsigned long len); + /* * Clear the page's PageMlocked(). This can be useful in a situation where * we want to unconditionally remove a page from the pagecache -- e.g., diff --git a/mm/mmap.c b/mm/mmap.c index 59a4682ebf3f..4dd40a4fedfb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1310,9 +1310,8 @@ static inline unsigned long round_hint_to_min(unsigned long hint) return hint; } -static inline int mlock_future_check(struct mm_struct *mm, - unsigned long flags, - unsigned long len) +int mlock_future_check(struct mm_struct *mm, unsigned long flags, + unsigned long len) { unsigned long locked, lock_limit; From patchwork Mon Jul 20 09:24:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 11673389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B23313A4 for ; Mon, 20 Jul 2020 09:25:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29E9622CB2 for ; Mon, 20 Jul 2020 09:25:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237141; bh=Z/sZw0k7cZFxzkUbMe55tJwLGvtJJOQ7SGuhFi0U+rs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=rdAv05DVBFB4jjMobKHoOthOmd5tw2rgXYnU7ZFXIe6hFNPzn4yRw044CRvdeHPlE CTT9CH36nwRamPaHsZVaBBNsenrF2WK8gcyX6rngnTnS96q7Y+zEcDaZj7BC0xMJDl myoLmPvjAdXL0d0/TDgX1VDeA3J7kG2M0H/iSgyc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728285AbgGTJZh (ORCPT ); Mon, 20 Jul 2020 05:25:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:39212 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727769AbgGTJZh (ORCPT ); Mon, 20 Jul 2020 05:25:37 -0400 Received: from aquarius.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CD77F2176B; Mon, 20 Jul 2020 09:25:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237135; bh=Z/sZw0k7cZFxzkUbMe55tJwLGvtJJOQ7SGuhFi0U+rs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZgkGf1Wc9H6JzuDl5pY0/KECfPO1tcrBYmpW1+56T7pB9O569N0dkSVB4WpXrecrb rYgQdhhtzTz2r/eTJjgKf7BRLAF+HCBEp8TNnXAMNQg4MfPgGW6ao+YXzAYnVioNV/ uVXs5bIdL4Q+6NN3XCNbWe++ZmkNYi1XRvldfJgQ= From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexander Viro , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Idan Yaniv , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mike Rapoport , Mike Rapoport , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: [PATCH 3/6] mm: introduce secretmemfd system call to create "secret" memory areas Date: Mon, 20 Jul 2020 12:24:32 +0300 Message-Id: <20200720092435.17469-4-rppt@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200720092435.17469-1-rppt@kernel.org> References: <20200720092435.17469-1-rppt@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Mike Rapoport Introduce "secretmemfd" system call with the ability to create memory areas visible only in the context of the owning process and not mapped not only to other processes but in the kernel page tables as well. The user will create a file descriptor using the secretmemfd system call where flags supplied as a parameter to this system call will define the desired protection mode for the memory associated with that file descriptor. Currently there are two protection modes: * exclusive - the memory area is unmapped from the kernel direct map and it is present only in the page tables of the owning mm. * uncached - the memory area is present only in the page tables of the owning mm and it is mapped there as uncached. For instance, the following example will create an uncached mapping (error handling is omitted): fd = secretmemfd(SECRETMEM_UNCACHED); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); Signed-off-by: Mike Rapoport --- include/uapi/linux/magic.h | 1 + include/uapi/linux/secretmem.h | 9 ++ mm/Kconfig | 4 + mm/Makefile | 1 + mm/secretmem.c | 263 +++++++++++++++++++++++++++++++++ 5 files changed, 278 insertions(+) create mode 100644 include/uapi/linux/secretmem.h create mode 100644 mm/secretmem.c diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index f3956fc11de6..35687dcb1a42 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -97,5 +97,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define Z3FOLD_MAGIC 0x33 #define PPC_CMM_MAGIC 0xc7571590 +#define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #endif /* __LINUX_MAGIC_H__ */ diff --git a/include/uapi/linux/secretmem.h b/include/uapi/linux/secretmem.h new file mode 100644 index 000000000000..cef7a59f7492 --- /dev/null +++ b/include/uapi/linux/secretmem.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_SECRERTMEM_H +#define _UAPI_LINUX_SECRERTMEM_H + +/* secretmem operation modes */ +#define SECRETMEM_EXCLUSIVE 0x1 +#define SECRETMEM_UNCACHED 0x2 + +#endif /* _UAPI_LINUX_SECRERTMEM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index f2104cc0d35c..c5aa948214f9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -872,4 +872,8 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +config SECRETMEM + def_bool ARCH_HAS_SET_DIRECT_MAP + select GENERIC_ALLOCATOR + endmenu diff --git a/mm/Makefile b/mm/Makefile index 6e9d46b2efc9..c2aa7a393b73 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -121,3 +121,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_SECRETMEM) += secretmem.o diff --git a/mm/secretmem.c b/mm/secretmem.c new file mode 100644 index 000000000000..2f65219baf80 --- /dev/null +++ b/mm/secretmem.c @@ -0,0 +1,263 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include + +#include "internal.h" + +#undef pr_fmt +#define pr_fmt(fmt) "secretmem: " fmt + +#define SECRETMEM_MODE_MASK (SECRETMEM_EXCLUSIVE | SECRETMEM_UNCACHED) +#define SECRETMEM_FLAGS_MASK SECRETMEM_MODE_MASK + +struct secretmem_ctx { + unsigned int mode; +}; + +static struct page *secretmem_alloc_page(gfp_t gfp) +{ + /* + * FIXME: use a cache of large pages to reduce the direct map + * fragmentation + */ + return alloc_page(gfp); +} + +static vm_fault_t secretmem_fault(struct vm_fault *vmf) +{ + struct address_space *mapping = vmf->vma->vm_file->f_mapping; + struct inode *inode = file_inode(vmf->vma->vm_file); + pgoff_t offset = vmf->pgoff; + unsigned long addr; + struct page *page; + int ret = 0; + + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) + return vmf_error(-EINVAL); + + page = find_get_entry(mapping, offset); + if (!page) { + page = secretmem_alloc_page(vmf->gfp_mask); + if (!page) + return vmf_error(-ENOMEM); + + ret = add_to_page_cache(page, mapping, offset, vmf->gfp_mask); + if (unlikely(ret)) + goto err_put_page; + + ret = set_direct_map_invalid_noflush(page); + if (ret) + goto err_del_page_cache; + + addr = (unsigned long)page_address(page); + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + + __SetPageUptodate(page); + + ret = VM_FAULT_LOCKED; + } + + vmf->page = page; + return ret; + +err_del_page_cache: + delete_from_page_cache(page); +err_put_page: + put_page(page); + return vmf_error(ret); +} + +static const struct vm_operations_struct secretmem_vm_ops = { + .fault = secretmem_fault, +}; + +static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct secretmem_ctx *ctx = file->private_data; + unsigned long mode = ctx->mode; + unsigned long len = vma->vm_end - vma->vm_start; + + if (!mode) + return -EINVAL; + + if (mlock_future_check(vma->vm_mm, vma->vm_flags | VM_LOCKED, len)) + return -EAGAIN; + + switch (mode) { + case SECRETMEM_UNCACHED: + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + fallthrough; + case SECRETMEM_EXCLUSIVE: + vma->vm_ops = &secretmem_vm_ops; + break; + default: + return -EINVAL; + } + + vma->vm_flags |= VM_LOCKED; + + return 0; +} + +const struct file_operations secretmem_fops = { + .mmap = secretmem_mmap, +}; + +static bool secretmem_isolate_page(struct page *page, isolate_mode_t mode) +{ + return false; +} + +static int secretmem_migratepage(struct address_space *mapping, + struct page *newpage, struct page *page, + enum migrate_mode mode) +{ + return -EBUSY; +} + +static void secretmem_freepage(struct page *page) +{ + set_direct_map_default_noflush(page); +} + +static const struct address_space_operations secretmem_aops = { + .freepage = secretmem_freepage, + .migratepage = secretmem_migratepage, + .isolate_page = secretmem_isolate_page, +}; + +static struct vfsmount *secretmem_mnt; + +static struct file *secretmem_file_create(unsigned long flags) +{ + struct file *file = ERR_PTR(-ENOMEM); + struct secretmem_ctx *ctx; + struct inode *inode; + + inode = alloc_anon_inode(secretmem_mnt->mnt_sb); + if (IS_ERR(inode)) + return ERR_CAST(inode); + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + goto err_free_inode; + + file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem", + O_RDWR, &secretmem_fops); + if (IS_ERR(file)) + goto err_free_ctx; + + mapping_set_unevictable(inode->i_mapping); + + inode->i_mapping->private_data = ctx; + inode->i_mapping->a_ops = &secretmem_aops; + + /* pretend we are a normal file with zero size */ + inode->i_mode |= S_IFREG; + inode->i_size = 0; + + file->private_data = ctx; + + ctx->mode = flags & SECRETMEM_MODE_MASK; + + return file; + +err_free_ctx: + kfree(ctx); +err_free_inode: + iput(inode); + return file; +} + +SYSCALL_DEFINE1(secretmemfd, unsigned long, flags) +{ + struct file *file; + unsigned int mode; + int fd, err; + + /* make sure local flags do not confict with global fcntl.h */ + BUILD_BUG_ON(SECRETMEM_FLAGS_MASK & O_CLOEXEC); + + if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC)) + return -EINVAL; + + /* modes are mutually exclusive, only one mode bit should be set */ + mode = flags & SECRETMEM_FLAGS_MASK; + if (ffs(mode) != fls(mode)) + return -EINVAL; + + fd = get_unused_fd_flags(flags & O_CLOEXEC); + if (fd < 0) + return fd; + + file = secretmem_file_create(flags); + if (IS_ERR(file)) { + err = PTR_ERR(file); + goto err_put_fd; + } + + file->f_flags |= O_LARGEFILE; + + fd_install(fd, file); + return fd; + +err_put_fd: + put_unused_fd(fd); + return err; +} + +static void secretmem_evict_inode(struct inode *inode) +{ + struct secretmem_ctx *ctx = inode->i_private; + + truncate_inode_pages_final(&inode->i_data); + clear_inode(inode); + kfree(ctx); +} + +static const struct super_operations secretmem_super_ops = { + .evict_inode = secretmem_evict_inode, +}; + +static int secretmem_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx = init_pseudo(fc, SECRETMEM_MAGIC); + + if (!ctx) + return -ENOMEM; + ctx->ops = &secretmem_super_ops; + + return 0; +} + +static struct file_system_type secretmem_fs = { + .name = "secretmem", + .init_fs_context = secretmem_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static int secretmem_init(void) +{ + int ret = 0; + + secretmem_mnt = kern_mount(&secretmem_fs); + if (IS_ERR(secretmem_mnt)) + ret = PTR_ERR(secretmem_mnt); + + return ret; +} +fs_initcall(secretmem_init); From patchwork Mon Jul 20 09:24:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 11673397 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15F3160D for ; Mon, 20 Jul 2020 09:25:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F1C3D2176B for ; Mon, 20 Jul 2020 09:25:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237149; bh=YMyAfjljQc8EzCHFIHsxaCYd3kRBN7HbarnPGa14M7k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=lyfi0eftbDLZVFfG53iSjRNtOgrPRmGOUEq0IPiRAjjuuEi9zF2cE+95v7dfqxtF+ ShbjFJnfGVY/ZcCPXAc2gkMTCkAt/1JS1FADMIEdDXtOZKsawU0+W6O/dpe2Wyqxa0 yy1TebkyjVtAn0xGvoj2YLH2/IoPXvau2eFwgixk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728300AbgGTJZp (ORCPT ); Mon, 20 Jul 2020 05:25:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:39570 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726730AbgGTJZo (ORCPT ); Mon, 20 Jul 2020 05:25:44 -0400 Received: from aquarius.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4B51821775; Mon, 20 Jul 2020 09:25:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237144; bh=YMyAfjljQc8EzCHFIHsxaCYd3kRBN7HbarnPGa14M7k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R/sQXHHftzERFhSX4qe/uBjqX1PQydghDLSbhQH84dS6AvDOZYJNhqdbcctYhKHqy 6yHeLNt65hm5HlvIwNeOGGFk12IFwzl/x/ngBS/kMGJK7lUlQ9BMmqxPT158JLP1sj unUTxs6WPJwzX9HI1Y8+tYlm3Cv1r1fkKGph/VT0= From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexander Viro , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Idan Yaniv , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mike Rapoport , Mike Rapoport , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: [PATCH 4/6] arch, mm: wire up secretmemfd system call were relevant Date: Mon, 20 Jul 2020 12:24:33 +0300 Message-Id: <20200720092435.17469-5-rppt@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200720092435.17469-1-rppt@kernel.org> References: <20200720092435.17469-1-rppt@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Mike Rapoport Wire up secretmemfd system call on architectures that define ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86. Signed-off-by: Mike Rapoport Acked-by: Palmer Dabbelt --- arch/arm64/include/asm/unistd32.h | 2 ++ arch/arm64/include/uapi/asm/unistd.h | 1 + arch/riscv/include/asm/unistd.h | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 7 ++++++- 7 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 6d95d0c8bf2f..f9e00baa67f5 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2) __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) #define __NR_faccessat2 439 __SYSCALL(__NR_faccessat2, sys_faccessat2) +#define __NR_secretmemfd 439 +__SYSCALL(__NR_secretmemfd, sys_secretmemfd) /* * Please add new compat syscalls above this comment and update diff --git a/arch/arm64/include/uapi/asm/unistd.h b/arch/arm64/include/uapi/asm/unistd.h index f83a70e07df8..f2693f05fc80 100644 --- a/arch/arm64/include/uapi/asm/unistd.h +++ b/arch/arm64/include/uapi/asm/unistd.h @@ -20,5 +20,6 @@ #define __ARCH_WANT_SET_GET_RLIMIT #define __ARCH_WANT_TIME32_SYSCALLS #define __ARCH_WANT_SYS_CLONE3 +#define __ARCH_WANT_SECRETMEMFD #include diff --git a/arch/riscv/include/asm/unistd.h b/arch/riscv/include/asm/unistd.h index 977ee6181dab..9e47d9aed5eb 100644 --- a/arch/riscv/include/asm/unistd.h +++ b/arch/riscv/include/asm/unistd.h @@ -9,6 +9,7 @@ */ #define __ARCH_WANT_SYS_CLONE +#define __ARCH_WANT_SECRETMEMFD #include diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index d8f8a1a69ed1..7b91a932ed13 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -443,3 +443,4 @@ 437 i386 openat2 sys_openat2 438 i386 pidfd_getfd sys_pidfd_getfd 439 i386 faccessat2 sys_faccessat2 +440 i386 secretmemfd sys_secretmemfd diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 78847b32e137..9cddea4ec1ce 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -360,6 +360,7 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common secretmemfd sys_secretmemfd # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index b951a87da987..8fe242fc70ea 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1005,6 +1005,7 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_secretmemfd(unsigned long flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index f4a01305d9a6..aadaf25cbf5a 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -858,8 +858,13 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) #define __NR_faccessat2 439 __SYSCALL(__NR_faccessat2, sys_faccessat2) +#ifdef __ARCH_WANT_SECRETMEMFD +#define __NR_secretmemfd 440 +__SYSCALL(__NR_secretmemfd, sys_secretmemfd) +#endif + #undef __NR_syscalls -#define __NR_syscalls 440 +#define __NR_syscalls 441 /* * 32 bit systems traditionally used different From patchwork Mon Jul 20 09:24:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 11673403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9849160D for ; Mon, 20 Jul 2020 09:25:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7C9D521775 for ; Mon, 20 Jul 2020 09:25:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237157; bh=9kPtPKwEUAnSuahV3W8pSHhDvS8uHdOx88OWN7WHnng=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=k9blpz75Z3bBOUtVnxeL+5QN/VLfwJI33ZDHhu6scTc/xUBkg6XZmDbk4pNmvBvqi lBcSmC7NFDLk/OKSKITpoDK2ez44v5ZW2NSRIPpsAd9iUfrSfKWZYHx/nJGTcRVOSy svyGsxAoUELvBW3nUjWkXWkUd9RlKLdsw5A4JNzA= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728298AbgGTJZx (ORCPT ); Mon, 20 Jul 2020 05:25:53 -0400 Received: from mail.kernel.org ([198.145.29.99]:39878 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727769AbgGTJZx (ORCPT ); Mon, 20 Jul 2020 05:25:53 -0400 Received: from aquarius.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 890EF20684; Mon, 20 Jul 2020 09:25:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237152; bh=9kPtPKwEUAnSuahV3W8pSHhDvS8uHdOx88OWN7WHnng=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=s1ZrP+vgWVhaXNgt090zvy4TPEalJa+qdnV/v1+MZkUKwcLPrQpdm6a/kUGs2V46p 2dDJWZzwv61H8u1Cd1OuHjNz7zAaqLs2fDKnaNBG9xohfo+hcCrz/zezMe3CVMy3lC plVjdUv3Danilnry2dO5745Ft+edWbwQaUoGzI5Y= From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexander Viro , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Idan Yaniv , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mike Rapoport , Mike Rapoport , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: [PATCH 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation Date: Mon, 20 Jul 2020 12:24:34 +0300 Message-Id: <20200720092435.17469-6-rppt@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200720092435.17469-1-rppt@kernel.org> References: <20200720092435.17469-1-rppt@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Mike Rapoport Removing a PAGE_SIZE page from the direct map every time such page is allocated for a secret memory mapping will cause severe fragmentation of the direct map. This fragmentation can be reduced by using PMD-size pages as a pool for small pages for secret memory mappings. Add a gen_pool per secretmem inode and lazily populate this pool with PMD-size pages. Signed-off-by: Mike Rapoport --- mm/secretmem.c | 107 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 88 insertions(+), 19 deletions(-) diff --git a/mm/secretmem.c b/mm/secretmem.c index 2f65219baf80..dce56f84968f 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -25,24 +26,66 @@ #define SECRETMEM_FLAGS_MASK SECRETMEM_MODE_MASK struct secretmem_ctx { + struct gen_pool *pool; unsigned int mode; }; -static struct page *secretmem_alloc_page(gfp_t gfp) +static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp) { - /* - * FIXME: use a cache of large pages to reduce the direct map - * fragmentation - */ - return alloc_page(gfp); + unsigned long nr_pages = (1 << PMD_PAGE_ORDER); + struct gen_pool *pool = ctx->pool; + unsigned long addr; + struct page *page; + int err; + + page = alloc_pages(gfp, PMD_PAGE_ORDER); + if (!page) + return -ENOMEM; + + addr = (unsigned long)page_address(page); + split_page(page, PMD_PAGE_ORDER); + + err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE); + if (err) { + __free_pages(page, PMD_PAGE_ORDER); + return err; + } + + __kernel_map_pages(page, nr_pages, 0); + + return 0; +} + +static struct page *secretmem_alloc_page(struct secretmem_ctx *ctx, + gfp_t gfp) +{ + struct gen_pool *pool = ctx->pool; + unsigned long addr; + struct page *page; + int err; + + if (gen_pool_avail(pool) < PAGE_SIZE) { + err = secretmem_pool_increase(ctx, gfp); + if (err) + return NULL; + } + + addr = gen_pool_alloc(pool, PAGE_SIZE); + if (!addr) + return NULL; + + page = virt_to_page(addr); + get_page(page); + + return page; } static vm_fault_t secretmem_fault(struct vm_fault *vmf) { + struct secretmem_ctx *ctx = vmf->vma->vm_file->private_data; struct address_space *mapping = vmf->vma->vm_file->f_mapping; struct inode *inode = file_inode(vmf->vma->vm_file); pgoff_t offset = vmf->pgoff; - unsigned long addr; struct page *page; int ret = 0; @@ -51,7 +94,7 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf) page = find_get_entry(mapping, offset); if (!page) { - page = secretmem_alloc_page(vmf->gfp_mask); + page = secretmem_alloc_page(ctx, vmf->gfp_mask); if (!page) return vmf_error(-ENOMEM); @@ -59,14 +102,8 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf) if (unlikely(ret)) goto err_put_page; - ret = set_direct_map_invalid_noflush(page); - if (ret) - goto err_del_page_cache; - - addr = (unsigned long)page_address(page); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); - __SetPageUptodate(page); + set_page_private(page, (unsigned long)ctx); ret = VM_FAULT_LOCKED; } @@ -74,8 +111,6 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf) vmf->page = page; return ret; -err_del_page_cache: - delete_from_page_cache(page); err_put_page: put_page(page); return vmf_error(ret); @@ -131,7 +166,11 @@ static int secretmem_migratepage(struct address_space *mapping, static void secretmem_freepage(struct page *page) { - set_direct_map_default_noflush(page); + unsigned long addr = (unsigned long)page_address(page); + struct secretmem_ctx *ctx = (struct secretmem_ctx *)page_private(page); + struct gen_pool *pool = ctx->pool; + + gen_pool_free(pool, addr, PAGE_SIZE); } static const struct address_space_operations secretmem_aops = { @@ -156,13 +195,18 @@ static struct file *secretmem_file_create(unsigned long flags) if (!ctx) goto err_free_inode; + ctx->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE); + if (!ctx->pool) + goto err_free_ctx; + file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem", O_RDWR, &secretmem_fops); if (IS_ERR(file)) - goto err_free_ctx; + goto err_free_pool; mapping_set_unevictable(inode->i_mapping); + inode->i_private = ctx; inode->i_mapping->private_data = ctx; inode->i_mapping->a_ops = &secretmem_aops; @@ -176,6 +220,8 @@ static struct file *secretmem_file_create(unsigned long flags) return file; +err_free_pool: + gen_pool_destroy(ctx->pool); err_free_ctx: kfree(ctx); err_free_inode: @@ -220,11 +266,34 @@ SYSCALL_DEFINE1(secretmemfd, unsigned long, flags) return err; } +static void secretmem_cleanup_chunk(struct gen_pool *pool, + struct gen_pool_chunk *chunk, void *data) +{ + unsigned long start = chunk->start_addr; + unsigned long end = chunk->end_addr; + unsigned long nr_pages, addr; + + nr_pages = (end - start + 1) / PAGE_SIZE; + __kernel_map_pages(virt_to_page(start), nr_pages, 1); + + for (addr = start; addr < end; addr += PAGE_SIZE) + put_page(virt_to_page(addr)); +} + +static void secretmem_cleanup_pool(struct secretmem_ctx *ctx) +{ + struct gen_pool *pool = ctx->pool; + + gen_pool_for_each_chunk(pool, secretmem_cleanup_chunk, ctx); + gen_pool_destroy(pool); +} + static void secretmem_evict_inode(struct inode *inode) { struct secretmem_ctx *ctx = inode->i_private; truncate_inode_pages_final(&inode->i_data); + secretmem_cleanup_pool(ctx); clear_inode(inode); kfree(ctx); } From patchwork Mon Jul 20 09:24:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 11673409 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0892260D for ; Mon, 20 Jul 2020 09:26:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DFFEA22BF3 for ; Mon, 20 Jul 2020 09:26:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237167; bh=xIW2F2IJFqxTzMmaAs6C8+3yiJvBuUFpN7ht+vECA5A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=kJy2iqMAg5qwdfWSzybpJpsuhTOJOFgANKj7hHBWI3jYODO0ZPgtzffQPIsCKF/3Q Evk2HfufdyJV6wQxoZpvr2vFODe7bwAuelMuigK/iKUFEpMPgD34fF1Ahko5a9eqA+ 9c/4dHtViRdgm8LkvpOGNyv2T2AqFKAUvaXPEKfs= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728320AbgGTJ0B (ORCPT ); Mon, 20 Jul 2020 05:26:01 -0400 Received: from mail.kernel.org ([198.145.29.99]:40148 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728115AbgGTJ0B (ORCPT ); Mon, 20 Jul 2020 05:26:01 -0400 Received: from aquarius.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C6D012176B; Mon, 20 Jul 2020 09:25:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595237160; bh=xIW2F2IJFqxTzMmaAs6C8+3yiJvBuUFpN7ht+vECA5A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rrNLOpGtsBkLwzoA0ZoHKqviCVJ3KJRCdTwu93rSdZ36ELhztooGsQ2e4T8KUYbwf g95kfDYYhWT0rijSXjiKDluNKSd11vd3VHc2VWrEE5zvJz5wMrVk6WVbraWxItZhwN Z/wN2S6aNtjO+InKcFYziVst7pgrShtls+s5WRPo= From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexander Viro , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Idan Yaniv , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mike Rapoport , Mike Rapoport , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: [PATCH 6/6] mm: secretmem: add ability to reserve memory at boot Date: Mon, 20 Jul 2020 12:24:35 +0300 Message-Id: <20200720092435.17469-7-rppt@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200720092435.17469-1-rppt@kernel.org> References: <20200720092435.17469-1-rppt@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Mike Rapoport Taking pages out from the direct map and bringing them back may create undesired fragmentation and usage of the smaller pages in the direct mapping of the physical memory. This can be avoided if a significantly large area of the physical memory would be reserved for secretmem purposes at boot time. Add ability to reserve physical memory for secretmem at boot time using "secretmem" kernel parameter and then use that reserved memory as a global pool for secret memory needs. Signed-off-by: Mike Rapoport --- mm/secretmem.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 126 insertions(+), 8 deletions(-) diff --git a/mm/secretmem.c b/mm/secretmem.c index dce56f84968f..322f425dbb22 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -30,6 +31,39 @@ struct secretmem_ctx { unsigned int mode; }; +struct secretmem_pool { + struct gen_pool *pool; + unsigned long reserved_size; + void *reserved; +}; + +static struct secretmem_pool secretmem_pool; + +static struct page *secretmem_alloc_huge_page(gfp_t gfp) +{ + struct gen_pool *pool = secretmem_pool.pool; + unsigned long addr = 0; + struct page *page = NULL; + + if (pool) { + if (gen_pool_avail(pool) < PMD_SIZE) + return NULL; + + addr = gen_pool_alloc(pool, PMD_SIZE); + if (!addr) + return NULL; + + page = virt_to_page(addr); + } else { + page = alloc_pages(gfp, PMD_PAGE_ORDER); + + if (page) + split_page(page, PMD_PAGE_ORDER); + } + + return page; +} + static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp) { unsigned long nr_pages = (1 << PMD_PAGE_ORDER); @@ -38,12 +72,11 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp) struct page *page; int err; - page = alloc_pages(gfp, PMD_PAGE_ORDER); + page = secretmem_alloc_huge_page(gfp); if (!page) return -ENOMEM; addr = (unsigned long)page_address(page); - split_page(page, PMD_PAGE_ORDER); err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE); if (err) { @@ -266,11 +299,13 @@ SYSCALL_DEFINE1(secretmemfd, unsigned long, flags) return err; } -static void secretmem_cleanup_chunk(struct gen_pool *pool, - struct gen_pool_chunk *chunk, void *data) +static void secretmem_recycle_range(unsigned long start, unsigned long end) +{ + gen_pool_free(secretmem_pool.pool, start, PMD_SIZE); +} + +static void secretmem_release_range(unsigned long start, unsigned long end) { - unsigned long start = chunk->start_addr; - unsigned long end = chunk->end_addr; unsigned long nr_pages, addr; nr_pages = (end - start + 1) / PAGE_SIZE; @@ -280,6 +315,18 @@ static void secretmem_cleanup_chunk(struct gen_pool *pool, put_page(virt_to_page(addr)); } +static void secretmem_cleanup_chunk(struct gen_pool *pool, + struct gen_pool_chunk *chunk, void *data) +{ + unsigned long start = chunk->start_addr; + unsigned long end = chunk->end_addr; + + if (secretmem_pool.pool) + secretmem_recycle_range(start, end); + else + secretmem_release_range(start, end); +} + static void secretmem_cleanup_pool(struct secretmem_ctx *ctx) { struct gen_pool *pool = ctx->pool; @@ -319,14 +366,85 @@ static struct file_system_type secretmem_fs = { .kill_sb = kill_anon_super, }; +static int secretmem_reserved_mem_init(void) +{ + struct gen_pool *pool; + struct page *page; + void *addr; + int err; + + if (!secretmem_pool.reserved) + return 0; + + pool = gen_pool_create(PMD_SHIFT, NUMA_NO_NODE); + if (!pool) + return -ENOMEM; + + err = gen_pool_add(pool, (unsigned long)secretmem_pool.reserved, + secretmem_pool.reserved_size, NUMA_NO_NODE); + if (err) + goto err_destroy_pool; + + for (addr = secretmem_pool.reserved; + addr < secretmem_pool.reserved + secretmem_pool.reserved_size; + addr += PAGE_SIZE) { + page = virt_to_page(addr); + __ClearPageReserved(page); + set_page_count(page, 1); + } + + secretmem_pool.pool = pool; + page = virt_to_page(secretmem_pool.reserved); + __kernel_map_pages(page, secretmem_pool.reserved_size / PAGE_SIZE, 0); + return 0; + +err_destroy_pool: + gen_pool_destroy(pool); + return err; +} + static int secretmem_init(void) { - int ret = 0; + int ret; + + ret = secretmem_reserved_mem_init(); + if (ret) + return ret; secretmem_mnt = kern_mount(&secretmem_fs); - if (IS_ERR(secretmem_mnt)) + if (IS_ERR(secretmem_mnt)) { + gen_pool_destroy(secretmem_pool.pool); ret = PTR_ERR(secretmem_mnt); + } return ret; } fs_initcall(secretmem_init); + +static int __init secretmem_setup(char *str) +{ + phys_addr_t align = PMD_SIZE; + unsigned long reserved_size; + void *reserved; + + reserved_size = memparse(str, NULL); + if (!reserved_size) + return 0; + + if (reserved_size * 2 > PUD_SIZE) + align = PUD_SIZE; + + reserved = memblock_alloc(reserved_size, align); + if (!reserved) { + pr_err("failed to reserve %zu bytes\n", secretmem_pool.reserved_size); + return 0; + } + + secretmem_pool.reserved_size = reserved_size; + secretmem_pool.reserved = reserved; + + pr_info("reserved %zuM\n", reserved_size >> 20); + + return 1; +} +__setup("secretmem=", secretmem_setup);