From patchwork Mon Oct 26 08:37:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 11856101 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6492D92C for ; Mon, 26 Oct 2020 08:40:19 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1FCFC223FD for ; Mon, 26 Oct 2020 08:40:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="YQYOlFrh"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="i7+DYdgO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1FCFC223FD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+patchwork-linux-riscv=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=94UHM5ExHoGTbcvTiJx8L9cSewFHxdUhMd25W2SsdaQ=; b=YQYOlFrhWnHgPAfW4wgPnGtpT Cvq8mdkv/9LnMGv/DkgsD9y9NNKZiQdeoBOwfBWfkgXPzpjFHs4/fCZDGFnFqQltZGWNwqv88fgv5 +9OQ3rZ/QBIMs3tAsr/C8t+IE053zcjJkTsSJ7EwAZCriSR7+yLBH980lLtRgeehkLJozc0KhOaVw +4uHyCThAXq4OOeY+c/wUH44byz0/Yuc01uH6FQSB+t3/IaUAjicYtWcnoH4Au+xTD4qIT8RhAqE6 9RJu/xkEiv4vzXi5eQsL3bXYrQCm3wg4urCIe8v1hkK6ai5oGdAb8CAaoHlJ9tqC58O+wKiN+4OKg mdh0MesoA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kWy2o-0001fj-El; Mon, 26 Oct 2020 08:39:58 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kWy23-0001Ia-5v; Mon, 26 Oct 2020 08:39:15 +0000 Received: from aquarius.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B64FF223FD; Mon, 26 Oct 2020 08:39:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1603701549; bh=T40iRaLwQz12+uFzwB/HBDTs/ag1fyJ+pXo/CUd+ftI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i7+DYdgOOAED08WdB438GSwMzbRCSH2loQlCtjivn8vq8W/9nnEK6bhy7qz1XvW3u NKqyOS+maJC6srqMf/a7rQA9IxD49vN2IB2I3M4qT/LKcoZ/uQm1FOja3q0bMtX4m9 A6rZ5frExyjgLh6+IZRHIEIcyvqBOG8xjZLL/nuY= From: Mike Rapoport To: Andrew Morton Subject: [PATCH v7 6/7] mm: secretmem: use PMD-size pages to amortize direct map fragmentation Date: Mon, 26 Oct 2020 10:37:51 +0200 Message-Id: <20201026083752.13267-7-rppt@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201026083752.13267-1-rppt@kernel.org> References: <20201026083752.13267-1-rppt@kernel.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201026_043911_568404_3BB65633 X-CRM114-Status: GOOD ( 26.18 ) X-Spam-Score: -5.2 (-----) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-5.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at https://www.dnswl.org/, high trust [198.145.29.99 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.0 DKIMWL_WL_HIGH DKIMwl.org - High trust sender X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Arnd Bergmann , James Bottomley , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Rick Edgecombe , Mike Rapoport Sender: "linux-riscv" Errors-To: linux-riscv-bounces+patchwork-linux-riscv=patchwork.kernel.org@lists.infradead.org From: Mike Rapoport Removing a PAGE_SIZE page from the direct map every time such page is allocated for a secret memory mapping will cause severe fragmentation of the direct map. This fragmentation can be reduced by using PMD-size pages as a pool for small pages for secret memory mappings. Add a gen_pool per secretmem inode and lazily populate this pool with PMD-size pages. Signed-off-by: Mike Rapoport --- mm/secretmem.c | 124 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 107 insertions(+), 17 deletions(-) diff --git a/mm/secretmem.c b/mm/secretmem.c index 2a63db2ed132..4f9e07d212be 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -12,8 +12,10 @@ #include #include #include +#include #include #include +#include #include #include @@ -40,24 +42,80 @@ #define SECRETMEM_FLAGS_MASK SECRETMEM_MODE_MASK struct secretmem_ctx { + struct gen_pool *pool; unsigned int mode; }; -static struct page *secretmem_alloc_page(gfp_t gfp) +static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp) { + unsigned long nr_pages = (1 << PMD_PAGE_ORDER); + struct gen_pool *pool = ctx->pool; + unsigned long addr; + struct page *page; + int err; + + page = alloc_pages(gfp | __GFP_ACCOUNT, PMD_PAGE_ORDER); + if (!page) + return -ENOMEM; + + addr = (unsigned long)page_address(page); + + err = set_direct_map_invalid_noflush(page, nr_pages); + if (err) + goto err_free_pages; + + err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE); + if (err) + goto err_set_direct_map; + + split_page(page, PMD_PAGE_ORDER); + flush_tlb_kernel_range(addr, addr + PMD_SIZE); + + return 0; + +err_set_direct_map: /* - * FIXME: use a cache of large pages to reduce the direct map - * fragmentation + * If a split of PUD-size page was required, it already happened + * when we made the pages invalid which guarantees that this call + * won't fail */ - return alloc_page(gfp); + set_direct_map_default_noflush(page, nr_pages); + +err_free_pages: + __free_pages(page, PMD_PAGE_ORDER); + return err; +} + +static struct page *secretmem_alloc_page(struct secretmem_ctx *ctx, + gfp_t gfp) +{ + struct gen_pool *pool = ctx->pool; + unsigned long addr; + struct page *page; + int err; + + if (gen_pool_avail(pool) < PAGE_SIZE) { + err = secretmem_pool_increase(ctx, gfp); + if (err) + return NULL; + } + + addr = gen_pool_alloc(pool, PAGE_SIZE); + if (!addr) + return NULL; + + page = virt_to_page(addr); + get_page(page); + + return page; } static vm_fault_t secretmem_fault(struct vm_fault *vmf) { + struct secretmem_ctx *ctx = vmf->vma->vm_file->private_data; struct address_space *mapping = vmf->vma->vm_file->f_mapping; struct inode *inode = file_inode(vmf->vma->vm_file); pgoff_t offset = vmf->pgoff; - unsigned long addr; struct page *page; int ret = 0; @@ -66,22 +124,22 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf) page = find_get_entry(mapping, offset); if (!page) { - page = secretmem_alloc_page(vmf->gfp_mask); + page = secretmem_alloc_page(ctx, vmf->gfp_mask); if (!page) return vmf_error(-ENOMEM); + /* + * add_to_page_cache() calls mem_cgroup_charge(), so we + * need to uncharge here to avoid double accounting + */ + memcg_kmem_uncharge_page(page, 0); + ret = add_to_page_cache(page, mapping, offset, vmf->gfp_mask); if (unlikely(ret)) goto err_put_page; - ret = set_direct_map_invalid_noflush(page, 1); - if (ret) - goto err_del_page_cache; - - addr = (unsigned long)page_address(page); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); - __SetPageUptodate(page); + set_page_private(page, (unsigned long)ctx); ret = VM_FAULT_LOCKED; } @@ -89,8 +147,6 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf) vmf->page = page; return ret; -err_del_page_cache: - delete_from_page_cache(page); err_put_page: put_page(page); return vmf_error(ret); @@ -143,7 +199,11 @@ static int secretmem_migratepage(struct address_space *mapping, static void secretmem_freepage(struct page *page) { - set_direct_map_default_noflush(page, 1); + unsigned long addr = (unsigned long)page_address(page); + struct secretmem_ctx *ctx = (struct secretmem_ctx *)page_private(page); + struct gen_pool *pool = ctx->pool; + + gen_pool_free(pool, addr, PAGE_SIZE); } static const struct address_space_operations secretmem_aops = { @@ -178,13 +238,18 @@ static struct file *secretmem_file_create(unsigned long flags) if (!ctx) goto err_free_inode; + ctx->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE); + if (!ctx->pool) + goto err_free_ctx; + file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem", O_RDWR, &secretmem_fops); if (IS_ERR(file)) - goto err_free_ctx; + goto err_free_pool; mapping_set_unevictable(inode->i_mapping); + inode->i_private = ctx; inode->i_mapping->private_data = ctx; inode->i_mapping->a_ops = &secretmem_aops; @@ -198,6 +263,8 @@ static struct file *secretmem_file_create(unsigned long flags) return file; +err_free_pool: + gen_pool_destroy(ctx->pool); err_free_ctx: kfree(ctx); err_free_inode: @@ -236,11 +303,34 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags) return err; } +static void secretmem_cleanup_chunk(struct gen_pool *pool, + struct gen_pool_chunk *chunk, void *data) +{ + unsigned long start = chunk->start_addr; + unsigned long end = chunk->end_addr; + unsigned long nr_pages, addr; + + nr_pages = (end - start + 1) / PAGE_SIZE; + __kernel_map_pages(virt_to_page(start), nr_pages, 1); + + for (addr = start; addr < end; addr += PAGE_SIZE) + put_page(virt_to_page(addr)); +} + +static void secretmem_cleanup_pool(struct secretmem_ctx *ctx) +{ + struct gen_pool *pool = ctx->pool; + + gen_pool_for_each_chunk(pool, secretmem_cleanup_chunk, ctx); + gen_pool_destroy(pool); +} + static void secretmem_evict_inode(struct inode *inode) { struct secretmem_ctx *ctx = inode->i_private; truncate_inode_pages_final(&inode->i_data); + secretmem_cleanup_pool(ctx); clear_inode(inode); kfree(ctx); }