From patchwork Fri Feb 12 00:27:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kani, Toshi" X-Patchwork-Id: 8285581 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id EAC4D9F38B for ; Thu, 11 Feb 2016 23:34:20 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E56BE203AB for ; Thu, 11 Feb 2016 23:34:19 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 971B620397 for ; Thu, 11 Feb 2016 23:34:18 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 825FB1A1DF2; Thu, 11 Feb 2016 15:34:18 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from g1t5425.austin.hp.com (g1t5425.austin.hp.com [15.216.225.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DCAD31A1DF2 for ; Thu, 11 Feb 2016 15:34:16 -0800 (PST) Received: from g2t2360.austin.hp.com (g2t2360.austin.hp.com [16.197.8.247]) by g1t5425.austin.hp.com (Postfix) with ESMTP id D275E44; Thu, 11 Feb 2016 23:34:15 +0000 (UTC) Received: from misato.fc.hp.com (misato.fc.hp.com [16.78.168.61]) by g2t2360.austin.hp.com (Postfix) with ESMTP id CFBA93A; Thu, 11 Feb 2016 23:34:13 +0000 (UTC) From: Toshi Kani To: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, bp@alien8.de Subject: [PATCH v2] x86/mm/vmfault: Make vmalloc_fault() handle large pages Date: Thu, 11 Feb 2016 17:27:16 -0700 Message-Id: <1455236836-24579-1-git-send-email-toshi.kani@hpe.com> X-Mailer: git-send-email 2.5.0 Cc: linux-mm@kvack.org, henning.schild@siemens.com, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The following oops was observed when a read syscall was made to a pmem device after a huge amount (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64 system. BUG: unable to handle kernel paging request at ffff880840000ff8 IP: [] vmalloc_fault+0x1be/0x300 PGD c7f03a067 PUD 0 Oops: 0000 [#1] SM : Call Trace: [] __do_page_fault+0x285/0x3e0 [] do_page_fault+0x2f/0x80 [] ? put_prev_entity+0x35/0x7a0 [] page_fault+0x28/0x30 [] ? memcpy_erms+0x6/0x10 [] ? schedule+0x35/0x80 [] ? pmem_rw_bytes+0x6a/0x190 [nd_pmem] [] ? schedule_timeout+0x183/0x240 [] btt_log_read+0x63/0x140 [nd_btt] : [] ? __symbol_put+0x60/0x60 [] ? kernel_read+0x50/0x80 [] SyS_finit_module+0xb9/0xf0 [] entry_SYSCALL_64_fastpath+0x1a/0xa4 Since 4.1, ioremap() supports large page (pud/pmd) mappings in x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc range is limited to pte mappings. vmalloc faults do not normally happen in ioremap'd ranges since ioremap() sets up the kernel page tables, which are shared by user processes. pgd_ctor() sets the kernel's pgd entries to user's during fork(). When allocation of the vmalloc ranges crosses a 512GB boundary, ioremap() allocates a new pud table and updates the kernel pgd entry to point it. If user process's pgd entry does not have this update yet, a read/write syscall to the range will cause a vmalloc fault, which hits the Oops above as it does not handle a large page properly. Following changes are made to vmalloc_fault(). 64-bit: - No change for the pgd sync operation as it handles large pages already. - Add pud_huge() and pmd_huge() to the validation code to handle large pages. - Change pud_page_vaddr() to pud_pfn() since an ioremap range is not directly mapped (while the if-statement still works with a bogus addr). - Change pmd_page() to pmd_pfn() since an ioremap range is not backed by struct page (while the if-statement still works with a bogus addr). 32-bit: - No change for the sync operation since the index3 pgd entry covers the entire vmalloc range, which is always valid. (A separate change to sync pgd entry is necessary if this memory layout is changed regardless of the page size.) - Add pmd_huge() to the validation code to handle large pages. This is for completeness since vmalloc_fault() won't happen in ioremap'd ranges as its pgd entry is always valid. Reported-by: Henning Schild Signed-off-by: Toshi Kani Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Borislav Petkov --- When this patch is accepted, please copy to stable up to 4.1. v2: Add more descriptions about the issue in the change log. --- arch/x86/mm/fault.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index eef44d9..e830c71 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -287,6 +287,9 @@ static noinline int vmalloc_fault(unsigned long address) if (!pmd_k) return -1; + if (pmd_huge(*pmd_k)) + return 0; + pte_k = pte_offset_kernel(pmd_k, address); if (!pte_present(*pte_k)) return -1; @@ -360,8 +363,6 @@ void vmalloc_sync_all(void) * 64-bit: * * Handle a fault on the vmalloc area - * - * This assumes no large pages in there. */ static noinline int vmalloc_fault(unsigned long address) { @@ -403,17 +404,23 @@ static noinline int vmalloc_fault(unsigned long address) if (pud_none(*pud_ref)) return -1; - if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref)) + if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref)) BUG(); + if (pud_huge(*pud)) + return 0; + pmd = pmd_offset(pud, address); pmd_ref = pmd_offset(pud_ref, address); if (pmd_none(*pmd_ref)) return -1; - if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) + if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref)) BUG(); + if (pmd_huge(*pmd)) + return 0; + pte_ref = pte_offset_kernel(pmd_ref, address); if (!pte_present(*pte_ref)) return -1;