From patchwork Fri Feb 5 10:32:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12069861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 081B3C433E0 for ; Fri, 5 Feb 2021 10:45:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B6FCE64F46 for ; Fri, 5 Feb 2021 10:45:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231513AbhBEKpW (ORCPT ); Fri, 5 Feb 2021 05:45:22 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:49554 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231448AbhBEKed (ORCPT ); Fri, 5 Feb 2021 05:34:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612521186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mj+Ly1k3UYPGQYxAhZFurZVICdhyrknCRIPzRSnG+Xc=; b=SMKf7zFYGkxf0DEFZK6mIWNscDR+sv4T8bs2Q4InrQRiRxH4GCIOnjHGKaZ1sAWWfcOIpf CdYtj4k8f7Oftb3VA3WKKNhKiH1TNH9R9Kj3RsNN892ulanNBqFFA4DIBQ8/l04Nnm5fkH QE75lY8UuAQOA/tRQzay6V5DVfpHLsE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-AFUaxDJtPIKnH-bT7enzFA-1; Fri, 05 Feb 2021 05:33:02 -0500 X-MC-Unique: AFUaxDJtPIKnH-bT7enzFA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 61A60102C7EE; Fri, 5 Feb 2021 10:33:01 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id D832F1050E; Fri, 5 Feb 2021 10:33:00 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: jgg@ziepe.ca, linux-mm@kvack.org, Andrew Morton , dan.j.williams@intel.com Subject: [PATCH 1/2] mm: provide a sane PTE walking API for modules Date: Fri, 5 Feb 2021 05:32:58 -0500 Message-Id: <20210205103259.42866-2-pbonzini@redhat.com> In-Reply-To: <20210205103259.42866-1-pbonzini@redhat.com> References: <20210205103259.42866-1-pbonzini@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Currently, the follow_pfn function is exported for modules but follow_pte is not. However, follow_pfn is very easy to misuse, because it does not provide protections (so most of its callers assume the page is writable!) and because it returns after having already unlocked the page table lock. Provide instead a simplified version of follow_pte that does not have the pmdpp and range arguments. The older version survives as follow_invalidate_pte() for use by fs/dax.c. Signed-off-by: Paolo Bonzini Reviewed-by: Jason Gunthorpe --- arch/s390/pci/pci_mmio.c | 2 +- fs/dax.c | 5 +++-- include/linux/mm.h | 6 ++++-- mm/memory.c | 35 ++++++++++++++++++++++++++++++----- 4 files changed, 38 insertions(+), 10 deletions(-) diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c index 18f2d10c3176..5922dce94bfd 100644 --- a/arch/s390/pci/pci_mmio.c +++ b/arch/s390/pci/pci_mmio.c @@ -170,7 +170,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, if (!(vma->vm_flags & VM_WRITE)) goto out_unlock_mmap; - ret = follow_pte(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl); + ret = follow_pte(vma->vm_mm, mmio_addr, &ptep, &ptl); if (ret) goto out_unlock_mmap; @@ -311,7 +311,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, if (!(vma->vm_flags & VM_WRITE)) goto out_unlock_mmap; - ret = follow_pte(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl); + ret = follow_pte(vma->vm_mm, mmio_addr, &ptep, &ptl); if (ret) goto out_unlock_mmap; diff --git a/fs/dax.c b/fs/dax.c index 26d5dcd2d69e..b14874c7b983 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -810,11 +810,12 @@ static void dax_entry_mkclean(struct address_space *mapping, pgoff_t index, address = pgoff_address(index, vma); /* - * Note because we provide range to follow_pte it will call + * follow_invalidate_pte() will use the range to call * mmu_notifier_invalidate_range_start() on our behalf before * taking any lock. */ - if (follow_pte(vma->vm_mm, address, &range, &ptep, &pmdp, &ptl)) + if (follow_invalidate_pte(vma->vm_mm, address, &range, &ptep, + &pmdp, &ptl)) continue; /* diff --git a/include/linux/mm.h b/include/linux/mm.h index ecdf8a8cd6ae..90b527260edf 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1658,9 +1658,11 @@ void free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); int copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); +int follow_invalidate_pte(struct mm_struct *mm, unsigned long address, + struct mmu_notifier_range *range, pte_t **ptepp, pmd_t **pmdpp, + spinlock_t **ptlp); int follow_pte(struct mm_struct *mm, unsigned long address, - struct mmu_notifier_range *range, pte_t **ptepp, pmd_t **pmdpp, - spinlock_t **ptlp); + pte_t **ptepp, spinlock_t **ptlp); int follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn); int follow_phys(struct vm_area_struct *vma, unsigned long address, diff --git a/mm/memory.c b/mm/memory.c index feff48e1465a..b247d296931c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4709,9 +4709,9 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address) } #endif /* __PAGETABLE_PMD_FOLDED */ -int follow_pte(struct mm_struct *mm, unsigned long address, - struct mmu_notifier_range *range, pte_t **ptepp, pmd_t **pmdpp, - spinlock_t **ptlp) +int follow_invalidate_pte(struct mm_struct *mm, unsigned long address, + struct mmu_notifier_range *range, pte_t **ptepp, + pmd_t **pmdpp, spinlock_t **ptlp) { pgd_t *pgd; p4d_t *p4d; @@ -4776,6 +4776,31 @@ int follow_pte(struct mm_struct *mm, unsigned long address, return -EINVAL; } +/** + * follow_pte - look up PTE at a user virtual address + * @vma: memory mapping + * @address: user virtual address + * @ptepp: location to store found PTE + * @ptlp: location to store the lock for the PTE + * + * On a successful return, the pointer to the PTE is stored in @ptepp; + * the corresponding lock is taken and its location is stored in @ptlp. + * The contents of the PTE are only stable until @ptlp is released; + * any further use, if any, must be protected against invalidation + * with MMU notifiers. + * + * Only IO mappings and raw PFN mappings are allowed. The mmap semaphore + * should be taken for read. + * + * Return: zero on success, -ve otherwise. + */ +int follow_pte(struct mm_struct *mm, unsigned long address, + pte_t **ptepp, spinlock_t **ptlp) +{ + return follow_invalidate_pte(mm, address, NULL, ptepp, NULL, ptlp); +} +EXPORT_SYMBOL_GPL(follow_pte); + /** * follow_pfn - look up PFN at a user virtual address * @vma: memory mapping @@ -4796,7 +4821,7 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) return ret; - ret = follow_pte(vma->vm_mm, address, NULL, &ptep, NULL, &ptl); + ret = follow_pte(vma->vm_mm, address, &ptep, &ptl); if (ret) return ret; *pfn = pte_pfn(*ptep); @@ -4817,7 +4842,7 @@ int follow_phys(struct vm_area_struct *vma, if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) goto out; - if (follow_pte(vma->vm_mm, address, NULL, &ptep, NULL, &ptl)) + if (follow_pte(vma->vm_mm, address, &ptep, &ptl)) goto out; pte = *ptep; From patchwork Fri Feb 5 10:32:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 12069847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21CA4C433E9 for ; Fri, 5 Feb 2021 10:37:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C61E464F3B for ; Fri, 5 Feb 2021 10:37:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231398AbhBEKgt (ORCPT ); Fri, 5 Feb 2021 05:36:49 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:38874 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231480AbhBEKee (ORCPT ); Fri, 5 Feb 2021 05:34:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612521187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3gSmdzdXiBJ/yrnbO3CPPt8A/90suw/nQvW79Jp5bxM=; b=GBT+Ultz34+kwmWfHRvsxL/+Girq0DXMvIAkcEOuw5CxAJ5Ax0vB34IGjfZl76a2OV8znB hfkSzCoLf04racEfNFNztNGfo8knUTpXPHS6l4gbk1lmjsOdDS4tocCIepFwI0Z6sI6JG3 wXx9at6wFRrgkgcq6u0xMqHfQuflahs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-151-h5qXyvvFPNayKE6P0wNAyQ-1; Fri, 05 Feb 2021 05:33:03 -0500 X-MC-Unique: h5qXyvvFPNayKE6P0wNAyQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 061E9814318; Fri, 5 Feb 2021 10:33:02 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7C4F019D9F; Fri, 5 Feb 2021 10:33:01 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: jgg@ziepe.ca, linux-mm@kvack.org, Andrew Morton , dan.j.williams@intel.com Subject: [PATCH 2/2] KVM: do not assume PTE is writable after follow_pfn Date: Fri, 5 Feb 2021 05:32:59 -0500 Message-Id: <20210205103259.42866-3-pbonzini@redhat.com> In-Reply-To: <20210205103259.42866-1-pbonzini@redhat.com> References: <20210205103259.42866-1-pbonzini@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functions. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn. In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable. Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable. Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens Cc: 3pvd@google.com Cc: Jann Horn Cc: Jason Gunthorpe Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Reported-by: kernel test robot Reported-by: kernel test robot --- virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f34a85b93c2d..ee4ac2618ec5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1907,9 +1907,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, kvm_pfn_t *p_pfn) { unsigned long pfn; + pte_t *ptep; + spinlock_t *ptl; int r; - r = follow_pfn(vma, addr, &pfn); + r = follow_pte(vma->vm_mm, addr, &ptep, &ptl); if (r) { /* * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does @@ -1924,14 +1926,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r; - r = follow_pfn(vma, addr, &pfn); + r = follow_pte(vma->vm_mm, addr, &ptep, &ptl); if (r) return r; + } + if (write_fault && !pte_write(*ptep)) { + pfn = KVM_PFN_ERR_RO_FAULT; + goto out; } if (writable) - *writable = true; + *writable = pte_write(*ptep); + pfn = pte_pfn(*ptep); /* * Get a reference here because callers of *hva_to_pfn* and @@ -1946,6 +1953,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, */ kvm_get_pfn(pfn); +out: + pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; return 0; }