diff mbox series

KVM: arm64: Skip CMOs when updating a PTE pointing to non-memory

Message ID 20210426103605.616908-1-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Skip CMOs when updating a PTE pointing to non-memory | expand

Commit Message

Marc Zyngier April 26, 2021, 10:36 a.m. UTC
Sumit Gupta and Krishna Reddy both reported that for MMIO regions
mapped into userspace using VFIO, a PTE update can trigger a MMU
notifier reaching kvm_set_spte_hva().

There is an assumption baked in kvm_set_spte_hva() that it only
deals with memory pages, and not MMIO. For this purpose, it
performs a cache cleaning of the potentially newly mapped page.
However, for a MMIO range, this explodes as there is no linear
mapping for this range (and doing cache maintenance on it would
make little sense anyway).

Check for the validity of the page before performing the CMO
addresses the problem.

Reported-by: Krishna Reddy <vdumpa@nvidia.com>
Reported-by: Sumit Gupta <sumitg@nvidia.com>,
Tested-by: Sumit Gupta <sumitg@nvidia.com>,
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/5a8825bc-286e-b316-515f-3bd3c9c70a80@nvidia.com
---
 arch/arm64/kvm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Marc Zyngier April 26, 2021, 10:41 a.m. UTC | #1
On 2021-04-26 11:36, Marc Zyngier wrote:
> Sumit Gupta and Krishna Reddy both reported that for MMIO regions
> mapped into userspace using VFIO, a PTE update can trigger a MMU
> notifier reaching kvm_set_spte_hva().
> 
> There is an assumption baked in kvm_set_spte_hva() that it only
> deals with memory pages, and not MMIO. For this purpose, it
> performs a cache cleaning of the potentially newly mapped page.
> However, for a MMIO range, this explodes as there is no linear
> mapping for this range (and doing cache maintenance on it would
> make little sense anyway).
> 
> Check for the validity of the page before performing the CMO
> addresses the problem.
> 
> Reported-by: Krishna Reddy <vdumpa@nvidia.com>
> Reported-by: Sumit Gupta <sumitg@nvidia.com>,
> Tested-by: Sumit Gupta <sumitg@nvidia.com>,
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Link: 
> https://lore.kernel.org/r/5a8825bc-286e-b316-515f-3bd3c9c70a80@nvidia.com

FWIW, I've locally added:

Fixes: 694556d54f35 ("KVM: arm/arm64: Clean dcache to PoC when changing 
PTE due to CoW")
Cc: stable@vger.kernel.org

         M.
Alexandru Elisei April 27, 2021, 2:52 p.m. UTC | #2
Hi,

I've been trying to reproduce the panic, but I haven't had any success.

With a known working PCI passtrough device, this is how I changed kvmtool:

diff --git a/vfio/core.c b/vfio/core.c
index 3ff2c0b..b4ee7e9 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -261,6 +261,9 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
                return ret;
        }
 
+       char c = *(char *)base;
+       fprintf(stderr, "c = %c\n", c);
+
        return 0;
 }
 
What the change is doing is reading from the BAR region after it's has been
mmap'ed into userspace. I can see that the read hits vfio_pci_mmap_fault(), which
calls io_remap_pfn_range(), but I can't figure out how I can trigger the MMU
notifiers. Any suggestions?

The comment [1] suggested that the panic is triggered during page aging.
vfio_pci_mmap() sets the VM_PFNMAP for the VMA and I see in the Documentation that
pages with VM_PFNMAP are added to the unevictable LRU list, doesn't that mean it's
not subject the page aging? I feel like there's something I'm missing.

[1]
https://lore.kernel.org/kvm/BY5PR12MB37642B9AC7E5D907F5A664F6B3459@BY5PR12MB3764.namprd12.prod.outlook.com/

Thanks,

Alex

On 4/26/21 11:36 AM, Marc Zyngier wrote:
> Sumit Gupta and Krishna Reddy both reported that for MMIO regions
> mapped into userspace using VFIO, a PTE update can trigger a MMU
> notifier reaching kvm_set_spte_hva().
>
> There is an assumption baked in kvm_set_spte_hva() that it only
> deals with memory pages, and not MMIO. For this purpose, it
> performs a cache cleaning of the potentially newly mapped page.
> However, for a MMIO range, this explodes as there is no linear
> mapping for this range (and doing cache maintenance on it would
> make little sense anyway).
>
> Check for the validity of the page before performing the CMO
> addresses the problem.
>
> Reported-by: Krishna Reddy <vdumpa@nvidia.com>
> Reported-by: Sumit Gupta <sumitg@nvidia.com>,
> Tested-by: Sumit Gupta <sumitg@nvidia.com>,
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Link: https://lore.kernel.org/r/5a8825bc-286e-b316-515f-3bd3c9c70a80@nvidia.com
> ---
>  arch/arm64/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index cd4d51ae3d4a..564a0f7fcd05 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1236,7 +1236,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>  	 * We've moved a page around, probably through CoW, so let's treat it
>  	 * just like a translation fault and clean the cache to the PoC.
>  	 */
> -	clean_dcache_guest_page(pfn, PAGE_SIZE);
> +	if (!kvm_is_device_pfn(pfn))
> +		clean_dcache_guest_page(pfn, PAGE_SIZE);
>  	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
>  	return 0;
>  }
Jean-Philippe Brucker April 27, 2021, 3:23 p.m. UTC | #3
On Tue, Apr 27, 2021 at 03:52:46PM +0100, Alexandru Elisei wrote:
> The comment [1] suggested that the panic is triggered during page aging.

I think only with an out-of-tree patch applied
https://jpbrucker.net/git/linux/commit/?h=sva/2021-03-01&id=d32d8baaf293aaefef8a1c9b8a4508ab2ec46c61
which probably is not going upstream.

Thanks,
Jean

> vfio_pci_mmap() sets the VM_PFNMAP for the VMA and I see in the Documentation that
> pages with VM_PFNMAP are added to the unevictable LRU list, doesn't that mean it's
> not subject the page aging? I feel like there's something I'm missing.
> 
> [1]
> https://lore.kernel.org/kvm/BY5PR12MB37642B9AC7E5D907F5A664F6B3459@BY5PR12MB3764.namprd12.prod.outlook.com/
> 
> Thanks,
> 
> Alex
Alexandru Elisei April 29, 2021, 10:32 a.m. UTC | #4
Hi Jean,

On 4/27/21 4:23 PM, Jean-Philippe Brucker wrote:
> On Tue, Apr 27, 2021 at 03:52:46PM +0100, Alexandru Elisei wrote:
>> The comment [1] suggested that the panic is triggered during page aging.
> I think only with an out-of-tree patch applied
> https://jpbrucker.net/git/linux/commit/?h=sva/2021-03-01&id=d32d8baaf293aaefef8a1c9b8a4508ab2ec46c61
> which probably is not going upstream.

Thanks for that, that explains why I wasn't able to trigger the notification.

I did a grep for all the places where mmu_notifier_change_pte() and
set_pte_at_notify() are used in the kernel, and from what I can tell they are only
called for a new pte which has a struct page. From my investigation, the notifiers
are called from ksm (which deals with physical memory), swap migration (so still
pages in memory) and on copy-on-write.

On Linux v5.12, I tried to trigger the copy-on-write notification by forking
kvmtool right after the BAR region is mapped and then reading from the userspace
BAR address, but the new pte (for which the notifier is called) is valid.

I also looked at what x86 does, but I couldn't find where cache maintenance is
performed (wouldn't surprise me if it's not necessary at all).

So I guess my question is what kind of pfns the MMU notifiers for the secondary
MMUs are required to handle. If the requirement is that they should handle both
device and struct page backed pfns, then the patch looks correct to me
(user_mem_abort() also uses kvm_is_device_pfn() to decide if dcache maintenance is
needed).

Thanks,

Alex

> Thanks,
> Jean
>
>> vfio_pci_mmap() sets the VM_PFNMAP for the VMA and I see in the Documentation that
>> pages with VM_PFNMAP are added to the unevictable LRU list, doesn't that mean it's
>> not subject the page aging? I feel like there's something I'm missing.
>>
>> [1]
>> https://lore.kernel.org/kvm/BY5PR12MB37642B9AC7E5D907F5A664F6B3459@BY5PR12MB3764.namprd12.prod.outlook.com/
>>
>> Thanks,
>>
>> Alex
diff mbox series

Patch

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index cd4d51ae3d4a..564a0f7fcd05 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1236,7 +1236,8 @@  int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	 * We've moved a page around, probably through CoW, so let's treat it
 	 * just like a translation fault and clean the cache to the PoC.
 	 */
-	clean_dcache_guest_page(pfn, PAGE_SIZE);
+	if (!kvm_is_device_pfn(pfn))
+		clean_dcache_guest_page(pfn, PAGE_SIZE);
 	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
 	return 0;
 }