Message ID | 20210426103605.616908-1-maz@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Skip CMOs when updating a PTE pointing to non-memory | expand |
On 2021-04-26 11:36, Marc Zyngier wrote: > Sumit Gupta and Krishna Reddy both reported that for MMIO regions > mapped into userspace using VFIO, a PTE update can trigger a MMU > notifier reaching kvm_set_spte_hva(). > > There is an assumption baked in kvm_set_spte_hva() that it only > deals with memory pages, and not MMIO. For this purpose, it > performs a cache cleaning of the potentially newly mapped page. > However, for a MMIO range, this explodes as there is no linear > mapping for this range (and doing cache maintenance on it would > make little sense anyway). > > Check for the validity of the page before performing the CMO > addresses the problem. > > Reported-by: Krishna Reddy <vdumpa@nvidia.com> > Reported-by: Sumit Gupta <sumitg@nvidia.com>, > Tested-by: Sumit Gupta <sumitg@nvidia.com>, > Signed-off-by: Marc Zyngier <maz@kernel.org> > Link: > https://lore.kernel.org/r/5a8825bc-286e-b316-515f-3bd3c9c70a80@nvidia.com FWIW, I've locally added: Fixes: 694556d54f35 ("KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW") Cc: stable@vger.kernel.org M.
Hi, I've been trying to reproduce the panic, but I haven't had any success. With a known working PCI passtrough device, this is how I changed kvmtool: diff --git a/vfio/core.c b/vfio/core.c index 3ff2c0b..b4ee7e9 100644 --- a/vfio/core.c +++ b/vfio/core.c @@ -261,6 +261,9 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev, return ret; } + char c = *(char *)base; + fprintf(stderr, "c = %c\n", c); + return 0; } What the change is doing is reading from the BAR region after it's has been mmap'ed into userspace. I can see that the read hits vfio_pci_mmap_fault(), which calls io_remap_pfn_range(), but I can't figure out how I can trigger the MMU notifiers. Any suggestions? The comment [1] suggested that the panic is triggered during page aging. vfio_pci_mmap() sets the VM_PFNMAP for the VMA and I see in the Documentation that pages with VM_PFNMAP are added to the unevictable LRU list, doesn't that mean it's not subject the page aging? I feel like there's something I'm missing. [1] https://lore.kernel.org/kvm/BY5PR12MB37642B9AC7E5D907F5A664F6B3459@BY5PR12MB3764.namprd12.prod.outlook.com/ Thanks, Alex On 4/26/21 11:36 AM, Marc Zyngier wrote: > Sumit Gupta and Krishna Reddy both reported that for MMIO regions > mapped into userspace using VFIO, a PTE update can trigger a MMU > notifier reaching kvm_set_spte_hva(). > > There is an assumption baked in kvm_set_spte_hva() that it only > deals with memory pages, and not MMIO. For this purpose, it > performs a cache cleaning of the potentially newly mapped page. > However, for a MMIO range, this explodes as there is no linear > mapping for this range (and doing cache maintenance on it would > make little sense anyway). > > Check for the validity of the page before performing the CMO > addresses the problem. > > Reported-by: Krishna Reddy <vdumpa@nvidia.com> > Reported-by: Sumit Gupta <sumitg@nvidia.com>, > Tested-by: Sumit Gupta <sumitg@nvidia.com>, > Signed-off-by: Marc Zyngier <maz@kernel.org> > Link: https://lore.kernel.org/r/5a8825bc-286e-b316-515f-3bd3c9c70a80@nvidia.com > --- > arch/arm64/kvm/mmu.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index cd4d51ae3d4a..564a0f7fcd05 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1236,7 +1236,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) > * We've moved a page around, probably through CoW, so let's treat it > * just like a translation fault and clean the cache to the PoC. > */ > - clean_dcache_guest_page(pfn, PAGE_SIZE); > + if (!kvm_is_device_pfn(pfn)) > + clean_dcache_guest_page(pfn, PAGE_SIZE); > handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn); > return 0; > }
On Tue, Apr 27, 2021 at 03:52:46PM +0100, Alexandru Elisei wrote: > The comment [1] suggested that the panic is triggered during page aging. I think only with an out-of-tree patch applied https://jpbrucker.net/git/linux/commit/?h=sva/2021-03-01&id=d32d8baaf293aaefef8a1c9b8a4508ab2ec46c61 which probably is not going upstream. Thanks, Jean > vfio_pci_mmap() sets the VM_PFNMAP for the VMA and I see in the Documentation that > pages with VM_PFNMAP are added to the unevictable LRU list, doesn't that mean it's > not subject the page aging? I feel like there's something I'm missing. > > [1] > https://lore.kernel.org/kvm/BY5PR12MB37642B9AC7E5D907F5A664F6B3459@BY5PR12MB3764.namprd12.prod.outlook.com/ > > Thanks, > > Alex
Hi Jean, On 4/27/21 4:23 PM, Jean-Philippe Brucker wrote: > On Tue, Apr 27, 2021 at 03:52:46PM +0100, Alexandru Elisei wrote: >> The comment [1] suggested that the panic is triggered during page aging. > I think only with an out-of-tree patch applied > https://jpbrucker.net/git/linux/commit/?h=sva/2021-03-01&id=d32d8baaf293aaefef8a1c9b8a4508ab2ec46c61 > which probably is not going upstream. Thanks for that, that explains why I wasn't able to trigger the notification. I did a grep for all the places where mmu_notifier_change_pte() and set_pte_at_notify() are used in the kernel, and from what I can tell they are only called for a new pte which has a struct page. From my investigation, the notifiers are called from ksm (which deals with physical memory), swap migration (so still pages in memory) and on copy-on-write. On Linux v5.12, I tried to trigger the copy-on-write notification by forking kvmtool right after the BAR region is mapped and then reading from the userspace BAR address, but the new pte (for which the notifier is called) is valid. I also looked at what x86 does, but I couldn't find where cache maintenance is performed (wouldn't surprise me if it's not necessary at all). So I guess my question is what kind of pfns the MMU notifiers for the secondary MMUs are required to handle. If the requirement is that they should handle both device and struct page backed pfns, then the patch looks correct to me (user_mem_abort() also uses kvm_is_device_pfn() to decide if dcache maintenance is needed). Thanks, Alex > Thanks, > Jean > >> vfio_pci_mmap() sets the VM_PFNMAP for the VMA and I see in the Documentation that >> pages with VM_PFNMAP are added to the unevictable LRU list, doesn't that mean it's >> not subject the page aging? I feel like there's something I'm missing. >> >> [1] >> https://lore.kernel.org/kvm/BY5PR12MB37642B9AC7E5D907F5A664F6B3459@BY5PR12MB3764.namprd12.prod.outlook.com/ >> >> Thanks, >> >> Alex
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index cd4d51ae3d4a..564a0f7fcd05 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1236,7 +1236,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) * We've moved a page around, probably through CoW, so let's treat it * just like a translation fault and clean the cache to the PoC. */ - clean_dcache_guest_page(pfn, PAGE_SIZE); + if (!kvm_is_device_pfn(pfn)) + clean_dcache_guest_page(pfn, PAGE_SIZE); handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn); return 0; }