Message ID | 20211002124012.18186-1-ajaygargnsit@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE. | expand |
On 2021/10/2 20:40, Ajay Garg wrote: > Taking a SD-MMC controller (over PCI) as an example, following is an > example sequencing, where the log-flooding happened : > > 0. > We have a host and a guest, both running latest x86_64 kernels. > > 1. > Host-machine is booted up (with intel_iommu=on), and the DMA-PTEs > are setup for the controller (on the host), for the first time. > > 2. > The SD-controller device is added to a (L1) guest on a KVM-VM > (via virt-manager). Isn't the domain should be switched from a default domain to an unmanaged domain when the device is assigned to the guest? Even you want to r-setup the same mappings, you need to un-map all existing mappings, right? Best regards, baolu > > 3. > The KVM-VM is booted up. > > 4. > Above triggers a re-setup of DMA-PTEs on the host, for a > second time. > > It is observed that the new PTEs formed (on the host) are same > as the original PTEs, and thus following logs, accompanied by > stacktraces, overwhelm the logs : > > ...... > DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003) > DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003) > DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003) > DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003) > DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003) > ...... > > As the PTEs are same, so there is no cause of concern, and we can easily > avoid the logs-flood for this non-error case. > > Signed-off-by: Ajay Garg <ajaygargnsit@gmail.com> > --- > drivers/iommu/intel/iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index d75f59ae28e6..8bea8b4e3ff9 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -2370,7 +2370,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, > * touches the iova range > */ > tmp = cmpxchg64_local(&pte->val, 0ULL, pteval); > - if (tmp) { > + if (tmp && (tmp != pteval)) { > static int dumps = 5; > pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n", > iov_pfn, tmp, (unsigned long long)pteval); >
Thanks Lu for the reply. > > Isn't the domain should be switched from a default domain to an > unmanaged domain when the device is assigned to the guest? > > Even you want to r-setup the same mappings, you need to un-map all > existing mappings, right? > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities need to take. May be the patch could suppress the flooding till then? Thanks and Regards, Ajay
On Sat, 2 Oct 2021 22:48:24 +0530 Ajay Garg <ajaygargnsit@gmail.com> wrote: > Thanks Lu for the reply. > > > > > Isn't the domain should be switched from a default domain to an > > unmanaged domain when the device is assigned to the guest? > > > > Even you want to r-setup the same mappings, you need to un-map all > > existing mappings, right? > > > > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities > need to take. > May be the patch could suppress the flooding till then? No, this is wrong. The pte values should not exist, it doesn't matter that they're the same. Is the host driver failing to remove mappings and somehow they persist in the new vfio owned domain? There's definitely a bug beyond logging going on here. Thanks, Alex
Thanks Alex for the reply. Lu, Alex : I got my diagnosis regarding the host-driver wrong, my apologies. There is no issue with the pci-device's host-driver (confirmed by preventing the loading of host-driver at host-bootup). Thus, nothing to be fixed at the host-driver side. Rather seems some dma mapping/unmapping inconsistency is happening, when kvm/qemu boots up with the pci-device attached to the guest. I put up debug-logs in "vfio_iommu_type1_ioctl" method in "vfio_iommu_type1.c" (on the host-machine). When the guest boots up, repeated DMA-mappings are observed for the same address as per the host-machine's logs (without a corresponding DMA-unmapping first) : ########################################################################################## ajay@ajay-Latitude-E6320:~$ tail -f /var/log/syslog | grep "ajay: " Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.202297] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.583179] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.583253] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:12:36 ajay-Latitude-E6320 kernel: [ 150.105584] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986499] ajay: _UNMAP_DMA for [0x7ffe724a9840] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986559] ajay: _MAP_DMA for [0x7ffe724a97d0] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986638] ajay: _MAP_DMA for [0x7ffe724a97d0] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 181.087359] ajay: _MAP_DMA for [0x7ffe724a97d0] status [0] Oct 7 14:13:13 ajay-Latitude-E6320 kernel: [ 187.271232] ajay: _UNMAP_DMA for [0x7fde7b7fcfa0] status [0] Oct 7 14:13:13 ajay-Latitude-E6320 kernel: [ 187.271320] ajay: _UNMAP_DMA for [0x7fde7b7fcfa0] status [0] .... ########################################################################################## I'll try and backtrack to the userspace process that is sending these ioctls. Thanks and Regards, Ajay On Tue, Oct 5, 2021 at 4:01 AM Alex Williamson <alex.williamson@redhat.com> wrote: > > On Sat, 2 Oct 2021 22:48:24 +0530 > Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > Thanks Lu for the reply. > > > > > > > > Isn't the domain should be switched from a default domain to an > > > unmanaged domain when the device is assigned to the guest? > > > > > > Even you want to r-setup the same mappings, you need to un-map all > > > existing mappings, right? > > > > > > > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities > > need to take. > > May be the patch could suppress the flooding till then? > > No, this is wrong. The pte values should not exist, it doesn't matter > that they're the same. Is the host driver failing to remove mappings > and somehow they persist in the new vfio owned domain? There's > definitely a bug beyond logging going on here. Thanks, > > Alex >
> I'll try and backtrack to the userspace process that is sending these ioctls. > The userspace process is qemu. I compiled qemu from latest source, installed via "sudo make install" on host-machine, rebooted the host-machine, and booted up the guest-machine on the host-machine. Now, no kernel-flooding is seen on the host-machine. For me, the issue is thus closed-invalid; admins may take the necessary action to officially mark ;) Thanks and Regards, Ajay
The flooding was seen today again, after I booted the host-machine in the morning. Need to look what the heck is going on ... On Sun, Oct 10, 2021 at 11:45 AM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > I'll try and backtrack to the userspace process that is sending these ioctls. > > > > The userspace process is qemu. > > I compiled qemu from latest source, installed via "sudo make install" > on host-machine, rebooted the host-machine, and booted up the > guest-machine on the host-machine. Now, no kernel-flooding is seen on > the host-machine. > > For me, the issue is thus closed-invalid; admins may take the > necessary action to officially mark ;) > > > Thanks and Regards, > Ajay
On Mon, 11 Oct 2021 11:49:33 +0530 Ajay Garg <ajaygargnsit@gmail.com> wrote: > The flooding was seen today again, after I booted the host-machine in > the morning. > Need to look what the heck is going on ... > > On Sun, Oct 10, 2021 at 11:45 AM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > > > I'll try and backtrack to the userspace process that is sending these ioctls. > > > > > > > The userspace process is qemu. > > > > I compiled qemu from latest source, installed via "sudo make install" > > on host-machine, rebooted the host-machine, and booted up the > > guest-machine on the host-machine. Now, no kernel-flooding is seen on > > the host-machine. > > > > For me, the issue is thus closed-invalid; admins may take the > > necessary action to officially mark ;) Even this QEMU explanation doesn't make a lot of sense, vfio tracks userspace mappings and will return an -EEXIST error for duplicate or overlapping IOVA entries. We expect to have an entirely empty IOMMU domain when a device is assigned, but it seems the only way userspace can trigger duplicate PTEs would be if mappings already exist, or we have a bug somewhere. If the most recent instance is purely on bare metal, then it seems the host itself has conflicting mappings. I can only speculate with the limited data presented, but I'm suspicious there's something happening with RMRRs here (but that should also entirely preclude assignment). dmesg, lspci -vvv, and VM configuration would be useful. Thanks, Alex
Thanks Alex for your time. I think I may have found the issue. Right now, when doing a dma-unmapping, we do a "soft-unmapping" only, as the pte-values themselves are not cleared in the unlinked pagetable-frame. I have made the (simple) changes, and things are looking good as of now (almost an hour now). However, this time I will give it a day ;) If there is not a single-flooding observed in the next 24 hours, I would float the v2 patch for review. Thanks again for your time and patience. Thanks and Regards, Ajay > > Even this QEMU explanation doesn't make a lot of sense, vfio tracks > userspace mappings and will return an -EEXIST error for duplicate or > overlapping IOVA entries. We expect to have an entirely empty IOMMU > domain when a device is assigned, but it seems the only way userspace > can trigger duplicate PTEs would be if mappings already exist, or we > have a bug somewhere. > > If the most recent instance is purely on bare metal, then it seems the > host itself has conflicting mappings. I can only speculate with the > limited data presented, but I'm suspicious there's something happening > with RMRRs here (but that should also entirely preclude assignment). > dmesg, lspci -vvv, and VM configuration would be useful. Thanks, > > Alex >
Hi Alex, Lu. Posted v2 patch, as per https://lists.linuxfoundation.org/pipermail/iommu/2021-October/059955.html Kindly review, and let's continue on that thread now. Thanks and Regards, Ajay On Mon, Oct 11, 2021 at 11:43 PM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > Thanks Alex for your time. > > I think I may have found the issue. Right now, when doing a > dma-unmapping, we do a "soft-unmapping" only, as the pte-values > themselves are not cleared in the unlinked pagetable-frame. > > I have made the (simple) changes, and things are looking good as of > now (almost an hour now). > However, this time I will give it a day ;) > > If there is not a single-flooding observed in the next 24 hours, I > would float the v2 patch for review. > > > Thanks again for your time and patience. > > > Thanks and Regards, > Ajay > > > > > > Even this QEMU explanation doesn't make a lot of sense, vfio tracks > > userspace mappings and will return an -EEXIST error for duplicate or > > overlapping IOVA entries. We expect to have an entirely empty IOMMU > > domain when a device is assigned, but it seems the only way userspace > > can trigger duplicate PTEs would be if mappings already exist, or we > > have a bug somewhere. > > > > If the most recent instance is purely on bare metal, then it seems the > > host itself has conflicting mappings. I can only speculate with the > > limited data presented, but I'm suspicious there's something happening > > with RMRRs here (but that should also entirely preclude assignment). > > dmesg, lspci -vvv, and VM configuration would be useful. Thanks, > > > > Alex > >
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d75f59ae28e6..8bea8b4e3ff9 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2370,7 +2370,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, * touches the iova range */ tmp = cmpxchg64_local(&pte->val, 0ULL, pteval); - if (tmp) { + if (tmp && (tmp != pteval)) { static int dumps = 5; pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n", iov_pfn, tmp, (unsigned long long)pteval);
Taking a SD-MMC controller (over PCI) as an example, following is an example sequencing, where the log-flooding happened : 0. We have a host and a guest, both running latest x86_64 kernels. 1. Host-machine is booted up (with intel_iommu=on), and the DMA-PTEs are setup for the controller (on the host), for the first time. 2. The SD-controller device is added to a (L1) guest on a KVM-VM (via virt-manager). 3. The KVM-VM is booted up. 4. Above triggers a re-setup of DMA-PTEs on the host, for a second time. It is observed that the new PTEs formed (on the host) are same as the original PTEs, and thus following logs, accompanied by stacktraces, overwhelm the logs : ...... DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003) DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003) DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003) DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003) DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003) ...... As the PTEs are same, so there is no cause of concern, and we can easily avoid the logs-flood for this non-error case. Signed-off-by: Ajay Garg <ajaygargnsit@gmail.com> --- drivers/iommu/intel/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)