Message ID | 0-v2-621370057090+91fec-smmuv3_nesting_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Initial support for SMMUv3 nested translation | expand |
On Tue, Aug 27, 2024 at 12:51:30PM -0300, Jason Gunthorpe wrote: > This brings support for the IOMMFD ioctls: > > - IOMMU_GET_HW_INFO > - IOMMU_HWPT_ALLOC_NEST_PARENT > - IOMMU_DOMAIN_NESTED > - ops->enforce_cache_coherency() > > This is quite straightforward as the nested STE can just be built in the > special NESTED domain op and fed through the generic update machinery. > > The design allows the user provided STE fragment to control several > aspects of the translation, including putting the STE into a "virtual > bypass" or a aborting state. This duplicates functionality available by > other means, but it allows trivially preserving the VMID in the STE as we > eventually move towards the VIOMMU owning the VMID. > > Nesting support requires the system to either support S2FWB or the > stronger CANWBS ACPI flag. This is to ensure the VM cannot bypass the > cache and view incoherent data, currently VFIO lacks any cache flushing > that would make this safe. > > Yan has a series to add some of the needed infrastructure for VFIO cache > flushing here: > > https://lore.kernel.org/linux-iommu/20240507061802.20184-1-yan.y.zhao@intel.com/ > > Which may someday allow relaxing this further. > > Remove VFIO_TYPE1_NESTING_IOMMU since it was never used and superseded by > this. > > This is the first series in what will be several to complete nesting > support. At least: > - IOMMU_RESV_SW_MSI related fixups > https://lore.kernel.org/linux-iommu/cover.1722644866.git.nicolinc@nvidia.com/ > - VIOMMU object support to allow ATS and CD invalidations > https://lore.kernel.org/linux-iommu/cover.1723061377.git.nicolinc@nvidia.com/ > - vCMDQ hypervisor support for direct invalidation queue assignment > https://lore.kernel.org/linux-iommu/cover.1712978212.git.nicolinc@nvidia.com/ > - KVM pinned VMID using VIOMMU for vBTM > https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/ > - Cross instance S2 sharing > - Virtual Machine Structure using VIOMMU (for vMPAM?) > - Fault forwarding support through IOMMUFD's fault fd for vSVA > > The VIOMMU series is essential to allow the invalidations to be processed > for the CD as well. > > It is enough to allow qemu work to progress. > > This is on github: https://github.com/jgunthorpe/linux/commits/smmuv3_nesting > > v2: As mentioned above, the VIOMMU series would be required to test the entire nesting feature, which now has a v2 rebasing on this series. I tested it with a paring QEMU branch. Please refer to: https://lore.kernel.org/linux-iommu/cover.1724776335.git.nicolinc@nvidia.com/ Also, there is another new VIRQ series on top of the VIOMMU one and this nesting series. And I tested it too. Please refer to: https://lore.kernel.org/linux-iommu/cover.1724777091.git.nicolinc@nvidia.com/ With that, Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Hi Nicolin, > -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Tuesday, August 27, 2024 10:31 PM > To: Jason Gunthorpe <jgg@nvidia.com> > Cc: acpica-devel@lists.linux.dev; Guohanjun (Hanjun Guo) > <guohanjun@huawei.com>; iommu@lists.linux.dev; Joerg Roedel > <joro@8bytes.org>; Kevin Tian <kevin.tian@intel.com>; > kvm@vger.kernel.org; Len Brown <lenb@kernel.org>; linux- > acpi@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Lorenzo Pieralisi > <lpieralisi@kernel.org>; Rafael J. Wysocki <rafael@kernel.org>; Robert > Moore <robert.moore@intel.com>; Robin Murphy > <robin.murphy@arm.com>; Sudeep Holla <sudeep.holla@arm.com>; Will > Deacon <will@kernel.org>; Alex Williamson <alex.williamson@redhat.com>; > Eric Auger <eric.auger@redhat.com>; Jean-Philippe Brucker <jean- > philippe@linaro.org>; Moritz Fischer <mdf@kernel.org>; Michael Shavit > <mshavit@google.com>; patches@lists.linux.dev; Shameerali Kolothum > Thodi <shameerali.kolothum.thodi@huawei.com>; Mostafa Saleh > <smostafa@google.com> > Subject: Re: [PATCH v2 0/8] Initial support for SMMUv3 nested translation > > As mentioned above, the VIOMMU series would be required to test the > entire nesting feature, which now has a v2 rebasing on this series. I tested it > with a paring QEMU branch. Please refer to: > https://lore.kernel.org/linux- > iommu/cover.1724776335.git.nicolinc@nvidia.com/ Thanks for this. I haven't gone through the viommu and its Qemu branch yet. The way we present nested-smmuv3/iommufd to the Qemu seems to have changed with the above Qemu branch(multiple nested SMMUs). The old Qemu command line for nested setup doesn't work anymore. Could you please share an example Qemu command line to verify this series(Sorry, if I missed it in the links/git). Thanks, Shameer
Hi Shameer, On Wed, Aug 28, 2024 at 04:31:36PM +0000, Shameerali Kolothum Thodi wrote: > Hi Nicolin, > > > -----Original Message----- > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Tuesday, August 27, 2024 10:31 PM > > To: Jason Gunthorpe <jgg@nvidia.com> > > Cc: acpica-devel@lists.linux.dev; Guohanjun (Hanjun Guo) > > <guohanjun@huawei.com>; iommu@lists.linux.dev; Joerg Roedel > > <joro@8bytes.org>; Kevin Tian <kevin.tian@intel.com>; > > kvm@vger.kernel.org; Len Brown <lenb@kernel.org>; linux- > > acpi@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Lorenzo Pieralisi > > <lpieralisi@kernel.org>; Rafael J. Wysocki <rafael@kernel.org>; Robert > > Moore <robert.moore@intel.com>; Robin Murphy > > <robin.murphy@arm.com>; Sudeep Holla <sudeep.holla@arm.com>; Will > > Deacon <will@kernel.org>; Alex Williamson <alex.williamson@redhat.com>; > > Eric Auger <eric.auger@redhat.com>; Jean-Philippe Brucker <jean- > > philippe@linaro.org>; Moritz Fischer <mdf@kernel.org>; Michael Shavit > > <mshavit@google.com>; patches@lists.linux.dev; Shameerali Kolothum > > Thodi <shameerali.kolothum.thodi@huawei.com>; Mostafa Saleh > > <smostafa@google.com> > > Subject: Re: [PATCH v2 0/8] Initial support for SMMUv3 nested translation > > > > > As mentioned above, the VIOMMU series would be required to test the > > entire nesting feature, which now has a v2 rebasing on this series. I tested it > > with a paring QEMU branch. Please refer to: > > https://lore.kernel.org/linux- > > iommu/cover.1724776335.git.nicolinc@nvidia.com/ > > Thanks for this. I haven't gone through the viommu and its Qemu branch > yet. The way we present nested-smmuv3/iommufd to the Qemu seems to > have changed with the above Qemu branch(multiple nested SMMUs). > The old Qemu command line for nested setup doesn't work anymore. > > Could you please share an example Qemu command line to verify this > series(Sorry, if I missed it in the links/git). My bad. I updated those two "for_iommufd_" QEMU branches with a README commit on top of each for the reference command. By the way, I wonder how many SMMUv3 instances there are on the platforms that SMMUv3 developers here are running on -- if some one is also working on a chip that has multiple instances? Thanks Nicolin
> -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Wednesday, August 28, 2024 6:15 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; acpica-devel@lists.linux.dev; > Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; > iommu@lists.linux.dev; Joerg Roedel <joro@8bytes.org>; Kevin Tian > <kevin.tian@intel.com>; kvm@vger.kernel.org; Len Brown > <lenb@kernel.org>; linux-acpi@vger.kernel.org; linux-arm- > kernel@lists.infradead.org; Lorenzo Pieralisi <lpieralisi@kernel.org>; Rafael J. > Wysocki <rafael@kernel.org>; Robert Moore <robert.moore@intel.com>; > Robin Murphy <robin.murphy@arm.com>; Sudeep Holla > <sudeep.holla@arm.com>; Will Deacon <will@kernel.org>; Alex Williamson > <alex.williamson@redhat.com>; Eric Auger <eric.auger@redhat.com>; Jean- > Philippe Brucker <jean-philippe@linaro.org>; Moritz Fischer > <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; > patches@lists.linux.dev; Mostafa Saleh <smostafa@google.com> > Subject: Re: [PATCH v2 0/8] Initial support for SMMUv3 nested translation > > Hi Shameer, > > On Wed, Aug 28, 2024 at 04:31:36PM +0000, Shameerali Kolothum Thodi > wrote: > > Hi Nicolin, > > > > > -----Original Message----- > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > Sent: Tuesday, August 27, 2024 10:31 PM > > > To: Jason Gunthorpe <jgg@nvidia.com> > > > Cc: acpica-devel@lists.linux.dev; Guohanjun (Hanjun Guo) > > > <guohanjun@huawei.com>; iommu@lists.linux.dev; Joerg Roedel > > > <joro@8bytes.org>; Kevin Tian <kevin.tian@intel.com>; > > > kvm@vger.kernel.org; Len Brown <lenb@kernel.org>; linux- > > > acpi@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Lorenzo > > > Pieralisi <lpieralisi@kernel.org>; Rafael J. Wysocki > > > <rafael@kernel.org>; Robert Moore <robert.moore@intel.com>; Robin > > > Murphy <robin.murphy@arm.com>; Sudeep Holla > <sudeep.holla@arm.com>; > > > Will Deacon <will@kernel.org>; Alex Williamson > > > <alex.williamson@redhat.com>; Eric Auger <eric.auger@redhat.com>; > > > Jean-Philippe Brucker <jean- philippe@linaro.org>; Moritz Fischer > > > <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; > > > patches@lists.linux.dev; Shameerali Kolothum Thodi > > > <shameerali.kolothum.thodi@huawei.com>; Mostafa Saleh > > > <smostafa@google.com> > > > Subject: Re: [PATCH v2 0/8] Initial support for SMMUv3 nested > > > translation > > > > > > > > As mentioned above, the VIOMMU series would be required to test the > > > entire nesting feature, which now has a v2 rebasing on this series. > > > I tested it with a paring QEMU branch. Please refer to: > > > https://lore.kernel.org/linux- > > > iommu/cover.1724776335.git.nicolinc@nvidia.com/ > > > > Thanks for this. I haven't gone through the viommu and its Qemu branch > > yet. The way we present nested-smmuv3/iommufd to the Qemu seems to > > have changed with the above Qemu branch(multiple nested SMMUs). > > The old Qemu command line for nested setup doesn't work anymore. > > > > Could you please share an example Qemu command line to verify this > > series(Sorry, if I missed it in the links/git). > > My bad. I updated those two "for_iommufd_" QEMU branches with a > README commit on top of each for the reference command. Thanks. I did give it a go and this is my command line based on above, ./qemu-system-aarch64-nicolin-viommu -object iommufd,id=iommufd0 \ -machine hmat=on \ -machine virt,accel=kvm,gic-version=3,iommu=nested-smmuv3,ras=on \ -cpu host -smp cpus=61 -m size=16G,slots=4,maxmem=256G -nographic \ -object memory-backend-ram,size=8G,id=m0 \ -object memory-backend-ram,size=8G,id=m1 \ -numa node,memdev=m0,cpus=0-60,nodeid=0 -numa node,memdev=m1,nodeid=1 \ -device vfio-pci-nohotplug,host=0000:75:00.1,iommufd=iommufd0 \ -bios QEMU_EFI.fd \ -drive if=none,file=ubuntu-18.04-old.img,id=fs \ -device virtio-blk-device,drive=fs \ -kernel Image \ -append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000 kpti=off" \ -nographic But it fails to boot very early: root@ubuntu:/home/shameer/qemu-test# ./qemu_run-simple-iommufd-nicolin-2 qemu-system-aarch64-nicolin-viommu: Illegal numa node 2 Any idea what am I missing? Do you any special config enabled while building Qemu? Thanks, Shameer
On Wed, Aug 28, 2024 at 06:06:36PM +0000, Shameerali Kolothum Thodi wrote: > > > > As mentioned above, the VIOMMU series would be required to test the > > > > entire nesting feature, which now has a v2 rebasing on this series. > > > > I tested it with a paring QEMU branch. Please refer to: > > > > https://lore.kernel.org/linux- > > > > iommu/cover.1724776335.git.nicolinc@nvidia.com/ > > > > > > Thanks for this. I haven't gone through the viommu and its Qemu branch > > > yet. The way we present nested-smmuv3/iommufd to the Qemu seems to > > > have changed with the above Qemu branch(multiple nested SMMUs). > > > The old Qemu command line for nested setup doesn't work anymore. > > > > > > Could you please share an example Qemu command line to verify this > > > series(Sorry, if I missed it in the links/git). > > > > My bad. I updated those two "for_iommufd_" QEMU branches with a > > README commit on top of each for the reference command. > > Thanks. I did give it a go and this is my command line based on above, > But it fails to boot very early: > > root@ubuntu:/home/shameer/qemu-test# ./qemu_run-simple-iommufd-nicolin-2 > qemu-system-aarch64-nicolin-viommu: Illegal numa node 2 > > Any idea what am I missing? Do you any special config enabled while building Qemu? Looks like you are running on a multi-SMMU platform :) Would you please try syncing your local branch? That should work, as the update also had a small change to the virt code: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 161a28a311..a782909016 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1640,7 +1640,7 @@ static PCIBus *create_pcie_expander_bridge(VirtMachineState *vms, uint8_t idx) } qdev_prop_set_uint8(dev, "bus_nr", bus_nr); - qdev_prop_set_uint16(dev, "numa_node", idx); + qdev_prop_set_uint16(dev, "numa_node", 0); qdev_realize_and_unref(dev, BUS(bus), &error_fatal); /* Get the pxb bus */ Thanks Nicolin
> -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Wednesday, August 28, 2024 7:13 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; acpica-devel@lists.linux.dev; > Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; > iommu@lists.linux.dev; Joerg Roedel <joro@8bytes.org>; Kevin Tian > <kevin.tian@intel.com>; kvm@vger.kernel.org; Len Brown > <lenb@kernel.org>; linux-acpi@vger.kernel.org; linux-arm- > kernel@lists.infradead.org; Lorenzo Pieralisi <lpieralisi@kernel.org>; Rafael J. > Wysocki <rafael@kernel.org>; Robert Moore <robert.moore@intel.com>; > Robin Murphy <robin.murphy@arm.com>; Sudeep Holla > <sudeep.holla@arm.com>; Will Deacon <will@kernel.org>; Alex Williamson > <alex.williamson@redhat.com>; Eric Auger <eric.auger@redhat.com>; Jean- > Philippe Brucker <jean-philippe@linaro.org>; Moritz Fischer > <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; > patches@lists.linux.dev; Mostafa Saleh <smostafa@google.com> > Subject: Re: [PATCH v2 0/8] Initial support for SMMUv3 nested translation > > On Wed, Aug 28, 2024 at 06:06:36PM +0000, Shameerali Kolothum Thodi > wrote: > > > > > As mentioned above, the VIOMMU series would be required to test > the > > > > > entire nesting feature, which now has a v2 rebasing on this series. > > > > > I tested it with a paring QEMU branch. Please refer to: > > > > > https://lore.kernel.org/linux- > > > > > iommu/cover.1724776335.git.nicolinc@nvidia.com/ > > > > > > > > Thanks for this. I haven't gone through the viommu and its Qemu > branch > > > > yet. The way we present nested-smmuv3/iommufd to the Qemu seems > to > > > > have changed with the above Qemu branch(multiple nested SMMUs). > > > > The old Qemu command line for nested setup doesn't work anymore. > > > > > > > > Could you please share an example Qemu command line to verify this > > > > series(Sorry, if I missed it in the links/git). > > > > > > My bad. I updated those two "for_iommufd_" QEMU branches with a > > > README commit on top of each for the reference command. > > > > Thanks. I did give it a go and this is my command line based on above, > > > But it fails to boot very early: > > > > root@ubuntu:/home/shameer/qemu-test# ./qemu_run-simple-iommufd- > nicolin-2 > > qemu-system-aarch64-nicolin-viommu: Illegal numa node 2 > > > > Any idea what am I missing? Do you any special config enabled while > building Qemu? > > Looks like you are running on a multi-SMMU platform :) > > Would you please try syncing your local branch? That should work, > as the update also had a small change to the virt code: > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 161a28a311..a782909016 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -1640,7 +1640,7 @@ static PCIBus > *create_pcie_expander_bridge(VirtMachineState *vms, uint8_t idx) > } > > qdev_prop_set_uint8(dev, "bus_nr", bus_nr); > - qdev_prop_set_uint16(dev, "numa_node", idx); > + qdev_prop_set_uint16(dev, "numa_node", 0); > qdev_realize_and_unref(dev, BUS(bus), &error_fatal); That makes some progress. But still I am not seeing the assigned dev in Guest. -device vfio-pci-nohotplug,host=0000:75:00.1,iommufd=iommufd0 root@ubuntu:/# lspci -tv# root@ubuntu:/# lspci -tv -+-[0000:ca]---00.0-[cb]-- \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge +-01.0 Red Hat, Inc Virtio network device +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge +-04.0 Red Hat, Inc. QEMU PCIe Expander bridge +-05.0 Red Hat, Inc. QEMU PCIe Expander bridge +-06.0 Red Hat, Inc. QEMU PCIe Expander bridge +-07.0 Red Hat, Inc. QEMU PCIe Expander bridge +-08.0 Red Hat, Inc. QEMU PCIe Expander bridge \-09.0 Red Hat, Inc. QEMU PCIe Expander bridge The new root port is created, but no device attached. But without iommufd, -device vfio-pci-nohotplug,host=0000:75:00.1 root@ubuntu:/# lspci -tv -[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge +-01.0 Red Hat, Inc Virtio network device +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge +-04.0 Red Hat, Inc. QEMU PCIe Expander bridge +-05.0 Red Hat, Inc. QEMU PCIe Expander bridge +-06.0 Red Hat, Inc. QEMU PCIe Expander bridge +-07.0 Red Hat, Inc. QEMU PCIe Expander bridge +-08.0 Red Hat, Inc. QEMU PCIe Expander bridge +-09.0 Red Hat, Inc. QEMU PCIe Expander bridge \-0a.0 Huawei Technologies Co., Ltd. Device a251 We can see dev a251. And yes the setup has multiple SMMUs(8). Thanks, Shameer
> -----Original Message----- > From: Shameerali Kolothum Thodi > Sent: Thursday, August 29, 2024 2:15 PM > To: 'Nicolin Chen' <nicolinc@nvidia.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; acpica-devel@lists.linux.dev; > Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; > iommu@lists.linux.dev; Joerg Roedel <joro@8bytes.org>; Kevin Tian > <kevin.tian@intel.com>; kvm@vger.kernel.org; Len Brown > <lenb@kernel.org>; linux-acpi@vger.kernel.org; linux-arm- > kernel@lists.infradead.org; Lorenzo Pieralisi <lpieralisi@kernel.org>; Rafael J. > Wysocki <rafael@kernel.org>; Robert Moore <robert.moore@intel.com>; > Robin Murphy <robin.murphy@arm.com>; Sudeep Holla > <sudeep.holla@arm.com>; Will Deacon <will@kernel.org>; Alex Williamson > <alex.williamson@redhat.com>; Eric Auger <eric.auger@redhat.com>; Jean- > Philippe Brucker <jean-philippe@linaro.org>; Moritz Fischer > <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; > patches@lists.linux.dev; Mostafa Saleh <smostafa@google.com> > Subject: RE: [PATCH v2 0/8] Initial support for SMMUv3 nested translation > > That makes some progress. But still I am not seeing the assigned dev in > Guest. > > -device vfio-pci-nohotplug,host=0000:75:00.1,iommufd=iommufd0 > > root@ubuntu:/# lspci -tv# > > root@ubuntu:/# lspci -tv > -+-[0000:ca]---00.0-[cb]-- > \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > +-01.0 Red Hat, Inc Virtio network device > +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-04.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-05.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-06.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-07.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-08.0 Red Hat, Inc. QEMU PCIe Expander bridge > \-09.0 Red Hat, Inc. QEMU PCIe Expander bridge > > The new root port is created, but no device attached. It looks like Guest finds the config invalid: [ 0.283618] PCI host bridge to bus 0000:ca [ 0.284064] ACPI BIOS Error (bug): \_SB.PCF7.PCEE.PCE5.PCDC.PCD3.PCCA._DSM: Excess arguments - ASL declared 5, ACPI requires 4 (20240322/nsarguments-162) [ 0.285533] pci_bus 0000:ca: root bus resource [bus ca] [ 0.286214] pci 0000:ca:00.0: [1b36:000c] type 01 class 0x060400 PCIe Root Port [ 0.287717] pci 0000:ca:00.0: BAR 0 [mem 0x00000000-0x00000fff] [ 0.288431] pci 0000:ca:00.0: PCI bridge to [bus 00] [ 0.290649] pci 0000:ca:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.292476] pci_bus 0000:cb: busn_res: can not insert [bus cb-ca] under [bus ca] (conflicts with (null) [bus ca]) [ 0.293597] pci_bus 0000:cb: busn_res: [bus cb-ca] end is updated to cb [ 0.294300] pci_bus 0000:cb: busn_res: can not insert [bus cb] under [bus ca] (conflicts with (null) [bus ca]) Let me know if you have any clue. Thanks, Shameer
On Thu, Aug 29, 2024 at 02:52:23PM +0000, Shameerali Kolothum Thodi wrote: > > That makes some progress. But still I am not seeing the assigned dev in > > Guest. > > > > -device vfio-pci-nohotplug,host=0000:75:00.1,iommufd=iommufd0 > > > > root@ubuntu:/# lspci -tv# > > > > root@ubuntu:/# lspci -tv > > -+-[0000:ca]---00.0-[cb]-- > > \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > > +-01.0 Red Hat, Inc Virtio network device > > +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > > +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge > > +-04.0 Red Hat, Inc. QEMU PCIe Expander bridge > > +-05.0 Red Hat, Inc. QEMU PCIe Expander bridge > > +-06.0 Red Hat, Inc. QEMU PCIe Expander bridge > > +-07.0 Red Hat, Inc. QEMU PCIe Expander bridge > > +-08.0 Red Hat, Inc. QEMU PCIe Expander bridge > > \-09.0 Red Hat, Inc. QEMU PCIe Expander bridge Hmm, the tree looks correct.. > > The new root port is created, but no device attached. > It looks like Guest finds the config invalid: > > [ 0.283618] PCI host bridge to bus 0000:ca > [ 0.284064] ACPI BIOS Error (bug): \_SB.PCF7.PCEE.PCE5.PCDC.PCD3.PCCA._DSM: Excess arguments - ASL declared 5, ACPI requires 4 (20240322/nsarguments-162) Looks like the DSM change wasn't clean. Yet, this might not be the root cause, as mine could boot with it. Here is mine (I added a print to that conflict part, for success): [ 0.340733] ACPI BIOS Error (bug): \_SB.PCF7.PCEE.PCE5.PCDC._DSM: Excess arguments - ASL declared 5, ACPI requires 4 (20230628/nsarguments-162) [ 0.341776] pci 0000:dc:00.0: [1b36:000c] type 01 class 0x060400 PCIe Root Port [ 0.344895] pci 0000:dc:00.0: BAR 0 [mem 0x10400000-0x10400fff] [ 0.347935] pci 0000:dc:00.0: PCI bridge to [bus dd] [ 0.348410] pci 0000:dc:00.0: bridge window [mem 0x10200000-0x103fffff] [ 0.349483] pci 0000:dc:00.0: bridge window [mem 0x42000000000-0x44080ffffff 64bit pref] [ 0.351459] pci_bus 0000:dd: busn_res: insert [bus dd] under [bus dc-dd] In my case: [root bus (00)] <---[pxb (dc)] <--- [root-port (dd)] <--- dev In your case: [root bus (00)] <---[pxb (ca)] <--- [root-port (cb)] <--- dev > [ 0.285533] pci_bus 0000:ca: root bus resource [bus ca] > [ 0.286214] pci 0000:ca:00.0: [1b36:000c] type 01 class 0x060400 PCIe Root Port > [ 0.287717] pci 0000:ca:00.0: BAR 0 [mem 0x00000000-0x00000fff] > [ 0.288431] pci 0000:ca:00.0: PCI bridge to [bus 00] This starts to diff. Somehow the link is reversed? It should be: [ 0.288431] pci 0000:ca:00.0: PCI bridge to [bus cb] > [ 0.290649] pci 0000:ca:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring > [ 0.292476] pci_bus 0000:cb: busn_res: can not insert [bus cb-ca] under [bus ca] (conflicts with (null) [bus ca]) > [ 0.293597] pci_bus 0000:cb: busn_res: [bus cb-ca] end is updated to cb > [ 0.294300] pci_bus 0000:cb: busn_res: can not insert [bus cb] under [bus ca] (conflicts with (null) [bus ca]) And then everything went south... Would you please try adding some prints? ---------------------------------------------------------------------- @@ -1556,6 +1556,7 @@ static char *create_new_pcie_port(VirtNestedSmmu *nested_smmu, Error **errp) uint32_t bus_nr = pci_bus_num(nested_smmu->pci_bus); DeviceState *dev; char *name_port; + bool ret; /* Create a root port */ dev = qdev_new("pcie-root-port"); @@ -1571,7 +1572,9 @@ static char *create_new_pcie_port(VirtNestedSmmu *nested_smmu, Error **errp) qdev_prop_set_uint32(dev, "chassis", chassis_nr); qdev_prop_set_uint32(dev, "slot", port_nr); qdev_prop_set_uint64(dev, "io-reserve", 0); - qdev_realize_and_unref(dev, BUS(nested_smmu->pci_bus), &error_fatal); + ret = qdev_realize_and_unref(dev, BUS(nested_smmu->pci_bus), &error_fatal); + fprintf(stderr, "ret=%d, pcie-root-port ID: %s, added to pxb_bus num: %x, chassis: %d\n", + ret, name_port, pci_bus_num(nested_smmu->pci_bus), chassis_nr); return name_port; } ---------------------------------------------------------------------- We should make sure that the 'bus_nr' and 'bus' are set correctly and all the realize() returned true?: Thanks Nicolin
> -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Thursday, August 29, 2024 5:10 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; acpica-devel@lists.linux.dev; > Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; iommu@lists.linux.dev; > Joerg Roedel <joro@8bytes.org>; Kevin Tian <kevin.tian@intel.com>; > kvm@vger.kernel.org; Len Brown <lenb@kernel.org>; linux- > acpi@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Lorenzo Pieralisi > <lpieralisi@kernel.org>; Rafael J. Wysocki <rafael@kernel.org>; Robert Moore > <robert.moore@intel.com>; Robin Murphy <robin.murphy@arm.com>; Sudeep > Holla <sudeep.holla@arm.com>; Will Deacon <will@kernel.org>; Alex > Williamson <alex.williamson@redhat.com>; Eric Auger > <eric.auger@redhat.com>; Jean-Philippe Brucker <jean-philippe@linaro.org>; > Moritz Fischer <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; > patches@lists.linux.dev; Mostafa Saleh <smostafa@google.com> > Subject: Re: [PATCH v2 0/8] Initial support for SMMUv3 nested translation > > On Thu, Aug 29, 2024 at 02:52:23PM +0000, Shameerali Kolothum Thodi wrote: > > > That makes some progress. But still I am not seeing the assigned dev in > > > Guest. > > > > > > -device vfio-pci-nohotplug,host=0000:75:00.1,iommufd=iommufd0 > > > > > > root@ubuntu:/# lspci -tv# > > > > > > root@ubuntu:/# lspci -tv > > > -+-[0000:ca]---00.0-[cb]-- > > > \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > > > +-01.0 Red Hat, Inc Virtio network device > > > +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > > > +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge > > > +-04.0 Red Hat, Inc. QEMU PCIe Expander bridge > > > +-05.0 Red Hat, Inc. QEMU PCIe Expander bridge > > > +-06.0 Red Hat, Inc. QEMU PCIe Expander bridge > > > +-07.0 Red Hat, Inc. QEMU PCIe Expander bridge > > > +-08.0 Red Hat, Inc. QEMU PCIe Expander bridge > > > \-09.0 Red Hat, Inc. QEMU PCIe Expander bridge > > Hmm, the tree looks correct.. > > > > The new root port is created, but no device attached. > > It looks like Guest finds the config invalid: > > > > [ 0.283618] PCI host bridge to bus 0000:ca > > [ 0.284064] ACPI BIOS Error (bug): > \_SB.PCF7.PCEE.PCE5.PCDC.PCD3.PCCA._DSM: Excess arguments - ASL declared > 5, ACPI requires 4 (20240322/nsarguments-162) > > Looks like the DSM change wasn't clean. Yet, this might not be the > root cause, as mine could boot with it. Yes. This is not the culprit in this case and was reported earlier as well, https://patchew.org/QEMU/20211005085313.493858-1-eric.auger@redhat.com/20211005085313.493858-2-eric.auger@redhat.com/ > Here is mine (I added a print to that conflict part, for success): > > [ 0.340733] ACPI BIOS Error (bug): \_SB.PCF7.PCEE.PCE5.PCDC._DSM: Excess > arguments - ASL declared 5, ACPI requires 4 (20230628/nsarguments-162) > [ 0.341776] pci 0000:dc:00.0: [1b36:000c] type 01 class 0x060400 PCIe Root > Port > [ 0.344895] pci 0000:dc:00.0: BAR 0 [mem 0x10400000-0x10400fff] > [ 0.347935] pci 0000:dc:00.0: PCI bridge to [bus dd] > [ 0.348410] pci 0000:dc:00.0: bridge window [mem 0x10200000-0x103fffff] > [ 0.349483] pci 0000:dc:00.0: bridge window [mem 0x42000000000- > 0x44080ffffff 64bit pref] > [ 0.351459] pci_bus 0000:dd: busn_res: insert [bus dd] under [bus dc-dd] > > In my case: > [root bus (00)] <---[pxb (dc)] <--- [root-port (dd)] <--- dev > > In your case: > [root bus (00)] <---[pxb (ca)] <--- [root-port (cb)] <--- dev > > > [ 0.285533] pci_bus 0000:ca: root bus resource [bus ca] > > [ 0.286214] pci 0000:ca:00.0: [1b36:000c] type 01 class 0x060400 PCIe Root > Port > > [ 0.287717] pci 0000:ca:00.0: BAR 0 [mem 0x00000000-0x00000fff] > > [ 0.288431] pci 0000:ca:00.0: PCI bridge to [bus 00] > > This starts to diff. Somehow the link is reversed? It should be: > [ 0.288431] pci 0000:ca:00.0: PCI bridge to [bus cb] > > > [ 0.290649] pci 0000:ca:00.0: bridge configuration invalid ([bus 00-00]), > reconfiguring > > [ 0.292476] pci_bus 0000:cb: busn_res: can not insert [bus cb-ca] under [bus > ca] (conflicts with (null) [bus ca]) > > [ 0.293597] pci_bus 0000:cb: busn_res: [bus cb-ca] end is updated to cb > > [ 0.294300] pci_bus 0000:cb: busn_res: can not insert [bus cb] under [bus ca] > (conflicts with (null) [bus ca]) > > And then everything went south... > > Would you please try adding some prints? > ---------------------------------------------------------------------- > @@ -1556,6 +1556,7 @@ static char *create_new_pcie_port(VirtNestedSmmu > *nested_smmu, Error **errp) > uint32_t bus_nr = pci_bus_num(nested_smmu->pci_bus); > DeviceState *dev; > char *name_port; > + bool ret; > > /* Create a root port */ > dev = qdev_new("pcie-root-port"); > @@ -1571,7 +1572,9 @@ static char *create_new_pcie_port(VirtNestedSmmu > *nested_smmu, Error **errp) > qdev_prop_set_uint32(dev, "chassis", chassis_nr); > qdev_prop_set_uint32(dev, "slot", port_nr); > qdev_prop_set_uint64(dev, "io-reserve", 0); > - qdev_realize_and_unref(dev, BUS(nested_smmu->pci_bus), &error_fatal); > + ret = qdev_realize_and_unref(dev, BUS(nested_smmu->pci_bus), > &error_fatal); > + fprintf(stderr, "ret=%d, pcie-root-port ID: %s, added to pxb_bus num: %x, > chassis: %d\n", > + ret, name_port, pci_bus_num(nested_smmu->pci_bus), chassis_nr); > return name_port; > } Print shows everything fine: create_new_pcie_port: name_port smmu_bus0xca_port0, bus_nr 0xca chassis_nr 0xfd, nested_smmu->index 0x2, pci_bus_num 0xca, ret 1 It looks like a problem with old QEMU_EFI.fd(2022 build and before). I tried with 2023 QEMU_EFI.fd and with that it looks fine. root@ubuntu:/# lspci -tv -+-[0000:ca]---00.0-[cb]----00.0 Huawei Technologies Co., Ltd. Device a251 \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge +-01.0 Red Hat, Inc Virtio network device +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge +-04.0 Red Hat, Inc. QEMU PCIe Expander bridge +-05.0 Red Hat, Inc. QEMU PCIe Expander bridge +-06.0 Red Hat, Inc. QEMU PCIe Expander bridge +-07.0 Red Hat, Inc. QEMU PCIe Expander bridge +-08.0 Red Hat, Inc. QEMU PCIe Expander bridge \-09.0 Red Hat, Inc. QEMU PCIe Expander bridge So for now, I can proceed. Thanks, Shameer
On Fri, Aug 30, 2024 at 09:07:16AM +0000, Shameerali Kolothum Thodi wrote: > Print shows everything fine: > create_new_pcie_port: name_port smmu_bus0xca_port0, bus_nr 0xca chassis_nr 0xfd, nested_smmu->index 0x2, pci_bus_num 0xca, ret 1 > > It looks like a problem with old QEMU_EFI.fd(2022 build and before). > I tried with 2023 QEMU_EFI.fd and with that it looks fine. > > root@ubuntu:/# lspci -tv > -+-[0000:ca]---00.0-[cb]----00.0 Huawei Technologies Co., Ltd. Device a251 > \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > +-01.0 Red Hat, Inc Virtio network device > +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-04.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-05.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-06.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-07.0 Red Hat, Inc. QEMU PCIe Expander bridge > +-08.0 Red Hat, Inc. QEMU PCIe Expander bridge > \-09.0 Red Hat, Inc. QEMU PCIe Expander bridge > > So for now, I can proceed. Nice! That's a relief, for now :) Thanks Nicolin
Hi, Nico On Wed, 28 Aug 2024 at 05:32, Nicolin Chen <nicolinc@nvidia.com> wrote: > > On Tue, Aug 27, 2024 at 12:51:30PM -0300, Jason Gunthorpe wrote: > > This brings support for the IOMMFD ioctls: > > > > - IOMMU_GET_HW_INFO > > - IOMMU_HWPT_ALLOC_NEST_PARENT > > - IOMMU_DOMAIN_NESTED > > - ops->enforce_cache_coherency() > > > > This is quite straightforward as the nested STE can just be built in the > > special NESTED domain op and fed through the generic update machinery. > > > > The design allows the user provided STE fragment to control several > > aspects of the translation, including putting the STE into a "virtual > > bypass" or a aborting state. This duplicates functionality available by > > other means, but it allows trivially preserving the VMID in the STE as we > > eventually move towards the VIOMMU owning the VMID. > > > > Nesting support requires the system to either support S2FWB or the > > stronger CANWBS ACPI flag. This is to ensure the VM cannot bypass the > > cache and view incoherent data, currently VFIO lacks any cache flushing > > that would make this safe. > > > > Yan has a series to add some of the needed infrastructure for VFIO cache > > flushing here: > > > > https://lore.kernel.org/linux-iommu/20240507061802.20184-1-yan.y.zhao@intel.com/ > > > > Which may someday allow relaxing this further. > > > > Remove VFIO_TYPE1_NESTING_IOMMU since it was never used and superseded by > > this. > > > > This is the first series in what will be several to complete nesting > > support. At least: > > - IOMMU_RESV_SW_MSI related fixups > > https://lore.kernel.org/linux-iommu/cover.1722644866.git.nicolinc@nvidia.com/ > > - VIOMMU object support to allow ATS and CD invalidations > > https://lore.kernel.org/linux-iommu/cover.1723061377.git.nicolinc@nvidia.com/ > > - vCMDQ hypervisor support for direct invalidation queue assignment > > https://lore.kernel.org/linux-iommu/cover.1712978212.git.nicolinc@nvidia.com/ > > - KVM pinned VMID using VIOMMU for vBTM > > https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/ > > - Cross instance S2 sharing > > - Virtual Machine Structure using VIOMMU (for vMPAM?) > > - Fault forwarding support through IOMMUFD's fault fd for vSVA > > > > The VIOMMU series is essential to allow the invalidations to be processed > > for the CD as well. > > > > It is enough to allow qemu work to progress. > > > > This is on github: https://github.com/jgunthorpe/linux/commits/smmuv3_nesting > > > > v2: > > As mentioned above, the VIOMMU series would be required to test > the entire nesting feature, which now has a v2 rebasing on this > series. I tested it with a paring QEMU branch. Please refer to: > https://lore.kernel.org/linux-iommu/cover.1724776335.git.nicolinc@nvidia.com/ > Also, there is another new VIRQ series on top of the VIOMMU one > and this nesting series. And I tested it too. Please refer to: > https://lore.kernel.org/linux-iommu/cover.1724777091.git.nicolinc@nvidia.com/ > > With that, > > Tested-by: Nicolin Chen <nicolinc@nvidia.com> > Have you tested the user page fault? I got an issue, when a user page fault happens, group->attach_handle = iommu_attach_handle_get(pasid) return NULL. A bit confused here, only find IOMMU_NO_PASID is used when attaching __fault_domain_replace_dev ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, &handle->handle); curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); not find where the code attach user pasid with the attach_handle. Thanks
On Thu, Sep 12, 2024 at 11:42:43AM +0800, Zhangfei Gao wrote: > > > The VIOMMU series is essential to allow the invalidations to be processed > > > for the CD as well. > > > > > > It is enough to allow qemu work to progress. > > > > > > This is on github: https://github.com/jgunthorpe/linux/commits/smmuv3_nesting > > > > > > v2: > > > > As mentioned above, the VIOMMU series would be required to test > > the entire nesting feature, which now has a v2 rebasing on this > > series. I tested it with a paring QEMU branch. Please refer to: > > https://lore.kernel.org/linux-iommu/cover.1724776335.git.nicolinc@nvidia.com/ > > Also, there is another new VIRQ series on top of the VIOMMU one > > and this nesting series. And I tested it too. Please refer to: > > https://lore.kernel.org/linux-iommu/cover.1724777091.git.nicolinc@nvidia.com/ > > > > With that, > > > > Tested-by: Nicolin Chen <nicolinc@nvidia.com> > > > Have you tested the user page fault? No, I don't have a HW to test PRI. So, I've little experience with the IOPF and its counter part in QEMU. I recall that Shameer has a series of changes in QEMU. > I got an issue, when a user page fault happens, > group->attach_handle = iommu_attach_handle_get(pasid) > return NULL. > > A bit confused here, only find IOMMU_NO_PASID is used when attaching > > __fault_domain_replace_dev > ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, > &handle->handle); > curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); > > not find where the code attach user pasid with the attach_handle. In SMMUv3 case (the latest design), a DOMAIN_NESTED is a CD/PASID table. So one single attach_handle (domain/idev) should be enough to cover the entire thing. Please refer to: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/io-pgfault.c?h=v6.11-rc7#n127 Thanks Nicolin
On 9/12/24 11:42 AM, Zhangfei Gao wrote: > On Wed, 28 Aug 2024 at 05:32, Nicolin Chen<nicolinc@nvidia.com> wrote: >> On Tue, Aug 27, 2024 at 12:51:30PM -0300, Jason Gunthorpe wrote: >>> This brings support for the IOMMFD ioctls: >>> >>> - IOMMU_GET_HW_INFO >>> - IOMMU_HWPT_ALLOC_NEST_PARENT >>> - IOMMU_DOMAIN_NESTED >>> - ops->enforce_cache_coherency() >>> >>> This is quite straightforward as the nested STE can just be built in the >>> special NESTED domain op and fed through the generic update machinery. >>> >>> The design allows the user provided STE fragment to control several >>> aspects of the translation, including putting the STE into a "virtual >>> bypass" or a aborting state. This duplicates functionality available by >>> other means, but it allows trivially preserving the VMID in the STE as we >>> eventually move towards the VIOMMU owning the VMID. >>> >>> Nesting support requires the system to either support S2FWB or the >>> stronger CANWBS ACPI flag. This is to ensure the VM cannot bypass the >>> cache and view incoherent data, currently VFIO lacks any cache flushing >>> that would make this safe. >>> >>> Yan has a series to add some of the needed infrastructure for VFIO cache >>> flushing here: >>> >>> https://lore.kernel.org/linux-iommu/20240507061802.20184-1-yan.y.zhao@intel.com/ >>> >>> Which may someday allow relaxing this further. >>> >>> Remove VFIO_TYPE1_NESTING_IOMMU since it was never used and superseded by >>> this. >>> >>> This is the first series in what will be several to complete nesting >>> support. At least: >>> - IOMMU_RESV_SW_MSI related fixups >>> https://lore.kernel.org/linux-iommu/cover.1722644866.git.nicolinc@nvidia.com/ >>> - VIOMMU object support to allow ATS and CD invalidations >>> https://lore.kernel.org/linux-iommu/cover.1723061377.git.nicolinc@nvidia.com/ >>> - vCMDQ hypervisor support for direct invalidation queue assignment >>> https://lore.kernel.org/linux-iommu/cover.1712978212.git.nicolinc@nvidia.com/ >>> - KVM pinned VMID using VIOMMU for vBTM >>> https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/ >>> - Cross instance S2 sharing >>> - Virtual Machine Structure using VIOMMU (for vMPAM?) >>> - Fault forwarding support through IOMMUFD's fault fd for vSVA >>> >>> The VIOMMU series is essential to allow the invalidations to be processed >>> for the CD as well. >>> >>> It is enough to allow qemu work to progress. >>> >>> This is on github:https://github.com/jgunthorpe/linux/commits/smmuv3_nesting >>> >>> v2: >> As mentioned above, the VIOMMU series would be required to test >> the entire nesting feature, which now has a v2 rebasing on this >> series. I tested it with a paring QEMU branch. Please refer to: >> https://lore.kernel.org/linux-iommu/cover.1724776335.git.nicolinc@nvidia.com/ >> Also, there is another new VIRQ series on top of the VIOMMU one >> and this nesting series. And I tested it too. Please refer to: >> https://lore.kernel.org/linux-iommu/cover.1724777091.git.nicolinc@nvidia.com/ >> >> With that, >> >> Tested-by: Nicolin Chen<nicolinc@nvidia.com> >> > Have you tested the user page fault? > > I got an issue, when a user page fault happens, > group->attach_handle = iommu_attach_handle_get(pasid) > return NULL. > > A bit confused here, only find IOMMU_NO_PASID is used when attaching > > __fault_domain_replace_dev > ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, > &handle->handle); > curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); > > not find where the code attach user pasid with the attach_handle. Have you set iommu_ops::user_pasid_table for SMMUv3 driver? Thanks, baolu
On Thu, 12 Sept 2024 at 12:29, Baolu Lu <baolu.lu@linux.intel.com> wrote: > > On 9/12/24 11:42 AM, Zhangfei Gao wrote: > > On Wed, 28 Aug 2024 at 05:32, Nicolin Chen<nicolinc@nvidia.com> wrote: > >> On Tue, Aug 27, 2024 at 12:51:30PM -0300, Jason Gunthorpe wrote: > >>> This brings support for the IOMMFD ioctls: > >>> > >>> - IOMMU_GET_HW_INFO > >>> - IOMMU_HWPT_ALLOC_NEST_PARENT > >>> - IOMMU_DOMAIN_NESTED > >>> - ops->enforce_cache_coherency() > >>> > >>> This is quite straightforward as the nested STE can just be built in the > >>> special NESTED domain op and fed through the generic update machinery. > >>> > >>> The design allows the user provided STE fragment to control several > >>> aspects of the translation, including putting the STE into a "virtual > >>> bypass" or a aborting state. This duplicates functionality available by > >>> other means, but it allows trivially preserving the VMID in the STE as we > >>> eventually move towards the VIOMMU owning the VMID. > >>> > >>> Nesting support requires the system to either support S2FWB or the > >>> stronger CANWBS ACPI flag. This is to ensure the VM cannot bypass the > >>> cache and view incoherent data, currently VFIO lacks any cache flushing > >>> that would make this safe. > >>> > >>> Yan has a series to add some of the needed infrastructure for VFIO cache > >>> flushing here: > >>> > >>> https://lore.kernel.org/linux-iommu/20240507061802.20184-1-yan.y.zhao@intel.com/ > >>> > >>> Which may someday allow relaxing this further. > >>> > >>> Remove VFIO_TYPE1_NESTING_IOMMU since it was never used and superseded by > >>> this. > >>> > >>> This is the first series in what will be several to complete nesting > >>> support. At least: > >>> - IOMMU_RESV_SW_MSI related fixups > >>> https://lore.kernel.org/linux-iommu/cover.1722644866.git.nicolinc@nvidia.com/ > >>> - VIOMMU object support to allow ATS and CD invalidations > >>> https://lore.kernel.org/linux-iommu/cover.1723061377.git.nicolinc@nvidia.com/ > >>> - vCMDQ hypervisor support for direct invalidation queue assignment > >>> https://lore.kernel.org/linux-iommu/cover.1712978212.git.nicolinc@nvidia.com/ > >>> - KVM pinned VMID using VIOMMU for vBTM > >>> https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/ > >>> - Cross instance S2 sharing > >>> - Virtual Machine Structure using VIOMMU (for vMPAM?) > >>> - Fault forwarding support through IOMMUFD's fault fd for vSVA > >>> > >>> The VIOMMU series is essential to allow the invalidations to be processed > >>> for the CD as well. > >>> > >>> It is enough to allow qemu work to progress. > >>> > >>> This is on github:https://github.com/jgunthorpe/linux/commits/smmuv3_nesting > >>> > >>> v2: > >> As mentioned above, the VIOMMU series would be required to test > >> the entire nesting feature, which now has a v2 rebasing on this > >> series. I tested it with a paring QEMU branch. Please refer to: > >> https://lore.kernel.org/linux-iommu/cover.1724776335.git.nicolinc@nvidia.com/ > >> Also, there is another new VIRQ series on top of the VIOMMU one > >> and this nesting series. And I tested it too. Please refer to: > >> https://lore.kernel.org/linux-iommu/cover.1724777091.git.nicolinc@nvidia.com/ > >> > >> With that, > >> > >> Tested-by: Nicolin Chen<nicolinc@nvidia.com> > >> > > Have you tested the user page fault? > > > > I got an issue, when a user page fault happens, > > group->attach_handle = iommu_attach_handle_get(pasid) > > return NULL. > > > > A bit confused here, only find IOMMU_NO_PASID is used when attaching > > > > __fault_domain_replace_dev > > ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, > > &handle->handle); > > curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); > > > > not find where the code attach user pasid with the attach_handle. > > Have you set iommu_ops::user_pasid_table for SMMUv3 driver? Thanks Baolu, Nico Yes, after arm_smmu_ops = { + .user_pasid_table = 1, find_fault_handler can go inside attach_handle = iommu_attach_handle_get(IOMMU_NO_PASID); qemu handler also gets called. But hardware reports errors and needs reset, still in check. [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 Thanks
On Thu, 12 Sept 2024 at 12:29, Baolu Lu <baolu.lu@linux.intel.com> wrote: > > Have you tested the user page fault? > > > > I got an issue, when a user page fault happens, > > group->attach_handle = iommu_attach_handle_get(pasid) > > return NULL. > > > > A bit confused here, only find IOMMU_NO_PASID is used when attaching > > > > __fault_domain_replace_dev > > ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, > > &handle->handle); > > curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); > > > > not find where the code attach user pasid with the attach_handle. > > Have you set iommu_ops::user_pasid_table for SMMUv3 driver? Thanks Baolu Can we send a patch to make it as default? +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3570,6 +3570,7 @@ static struct iommu_ops arm_smmu_ops = { .viommu_alloc = arm_vsmmu_alloc, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, + .user_pasid_table = 1, Thanks
On Tue, Oct 15, 2024 at 11:21:54AM +0800, Zhangfei Gao wrote: > On Thu, 12 Sept 2024 at 12:29, Baolu Lu <baolu.lu@linux.intel.com> wrote: > > > > Have you tested the user page fault? > > > > > > I got an issue, when a user page fault happens, > > > group->attach_handle = iommu_attach_handle_get(pasid) > > > return NULL. > > > > > > A bit confused here, only find IOMMU_NO_PASID is used when attaching > > > > > > __fault_domain_replace_dev > > > ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, > > > &handle->handle); > > > curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); > > > > > > not find where the code attach user pasid with the attach_handle. > > > > Have you set iommu_ops::user_pasid_table for SMMUv3 driver? > > Thanks Baolu > > Can we send a patch to make it as default? > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -3570,6 +3570,7 @@ static struct iommu_ops arm_smmu_ops = { > .viommu_alloc = arm_vsmmu_alloc, > .pgsize_bitmap = -1UL, /* Restricted during device attach */ > .owner = THIS_MODULE, > + .user_pasid_table = 1, You shouldn't need this right now as smmu3 doesn't support nesting domains yet. if (!ops->user_pasid_table) return NULL; /* * The iommu driver for this device supports user- * managed PASID table. Therefore page faults for * any PASID should go through the NESTING domain * attached to the device RID. */ attach_handle = iommu_attach_handle_get( dev->iommu_group, IOMMU_NO_PASID, IOMMU_DOMAIN_NESTED); if (IS_ERR(attach_handle)) ^^^^^^^^^^^^^^^^^^^^^ Will always fail But I will add it to the patch that adds IOMMU_DOMAIN_NESTED Jason
Hi, Jason On Tue, 27 Aug 2024 at 23:51, Jason Gunthorpe <jgg@nvidia.com> wrote: > > This brings support for the IOMMFD ioctls: > > - IOMMU_GET_HW_INFO > - IOMMU_HWPT_ALLOC_NEST_PARENT > - IOMMU_DOMAIN_NESTED > - ops->enforce_cache_coherency() > > This is quite straightforward as the nested STE can just be built in the > special NESTED domain op and fed through the generic update machinery. > > The design allows the user provided STE fragment to control several > aspects of the translation, including putting the STE into a "virtual > bypass" or a aborting state. This duplicates functionality available by > other means, but it allows trivially preserving the VMID in the STE as we > eventually move towards the VIOMMU owning the VMID. > > Nesting support requires the system to either support S2FWB or the > stronger CANWBS ACPI flag. This is to ensure the VM cannot bypass the > cache and view incoherent data, currently VFIO lacks any cache flushing > that would make this safe. What if the system does not support S2FWB or CANWBS, any workaround to passthrough? Currently I am testing nesting by ignoring this check. Thanks
On Wed, Oct 16, 2024 at 10:23:49AM +0800, Zhangfei Gao wrote: > > Nesting support requires the system to either support S2FWB or the > > stronger CANWBS ACPI flag. This is to ensure the VM cannot bypass the > > cache and view incoherent data, currently VFIO lacks any cache flushing > > that would make this safe. > > What if the system does not support S2FWB or CANWBS, any workaround to > passthrough? Eventually we can add the required cache flushing to VFIO, but that would have to be a followup. > Currently I am testing nesting by ignoring this check. This is probably OK, but I wouldn't run it as a production environment with a hostile VM. Jason
On Tue, 15 Oct 2024 at 21:09, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Oct 15, 2024 at 11:21:54AM +0800, Zhangfei Gao wrote: > > On Thu, 12 Sept 2024 at 12:29, Baolu Lu <baolu.lu@linux.intel.com> wrote: > > > > > > Have you tested the user page fault? > > > > > > > > I got an issue, when a user page fault happens, > > > > group->attach_handle = iommu_attach_handle_get(pasid) > > > > return NULL. > > > > > > > > A bit confused here, only find IOMMU_NO_PASID is used when attaching > > > > > > > > __fault_domain_replace_dev > > > > ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, > > > > &handle->handle); > > > > curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); > > > > > > > > not find where the code attach user pasid with the attach_handle. > > > > > > Have you set iommu_ops::user_pasid_table for SMMUv3 driver? > > > > Thanks Baolu > > > > Can we send a patch to make it as default? > > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > @@ -3570,6 +3570,7 @@ static struct iommu_ops arm_smmu_ops = { > > .viommu_alloc = arm_vsmmu_alloc, > > .pgsize_bitmap = -1UL, /* Restricted during device attach */ > > .owner = THIS_MODULE, > > + .user_pasid_table = 1, > > You shouldn't need this right now as smmu3 doesn't support nesting > domains yet. I am testing with .user_pasid_table = 1 and IOMMU_NO_PASID It works for user page faults. > > if (!ops->user_pasid_table) > return NULL; > /* > * The iommu driver for this device supports user- > * managed PASID table. Therefore page faults for > * any PASID should go through the NESTING domain > * attached to the device RID. > */ > attach_handle = iommu_attach_handle_get( > dev->iommu_group, IOMMU_NO_PASID, > IOMMU_DOMAIN_NESTED); > if (IS_ERR(attach_handle)) > ^^^^^^^^^^^^^^^^^^^^^ Will always fail > > > But I will add it to the patch that adds IOMMU_DOMAIN_NESTED OK, cool. Thanks
On Thu, Oct 17, 2024 at 09:53:22AM +0800, Zhangfei Gao wrote: > On Tue, 15 Oct 2024 at 21:09, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Tue, Oct 15, 2024 at 11:21:54AM +0800, Zhangfei Gao wrote: > > > On Thu, 12 Sept 2024 at 12:29, Baolu Lu <baolu.lu@linux.intel.com> wrote: > > > > > > > > Have you tested the user page fault? > > > > > > > > > > I got an issue, when a user page fault happens, > > > > > group->attach_handle = iommu_attach_handle_get(pasid) > > > > > return NULL. > > > > > > > > > > A bit confused here, only find IOMMU_NO_PASID is used when attaching > > > > > > > > > > __fault_domain_replace_dev > > > > > ret = iommu_replace_group_handle(idev->igroup->group, hwpt->domain, > > > > > &handle->handle); > > > > > curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); > > > > > > > > > > not find where the code attach user pasid with the attach_handle. > > > > > > > > Have you set iommu_ops::user_pasid_table for SMMUv3 driver? > > > > > > Thanks Baolu > > > > > > Can we send a patch to make it as default? > > > > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > @@ -3570,6 +3570,7 @@ static struct iommu_ops arm_smmu_ops = { > > > .viommu_alloc = arm_vsmmu_alloc, > > > .pgsize_bitmap = -1UL, /* Restricted during device attach */ > > > .owner = THIS_MODULE, > > > + .user_pasid_table = 1, > > > > You shouldn't need this right now as smmu3 doesn't support nesting > > domains yet. > > I am testing with .user_pasid_table = 1 and IOMMU_NO_PASID > It works for user page faults. You shouldn't need user_pasid_table for that case, it is only necessary if you are doing nesting. Jason