Message ID | 20240614142156.29420-5-zong.li@sifive.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | RISC-V IOMMU HPM and nested IOMMU support | expand |
Context | Check | Description |
---|---|---|
conchuod/vmtest-fixes-PR | fail | merge-conflict |
On 6/14/24 10:21 PM, Zong Li wrote: > Add iotlb_sync_map operation for flush IOTLB. Software must > flush the IOTLB after each page table. > > Signed-off-by: Zong Li<zong.li@sifive.com> > --- > drivers/iommu/riscv/Makefile | 1 + > drivers/iommu/riscv/iommu.c | 11 +++++++++++ > 2 files changed, 12 insertions(+) > > diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile > index d36625a1fd08..f02ce6ebfbd0 100644 > --- a/drivers/iommu/riscv/Makefile > +++ b/drivers/iommu/riscv/Makefile > @@ -1,3 +1,4 @@ > # SPDX-License-Identifier: GPL-2.0-only > obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-platform.o iommu-pmu.o > obj-$(CONFIG_RISCV_IOMMU_PCI) += iommu-pci.o > +obj-$(CONFIG_SIFIVE_IOMMU) += iommu-sifive.o > diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c > index 9aeb4b20c145..df7aeb2571ae 100644 > --- a/drivers/iommu/riscv/iommu.c > +++ b/drivers/iommu/riscv/iommu.c > @@ -1115,6 +1115,16 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, > riscv_iommu_iotlb_inval(domain, gather->start, gather->end); > } > > +static int riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain, > + unsigned long iova, size_t size) > +{ > + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); > + > + riscv_iommu_iotlb_inval(domain, iova, iova + size - 1); Does the RISC-V IOMMU architecture always cache the non-present or erroneous translation entries? If so, can you please provide more context in the commit message? If not, why do you want to flush the cache when building a new translation? Best regards, baolu
On Sat, Jun 15, 2024 at 11:17 AM Baolu Lu <baolu.lu@linux.intel.com> wrote: > > On 6/14/24 10:21 PM, Zong Li wrote: > > Add iotlb_sync_map operation for flush IOTLB. Software must > > flush the IOTLB after each page table. > > > > Signed-off-by: Zong Li<zong.li@sifive.com> > > --- > > drivers/iommu/riscv/Makefile | 1 + > > drivers/iommu/riscv/iommu.c | 11 +++++++++++ > > 2 files changed, 12 insertions(+) > > > > diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile > > index d36625a1fd08..f02ce6ebfbd0 100644 > > --- a/drivers/iommu/riscv/Makefile > > +++ b/drivers/iommu/riscv/Makefile > > @@ -1,3 +1,4 @@ > > # SPDX-License-Identifier: GPL-2.0-only > > obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-platform.o iommu-pmu.o > > obj-$(CONFIG_RISCV_IOMMU_PCI) += iommu-pci.o > > +obj-$(CONFIG_SIFIVE_IOMMU) += iommu-sifive.o > > diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c > > index 9aeb4b20c145..df7aeb2571ae 100644 > > --- a/drivers/iommu/riscv/iommu.c > > +++ b/drivers/iommu/riscv/iommu.c > > @@ -1115,6 +1115,16 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, > > riscv_iommu_iotlb_inval(domain, gather->start, gather->end); > > } > > > > +static int riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain, > > + unsigned long iova, size_t size) > > +{ > > + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); > > + > > + riscv_iommu_iotlb_inval(domain, iova, iova + size - 1); > > Does the RISC-V IOMMU architecture always cache the non-present or > erroneous translation entries? If so, can you please provide more > context in the commit message? > > If not, why do you want to flush the cache when building a new > translation? > It seems to me that we can indeed remove this operation, because it may be too aggressive given the following situation. I added it for updating the MSI mapping when we change the irq affinity of a pass-through device to another vCPU. The RISC-V IOMMU spec allows MSI translation to go through the MSI flat table, MRIF, or the normal page table. In the case of the normal page table, the MSI mapping is created in the second-stage page table, mapping the GPA of the guest's supervisor interrupt file to the HPA of host's guest interrupt file. This MSI mapping needs to be updated when the HPA of host's guest interrupt file is changed. I think we can invalidate the cache after updating the MSI mapping, rather than adding the iotlb_sync_map() operation for every mapping created. Does it also make sense to you? If so, I will remove it in the next version. Thanks. > Best regards, > baolu
On Mon, Jun 17, 2024 at 09:43:35PM +0800, Zong Li wrote: > I added it for updating the MSI mapping when we change the irq > affinity of a pass-through device to another vCPU. The RISC-V IOMMU > spec allows MSI translation to go through the MSI flat table, MRIF, or > the normal page table. In the case of the normal page table, the MSI > mapping is created in the second-stage page table, mapping the GPA of > the guest's supervisor interrupt file to the HPA of host's guest > interrupt file. This MSI mapping needs to be updated when the HPA of > host's guest interrupt file is changed. It sounds like more thought is needed for the MSI architecture, having the host read the guest page table to mirror weird MSI stuff seems kind of wrong.. The S2 really needs to have the correct physical MSI pages statically at boot time. Jason
On Mon, Jun 17, 2024 at 10:39 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Mon, Jun 17, 2024 at 09:43:35PM +0800, Zong Li wrote: > > > I added it for updating the MSI mapping when we change the irq > > affinity of a pass-through device to another vCPU. The RISC-V IOMMU > > spec allows MSI translation to go through the MSI flat table, MRIF, or > > the normal page table. In the case of the normal page table, the MSI > > mapping is created in the second-stage page table, mapping the GPA of > > the guest's supervisor interrupt file to the HPA of host's guest > > interrupt file. This MSI mapping needs to be updated when the HPA of > > host's guest interrupt file is changed. > > It sounds like more thought is needed for the MSI architecture, having > the host read the guest page table to mirror weird MSI stuff seems > kind of wrong.. > Perhaps I should rephrase it. Host doesn't read the guest page table. In a RISC-V system, MSIs are directed to a specific privilege level of a specific hart, including a specific virtual hart. In a hart's IMSIC (Incoming MSI Controller), it contains some 'interrupt files' for these specific privilege level harts. For instance, if the target address of MSI is the address of the interrupt file which is for a specific supervisor level hart, then that hart's supervisor mode will receive this MSI. Furthermore, when a hart implements the hypervisor extension, its IMSIC will have interrupt files for virtual harts, called 'guest interrupt files'. We will create the MSI mapping in S2 page table at boot time firstly, the mapping would be GPA of the interrupt file for supervisor level (in guest view, it thinks it use a supervisor level interrupt file) to HPA of the 'guest interrupt file' (in host view, the device should actually use a guest interrupt file). When the vCPU is migrated to another physical hart, the 'guest interrupt files' should be switched to another physical hart's IMSIC's 'guest interrupt file', it means that the HPA of this MSI mapping in S2 page table needs to be updated. > The S2 really needs to have the correct physical MSI pages statically > at boot time. > > Jason
On Tue, Jun 18, 2024 at 11:01:48AM +0800, Zong Li wrote: > On Mon, Jun 17, 2024 at 10:39 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Mon, Jun 17, 2024 at 09:43:35PM +0800, Zong Li wrote: > > > > > I added it for updating the MSI mapping when we change the irq > > > affinity of a pass-through device to another vCPU. The RISC-V IOMMU > > > spec allows MSI translation to go through the MSI flat table, MRIF, or > > > the normal page table. In the case of the normal page table, the MSI > > > mapping is created in the second-stage page table, mapping the GPA of > > > the guest's supervisor interrupt file to the HPA of host's guest > > > interrupt file. This MSI mapping needs to be updated when the HPA of > > > host's guest interrupt file is changed. > > > > It sounds like more thought is needed for the MSI architecture, having > > the host read the guest page table to mirror weird MSI stuff seems > > kind of wrong.. > > Perhaps I should rephrase it. Host doesn't read the guest page table. > In a RISC-V system, MSIs are directed to a specific privilege level of > a specific hart, including a specific virtual hart. In a hart's IMSIC > (Incoming MSI Controller), it contains some 'interrupt files' for > these specific privilege level harts. For instance, if the target > address of MSI is the address of the interrupt file which is for a > specific supervisor level hart, then that hart's supervisor mode will > receive this MSI. Furthermore, when a hart implements the hypervisor > extension, its IMSIC will have interrupt files for virtual harts, > called 'guest interrupt files'. > We will create the MSI mapping in S2 page table at boot time firstly, > the mapping would be GPA of the interrupt file for supervisor level > (in guest view, it thinks it use a supervisor level interrupt file) to > HPA of the 'guest interrupt file' (in host view, the device should > actually use a guest interrupt file). When the vCPU is migrated to > another physical hart, the 'guest interrupt files' should be switched > to another physical hart's IMSIC's 'guest interrupt file', it means > that the HPA of this MSI mapping in S2 page table needs to be updated. I am vaugely aware of these details, but it is good to hear them again. However, none of that really explains why this is messing with invalidation logic.. If you need to replace MSI pages in the S2 atomicaly as you migrate vCPUs then you need a proper replace operation for the io page table. map is supposed to fail if there are already mappings at that address, you can't use it to replace existing mappings with something else. Jason
diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile index d36625a1fd08..f02ce6ebfbd0 100644 --- a/drivers/iommu/riscv/Makefile +++ b/drivers/iommu/riscv/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-platform.o iommu-pmu.o obj-$(CONFIG_RISCV_IOMMU_PCI) += iommu-pci.o +obj-$(CONFIG_SIFIVE_IOMMU) += iommu-sifive.o diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 9aeb4b20c145..df7aeb2571ae 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -1115,6 +1115,16 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, riscv_iommu_iotlb_inval(domain, gather->start, gather->end); } +static int riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain, + unsigned long iova, size_t size) +{ + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); + + riscv_iommu_iotlb_inval(domain, iova, iova + size - 1); + + return 0; +} + static inline size_t get_page_size(size_t size) { if (size >= IOMMU_PAGE_SIZE_512G) @@ -1396,6 +1406,7 @@ static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = { .unmap_pages = riscv_iommu_unmap_pages, .iova_to_phys = riscv_iommu_iova_to_phys, .iotlb_sync = riscv_iommu_iotlb_sync, + .iotlb_sync_map = riscv_iommu_iotlb_sync_map, .flush_iotlb_all = riscv_iommu_iotlb_flush_all, };
Add iotlb_sync_map operation for flush IOTLB. Software must flush the IOTLB after each page table. Signed-off-by: Zong Li <zong.li@sifive.com> --- drivers/iommu/riscv/Makefile | 1 + drivers/iommu/riscv/iommu.c | 11 +++++++++++ 2 files changed, 12 insertions(+)