diff mbox series

[RFC,v2,04/10] iommu/riscv: add iotlb_sync_map operation support

Message ID 20240614142156.29420-5-zong.li@sifive.com (mailing list archive)
State RFC
Headers show
Series RISC-V IOMMU HPM and nested IOMMU support | expand

Checks

Context Check Description
conchuod/vmtest-fixes-PR fail merge-conflict

Commit Message

Zong Li June 14, 2024, 2:21 p.m. UTC
Add iotlb_sync_map operation for flush IOTLB. Software must
flush the IOTLB after each page table.

Signed-off-by: Zong Li <zong.li@sifive.com>
---
 drivers/iommu/riscv/Makefile |  1 +
 drivers/iommu/riscv/iommu.c  | 11 +++++++++++
 2 files changed, 12 insertions(+)

Comments

Baolu Lu June 15, 2024, 3:14 a.m. UTC | #1
On 6/14/24 10:21 PM, Zong Li wrote:
> Add iotlb_sync_map operation for flush IOTLB. Software must
> flush the IOTLB after each page table.
> 
> Signed-off-by: Zong Li<zong.li@sifive.com>
> ---
>   drivers/iommu/riscv/Makefile |  1 +
>   drivers/iommu/riscv/iommu.c  | 11 +++++++++++
>   2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> index d36625a1fd08..f02ce6ebfbd0 100644
> --- a/drivers/iommu/riscv/Makefile
> +++ b/drivers/iommu/riscv/Makefile
> @@ -1,3 +1,4 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-platform.o iommu-pmu.o
>   obj-$(CONFIG_RISCV_IOMMU_PCI) += iommu-pci.o
> +obj-$(CONFIG_SIFIVE_IOMMU) += iommu-sifive.o
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 9aeb4b20c145..df7aeb2571ae 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -1115,6 +1115,16 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
>   	riscv_iommu_iotlb_inval(domain, gather->start, gather->end);
>   }
>   
> +static int riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
> +				      unsigned long iova, size_t size)
> +{
> +	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> +
> +	riscv_iommu_iotlb_inval(domain, iova, iova + size - 1);

Does the RISC-V IOMMU architecture always cache the non-present or
erroneous translation entries? If so, can you please provide more
context in the commit message?

If not, why do you want to flush the cache when building a new
translation?

Best regards,
baolu
Zong Li June 17, 2024, 1:43 p.m. UTC | #2
On Sat, Jun 15, 2024 at 11:17 AM Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 6/14/24 10:21 PM, Zong Li wrote:
> > Add iotlb_sync_map operation for flush IOTLB. Software must
> > flush the IOTLB after each page table.
> >
> > Signed-off-by: Zong Li<zong.li@sifive.com>
> > ---
> >   drivers/iommu/riscv/Makefile |  1 +
> >   drivers/iommu/riscv/iommu.c  | 11 +++++++++++
> >   2 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
> > index d36625a1fd08..f02ce6ebfbd0 100644
> > --- a/drivers/iommu/riscv/Makefile
> > +++ b/drivers/iommu/riscv/Makefile
> > @@ -1,3 +1,4 @@
> >   # SPDX-License-Identifier: GPL-2.0-only
> >   obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-platform.o iommu-pmu.o
> >   obj-$(CONFIG_RISCV_IOMMU_PCI) += iommu-pci.o
> > +obj-$(CONFIG_SIFIVE_IOMMU) += iommu-sifive.o
> > diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> > index 9aeb4b20c145..df7aeb2571ae 100644
> > --- a/drivers/iommu/riscv/iommu.c
> > +++ b/drivers/iommu/riscv/iommu.c
> > @@ -1115,6 +1115,16 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
> >       riscv_iommu_iotlb_inval(domain, gather->start, gather->end);
> >   }
> >
> > +static int riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
> > +                                   unsigned long iova, size_t size)
> > +{
> > +     struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
> > +
> > +     riscv_iommu_iotlb_inval(domain, iova, iova + size - 1);
>
> Does the RISC-V IOMMU architecture always cache the non-present or
> erroneous translation entries? If so, can you please provide more
> context in the commit message?
>
> If not, why do you want to flush the cache when building a new
> translation?
>

It seems to me that we can indeed remove this operation, because it
may be too aggressive given the following situation.

I added it for updating the MSI mapping when we change the irq
affinity of a pass-through device to another vCPU. The RISC-V IOMMU
spec allows MSI translation to go through the MSI flat table, MRIF, or
the normal page table. In the case of the normal page table, the MSI
mapping is created in the second-stage page table, mapping the GPA of
the guest's supervisor interrupt file to the HPA of host's guest
interrupt file. This MSI mapping needs to be updated when the HPA of
host's guest interrupt file is changed.

I think we can invalidate the cache after updating the MSI mapping,
rather than adding the iotlb_sync_map() operation for every mapping
created. Does it also make sense to you? If so, I will remove it in
the next version. Thanks.

> Best regards,
> baolu
Jason Gunthorpe June 17, 2024, 2:39 p.m. UTC | #3
On Mon, Jun 17, 2024 at 09:43:35PM +0800, Zong Li wrote:

> I added it for updating the MSI mapping when we change the irq
> affinity of a pass-through device to another vCPU. The RISC-V IOMMU
> spec allows MSI translation to go through the MSI flat table, MRIF, or
> the normal page table. In the case of the normal page table, the MSI
> mapping is created in the second-stage page table, mapping the GPA of
> the guest's supervisor interrupt file to the HPA of host's guest
> interrupt file. This MSI mapping needs to be updated when the HPA of
> host's guest interrupt file is changed.

It sounds like more thought is needed for the MSI architecture, having
the host read the guest page table to mirror weird MSI stuff seems
kind of wrong..

The S2 really needs to have the correct physical MSI pages statically
at boot time.

Jason
Zong Li June 18, 2024, 3:01 a.m. UTC | #4
On Mon, Jun 17, 2024 at 10:39 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Jun 17, 2024 at 09:43:35PM +0800, Zong Li wrote:
>
> > I added it for updating the MSI mapping when we change the irq
> > affinity of a pass-through device to another vCPU. The RISC-V IOMMU
> > spec allows MSI translation to go through the MSI flat table, MRIF, or
> > the normal page table. In the case of the normal page table, the MSI
> > mapping is created in the second-stage page table, mapping the GPA of
> > the guest's supervisor interrupt file to the HPA of host's guest
> > interrupt file. This MSI mapping needs to be updated when the HPA of
> > host's guest interrupt file is changed.
>
> It sounds like more thought is needed for the MSI architecture, having
> the host read the guest page table to mirror weird MSI stuff seems
> kind of wrong..
>

Perhaps I should rephrase it. Host doesn't read the guest page table.
In a RISC-V system, MSIs are directed to a specific privilege level of
a specific hart, including a specific virtual hart. In a hart's IMSIC
(Incoming MSI Controller), it contains some 'interrupt files' for
these specific privilege level harts. For instance, if the target
address of MSI is the address of the interrupt file which is for a
specific supervisor level hart, then that hart's supervisor mode will
receive this MSI. Furthermore, when a hart implements the hypervisor
extension, its IMSIC will have interrupt files for virtual harts,
called 'guest interrupt files'.
We will create the MSI mapping in S2 page table at boot time firstly,
the mapping would be GPA of the interrupt file for supervisor level
(in guest view, it thinks it use a supervisor level interrupt file) to
HPA of the 'guest interrupt file' (in host view, the device should
actually use a guest interrupt file). When the vCPU is migrated to
another physical hart, the 'guest interrupt files' should be switched
to another physical hart's IMSIC's 'guest interrupt file', it means
that the HPA of this MSI mapping in S2 page table needs to be updated.

> The S2 really needs to have the correct physical MSI pages statically
> at boot time.
>
> Jason
Jason Gunthorpe June 18, 2024, 1:31 p.m. UTC | #5
On Tue, Jun 18, 2024 at 11:01:48AM +0800, Zong Li wrote:
> On Mon, Jun 17, 2024 at 10:39 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Jun 17, 2024 at 09:43:35PM +0800, Zong Li wrote:
> >
> > > I added it for updating the MSI mapping when we change the irq
> > > affinity of a pass-through device to another vCPU. The RISC-V IOMMU
> > > spec allows MSI translation to go through the MSI flat table, MRIF, or
> > > the normal page table. In the case of the normal page table, the MSI
> > > mapping is created in the second-stage page table, mapping the GPA of
> > > the guest's supervisor interrupt file to the HPA of host's guest
> > > interrupt file. This MSI mapping needs to be updated when the HPA of
> > > host's guest interrupt file is changed.
> >
> > It sounds like more thought is needed for the MSI architecture, having
> > the host read the guest page table to mirror weird MSI stuff seems
> > kind of wrong..
> 
> Perhaps I should rephrase it. Host doesn't read the guest page table.
> In a RISC-V system, MSIs are directed to a specific privilege level of
> a specific hart, including a specific virtual hart. In a hart's IMSIC
> (Incoming MSI Controller), it contains some 'interrupt files' for
> these specific privilege level harts. For instance, if the target
> address of MSI is the address of the interrupt file which is for a
> specific supervisor level hart, then that hart's supervisor mode will
> receive this MSI. Furthermore, when a hart implements the hypervisor
> extension, its IMSIC will have interrupt files for virtual harts,
> called 'guest interrupt files'.
> We will create the MSI mapping in S2 page table at boot time firstly,
> the mapping would be GPA of the interrupt file for supervisor level
> (in guest view, it thinks it use a supervisor level interrupt file) to
> HPA of the 'guest interrupt file' (in host view, the device should
> actually use a guest interrupt file). When the vCPU is migrated to
> another physical hart, the 'guest interrupt files' should be switched
> to another physical hart's IMSIC's 'guest interrupt file', it means
> that the HPA of this MSI mapping in S2 page table needs to be updated.

I am vaugely aware of these details, but it is good to hear them again.

However, none of that really explains why this is messing with
invalidation logic..

If you need to replace MSI pages in the S2 atomicaly as you migrate
vCPUs then you need a proper replace operation for the io page table.

map is supposed to fail if there are already mappings at that address,
you can't use it to replace existing mappings with something else.

Jason
diff mbox series

Patch

diff --git a/drivers/iommu/riscv/Makefile b/drivers/iommu/riscv/Makefile
index d36625a1fd08..f02ce6ebfbd0 100644
--- a/drivers/iommu/riscv/Makefile
+++ b/drivers/iommu/riscv/Makefile
@@ -1,3 +1,4 @@ 
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-platform.o iommu-pmu.o
 obj-$(CONFIG_RISCV_IOMMU_PCI) += iommu-pci.o
+obj-$(CONFIG_SIFIVE_IOMMU) += iommu-sifive.o
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 9aeb4b20c145..df7aeb2571ae 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1115,6 +1115,16 @@  static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
 	riscv_iommu_iotlb_inval(domain, gather->start, gather->end);
 }
 
+static int riscv_iommu_iotlb_sync_map(struct iommu_domain *iommu_domain,
+				      unsigned long iova, size_t size)
+{
+	struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain);
+
+	riscv_iommu_iotlb_inval(domain, iova, iova + size - 1);
+
+	return 0;
+}
+
 static inline size_t get_page_size(size_t size)
 {
 	if (size >= IOMMU_PAGE_SIZE_512G)
@@ -1396,6 +1406,7 @@  static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = {
 	.unmap_pages = riscv_iommu_unmap_pages,
 	.iova_to_phys = riscv_iommu_iova_to_phys,
 	.iotlb_sync = riscv_iommu_iotlb_sync,
+	.iotlb_sync_map = riscv_iommu_iotlb_sync_map,
 	.flush_iotlb_all = riscv_iommu_iotlb_flush_all,
 };