diff mbox series

[2/6] system/physmem: IOMMU: Invoke the translate_size function if it is implemented

Message ID 20231025051430.493079-3-ethan84@andestech.com (mailing list archive)
State New, archived
Headers show
Series Support RISC-V IOPMP | expand

Commit Message

Ethan Chen Oct. 25, 2023, 5:14 a.m. UTC
Signed-off-by: Ethan Chen <ethan84@andestech.com>
---
 system/physmem.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Peter Xu Oct. 25, 2023, 3:14 p.m. UTC | #1
On Wed, Oct 25, 2023 at 01:14:26PM +0800, Ethan Chen wrote:
> Signed-off-by: Ethan Chen <ethan84@andestech.com>
> ---
>  system/physmem.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/system/physmem.c b/system/physmem.c
> index fc2b0fee01..53b6ab735c 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -432,8 +432,13 @@ static MemoryRegionSection address_space_translate_iommu(IOMMUMemoryRegion *iomm
>              iommu_idx = imrc->attrs_to_index(iommu_mr, attrs);
>          }
>  
> -        iotlb = imrc->translate(iommu_mr, addr, is_write ?
> -                                IOMMU_WO : IOMMU_RO, iommu_idx);
> +        if (imrc->translate_size) {
> +            iotlb = imrc->translate_size(iommu_mr, addr, *plen_out, is_write ?
> +                                         IOMMU_WO : IOMMU_RO, iommu_idx);
> +        } else {
> +            iotlb = imrc->translate(iommu_mr, addr, is_write ?
> +                                    IOMMU_WO : IOMMU_RO, iommu_idx);
> +        }

Currently the translation size is encoded in iotlb.addr_mask.  Can riscv do
the same?

For example, lookup addr in match_entry_md() ranges, report size back into
iotlb.addr_mask, rather than enforcing *plen_out range always resides in
one translation only.

IMHO it's actually legal if *plen_out covers more than one IOMMU
translations.  QEMU memory core should have taken care of that by
separately translate the ranges and apply RW on top.  With current proposal
of translate_size() I think it'll fail instead, which is not wanted.

Thanks,
Ethan Chen Oct. 26, 2023, 6:48 a.m. UTC | #2
On Wed, Oct 25, 2023 at 11:14:42AM -0400, Peter Xu wrote:
> On Wed, Oct 25, 2023 at 01:14:26PM +0800, Ethan Chen wrote:
> > Signed-off-by: Ethan Chen <ethan84@andestech.com>
> > ---
> >  system/physmem.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/system/physmem.c b/system/physmem.c
> > index fc2b0fee01..53b6ab735c 100644
> > --- a/system/physmem.c
> > +++ b/system/physmem.c
> > @@ -432,8 +432,13 @@ static MemoryRegionSection address_space_translate_iommu(IOMMUMemoryRegion *iomm
> >              iommu_idx = imrc->attrs_to_index(iommu_mr, attrs);
> >          }
> >  
> > -        iotlb = imrc->translate(iommu_mr, addr, is_write ?
> > -                                IOMMU_WO : IOMMU_RO, iommu_idx);
> > +        if (imrc->translate_size) {
> > +            iotlb = imrc->translate_size(iommu_mr, addr, *plen_out, is_write ?
> > +                                         IOMMU_WO : IOMMU_RO, iommu_idx);
> > +        } else {
> > +            iotlb = imrc->translate(iommu_mr, addr, is_write ?
> > +                                    IOMMU_WO : IOMMU_RO, iommu_idx);
> > +        }
> 
> Currently the translation size is encoded in iotlb.addr_mask.  Can riscv do
> the same?
Riscv do the same, so translation size may be reduced by iotlb.addr_mask.
>
> For example, lookup addr in match_entry_md() ranges, report size back into
> iotlb.addr_mask, rather than enforcing *plen_out range always resides in
> one translation only.
>
> IMHO it's actually legal if *plen_out covers more than one IOMMU
> translations.  QEMU memory core should have taken care of that by
> separately translate the ranges and apply RW on top.  With current proposal
> of translate_size() I think it'll fail instead, which is not wanted.
>
My target is to support IOPMP partially hit error. IOPMP checks whole memory 
access region is in the same entry. If not, reject the access instead of modify
the access size.

Because most of IOPMP permisson checking features can be implemented by 
current IOMMU class, so I add this function in IOMMU class. There may be 
other more suitable ways to support partially hit error.
> Thanks,
> 
> -- 
> Peter Xu
> 
Thanks,
Ethan Chen
Peter Xu Oct. 26, 2023, 2:20 p.m. UTC | #3
On Thu, Oct 26, 2023 at 02:48:14PM +0800, Ethan Chen wrote:
> My target is to support IOPMP partially hit error. IOPMP checks whole memory 
> access region is in the same entry. If not, reject the access instead of modify
> the access size.

Could you elaborate why is that important?  In what use case?

Consider IOVA mapped for address range iova=[0, 4K] only, here we have a
DMA request with range=[0, 8K].  Now my understanding is what you want to
achieve is don't trigger the DMA to [0, 4K] and fail the whole [0, 8K]
request.

Can we just fail at the latter DMA [4K, 8K] when it happens?  After all,
IIUC a device can split the 0-8K DMA into two smaller DMAs, then the 1st
chunk can succeed then if it falls in 0-4K.  Some further explanation of
the failure use case could be helpful.

Thanks,
Ethan Chen Oct. 27, 2023, 3:28 a.m. UTC | #4
On Thu, Oct 26, 2023 at 10:20:41AM -0400, Peter Xu wrote:
> Could you elaborate why is that important?  In what use case?
I was not involved in the formulation of the IOPMP specification, but I'll try
to explain my perspective. IOPMP use the same the idea as PMP. "The matching 
PMP entry must match all bytes of an access, or the access fails."

> 
> Consider IOVA mapped for address range iova=[0, 4K] only, here we have a
> DMA request with range=[0, 8K].  Now my understanding is what you want to
> achieve is don't trigger the DMA to [0, 4K] and fail the whole [0, 8K]
> request.
> 
> Can we just fail at the latter DMA [4K, 8K] when it happens?  After all,
> IIUC a device can split the 0-8K DMA into two smaller DMAs, then the 1st
> chunk can succeed then if it falls in 0-4K.  Some further explanation of
> the failure use case could be helpful.

IOPMP can only detect partially hit in an access. DMA device will split a 
large DMA transfer to small DMA transfers base on target and DMA transfer 
width, so partially hit error only happens when an access cross the boundary.
But to ensure that an access is only within one entry is still important. 
For example, an entry may mean permission of a device memory region. We do 
not want to see one DMA transfer can access mutilple devices, although DMA 
have permissions from multiple entries.

Thanks,
Ethan Chen
Peter Xu Oct. 27, 2023, 4:02 p.m. UTC | #5
On Fri, Oct 27, 2023 at 11:28:36AM +0800, Ethan Chen wrote:
> On Thu, Oct 26, 2023 at 10:20:41AM -0400, Peter Xu wrote:
> > Could you elaborate why is that important?  In what use case?
> I was not involved in the formulation of the IOPMP specification, but I'll try
> to explain my perspective. IOPMP use the same the idea as PMP. "The matching 
> PMP entry must match all bytes of an access, or the access fails."
> 
> > 
> > Consider IOVA mapped for address range iova=[0, 4K] only, here we have a
> > DMA request with range=[0, 8K].  Now my understanding is what you want to
> > achieve is don't trigger the DMA to [0, 4K] and fail the whole [0, 8K]
> > request.
> > 
> > Can we just fail at the latter DMA [4K, 8K] when it happens?  After all,
> > IIUC a device can split the 0-8K DMA into two smaller DMAs, then the 1st
> > chunk can succeed then if it falls in 0-4K.  Some further explanation of
> > the failure use case could be helpful.
> 
> IOPMP can only detect partially hit in an access. DMA device will split a 
> large DMA transfer to small DMA transfers base on target and DMA transfer 
> width, so partially hit error only happens when an access cross the boundary.
> But to ensure that an access is only within one entry is still important. 
> For example, an entry may mean permission of a device memory region. We do 
> not want to see one DMA transfer can access mutilple devices, although DMA 
> have permissions from multiple entries.

I was expecting a DMA request can be fulfilled successfully as long as the
DMA translations are valid for the whole range of the request, even if the
requested range may include two separate translated targets or more, each
point to different places (either RAM, or other devicie's MMIO regions).

AFAIK currently QEMU memory model will automatically split that large
request into two or more smaller requests, and fulfill them separately by
two/more IOMMU translations, with its memory access dispatched to the
specific memory regions.

The example you provided doesn't seem to be RISCV specific.  Do you mean it
is a generic requirement from PCI/PCIe POV, or is it only a restriction of
IOPMP?  If it's a valid PCI restriction, does it mean that all the rest
IOMMU implementations in QEMU currently are broken?

[copy Michael and Igor]

Thanks,
Peter Xu Oct. 27, 2023, 4:13 p.m. UTC | #6
Add cc list.

On Fri, Oct 27, 2023 at 12:02:24PM -0400, Peter Xu wrote:
> On Fri, Oct 27, 2023 at 11:28:36AM +0800, Ethan Chen wrote:
> > On Thu, Oct 26, 2023 at 10:20:41AM -0400, Peter Xu wrote:
> > > Could you elaborate why is that important?  In what use case?
> > I was not involved in the formulation of the IOPMP specification, but I'll try
> > to explain my perspective. IOPMP use the same the idea as PMP. "The matching 
> > PMP entry must match all bytes of an access, or the access fails."
> > 
> > > 
> > > Consider IOVA mapped for address range iova=[0, 4K] only, here we have a
> > > DMA request with range=[0, 8K].  Now my understanding is what you want to
> > > achieve is don't trigger the DMA to [0, 4K] and fail the whole [0, 8K]
> > > request.
> > > 
> > > Can we just fail at the latter DMA [4K, 8K] when it happens?  After all,
> > > IIUC a device can split the 0-8K DMA into two smaller DMAs, then the 1st
> > > chunk can succeed then if it falls in 0-4K.  Some further explanation of
> > > the failure use case could be helpful.
> > 
> > IOPMP can only detect partially hit in an access. DMA device will split a 
> > large DMA transfer to small DMA transfers base on target and DMA transfer 
> > width, so partially hit error only happens when an access cross the boundary.
> > But to ensure that an access is only within one entry is still important. 
> > For example, an entry may mean permission of a device memory region. We do 
> > not want to see one DMA transfer can access mutilple devices, although DMA 
> > have permissions from multiple entries.
> 
> I was expecting a DMA request can be fulfilled successfully as long as the
> DMA translations are valid for the whole range of the request, even if the
> requested range may include two separate translated targets or more, each
> point to different places (either RAM, or other devicie's MMIO regions).
> 
> AFAIK currently QEMU memory model will automatically split that large
> request into two or more smaller requests, and fulfill them separately by
> two/more IOMMU translations, with its memory access dispatched to the
> specific memory regions.
> 
> The example you provided doesn't seem to be RISCV specific.  Do you mean it
> is a generic requirement from PCI/PCIe POV, or is it only a restriction of
> IOPMP?  If it's a valid PCI restriction, does it mean that all the rest
> IOMMU implementations in QEMU currently are broken?
> 
> [copy Michael and Igor]
> 
> Thanks,
> 
> -- 
> Peter Xu
Ethan Chen Oct. 30, 2023, 6 a.m. UTC | #7
On Fri, Oct 27, 2023 at 12:13:50PM -0400, Peter Xu wrote:
> Add cc list.
> 
> On Fri, Oct 27, 2023 at 12:02:24PM -0400, Peter Xu wrote:
> > On Fri, Oct 27, 2023 at 11:28:36AM +0800, Ethan Chen wrote:
> > > On Thu, Oct 26, 2023 at 10:20:41AM -0400, Peter Xu wrote:
> > > > Could you elaborate why is that important?  In what use case?
> > > I was not involved in the formulation of the IOPMP specification, but I'll try
> > > to explain my perspective. IOPMP use the same the idea as PMP. "The matching 
> > > PMP entry must match all bytes of an access, or the access fails."
> > > 
> > > > 
> > > > Consider IOVA mapped for address range iova=[0, 4K] only, here we have a
> > > > DMA request with range=[0, 8K].  Now my understanding is what you want to
> > > > achieve is don't trigger the DMA to [0, 4K] and fail the whole [0, 8K]
> > > > request.
> > > > 
> > > > Can we just fail at the latter DMA [4K, 8K] when it happens?  After all,
> > > > IIUC a device can split the 0-8K DMA into two smaller DMAs, then the 1st
> > > > chunk can succeed then if it falls in 0-4K.  Some further explanation of
> > > > the failure use case could be helpful.
> > > 
> > > IOPMP can only detect partially hit in an access. DMA device will split a 
> > > large DMA transfer to small DMA transfers base on target and DMA transfer 
> > > width, so partially hit error only happens when an access cross the boundary.
> > > But to ensure that an access is only within one entry is still important. 
> > > For example, an entry may mean permission of a device memory region. We do 
> > > not want to see one DMA transfer can access mutilple devices, although DMA 
> > > have permissions from multiple entries.
> > 
> > I was expecting a DMA request can be fulfilled successfully as long as the
> > DMA translations are valid for the whole range of the request, even if the
> > requested range may include two separate translated targets or more, each
> > point to different places (either RAM, or other devicie's MMIO regions).

IOPMP is used to check DMA translation is vaild or not. In IOPMP specification
, a translation access more than one entry is not invalid.
Though it is not recommand, user can create an IOPMP entry contains mutiple
places to make this kind translations valid.

> > 
> > AFAIK currently QEMU memory model will automatically split that large
> > request into two or more smaller requests, and fulfill them separately by
> > two/more IOMMU translations, with its memory access dispatched to the
> > specific memory regions.

Because of requests may be split, I need a method to take the original request
information to IOPMP.

> > 
> > The example you provided doesn't seem to be RISCV specific.  Do you mean it
> > is a generic requirement from PCI/PCIe POV, or is it only a restriction of
> > IOPMP?  If it's a valid PCI restriction, does it mean that all the rest
> > IOMMU implementations in QEMU currently are broken?
> > 

It only a restriction of IOPMP.

Thanks,
Ethan Chen
Ethan Chen Oct. 31, 2023, 8:52 a.m. UTC | #8
On Mon, Oct 30, 2023 at 11:02:30AM -0400, Peter Xu wrote:
> On Mon, Oct 30, 2023 at 02:00:54PM +0800, Ethan Chen wrote:
> > On Fri, Oct 27, 2023 at 12:13:50PM -0400, Peter Xu wrote:
> > > Add cc list.
> > > 
> > > On Fri, Oct 27, 2023 at 12:02:24PM -0400, Peter Xu wrote:
> > > > On Fri, Oct 27, 2023 at 11:28:36AM +0800, Ethan Chen wrote:
> > > > > On Thu, Oct 26, 2023 at 10:20:41AM -0400, Peter Xu wrote:
> > > > > > Could you elaborate why is that important?  In what use case?
> > > > > I was not involved in the formulation of the IOPMP specification, but I'll try
> > > > > to explain my perspective. IOPMP use the same the idea as PMP. "The matching 
> > > > > PMP entry must match all bytes of an access, or the access fails."
> > > > > 
> > > > > > 
> > > > > > Consider IOVA mapped for address range iova=[0, 4K] only, here we have a
> > > > > > DMA request with range=[0, 8K].  Now my understanding is what you want to
> > > > > > achieve is don't trigger the DMA to [0, 4K] and fail the whole [0, 8K]
> > > > > > request.
> > > > > > 
> > > > > > Can we just fail at the latter DMA [4K, 8K] when it happens?  After all,
> > > > > > IIUC a device can split the 0-8K DMA into two smaller DMAs, then the 1st
> > > > > > chunk can succeed then if it falls in 0-4K.  Some further explanation of
> > > > > > the failure use case could be helpful.
> > > > > 
> > > > > IOPMP can only detect partially hit in an access. DMA device will split a 
> > > > > large DMA transfer to small DMA transfers base on target and DMA transfer 
> > > > > width, so partially hit error only happens when an access cross the boundary.
> > > > > But to ensure that an access is only within one entry is still important. 
> > > > > For example, an entry may mean permission of a device memory region. We do 
> > > > > not want to see one DMA transfer can access mutilple devices, although DMA 
> > > > > have permissions from multiple entries.
> > > > 
> > > > I was expecting a DMA request can be fulfilled successfully as long as the
> > > > DMA translations are valid for the whole range of the request, even if the
> > > > requested range may include two separate translated targets or more, each
> > > > point to different places (either RAM, or other devicie's MMIO regions).
> > 
> > IOPMP is used to check DMA translation is vaild or not. In IOPMP specification
> > , a translation access more than one entry is not invalid.
> > Though it is not recommand, user can create an IOPMP entry contains mutiple
> > places to make this kind translations valid.
> > 
> > > > 
> > > > AFAIK currently QEMU memory model will automatically split that large
> > > > request into two or more smaller requests, and fulfill them separately by
> > > > two/more IOMMU translations, with its memory access dispatched to the
> > > > specific memory regions.
> > 
> > Because of requests may be split, I need a method to take the original request
> > information to IOPMP.
> 
> I'm not sure whether translate() is the "original request" either.  The
> problem is QEMU can split the request for various reasons already, afaict.
> 
> For example, address_space_translate_internal() has this:
> 
>     if (memory_region_is_ram(mr)) {
>         diff = int128_sub(section->size, int128_make64(addr));
>         *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
>     }
> 
> Which can already shrink the request size from the caller before reaching
> translate().  So the length passed into translate() can already be
> modified.
> 
> Another thing is, we have two other common call sites for translate():
> 
>         memory_region_iommu_replay
>         address_space_translate_for_iotlb
> 
> I'm not sure whether you've looked into them and think they don't need to
> be trapped: at least memory_region_iommu_replay() looks all fine in this
> regard because it always translate in min page size granule.  But I think
> the restriction should apply to all translate()s.
> 
> translate_size() is weird on its own. If the only purpose is to pass the
> length into translate(), another option is to add that parameter into
> current translate(), allowing the implementation to ignore it.  I think
> that'll be better, but even if so, I'm not 100% sure it'll always do what
> you wanted as discussed above.

It seems that there are too many things that have not been considered in my 
current method. I am doing the revision that no new translation function but 
adding start address and end address to MemTxAttrs. 

Since attrs_to_index() only return one interger, IOPMP attrs_to_index() will 
copy the address range to its device state and then handle the translate(). 

Thanks,
Ethan Chen
diff mbox series

Patch

diff --git a/system/physmem.c b/system/physmem.c
index fc2b0fee01..53b6ab735c 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -432,8 +432,13 @@  static MemoryRegionSection address_space_translate_iommu(IOMMUMemoryRegion *iomm
             iommu_idx = imrc->attrs_to_index(iommu_mr, attrs);
         }
 
-        iotlb = imrc->translate(iommu_mr, addr, is_write ?
-                                IOMMU_WO : IOMMU_RO, iommu_idx);
+        if (imrc->translate_size) {
+            iotlb = imrc->translate_size(iommu_mr, addr, *plen_out, is_write ?
+                                         IOMMU_WO : IOMMU_RO, iommu_idx);
+        } else {
+            iotlb = imrc->translate(iommu_mr, addr, is_write ?
+                                    IOMMU_WO : IOMMU_RO, iommu_idx);
+        }
 
         if (!(iotlb.perm & (1 << is_write))) {
             goto unassigned;