diff mbox series

vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support

Message ID 20210204191228.187550-1-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support | expand

Commit Message

Peter Xu Feb. 4, 2021, 7:12 p.m. UTC
Previous work on dev-iotlb message broke vhost on either SMMU or virtio-iommu
since dev-iotlb (or PCIe ATS) is not yet supported for those archs.

An initial idea is that we can let IOMMU to export this information to vhost so
that vhost would know whether the vIOMMU would support dev-iotlb, then vhost
can conditionally register to dev-iotlb or the old iotlb way.  We can work
based on some previous patch to introduce PCIIOMMUOps as Yi Liu proposed [1].

However it's not as easy as I thought since vhost_iommu_region_add() does not
have a PCIDevice context at all since it's completely a backend.  It seems
non-trivial to pass over a PCI device to the backend during init.  E.g. when
the IOMMU notifier registered hdev->vdev is still NULL.

To make the fix smaller and easier, this patch goes the other way to leverage
the flag_changed() hook of vIOMMUs so that SMMU and virtio-iommu can trap the
dev-iotlb registration and fail it.  Then vhost could try the fallback solution
as using UNMAP invalidation for it's translations.

[1] https://lore.kernel.org/qemu-devel/1599735398-6829-4-git-send-email-yi.l.liu@intel.com/

Reported-by: Eric Auger <eric.auger@redhat.com>
Fixes: b68ba1ca57677acf870d5ab10579e6105c1f5338
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/arm/smmuv3.c          |  5 +++++
 hw/virtio/vhost.c        | 13 +++++++++++--
 hw/virtio/virtio-iommu.c |  5 +++++
 3 files changed, 21 insertions(+), 2 deletions(-)

Comments

Jason Wang Feb. 5, 2021, 3:16 a.m. UTC | #1
On 2021/2/5 上午3:12, Peter Xu wrote:
> Previous work on dev-iotlb message broke vhost on either SMMU


Have a quick git grep and it looks to me v3 support ATS and have command 
for device iotlb (ATC) invalidation.


> or virtio-iommu
> since dev-iotlb (or PCIe ATS)


We may need to add this in the future.


> is not yet supported for those archs.


Rethink about this, it looks to me the point is that we should make 
vhost work when ATS is disabled like what ATS spec defined:

"""

ATS is enabled through a new Capability and associated configuration 
structure.  To enable 15 ATS, software must detect this Capability and 
enable the Function to issue ATS TLP.  If a Function is not enabled, the 
Function is required not to issue ATS Translation Requests and is 
required to issue all DMA Read and Write Requests with the TLP AT field 
set to “untranslated.”

"""

Maybe we can add this in the commit log.


>
> An initial idea is that we can let IOMMU to export this information to vhost so
> that vhost would know whether the vIOMMU would support dev-iotlb, then vhost
> can conditionally register to dev-iotlb or the old iotlb way.  We can work
> based on some previous patch to introduce PCIIOMMUOps as Yi Liu proposed [1].
>
> However it's not as easy as I thought since vhost_iommu_region_add() does not
> have a PCIDevice context at all since it's completely a backend.  It seems
> non-trivial to pass over a PCI device to the backend during init.  E.g. when
> the IOMMU notifier registered hdev->vdev is still NULL.
>
> To make the fix smaller and easier, this patch goes the other way to leverage
> the flag_changed() hook of vIOMMUs so that SMMU and virtio-iommu can trap the
> dev-iotlb registration and fail it.  Then vhost could try the fallback solution
> as using UNMAP invalidation for it's translations.
>
> [1] https://lore.kernel.org/qemu-devel/1599735398-6829-4-git-send-email-yi.l.liu@intel.com/
>
> Reported-by: Eric Auger <eric.auger@redhat.com>
> Fixes: b68ba1ca57677acf870d5ab10579e6105c1f5338
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   hw/arm/smmuv3.c          |  5 +++++
>   hw/virtio/vhost.c        | 13 +++++++++++--
>   hw/virtio/virtio-iommu.c |  5 +++++
>   3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 98b99d4fe8e..bd1f97000d9 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1497,6 +1497,11 @@ static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
>       SMMUv3State *s3 = sdev->smmu;
>       SMMUState *s = &(s3->smmu_state);
>   
> +    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
> +        error_setg(errp, "SMMUv3 does not support dev-iotlb yet");
> +        return -EINVAL;
> +    }
> +
>       if (new & IOMMU_NOTIFIER_MAP) {
>           error_setg(errp,
>                      "device %02x.%02x.%x requires iommu MAP notifier which is "
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 28c7d781721..6e17d631f77 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -704,6 +704,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>       Int128 end;
>       int iommu_idx;
>       IOMMUMemoryRegion *iommu_mr;
> +    int ret;
>   
>       if (!memory_region_is_iommu(section->mr)) {
>           return;
> @@ -726,8 +727,16 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>       iommu->iommu_offset = section->offset_within_address_space -
>                             section->offset_within_region;
>       iommu->hdev = dev;
> -    memory_region_register_iommu_notifier(section->mr, &iommu->n,
> -                                          &error_fatal);
> +    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> +    if (ret) {
> +        /*
> +         * Some vIOMMUs do not support dev-iotlb yet.  If so, try to use the
> +         * UNMAP legacy message
> +         */
> +        iommu->n.notifier_flags = IOMMU_NOTIFIER_UNMAP;
> +        memory_region_register_iommu_notifier(section->mr, &iommu->n,
> +                                              &error_fatal);
> +    }
>       QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
>       /* TODO: can replay help performance here? */
>   }
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 6b9ef7f6b2b..c2883a2f6c8 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -893,6 +893,11 @@ static int virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
>                                               IOMMUNotifierFlag new,
>                                               Error **errp)
>   {
> +    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
> +        error_setg(errp, "Virtio-iommu does not support dev-iotlb yet");
> +        return -EINVAL;
> +    }
> +
>       if (old == IOMMU_NOTIFIER_NONE) {
>           trace_virtio_iommu_notify_flag_add(iommu_mr->parent_obj.name);
>       } else if (new == IOMMU_NOTIFIER_NONE) {


Patch looks good. I wonder whether we should fix intel when ATS is disabled.

Thanks
Eric Auger Feb. 5, 2021, 8:33 a.m. UTC | #2
Hi,

On 2/5/21 4:16 AM, Jason Wang wrote:
> 
> On 2021/2/5 上午3:12, Peter Xu wrote:
>> Previous work on dev-iotlb message broke vhost on either SMMU
> 
> 
> Have a quick git grep and it looks to me v3 support ATS and have command
> for device iotlb (ATC) invalidation.


Yes I will do that. Should not be a big deal.
> 
> 
>> or virtio-iommu
>> since dev-iotlb (or PCIe ATS)
> 
> 
> We may need to add this in the future.
added Jean-Philippe in CC
> 
> 
>> is not yet supported for those archs.
> 
> 
> Rethink about this, it looks to me the point is that we should make
> vhost work when ATS is disabled like what ATS spec defined:
> 
> """
> 
> ATS is enabled through a new Capability and associated configuration
> structure.  To enable 15 ATS, software must detect this Capability and
> enable the Function to issue ATS TLP.  If a Function is not enabled, the
> Function is required not to issue ATS Translation Requests and is
> required to issue all DMA Read and Write Requests with the TLP AT field
> set to “untranslated.”
> 
> """
> 
> Maybe we can add this in the commit log.
> 
> 
>>
>> An initial idea is that we can let IOMMU to export this information to
>> vhost so
>> that vhost would know whether the vIOMMU would support dev-iotlb, then
>> vhost
>> can conditionally register to dev-iotlb or the old iotlb way.  We can
>> work
>> based on some previous patch to introduce PCIIOMMUOps as Yi Liu
>> proposed [1].
>>
>> However it's not as easy as I thought since vhost_iommu_region_add()
>> does not
>> have a PCIDevice context at all since it's completely a backend.  It
>> seems
>> non-trivial to pass over a PCI device to the backend during init. 
>> E.g. when
>> the IOMMU notifier registered hdev->vdev is still NULL.
>>
>> To make the fix smaller and easier, this patch goes the other way to
>> leverage
>> the flag_changed() hook of vIOMMUs so that SMMU and virtio-iommu can
>> trap the
>> dev-iotlb registration and fail it.  Then vhost could try the fallback
>> solution
>> as using UNMAP invalidation for it's translations.
>>
>> [1]
>> https://lore.kernel.org/qemu-devel/1599735398-6829-4-git-send-email-yi.l.liu@intel.com/
>>
>>
>> Reported-by: Eric Auger <eric.auger@redhat.com>
>> Fixes: b68ba1ca57677acf870d5ab10579e6105c1f5338
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>> Tested-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>   hw/arm/smmuv3.c          |  5 +++++
>>   hw/virtio/vhost.c        | 13 +++++++++++--
>>   hw/virtio/virtio-iommu.c |  5 +++++
>>   3 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>> index 98b99d4fe8e..bd1f97000d9 100644
>> --- a/hw/arm/smmuv3.c
>> +++ b/hw/arm/smmuv3.c
>> @@ -1497,6 +1497,11 @@ static int
>> smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
>>       SMMUv3State *s3 = sdev->smmu;
>>       SMMUState *s = &(s3->smmu_state);
>>   +    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
>> +        error_setg(errp, "SMMUv3 does not support dev-iotlb yet");
>> +        return -EINVAL;
>> +    }
>> +
>>       if (new & IOMMU_NOTIFIER_MAP) {
>>           error_setg(errp,
>>                      "device %02x.%02x.%x requires iommu MAP notifier
>> which is "
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 28c7d781721..6e17d631f77 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -704,6 +704,7 @@ static void vhost_iommu_region_add(MemoryListener
>> *listener,
>>       Int128 end;
>>       int iommu_idx;
>>       IOMMUMemoryRegion *iommu_mr;
>> +    int ret;
>>         if (!memory_region_is_iommu(section->mr)) {
>>           return;
>> @@ -726,8 +727,16 @@ static void vhost_iommu_region_add(MemoryListener
>> *listener,
>>       iommu->iommu_offset = section->offset_within_address_space -
>>                             section->offset_within_region;
>>       iommu->hdev = dev;
>> -    memory_region_register_iommu_notifier(section->mr, &iommu->n,
>> -                                          &error_fatal);
>> +    ret = memory_region_register_iommu_notifier(section->mr,
>> &iommu->n, NULL);
>> +    if (ret) {
>> +        /*
>> +         * Some vIOMMUs do not support dev-iotlb yet.  If so, try to
>> use the
>> +         * UNMAP legacy message
>> +         */
>> +        iommu->n.notifier_flags = IOMMU_NOTIFIER_UNMAP;
>> +        memory_region_register_iommu_notifier(section->mr, &iommu->n,
>> +                                              &error_fatal);
>> +    }
>>       QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
>>       /* TODO: can replay help performance here? */
>>   }
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 6b9ef7f6b2b..c2883a2f6c8 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -893,6 +893,11 @@ static int
>> virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
>>                                               IOMMUNotifierFlag new,
>>                                               Error **errp)
>>   {
>> +    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
>> +        error_setg(errp, "Virtio-iommu does not support dev-iotlb yet");
>> +        return -EINVAL;
>> +    }
>> +
>>       if (old == IOMMU_NOTIFIER_NONE) {
>>           trace_virtio_iommu_notify_flag_add(iommu_mr->parent_obj.name);
>>       } else if (new == IOMMU_NOTIFIER_NONE) {
> 
> 
> Patch looks good. I wonder whether we should fix intel when ATS is
> disabled.
good point

Thanks

Eric
> 
> Thanks
>
Peter Xu Feb. 5, 2021, 3:31 p.m. UTC | #3
On Fri, Feb 05, 2021 at 09:33:29AM +0100, Auger Eric wrote:
> Hi,
> 
> On 2/5/21 4:16 AM, Jason Wang wrote:
> > 
> > On 2021/2/5 上午3:12, Peter Xu wrote:
> >> Previous work on dev-iotlb message broke vhost on either SMMU
> > 
> > 
> > Have a quick git grep and it looks to me v3 support ATS and have command
> > for device iotlb (ATC) invalidation.
> 
> 
> Yes I will do that. Should not be a big deal.

Great, thanks.

> > 
> > 
> >> or virtio-iommu
> >> since dev-iotlb (or PCIe ATS)
> > 
> > 
> > We may need to add this in the future.
> added Jean-Philippe in CC

So that's the part I'm unsure about..  Since everybody is cced so maybe good
time to ask. :)

The thing is I'm still not clear on whether dev-iotlb is useful for a full
emulation environment and how that should differ from a normal iotlb, since
after all normal iotlb will be attached with device information too.

For real hardwares, they make sense because they ask for two things: iotlb is
for IOMMU, but dev-iotlb is for the device cache.  For emulation environment
(virtio-iommu is the case) do we really need that complexity?

Note that even if there're assigned devices under virtio-iommu in the future,
we can still isolate that and iiuc we can easily convert an iotlb (from
virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of IOMMU is
underneath the vIOMMU.

> > 
> > 
> >> is not yet supported for those archs.
> > 
> > 
> > Rethink about this, it looks to me the point is that we should make
> > vhost work when ATS is disabled like what ATS spec defined:
> > 
> > """
> > 
> > ATS is enabled through a new Capability and associated configuration
> > structure.  To enable 15 ATS, software must detect this Capability and
> > enable the Function to issue ATS TLP.  If a Function is not enabled, the
> > Function is required not to issue ATS Translation Requests and is
> > required to issue all DMA Read and Write Requests with the TLP AT field
> > set to “untranslated.”
> > 
> > """
> > 
> > Maybe we can add this in the commit log.

I saw Michael was super fast on handling this patch and already got it in a
pull, so I may not directly post a new version.  But I'll add it if I'll post a
new version.

[...]

> > Patch looks good. I wonder whether we should fix intel when ATS is
> > disabled.
> good point

I'm not sure I remember it right, but we seem to have similar discussion
previously on "what if the user didn't specify ats=on" - I think at that time
the conclusion was that we ignore the failure since that's not a valid
configuration for qemu.

But I agree it would be nicer to be able to fallback.

The other issue I'm worried is (I think I mentioned it somewhere, but just to
double confirm): I'd like to make sure SMMU and virtio-iommu are the only IOMMU
platform that will use vhost.  Otherwise IIUC we need to fix those vIOMMUs too.

Thanks,
Tian, Kevin Feb. 7, 2021, 9:04 a.m. UTC | #4
> From: Peter Xu
> Sent: Friday, February 5, 2021 11:31 PM
> 
> > >
> > >
> > >> or virtio-iommu
> > >> since dev-iotlb (or PCIe ATS)
> > >
> > >
> > > We may need to add this in the future.
> > added Jean-Philippe in CC
> 
> So that's the part I'm unsure about..  Since everybody is cced so maybe good
> time to ask. :)
> 
> The thing is I'm still not clear on whether dev-iotlb is useful for a full
> emulation environment and how that should differ from a normal iotlb, since
> after all normal iotlb will be attached with device information too.

dev-iotlb is useful in two manners. First, it's a functional prerequisite for
supporting I/O page faults. Second, it has performance benefit as you don't
need to contend the lock of global iotlb.

> 
> For real hardwares, they make sense because they ask for two things: iotlb is
> for IOMMU, but dev-iotlb is for the device cache.  For emulation
> environment
> (virtio-iommu is the case) do we really need that complexity?
> 
> Note that even if there're assigned devices under virtio-iommu in the future,
> we can still isolate that and iiuc we can easily convert an iotlb (from
> virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of
> IOMMU is
> underneath the vIOMMU.
> 

Didn't get this point. Hardware dev-iotlb is updated by hardware (between
the device and the IOMMU). How could software convert a virtual iotlb
entry into hardware dev-iotlb?

Thanks
Kevin
Peter Xu Feb. 7, 2021, 2:47 p.m. UTC | #5
Hi, Kevin,

On Sun, Feb 07, 2021 at 09:04:55AM +0000, Tian, Kevin wrote:
> > From: Peter Xu
> > Sent: Friday, February 5, 2021 11:31 PM
> > 
> > > >
> > > >
> > > >> or virtio-iommu
> > > >> since dev-iotlb (or PCIe ATS)
> > > >
> > > >
> > > > We may need to add this in the future.
> > > added Jean-Philippe in CC
> > 
> > So that's the part I'm unsure about..  Since everybody is cced so maybe good
> > time to ask. :)
> > 
> > The thing is I'm still not clear on whether dev-iotlb is useful for a full
> > emulation environment and how that should differ from a normal iotlb, since
> > after all normal iotlb will be attached with device information too.
> 
> dev-iotlb is useful in two manners. First, it's a functional prerequisite for
> supporting I/O page faults.

Is this also a hard requirement for virtio-iommu, which is not a real hardware
after all?

> Second, it has performance benefit as you don't
> need to contend the lock of global iotlb.

Hmm.. are you talking about e.g. vt-d driver or virtio-iommu?

Assuming it's about vt-d, qi_flush_dev_iotlb() will still call qi_submit_sync()
and taking the same global QI lock, as I see it, or I could be wrong somewhere.
I don't see where dev-iotlb has a standalone channel for delivery.

For virtio-iommu, we haven't defined dev-iotlb, right?  Sorry I missed things
when I completely didn't follow virtio-iommu recently - let's say if
virtio-iommu in the future can support per-dev dev-iotlb queue so it doesn't
need a global lock, what if we make it still per-device but still delivering
iotlb message?  Again, it's still a bit unclear to me why a full emulation
iommu would need that definition of "iotlb" and "dev-iotlb".

> 
> > 
> > For real hardwares, they make sense because they ask for two things: iotlb is
> > for IOMMU, but dev-iotlb is for the device cache.  For emulation
> > environment
> > (virtio-iommu is the case) do we really need that complexity?
> > 
> > Note that even if there're assigned devices under virtio-iommu in the future,
> > we can still isolate that and iiuc we can easily convert an iotlb (from
> > virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of
> > IOMMU is
> > underneath the vIOMMU.
> > 
> 
> Didn't get this point. Hardware dev-iotlb is updated by hardware (between
> the device and the IOMMU). How could software convert a virtual iotlb
> entry into hardware dev-iotlb?

I mean if virtio-iommu must be run in a guest, then we can trap that message
first, right?  If there're assigned device in the guest, we must convert that
invalidation to whatever message required for the host, that seems to not
require the virtio-iommu to have dev-iotlb knowledge, still?

Thanks,
Jason Wang Feb. 8, 2021, 3:21 a.m. UTC | #6
On 2021/2/5 下午11:31, Peter Xu wrote:
> On Fri, Feb 05, 2021 at 09:33:29AM +0100, Auger Eric wrote:
>> Hi,
>>
>> On 2/5/21 4:16 AM, Jason Wang wrote:
>>> On 2021/2/5 上午3:12, Peter Xu wrote:
>>>> Previous work on dev-iotlb message broke vhost on either SMMU
>>>
>>> Have a quick git grep and it looks to me v3 support ATS and have command
>>> for device iotlb (ATC) invalidation.
>>
>> Yes I will do that. Should not be a big deal.
> Great, thanks.
>
>>>
>>>> or virtio-iommu
>>>> since dev-iotlb (or PCIe ATS)
>>>
>>> We may need to add this in the future.
>> added Jean-Philippe in CC
> So that's the part I'm unsure about..  Since everybody is cced so maybe good
> time to ask. :)
>
> The thing is I'm still not clear on whether dev-iotlb is useful for a full
> emulation environment and how that should differ from a normal iotlb, since
> after all normal iotlb will be attached with device information too.


I think vhost is a good example with device IOTLB? It solves the issue 
exactly the bottleneck of a centralized IOTLB when everything is cached 
in the vhost.


>
> For real hardwares, they make sense because they ask for two things: iotlb is
> for IOMMU, but dev-iotlb is for the device cache.  For emulation environment
> (virtio-iommu is the case) do we really need that complexity?


I think the answer is yes it virtio-iommu is the only choice for some 
platform/archs.


>
> Note that even if there're assigned devices under virtio-iommu in the future,
> we can still isolate that and iiuc we can easily convert an iotlb (from
> virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of IOMMU is
> underneath the vIOMMU.


Looks like another topic (e.g if we need to expose ATS to guest for 
assigned device)?


>
>>>
>>>> is not yet supported for those archs.
>>>
>>> Rethink about this, it looks to me the point is that we should make
>>> vhost work when ATS is disabled like what ATS spec defined:
>>>
>>> """
>>>
>>> ATS is enabled through a new Capability and associated configuration
>>> structure.  To enable 15 ATS, software must detect this Capability and
>>> enable the Function to issue ATS TLP.  If a Function is not enabled, the
>>> Function is required not to issue ATS Translation Requests and is
>>> required to issue all DMA Read and Write Requests with the TLP AT field
>>> set to “untranslated.”
>>>
>>> """
>>>
>>> Maybe we can add this in the commit log.
> I saw Michael was super fast on handling this patch and already got it in a
> pull, so I may not directly post a new version.  But I'll add it if I'll post a
> new version.
>
> [...]


Right.


>
>>> Patch looks good. I wonder whether we should fix intel when ATS is
>>> disabled.
>> good point
> I'm not sure I remember it right, but we seem to have similar discussion
> previously on "what if the user didn't specify ats=on" - I think at that time
> the conclusion was that we ignore the failure since that's not a valid
> configuration for qemu.


Yes, but I think I was wrong at that time.


>
> But I agree it would be nicer to be able to fallback.


So see my reply quoted from ATS spec. My understanding is that the 
device should behave correctly if ATS is disabled.


>
> The other issue I'm worried is (I think I mentioned it somewhere, but just to
> double confirm): I'd like to make sure SMMU and virtio-iommu are the only IOMMU
> platform that will use vhost.


For upstream, it won't be easy :)


>    Otherwise IIUC we need to fix those vIOMMUs too.


Right, last time I check AMD IOMMU emulation, it simply trigger device 
IOTLB invalidation during IOTLB invalidation which looks wrong.

Thanks


>
> Thanks,
>
Tian, Kevin Feb. 8, 2021, 7:03 a.m. UTC | #7
> From: Peter Xu <peterx@redhat.com>
> Sent: Sunday, February 7, 2021 10:47 PM
> 
> Hi, Kevin,
> 
> On Sun, Feb 07, 2021 at 09:04:55AM +0000, Tian, Kevin wrote:
> > > From: Peter Xu
> > > Sent: Friday, February 5, 2021 11:31 PM
> > >
> > > > >
> > > > >
> > > > >> or virtio-iommu
> > > > >> since dev-iotlb (or PCIe ATS)
> > > > >
> > > > >
> > > > > We may need to add this in the future.
> > > > added Jean-Philippe in CC
> > >
> > > So that's the part I'm unsure about..  Since everybody is cced so maybe
> good
> > > time to ask. :)
> > >
> > > The thing is I'm still not clear on whether dev-iotlb is useful for a full
> > > emulation environment and how that should differ from a normal iotlb,
> since
> > > after all normal iotlb will be attached with device information too.
> >
> > dev-iotlb is useful in two manners. First, it's a functional prerequisite for
> > supporting I/O page faults.
> 
> Is this also a hard requirement for virtio-iommu, which is not a real hardware
> after all?

Not exactly but why do we want to use a non-standard way in the virtual
platform when PCI ATS is already in place?

> 
> > Second, it has performance benefit as you don't
> > need to contend the lock of global iotlb.
> 
> Hmm.. are you talking about e.g. vt-d driver or virtio-iommu?

It is a general iommu concept.

> 
> Assuming it's about vt-d, qi_flush_dev_iotlb() will still call qi_submit_sync()
> and taking the same global QI lock, as I see it, or I could be wrong
> somewhere.
> I don't see where dev-iotlb has a standalone channel for delivery.

What I referred to is about lookup, instead of invalidation. 

> 
> For virtio-iommu, we haven't defined dev-iotlb, right?  Sorry I missed things
> when I completely didn't follow virtio-iommu recently - let's say if
> virtio-iommu in the future can support per-dev dev-iotlb queue so it doesn't
> need a global lock, what if we make it still per-device but still delivering
> iotlb message?  Again, it's still a bit unclear to me why a full emulation
> iommu would need that definition of "iotlb" and "dev-iotlb".

well, my view of definition of "iotlb" vs. "dev-iotlb" is from guest software
p.o.v. From this angle they have distinct meaning and semantics which
must be properly emulated according to the spec, regardless of whether 
they are maintained in separate or same data structure in the virtual platform.

> 
> >
> > >
> > > For real hardwares, they make sense because they ask for two things:
> iotlb is
> > > for IOMMU, but dev-iotlb is for the device cache.  For emulation
> > > environment
> > > (virtio-iommu is the case) do we really need that complexity?
> > >
> > > Note that even if there're assigned devices under virtio-iommu in the
> future,
> > > we can still isolate that and iiuc we can easily convert an iotlb (from
> > > virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of
> > > IOMMU is
> > > underneath the vIOMMU.
> > >
> >
> > Didn't get this point. Hardware dev-iotlb is updated by hardware (between
> > the device and the IOMMU). How could software convert a virtual iotlb
> > entry into hardware dev-iotlb?
> 
> I mean if virtio-iommu must be run in a guest, then we can trap that message
> first, right?  If there're assigned device in the guest, we must convert that
> invalidation to whatever message required for the host, that seems to not
> require the virtio-iommu to have dev-iotlb knowledge, still?
> 

It really depends on the definition of dev-iotlb in this context. To me the
fact that virtio-iommu needs to notify the kernel for updating split cache
is already sort of dev-iotlb semantics, regardless of whether it's delivered 
through a iotlb message or dev-iotlb message in a specific implementation. 
Peter Xu Feb. 8, 2021, 6:26 p.m. UTC | #8
Kevin,

On Mon, Feb 08, 2021 at 07:03:08AM +0000, Tian, Kevin wrote:
> It really depends on the definition of dev-iotlb in this context. To me the
> fact that virtio-iommu needs to notify the kernel for updating split cache
> is already sort of dev-iotlb semantics, regardless of whether it's delivered 
> through a iotlb message or dev-iotlb message in a specific implementation. 
Peter Xu Feb. 8, 2021, 6:37 p.m. UTC | #9
On Mon, Feb 08, 2021 at 11:21:23AM +0800, Jason Wang wrote:

[...]

> > I'm not sure I remember it right, but we seem to have similar discussion
> > previously on "what if the user didn't specify ats=on" - I think at that time
> > the conclusion was that we ignore the failure since that's not a valid
> > configuration for qemu.
> 
> 
> Yes, but I think I was wrong at that time.

I can't say you're wrong - I actually still agree with you that at least
there's a priority of things we'd do, and this one is not extremely important
if that's not a major use case (say, if you will 100% always suggest an user to
use ats=on for a viommu enabled vhost).

> > 
> > The other issue I'm worried is (I think I mentioned it somewhere, but just to
> > double confirm): I'd like to make sure SMMU and virtio-iommu are the only IOMMU
> > platform that will use vhost.
> 
> 
> For upstream, it won't be easy :)

Sorry I definitely didn't make myself clear... :)

To be explicit, does ppc use vhost kernel too?  Since I know at least ppc has
its own translation unit and its iommu notifier in qemu, so I'm unsure whether
the same patch would break ppc too, because vhost could also ignore all UNMAP
sent by the ppc vIOMMU.

> 
> 
> >    Otherwise IIUC we need to fix those vIOMMUs too.
> 
> 
> Right, last time I check AMD IOMMU emulation, it simply trigger device IOTLB
> invalidation during IOTLB invalidation which looks wrong.

I did quickly grep IOMMU_NOTIFIER_UNMAP in amd_iommu.c and saw nothing. It
seems amd iommu is not ready for any kind of IOMMU notifiers yet.

Thanks,
Eric Auger Feb. 8, 2021, 8:23 p.m. UTC | #10
Hi,

On 2/7/21 3:47 PM, Peter Xu wrote:
> Hi, Kevin,
> 
> On Sun, Feb 07, 2021 at 09:04:55AM +0000, Tian, Kevin wrote:
>>> From: Peter Xu
>>> Sent: Friday, February 5, 2021 11:31 PM
>>>
>>>>>
>>>>>
>>>>>> or virtio-iommu
>>>>>> since dev-iotlb (or PCIe ATS)
>>>>>
>>>>>
>>>>> We may need to add this in the future.
>>>> added Jean-Philippe in CC
>>>
>>> So that's the part I'm unsure about..  Since everybody is cced so maybe good
>>> time to ask. :)
>>>
>>> The thing is I'm still not clear on whether dev-iotlb is useful for a full
>>> emulation environment and how that should differ from a normal iotlb, since
>>> after all normal iotlb will be attached with device information too.
>>
>> dev-iotlb is useful in two manners.First, it's a functional prerequisite for
>> supporting I/O page faults.
If I understand correctly, the stall model of the ARM SMMU allows IOPF I
guess without dev-iotlb (ATS). However indeed PRI requires ATS.
> 
> Is this also a hard requirement for virtio-iommu, which is not a real hardware
> after all?
> 
>> Second, it has performance benefit as you don't
>> need to contend the lock of global iotlb.
> 
> Hmm.. are you talking about e.g. vt-d driver or virtio-iommu?
> 
> Assuming it's about vt-d, qi_flush_dev_iotlb() will still call qi_submit_sync()
> and taking the same global QI lock, as I see it, or I could be wrong somewhere.
> I don't see where dev-iotlb has a standalone channel for delivery.
> 
> For virtio-iommu, we haven't defined dev-iotlb, right?
no there is no such feature at the moment. If my understanding is
correct this would only make sense when protecting a HW device. In that
case the underlying physical IOMMU would be programmed for ATS.

When protecting a virtio device (inc. vhost) what would be the adventage
over the current standard unmap notifier?

Thanks

Eric
  Sorry I missed things
> when I completely didn't follow virtio-iommu recently - let's say if
> virtio-iommu in the future can support per-dev dev-iotlb queue so it doesn't
> need a global lock, what if we make it still per-device but still delivering
> iotlb message?  Again, it's still a bit unclear to me why a full emulation
> iommu would need that definition of "iotlb" and "dev-iotlb".
> 
>>
>>>
>>> For real hardwares, they make sense because they ask for two things: iotlb is
>>> for IOMMU, but dev-iotlb is for the device cache.  For emulation
>>> environment
>>> (virtio-iommu is the case) do we really need that complexity?
>>>
>>> Note that even if there're assigned devices under virtio-iommu in the future,
>>> we can still isolate that and iiuc we can easily convert an iotlb (from
>>> virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of
>>> IOMMU is
>>> underneath the vIOMMU.
>>>
>>
>> Didn't get this point. Hardware dev-iotlb is updated by hardware (between
>> the device and the IOMMU). How could software convert a virtual iotlb
>> entry into hardware dev-iotlb?
> 
> I mean if virtio-iommu must be run in a guest, then we can trap that message
> first, right?  If there're assigned device in the guest, we must convert that
> invalidation to whatever message required for the host, that seems to not
> require the virtio-iommu to have dev-iotlb knowledge, still?
> 
> Thanks,
>
Eric Auger Feb. 8, 2021, 8:30 p.m. UTC | #11
Hi,

[Adding David and Greg in CC]


On 2/8/21 7:37 PM, Peter Xu wrote:
> On Mon, Feb 08, 2021 at 11:21:23AM +0800, Jason Wang wrote:
> 
> [...]
> 
>>> I'm not sure I remember it right, but we seem to have similar discussion
>>> previously on "what if the user didn't specify ats=on" - I think at that time
>>> the conclusion was that we ignore the failure since that's not a valid
>>> configuration for qemu.
>>
>>
>> Yes, but I think I was wrong at that time.
> 
> I can't say you're wrong - I actually still agree with you that at least
> there's a priority of things we'd do, and this one is not extremely important
> if that's not a major use case (say, if you will 100% always suggest an user to
> use ats=on for a viommu enabled vhost).
> 
>>>
>>> The other issue I'm worried is (I think I mentioned it somewhere, but just to
>>> double confirm): I'd like to make sure SMMU and virtio-iommu are the only IOMMU
>>> platform that will use vhost.
>>
>>
>> For upstream, it won't be easy :)
> 
> Sorry I definitely didn't make myself clear... :)
> 
> To be explicit, does ppc use vhost kernel too?  Since I know at least ppc has
> its own translation unit and its iommu notifier in qemu, so I'm unsure whether
> the same patch would break ppc too, because vhost could also ignore all UNMAP
> sent by the ppc vIOMMU.
> 
>>
>>
>>>    Otherwise IIUC we need to fix those vIOMMUs too.
>>
>>
>> Right, last time I check AMD IOMMU emulation, it simply trigger device IOTLB
>> invalidation during IOTLB invalidation which looks wrong.
> 
> I did quickly grep IOMMU_NOTIFIER_UNMAP in amd_iommu.c and saw nothing. It
> seems amd iommu is not ready for any kind of IOMMU notifiers yet.

for context, we experienced a regression with vsmmuv3/vhost and
virtio-iommu/vhost integration. We wondered whether the ppc viommu is
able to protect vhost devices and if this relies on legacy
IOMMU_NOTIFIER_UNMAP notifiers. ie. vhost does not register this
notifier anymore but instead register dev-iotlb unmap notifier.

Thanks

Eric
> 
> Thanks,
>
Jason Wang Feb. 9, 2021, 3:12 a.m. UTC | #12
On 2021/2/9 上午2:37, Peter Xu wrote:
> On Mon, Feb 08, 2021 at 11:21:23AM +0800, Jason Wang wrote:
>
> [...]
>
>>> I'm not sure I remember it right, but we seem to have similar discussion
>>> previously on "what if the user didn't specify ats=on" - I think at that time
>>> the conclusion was that we ignore the failure since that's not a valid
>>> configuration for qemu.
>>
>> Yes, but I think I was wrong at that time.
> I can't say you're wrong - I actually still agree with you that at least
> there's a priority of things we'd do, and this one is not extremely important
> if that's not a major use case (say, if you will 100% always suggest an user to
> use ats=on for a viommu enabled vhost).


Right, but it depends on e.g how libvirt use that. As far as I know, 
they do enable ATS. But it would still an issue if libvirt want to 
support vIOMMUs other than intel.


>
>>> The other issue I'm worried is (I think I mentioned it somewhere, but just to
>>> double confirm): I'd like to make sure SMMU and virtio-iommu are the only IOMMU
>>> platform that will use vhost.
>>
>> For upstream, it won't be easy :)
> Sorry I definitely didn't make myself clear... :)
>
> To be explicit, does ppc use vhost kernel too?


I think the answer is yes.


>   Since I know at least ppc has
> its own translation unit and its iommu notifier in qemu, so I'm unsure whether
> the same patch would break ppc too, because vhost could also ignore all UNMAP
> sent by the ppc vIOMMU.


If this is true, we probably need to fix that.


>
>>
>>>     Otherwise IIUC we need to fix those vIOMMUs too.
>>
>> Right, last time I check AMD IOMMU emulation, it simply trigger device IOTLB
>> invalidation during IOTLB invalidation which looks wrong.
> I did quickly grep IOMMU_NOTIFIER_UNMAP in amd_iommu.c and saw nothing. It
> seems amd iommu is not ready for any kind of IOMMU notifiers yet.
>
> Thanks,


Right.

Thanks


>
Eric Auger Feb. 9, 2021, 5:15 p.m. UTC | #13
Hi,

On 2/9/21 4:12 AM, Jason Wang wrote:
> 
> On 2021/2/9 上午2:37, Peter Xu wrote:
>> On Mon, Feb 08, 2021 at 11:21:23AM +0800, Jason Wang wrote:
>>
>> [...]
>>
>>>> I'm not sure I remember it right, but we seem to have similar
>>>> discussion
>>>> previously on "what if the user didn't specify ats=on" - I think at
>>>> that time
>>>> the conclusion was that we ignore the failure since that's not a valid
>>>> configuration for qemu.
>>>
>>> Yes, but I think I was wrong at that time.
>> I can't say you're wrong - I actually still agree with you that at least
>> there's a priority of things we'd do, and this one is not extremely
>> important
>> if that's not a major use case (say, if you will 100% always suggest
>> an user to
>> use ats=on for a viommu enabled vhost).
> 
> 
> Right, but it depends on e.g how libvirt use that. As far as I know,
> they do enable ATS. But it would still an issue if libvirt want to
> support vIOMMUs other than intel.
> 
> 
>>
>>>> The other issue I'm worried is (I think I mentioned it somewhere,
>>>> but just to
>>>> double confirm): I'd like to make sure SMMU and virtio-iommu are the
>>>> only IOMMU
>>>> platform that will use vhost.
>>>
>>> For upstream, it won't be easy :)
>> Sorry I definitely didn't make myself clear... :)
>>
>> To be explicit, does ppc use vhost kernel too?
> 
> 
> I think the answer is yes.
> 
> 
>>   Since I know at least ppc has
>> its own translation unit and its iommu notifier in qemu, so I'm unsure
>> whether
>> the same patch would break ppc too, because vhost could also ignore
>> all UNMAP
>> sent by the ppc vIOMMU.
> 
> 
> If this is true, we probably need to fix that.
> 
> 
>>
>>>
>>>>     Otherwise IIUC we need to fix those vIOMMUs too.
>>>
>>> Right, last time I check AMD IOMMU emulation, it simply trigger
>>> device IOTLB
>>> invalidation during IOTLB invalidation which looks wrong.
>> I did quickly grep IOMMU_NOTIFIER_UNMAP in amd_iommu.c and saw
>> nothing. It
>> seems amd iommu is not ready for any kind of IOMMU notifiers yet.
>>
>> Thanks,
> 
> 
> Right.
> 
> Thanks
> 
> 
>>
> 
> 
I just noted that the vhost fix now breaks virtio-iommu/vfio integration
because VFIO registers IOMMU_NOTIFIER_ALL which includes the DEV-IOTLB
that is now rejected by virtio-iommu virtio_iommu_notify_flag_changed().
Is it safe to replace IOMMU_NOTIFIER_ALL by IOMMU_NOTIFIER_IOTLB_EVENTS
in vfio_listener_region_add (hw/vfio/common.c) or shall we also do the
2-step registration? After your confirmation, I can send the patch.

Thanks

Eric
Peter Xu Feb. 9, 2021, 7:46 p.m. UTC | #14
On Tue, Feb 09, 2021 at 06:15:11PM +0100, Auger Eric wrote:
> I just noted that the vhost fix now breaks virtio-iommu/vfio integration
> because VFIO registers IOMMU_NOTIFIER_ALL which includes the DEV-IOTLB
> that is now rejected by virtio-iommu virtio_iommu_notify_flag_changed().
> Is it safe to replace IOMMU_NOTIFIER_ALL by IOMMU_NOTIFIER_IOTLB_EVENTS
> in vfio_listener_region_add (hw/vfio/common.c) or shall we also do the
> 2-step registration? After your confirmation, I can send the patch.

Thanks for noticing this, Eric.  Indeed there're a bunch of things that we'd
overlooked.

I think IOMMU_NOTIFIER_IOTLB_EVENTS should suffice with vfio.

If you post that patch, would you mind post a similar fix for PPC too which
will need the two-step thing?  Assuming they can be in the same series to fix
the breakage of that same patch.

CC Alex and David Gibson.

Thanks,
Jason Wang Feb. 10, 2021, 4:05 a.m. UTC | #15
On 2021/2/9 上午2:26, Peter Xu wrote:
> Kevin,
>
> On Mon, Feb 08, 2021 at 07:03:08AM +0000, Tian, Kevin wrote:
>> It really depends on the definition of dev-iotlb in this context. To me the
>> fact that virtio-iommu needs to notify the kernel for updating split cache
>> is already sort of dev-iotlb semantics, regardless of whether it's delivered
>> through a iotlb message or dev-iotlb message in a specific implementation. 
diff mbox series

Patch

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 98b99d4fe8e..bd1f97000d9 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1497,6 +1497,11 @@  static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
     SMMUv3State *s3 = sdev->smmu;
     SMMUState *s = &(s3->smmu_state);
 
+    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
+        error_setg(errp, "SMMUv3 does not support dev-iotlb yet");
+        return -EINVAL;
+    }
+
     if (new & IOMMU_NOTIFIER_MAP) {
         error_setg(errp,
                    "device %02x.%02x.%x requires iommu MAP notifier which is "
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 28c7d781721..6e17d631f77 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -704,6 +704,7 @@  static void vhost_iommu_region_add(MemoryListener *listener,
     Int128 end;
     int iommu_idx;
     IOMMUMemoryRegion *iommu_mr;
+    int ret;
 
     if (!memory_region_is_iommu(section->mr)) {
         return;
@@ -726,8 +727,16 @@  static void vhost_iommu_region_add(MemoryListener *listener,
     iommu->iommu_offset = section->offset_within_address_space -
                           section->offset_within_region;
     iommu->hdev = dev;
-    memory_region_register_iommu_notifier(section->mr, &iommu->n,
-                                          &error_fatal);
+    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
+    if (ret) {
+        /*
+         * Some vIOMMUs do not support dev-iotlb yet.  If so, try to use the
+         * UNMAP legacy message
+         */
+        iommu->n.notifier_flags = IOMMU_NOTIFIER_UNMAP;
+        memory_region_register_iommu_notifier(section->mr, &iommu->n,
+                                              &error_fatal);
+    }
     QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
     /* TODO: can replay help performance here? */
 }
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 6b9ef7f6b2b..c2883a2f6c8 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -893,6 +893,11 @@  static int virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
                                             IOMMUNotifierFlag new,
                                             Error **errp)
 {
+    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
+        error_setg(errp, "Virtio-iommu does not support dev-iotlb yet");
+        return -EINVAL;
+    }
+
     if (old == IOMMU_NOTIFIER_NONE) {
         trace_virtio_iommu_notify_flag_add(iommu_mr->parent_obj.name);
     } else if (new == IOMMU_NOTIFIER_NONE) {