diff mbox series

[v2,1/1] virtio: fix the condition for iommu_platform not supported

Message ID 20220117120238.2519239-1-pasic@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/1] virtio: fix the condition for iommu_platform not supported | expand

Commit Message

Halil Pasic Jan. 17, 2022, 12:02 p.m. UTC
The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported") claims to fail the device hotplug when iommu_platform
is requested, but not supported by the (vhost) device. On the first
glance the condition for detecting that situation looks perfect, but
because a certain peculiarity of virtio_platform it ain't.

In fact the aforementioned commit introduces a regression. It breaks
virtio-fs support for Secure Execution, and most likely also for AMD SEV
or any other confidential guest scenario that relies encrypted guest
memory.  The same also applies to any other vhost device that does not
support _F_ACCESS_PLATFORM.

The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
"device can not access all of the guest RAM" and "iova != gpa, thus
device needs to translate iova".

Confidential guest technologies currently rely on the device/hypervisor
offering _F_ACCESS_PLATFORM, so that, after the feature has been
negotiated, the guest  grants access to the portions of memory the
device needs to see. So in for confidential guests, generally,
_F_ACCESS_PLATFORM is about the restricted access to memory, but not
about the addresses used being something else than guest physical
addresses.

This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
form the vhost device that does not need it, because on the vhost
interface it only means "I/O address translation is needed".

This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
by the device, and thus no device capability is needed. In this
situation claiming that the device does not support iommu_plattform=on
is counter-productive. So let us stop doing that!

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported")
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-stable@nongnu.org

---

v1->v2: 
* Commit message tweaks. Most notably fixed commit SHA (Michael)

---
 hw/virtio/virtio-bus.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)


base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32

Comments

Halil Pasic Jan. 25, 2022, 10:21 a.m. UTC | #1
ping

On Mon, 17 Jan 2022 13:02:38 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> unsupported") claims to fail the device hotplug when iommu_platform
> is requested, but not supported by the (vhost) device. On the first
> glance the condition for detecting that situation looks perfect, but
> because a certain peculiarity of virtio_platform it ain't.
> 
> In fact the aforementioned commit introduces a regression. It breaks
> virtio-fs support for Secure Execution, and most likely also for AMD SEV
> or any other confidential guest scenario that relies encrypted guest
> memory.  The same also applies to any other vhost device that does not
> support _F_ACCESS_PLATFORM.
> 
> The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
> "device can not access all of the guest RAM" and "iova != gpa, thus
> device needs to translate iova".
> 
> Confidential guest technologies currently rely on the device/hypervisor
> offering _F_ACCESS_PLATFORM, so that, after the feature has been
> negotiated, the guest  grants access to the portions of memory the
> device needs to see. So in for confidential guests, generally,
> _F_ACCESS_PLATFORM is about the restricted access to memory, but not
> about the addresses used being something else than guest physical
> addresses.
> 
> This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
> turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
> form the vhost device that does not need it, because on the vhost
> interface it only means "I/O address translation is needed".
> 
> This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
> VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
> situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
> by the device, and thus no device capability is needed. In this
> situation claiming that the device does not support iommu_plattform=on
> is counter-productive. So let us stop doing that!
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
> Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> unsupported")
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: qemu-stable@nongnu.org
> 
> ---
> 
> v1->v2: 
> * Commit message tweaks. Most notably fixed commit SHA (Michael)
> 
> ---
>  hw/virtio/virtio-bus.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index d23db98c56..c1578f3de2 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>          return;
>      }
>  
> -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> -        error_setg(errp, "iommu_platform=true is not supported by the device");
> -        return;
> -    }
> -
>      if (klass->device_plugged != NULL) {
>          klass->device_plugged(qbus->parent, &local_err);
>      }
> @@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>      } else {
>          vdev->dma_as = &address_space_memory;
>      }
> +
> +    if (has_iommu && vdev->dma_as != &address_space_memory
> +                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> +        error_setg(errp, "iommu_platform=true is not supported by the device");
> +        return;
> +    }
>  }
>  
>  /* Reset the virtio_bus */
> 
> base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32
Halil Pasic Jan. 27, 2022, 1:28 p.m. UTC | #2
ping^2

Also adding Brijesh and Daniel, as I believe you guys should be
interested in this, and I'm yet to receive review.

@Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
too, and that the fix works for your platforms as well?

Regards,
Halil

On Tue, 25 Jan 2022 11:21:12 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> ping
> 
> On Mon, 17 Jan 2022 13:02:38 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> > unsupported") claims to fail the device hotplug when iommu_platform
> > is requested, but not supported by the (vhost) device. On the first
> > glance the condition for detecting that situation looks perfect, but
> > because a certain peculiarity of virtio_platform it ain't.
> > 
> > In fact the aforementioned commit introduces a regression. It breaks
> > virtio-fs support for Secure Execution, and most likely also for AMD SEV
> > or any other confidential guest scenario that relies encrypted guest
> > memory.  The same also applies to any other vhost device that does not
> > support _F_ACCESS_PLATFORM.
> > 
> > The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
> > "device can not access all of the guest RAM" and "iova != gpa, thus
> > device needs to translate iova".
> > 
> > Confidential guest technologies currently rely on the device/hypervisor
> > offering _F_ACCESS_PLATFORM, so that, after the feature has been
> > negotiated, the guest  grants access to the portions of memory the
> > device needs to see. So in for confidential guests, generally,
> > _F_ACCESS_PLATFORM is about the restricted access to memory, but not
> > about the addresses used being something else than guest physical
> > addresses.
> > 
> > This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
> > turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
> > form the vhost device that does not need it, because on the vhost
> > interface it only means "I/O address translation is needed".
> > 
> > This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
> > VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
> > situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
> > by the device, and thus no device capability is needed. In this
> > situation claiming that the device does not support iommu_plattform=on
> > is counter-productive. So let us stop doing that!
> > 
> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
> > Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> > unsupported")
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: qemu-stable@nongnu.org
> > 
> > ---
> > 
> > v1->v2: 
> > * Commit message tweaks. Most notably fixed commit SHA (Michael)
> > 
> > ---
> >  hw/virtio/virtio-bus.c | 11 ++++++-----
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > index d23db98c56..c1578f3de2 100644
> > --- a/hw/virtio/virtio-bus.c
> > +++ b/hw/virtio/virtio-bus.c
> > @@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> >          return;
> >      }
> >  
> > -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > -        error_setg(errp, "iommu_platform=true is not supported by the device");
> > -        return;
> > -    }
> > -
> >      if (klass->device_plugged != NULL) {
> >          klass->device_plugged(qbus->parent, &local_err);
> >      }
> > @@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> >      } else {
> >          vdev->dma_as = &address_space_memory;
> >      }
> > +
> > +    if (has_iommu && vdev->dma_as != &address_space_memory
> > +                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > +        error_setg(errp, "iommu_platform=true is not supported by the device");
> > +        return;
> > +    }
> >  }
> >  
> >  /* Reset the virtio_bus */
> > 
> > base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32  
>
Brijesh Singh Jan. 27, 2022, 7:17 p.m. UTC | #3
On 1/27/22 7:28 AM, Halil Pasic wrote:
> ping^2
> 
> Also adding Brijesh and Daniel, as I believe you guys should be
> interested in this, and I'm yet to receive review.
> 
> @Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
> too, and that the fix works for your platforms as well?
> 

Thanks for looping me in, I can confirm that SEV virtio-fs device 
support was *broken* on the latest qemu, and your patch fixes it.


Tested-by: Brijesh Singh <brijesh.singh@amd.com>

> Regards,
> Halil
> 
> On Tue, 25 Jan 2022 11:21:12 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> ping
>>
>> On Mon, 17 Jan 2022 13:02:38 +0100
>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>>> The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
>>> unsupported") claims to fail the device hotplug when iommu_platform
>>> is requested, but not supported by the (vhost) device. On the first
>>> glance the condition for detecting that situation looks perfect, but
>>> because a certain peculiarity of virtio_platform it ain't.
>>>
>>> In fact the aforementioned commit introduces a regression. It breaks
>>> virtio-fs support for Secure Execution, and most likely also for AMD SEV
>>> or any other confidential guest scenario that relies encrypted guest
>>> memory.  The same also applies to any other vhost device that does not
>>> support _F_ACCESS_PLATFORM.
>>>
>>> The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
>>> "device can not access all of the guest RAM" and "iova != gpa, thus
>>> device needs to translate iova".
>>>
>>> Confidential guest technologies currently rely on the device/hypervisor
>>> offering _F_ACCESS_PLATFORM, so that, after the feature has been
>>> negotiated, the guest  grants access to the portions of memory the
>>> device needs to see. So in for confidential guests, generally,
>>> _F_ACCESS_PLATFORM is about the restricted access to memory, but not
>>> about the addresses used being something else than guest physical
>>> addresses.
>>>
>>> This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
>>> turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
>>> form the vhost device that does not need it, because on the vhost
>>> interface it only means "I/O address translation is needed".
>>>
>>> This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
>>> VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
>>> situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
>>> by the device, and thus no device capability is needed. In this
>>> situation claiming that the device does not support iommu_plattform=on
>>> is counter-productive. So let us stop doing that!
>>>
>>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
>>> Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
>>> Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
>>> unsupported")
>>> Cc: Kevin Wolf <kwolf@redhat.com>
>>> Cc: qemu-stable@nongnu.org
>>>
>>> ---
>>>
>>> v1->v2:
>>> * Commit message tweaks. Most notably fixed commit SHA (Michael)
>>>
>>> ---
>>>   hw/virtio/virtio-bus.c | 11 ++++++-----
>>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
>>> index d23db98c56..c1578f3de2 100644
>>> --- a/hw/virtio/virtio-bus.c
>>> +++ b/hw/virtio/virtio-bus.c
>>> @@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>>>           return;
>>>       }
>>>   
>>> -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
>>> -        error_setg(errp, "iommu_platform=true is not supported by the device");
>>> -        return;
>>> -    }
>>> -
>>>       if (klass->device_plugged != NULL) {
>>>           klass->device_plugged(qbus->parent, &local_err);
>>>       }
>>> @@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>>>       } else {
>>>           vdev->dma_as = &address_space_memory;
>>>       }
>>> +
>>> +    if (has_iommu && vdev->dma_as != &address_space_memory
>>> +                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
>>> +        error_setg(errp, "iommu_platform=true is not supported by the device");
>>> +        return;
>>> +    }
>>>   }
>>>   
>>>   /* Reset the virtio_bus */
>>>
>>> base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32
>>
>
Daniel Henrique Barboza Jan. 27, 2022, 9:34 p.m. UTC | #4
On 1/27/22 10:28, Halil Pasic wrote:
> ping^2
> 
> Also adding Brijesh and Daniel, as I believe you guys should be
> interested in this, and I'm yet to receive review.
> 
> @Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
> too, and that the fix works for your platforms as well?

I failed to find a host that has Power secure execution support. I'll keep looking.


Meanwhile I have to mention that this patch re-introduced the problem that Kevin's
commit fixed.


With current upstream, if you start a regular guest with the following command line:

qemu-system-ppc64 (....)
-chardev socket,id=char0,path=/tmp/vhostqemu
-device vhost-user-fs-pci,chardev=char0,tag=myfs,iommu_platform=on

i.e. a guest with a vhost-user-fs-pci device that claims to have iommu support,
but it doesn't, this is the error message:


qemu-system-ppc64: -device vhost-user-fs-pci,chardev=char0,tag=myfs,iommu_platform=on: iommu_platform=true is not supported by the device


With this patch, that command line above starts the guest. virtiofsd fails during boot:

sudo ~/qemu/build/tools/virtiofsd/virtiofsd --socket-path=/tmp/vhostqemu -o source=~/linux-L1
[sudo] password for danielhb:
virtio_session_mount: Waiting for vhost-user socket connection...
virtio_session_mount: Received vhost-user socket connection
virtio_loop: Entry
fv_panic: libvhost-user: Invalid vring_addr message


And inside the guest, if you attempt to mount and use the virtiofs filesystem, the guest
hangs:

[root@localhost ~]# mount -t virtiofs myfs /mnt
[root@localhost ~]# cd /mnt

(hangs)

Exiting QEMU throws several vhost related errors:


QEMU 6.2.50 monitor - type 'help' for more information
(qemu) quit
qemu-system-ppc64: Failed to set msg fds.
qemu-system-ppc64: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
qemu-system-ppc64: Failed to set msg fds.
qemu-system-ppc64: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
qemu-system-ppc64: Failed to set msg fds.
qemu-system-ppc64: vhost_set_vring_call failed: Invalid argument (22)
qemu-system-ppc64: Failed to set msg fds.
qemu-system-ppc64: vhost_set_vring_call failed: Invalid argument (22)



I made a little experiment with upstream and reverting Kevin's patch and the result is
the same, meaning that this is the original bug [1] Kevin fixed back then. Note that [1]
was reported on x86, meaning that this particular issue seems to be arch agnostic.


My point here is that your patch fixes the situation for s390x, and Brijesh already chimed
in claiming that it fixed for AMD SEV, but it reintroduced a bug. I believe you should
include this test case with vhost-user in your testing to figure out a way to fix what
is needed without adding this particular regression.


In fact, I have a feeling that this is not the first time this kind of situation is discussed
around here. This reminds me of [2] and a discussion about the order virtiofs features
are negotiated versus when/how QEMU inits the devices.



[1] https://bugzilla.redhat.com/show_bug.cgi?id=1935019
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg05644.html


Thanks,


Daniel


> 
> Regards,
> Halil
> 
> On Tue, 25 Jan 2022 11:21:12 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> ping
>>
>> On Mon, 17 Jan 2022 13:02:38 +0100
>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>>> The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
>>> unsupported") claims to fail the device hotplug when iommu_platform
>>> is requested, but not supported by the (vhost) device. On the first
>>> glance the condition for detecting that situation looks perfect, but
>>> because a certain peculiarity of virtio_platform it ain't.
>>>
>>> In fact the aforementioned commit introduces a regression. It breaks
>>> virtio-fs support for Secure Execution, and most likely also for AMD SEV
>>> or any other confidential guest scenario that relies encrypted guest
>>> memory.  The same also applies to any other vhost device that does not
>>> support _F_ACCESS_PLATFORM.
>>>
>>> The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
>>> "device can not access all of the guest RAM" and "iova != gpa, thus
>>> device needs to translate iova".
>>>
>>> Confidential guest technologies currently rely on the device/hypervisor
>>> offering _F_ACCESS_PLATFORM, so that, after the feature has been
>>> negotiated, the guest  grants access to the portions of memory the
>>> device needs to see. So in for confidential guests, generally,
>>> _F_ACCESS_PLATFORM is about the restricted access to memory, but not
>>> about the addresses used being something else than guest physical
>>> addresses.
>>>
>>> This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
>>> turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
>>> form the vhost device that does not need it, because on the vhost
>>> interface it only means "I/O address translation is needed".
>>>
>>> This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
>>> VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
>>> situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
>>> by the device, and thus no device capability is needed. In this
>>> situation claiming that the device does not support iommu_plattform=on
>>> is counter-productive. So let us stop doing that!
>>>
>>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
>>> Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
>>> Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
>>> unsupported")
>>> Cc: Kevin Wolf <kwolf@redhat.com>
>>> Cc: qemu-stable@nongnu.org
>>>
>>> ---
>>>
>>> v1->v2:
>>> * Commit message tweaks. Most notably fixed commit SHA (Michael)
>>>
>>> ---
>>>   hw/virtio/virtio-bus.c | 11 ++++++-----
>>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
>>> index d23db98c56..c1578f3de2 100644
>>> --- a/hw/virtio/virtio-bus.c
>>> +++ b/hw/virtio/virtio-bus.c
>>> @@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>>>           return;
>>>       }
>>>   
>>> -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
>>> -        error_setg(errp, "iommu_platform=true is not supported by the device");
>>> -        return;
>>> -    }
>>> -
>>>       if (klass->device_plugged != NULL) {
>>>           klass->device_plugged(qbus->parent, &local_err);
>>>       }
>>> @@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>>>       } else {
>>>           vdev->dma_as = &address_space_memory;
>>>       }
>>> +
>>> +    if (has_iommu && vdev->dma_as != &address_space_memory
>>> +                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
>>> +        error_setg(errp, "iommu_platform=true is not supported by the device");
>>> +        return;
>>> +    }
>>>   }
>>>   
>>>   /* Reset the virtio_bus */
>>>
>>> base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32
>>
> 
>
Halil Pasic Jan. 28, 2022, 2:29 a.m. UTC | #5
On Thu, 27 Jan 2022 18:34:23 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> On 1/27/22 10:28, Halil Pasic wrote:
> > ping^2
> > 
> > Also adding Brijesh and Daniel, as I believe you guys should be
> > interested in this, and I'm yet to receive review.
> > 
> > @Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
> > too, and that the fix works for your platforms as well?  
> 
> I failed to find a host that has Power secure execution support. I'll keep looking.
> 
> 
> Meanwhile I have to mention that this patch re-introduced the problem that Kevin's
> commit fixed.
> 
> 
> With current upstream, if you start a regular guest with the following command line:
> 
> qemu-system-ppc64 (....)
> -chardev socket,id=char0,path=/tmp/vhostqemu
> -device vhost-user-fs-pci,chardev=char0,tag=myfs,iommu_platform=on
> 
> i.e. a guest with a vhost-user-fs-pci device that claims to have iommu support,
> but it doesn't, this is the error message:
> 
> 
> qemu-system-ppc64: -device vhost-user-fs-pci,chardev=char0,tag=myfs,iommu_platform=on: iommu_platform=true is not supported by the device
> 
> 
> With this patch, that command line above starts the guest. 
> virtiofsd fails during boot:
> 
> sudo ~/qemu/build/tools/virtiofsd/virtiofsd --socket-path=/tmp/vhostqemu -o source=~/linux-L1
> [sudo] password for danielhb:
> virtio_session_mount: Waiting for vhost-user socket connection...
> virtio_session_mount: Received vhost-user socket connection
> virtio_loop: Entry
> fv_panic: libvhost-user: Invalid vring_addr message
> 
> 
> And inside the guest, if you attempt to mount and use the virtiofs filesystem, the guest
> hangs:
> 
> [root@localhost ~]# mount -t virtiofs myfs /mnt
> [root@localhost ~]# cd /mnt
> 
> (hangs)
> 
> Exiting QEMU throws several vhost related errors:
> 
> 
> QEMU 6.2.50 monitor - type 'help' for more information
> (qemu) quit
> qemu-system-ppc64: Failed to set msg fds.
> qemu-system-ppc64: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
> qemu-system-ppc64: Failed to set msg fds.
> qemu-system-ppc64: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
> qemu-system-ppc64: Failed to set msg fds.
> qemu-system-ppc64: vhost_set_vring_call failed: Invalid argument (22)
> qemu-system-ppc64: Failed to set msg fds.
> qemu-system-ppc64: vhost_set_vring_call failed: Invalid argument (22)
> 
> 


Does your VM have an IOMMU and does your guest see it? If yes does
vdev->dma_as != &address_space_memory hold for your virtio device? If no why not?

My understanding is that your guest wants to do translated addresses,
because it sees the ACCESS_PLATFORM feature, and probably thinks that
your device is indeed behind an IOMMU, from what I assume, at least it
sees that there is an IOMMU. But then I would expect your virtio device
to have its vdev->dma_as set to something different than
&address_space_memory. Conversely if your dma address space is
address_space_memory, then you don't need address translation because
your dma addresses are the same  as your guest physical addresses.

> 
> I made a little experiment with upstream and reverting Kevin's patch and the result is
> the same, meaning that this is the original bug [1] Kevin fixed back then. Note that [1]
> was reported on x86, meaning that this particular issue seems to be arch agnostic.

We don't have this problem on s390, so it ain't entirely arch agnostic.

> 
> 
> My point here is that your patch fixes the situation for s390x, and Brijesh already chimed
> in claiming that it fixed for AMD SEV, but it reintroduced a bug. I believe you should
> include this test case with vhost-user in your testing to figure out a way to fix what
> is needed without adding this particular regression.

Can you help me with this? IMHO the big problem is that iommu_platform
is used for two distinct things. I've described that in the commit
message.

We may be able to differentiate between the two using ->dma_as, but for
that it needs to be set up correctly: whenever you require translation
it should be something different than address_space_memory. The question
is why do you require translation but don't have your ->dma_as set up
properly? It can be a guest thing, i.e. guest just assumes it has to do
bus addresses, while it actually does not have to, or we indeed do have
an IOMMU which polices the devices access to the guest memory, but for
some strange reason we failed to set up ->dma_as to reflect that.

@Michael: what is your opinion?

> 
> 
> In fact, I have a feeling that this is not the first time this kind of situation is discussed
> around here. This reminds me of [2] and a discussion about the order virtiofs features
> are negotiated versus when/how QEMU inits the devices.
> 
> 
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1935019
> [2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg05644.html
> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
> > 
> > Regards,
> > Halil
> > 
> > On Tue, 25 Jan 2022 11:21:12 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> >> ping
> >>
> >> On Mon, 17 Jan 2022 13:02:38 +0100
> >> Halil Pasic <pasic@linux.ibm.com> wrote:
> >>  
> >>> The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> >>> unsupported") claims to fail the device hotplug when iommu_platform
> >>> is requested, but not supported by the (vhost) device. On the first
> >>> glance the condition for detecting that situation looks perfect, but
> >>> because a certain peculiarity of virtio_platform it ain't.
> >>>
> >>> In fact the aforementioned commit introduces a regression. It breaks
> >>> virtio-fs support for Secure Execution, and most likely also for AMD SEV
> >>> or any other confidential guest scenario that relies encrypted guest
> >>> memory.  The same also applies to any other vhost device that does not
> >>> support _F_ACCESS_PLATFORM.
> >>>
> >>> The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
> >>> "device can not access all of the guest RAM" and "iova != gpa, thus
> >>> device needs to translate iova".
> >>>
> >>> Confidential guest technologies currently rely on the device/hypervisor
> >>> offering _F_ACCESS_PLATFORM, so that, after the feature has been
> >>> negotiated, the guest  grants access to the portions of memory the
> >>> device needs to see. So in for confidential guests, generally,
> >>> _F_ACCESS_PLATFORM is about the restricted access to memory, but not
> >>> about the addresses used being something else than guest physical
> >>> addresses.
> >>>
> >>> This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
> >>> turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
> >>> form the vhost device that does not need it, because on the vhost
> >>> interface it only means "I/O address translation is needed".
> >>>
> >>> This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
> >>> VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
> >>> situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
> >>> by the device, and thus no device capability is needed. In this
> >>> situation claiming that the device does not support iommu_plattform=on
> >>> is counter-productive. So let us stop doing that!
> >>>
> >>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> >>> Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
> >>> Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> >>> unsupported")
> >>> Cc: Kevin Wolf <kwolf@redhat.com>
> >>> Cc: qemu-stable@nongnu.org
> >>>
> >>> ---
> >>>
> >>> v1->v2:
> >>> * Commit message tweaks. Most notably fixed commit SHA (Michael)
> >>>
> >>> ---
> >>>   hw/virtio/virtio-bus.c | 11 ++++++-----
> >>>   1 file changed, 6 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> >>> index d23db98c56..c1578f3de2 100644
> >>> --- a/hw/virtio/virtio-bus.c
> >>> +++ b/hw/virtio/virtio-bus.c
> >>> @@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> >>>           return;
> >>>       }
> >>>   
> >>> -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> >>> -        error_setg(errp, "iommu_platform=true is not supported by the device");
> >>> -        return;
> >>> -    }
> >>> -
> >>>       if (klass->device_plugged != NULL) {
> >>>           klass->device_plugged(qbus->parent, &local_err);
> >>>       }
> >>> @@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> >>>       } else {
> >>>           vdev->dma_as = &address_space_memory;
> >>>       }
> >>> +
> >>> +    if (has_iommu && vdev->dma_as != &address_space_memory
> >>> +                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> >>> +        error_setg(errp, "iommu_platform=true is not supported by the device");
> >>> +        return;
> >>> +    }
> >>>   }
> >>>   
> >>>   /* Reset the virtio_bus */
> >>>
> >>> base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32  
> >>  
> > 
> >   
>
Michael S. Tsirkin Jan. 28, 2022, 9:48 a.m. UTC | #6
On Fri, Jan 28, 2022 at 03:29:11AM +0100, Halil Pasic wrote:
> On Thu, 27 Jan 2022 18:34:23 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
> > On 1/27/22 10:28, Halil Pasic wrote:
> > > ping^2
> > > 
> > > Also adding Brijesh and Daniel, as I believe you guys should be
> > > interested in this, and I'm yet to receive review.
> > > 
> > > @Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
> > > too, and that the fix works for your platforms as well?  
> > 
> > I failed to find a host that has Power secure execution support. I'll keep looking.
> > 
> > 
> > Meanwhile I have to mention that this patch re-introduced the problem that Kevin's
> > commit fixed.
> > 
> > 
> > With current upstream, if you start a regular guest with the following command line:
> > 
> > qemu-system-ppc64 (....)
> > -chardev socket,id=char0,path=/tmp/vhostqemu
> > -device vhost-user-fs-pci,chardev=char0,tag=myfs,iommu_platform=on
> > 
> > i.e. a guest with a vhost-user-fs-pci device that claims to have iommu support,
> > but it doesn't, this is the error message:
> > 
> > 
> > qemu-system-ppc64: -device vhost-user-fs-pci,chardev=char0,tag=myfs,iommu_platform=on: iommu_platform=true is not supported by the device
> > 
> > 
> > With this patch, that command line above starts the guest. 
> > virtiofsd fails during boot:
> > 
> > sudo ~/qemu/build/tools/virtiofsd/virtiofsd --socket-path=/tmp/vhostqemu -o source=~/linux-L1
> > [sudo] password for danielhb:
> > virtio_session_mount: Waiting for vhost-user socket connection...
> > virtio_session_mount: Received vhost-user socket connection
> > virtio_loop: Entry
> > fv_panic: libvhost-user: Invalid vring_addr message
> > 
> > 
> > And inside the guest, if you attempt to mount and use the virtiofs filesystem, the guest
> > hangs:
> > 
> > [root@localhost ~]# mount -t virtiofs myfs /mnt
> > [root@localhost ~]# cd /mnt
> > 
> > (hangs)
> > 
> > Exiting QEMU throws several vhost related errors:
> > 
> > 
> > QEMU 6.2.50 monitor - type 'help' for more information
> > (qemu) quit
> > qemu-system-ppc64: Failed to set msg fds.
> > qemu-system-ppc64: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
> > qemu-system-ppc64: Failed to set msg fds.
> > qemu-system-ppc64: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
> > qemu-system-ppc64: Failed to set msg fds.
> > qemu-system-ppc64: vhost_set_vring_call failed: Invalid argument (22)
> > qemu-system-ppc64: Failed to set msg fds.
> > qemu-system-ppc64: vhost_set_vring_call failed: Invalid argument (22)
> > 
> > 
> 
> 
> Does your VM have an IOMMU and does your guest see it? If yes does
> vdev->dma_as != &address_space_memory hold for your virtio device? If no why not?
> 
> My understanding is that your guest wants to do translated addresses,
> because it sees the ACCESS_PLATFORM feature, and probably thinks that
> your device is indeed behind an IOMMU, from what I assume, at least it
> sees that there is an IOMMU. But then I would expect your virtio device
> to have its vdev->dma_as set to something different than
> &address_space_memory. Conversely if your dma address space is
> address_space_memory, then you don't need address translation because
> your dma addresses are the same  as your guest physical addresses.
> 
> > 
> > I made a little experiment with upstream and reverting Kevin's patch and the result is
> > the same, meaning that this is the original bug [1] Kevin fixed back then. Note that [1]
> > was reported on x86, meaning that this particular issue seems to be arch agnostic.
> 
> We don't have this problem on s390, so it ain't entirely arch agnostic.
> 
> > 
> > 
> > My point here is that your patch fixes the situation for s390x, and Brijesh already chimed
> > in claiming that it fixed for AMD SEV, but it reintroduced a bug. I believe you should
> > include this test case with vhost-user in your testing to figure out a way to fix what
> > is needed without adding this particular regression.
> 
> Can you help me with this? IMHO the big problem is that iommu_platform
> is used for two distinct things. I've described that in the commit
> message.
> 
> We may be able to differentiate between the two using ->dma_as, but for
> that it needs to be set up correctly: whenever you require translation
> it should be something different than address_space_memory. The question
> is why do you require translation but don't have your ->dma_as set up
> properly? It can be a guest thing, i.e. guest just assumes it has to do
> bus addresses, while it actually does not have to, or we indeed do have
> an IOMMU which polices the devices access to the guest memory, but for
> some strange reason we failed to set up ->dma_as to reflect that.
> 
> @Michael: what is your opinion?

Right, I am puzzled too.

> > 
> > 
> > In fact, I have a feeling that this is not the first time this kind of situation is discussed
> > around here. This reminds me of [2] and a discussion about the order virtiofs features
> > are negotiated versus when/how QEMU inits the devices.
> > 
> > 
> > 
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1935019
> > [2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg05644.html
> > 
> > 
> > Thanks,
> > 
> > 
> > Daniel
> > 
> > 
> > > 
> > > Regards,
> > > Halil
> > > 
> > > On Tue, 25 Jan 2022 11:21:12 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > >   
> > >> ping
> > >>
> > >> On Mon, 17 Jan 2022 13:02:38 +0100
> > >> Halil Pasic <pasic@linux.ibm.com> wrote:
> > >>  
> > >>> The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> > >>> unsupported") claims to fail the device hotplug when iommu_platform
> > >>> is requested, but not supported by the (vhost) device. On the first
> > >>> glance the condition for detecting that situation looks perfect, but
> > >>> because a certain peculiarity of virtio_platform it ain't.
> > >>>
> > >>> In fact the aforementioned commit introduces a regression. It breaks
> > >>> virtio-fs support for Secure Execution, and most likely also for AMD SEV
> > >>> or any other confidential guest scenario that relies encrypted guest
> > >>> memory.  The same also applies to any other vhost device that does not
> > >>> support _F_ACCESS_PLATFORM.
> > >>>
> > >>> The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
> > >>> "device can not access all of the guest RAM" and "iova != gpa, thus
> > >>> device needs to translate iova".
> > >>>
> > >>> Confidential guest technologies currently rely on the device/hypervisor
> > >>> offering _F_ACCESS_PLATFORM, so that, after the feature has been
> > >>> negotiated, the guest  grants access to the portions of memory the
> > >>> device needs to see. So in for confidential guests, generally,
> > >>> _F_ACCESS_PLATFORM is about the restricted access to memory, but not
> > >>> about the addresses used being something else than guest physical
> > >>> addresses.
> > >>>
> > >>> This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
> > >>> turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
> > >>> form the vhost device that does not need it, because on the vhost
> > >>> interface it only means "I/O address translation is needed".
> > >>>
> > >>> This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
> > >>> VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
> > >>> situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
> > >>> by the device, and thus no device capability is needed. In this
> > >>> situation claiming that the device does not support iommu_plattform=on
> > >>> is counter-productive. So let us stop doing that!
> > >>>
> > >>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > >>> Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
> > >>> Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> > >>> unsupported")
> > >>> Cc: Kevin Wolf <kwolf@redhat.com>
> > >>> Cc: qemu-stable@nongnu.org
> > >>>
> > >>> ---
> > >>>
> > >>> v1->v2:
> > >>> * Commit message tweaks. Most notably fixed commit SHA (Michael)
> > >>>
> > >>> ---
> > >>>   hw/virtio/virtio-bus.c | 11 ++++++-----
> > >>>   1 file changed, 6 insertions(+), 5 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > >>> index d23db98c56..c1578f3de2 100644
> > >>> --- a/hw/virtio/virtio-bus.c
> > >>> +++ b/hw/virtio/virtio-bus.c
> > >>> @@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> > >>>           return;
> > >>>       }
> > >>>   
> > >>> -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > >>> -        error_setg(errp, "iommu_platform=true is not supported by the device");
> > >>> -        return;
> > >>> -    }
> > >>> -
> > >>>       if (klass->device_plugged != NULL) {
> > >>>           klass->device_plugged(qbus->parent, &local_err);
> > >>>       }
> > >>> @@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> > >>>       } else {
> > >>>           vdev->dma_as = &address_space_memory;
> > >>>       }
> > >>> +
> > >>> +    if (has_iommu && vdev->dma_as != &address_space_memory
> > >>> +                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > >>> +        error_setg(errp, "iommu_platform=true is not supported by the device");
> > >>> +        return;
> > >>> +    }
> > >>>   }
> > >>>   
> > >>>   /* Reset the virtio_bus */
> > >>>
> > >>> base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32  
> > >>  
> > > 
> > >   
> >
Daniel Henrique Barboza Jan. 28, 2022, 11:02 a.m. UTC | #7
On 1/27/22 23:29, Halil Pasic wrote:
> On Thu, 27 Jan 2022 18:34:23 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>> On 1/27/22 10:28, Halil Pasic wrote:
>>> ping^2
>>>
>>> Also adding Brijesh and Daniel, as I believe you guys should be
>>> interested in this, and I'm yet to receive review.
>>>
>>> @Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
>>> too, and that the fix works for your platforms as well?
>>
>> I failed to find a host that has Power secure execution support. I'll keep looking.
>>
>>
>> Meanwhile I have to mention that this patch re-introduced the problem that Kevin's
>> commit fixed.

[...]

>>
>> I made a little experiment with upstream and reverting Kevin's patch and the result is
>> the same, meaning that this is the original bug [1] Kevin fixed back then. Note that [1]
>> was reported on x86, meaning that this particular issue seems to be arch agnostic.
> 
> We don't have this problem on s390, so it ain't entirely arch agnostic.

It is arch agnostic in a way that it relies on iommu_platform support being true to this
specific device (vhost-user-fs-pci) instead of some particularity of the machine.

> 
>>
>>
>> My point here is that your patch fixes the situation for s390x, and Brijesh already chimed
>> in claiming that it fixed for AMD SEV, but it reintroduced a bug. I believe you should
>> include this test case with vhost-user in your testing to figure out a way to fix what
>> is needed without adding this particular regression.
> 
> Can you help me with this? IMHO the big problem is that iommu_platform
> is used for two distinct things. I've described that in the commit
> message.
> 
> We may be able to differentiate between the two using ->dma_as, but for
> that it needs to be set up correctly: whenever you require translation
> it should be something different than address_space_memory. The question
> is why do you require translation but don't have your ->dma_as set up
> properly? It can be a guest thing, i.e. guest just assumes it has to do
> bus addresses, while it actually does not have to, or we indeed do have
> an IOMMU which polices the devices access to the guest memory, but for
> some strange reason we failed to set up ->dma_as to reflect that.


I have 2 suggestions. First is to separate how we interpret iommu_platform. I find it
hard to do this properly without creating a new flag/command line option.


My second suggestion is, well .... I think it's proved that s390x-PV and AMD SEV are
being impacted (and probably Power secure guests as well), so why not check for
confidential guest support to skip that check entirely? Something like this patch:


diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d23db98c56..4305fdd1b7 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -29,6 +29,7 @@
  #include "hw/virtio/virtio-bus.h"
  #include "hw/virtio/virtio.h"
  #include "exec/address-spaces.h"
+#include "hw/boards.h"
  
  /* #define DEBUG_VIRTIO_BUS */
  
@@ -42,6 +43,7 @@ do { printf("virtio_bus: " fmt , ## __VA_ARGS__); } while (0)
  /* A VirtIODevice is being plugged */
  void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
  {
+    MachineState *machine = MACHINE(qdev_get_machine());
      DeviceState *qdev = DEVICE(vdev);
      BusState *qbus = BUS(qdev_get_parent_bus(qdev));
      VirtioBusState *bus = VIRTIO_BUS(qbus);
@@ -69,7 +71,18 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
          return;
      }
  
-    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+    /*
+     * Confidential guest technologies such as AMD SEV and s390x-PV relies
+     * on device/hypervisor offering _F_ACCESS_PLATFORM so the guest grants
+     * access to the portions of memory the device needs to see. For these
+     * guests, _F_ACCESS_PLATFORM is about the restricted access to memory,
+     * but not about infering whether iommu_platform is supported in the
+     * device.
+     *
+     * Skip this check for these guests by checking machine->cgs.
+     */
+    if (!machine->cgs && has_iommu &&
+        !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
          error_setg(errp, "iommu_platform=true is not supported by the device");
          return;
      }
Halil Pasic Jan. 28, 2022, 11:48 a.m. UTC | #8
On Fri, 28 Jan 2022 08:02:39 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> > We may be able to differentiate between the two using ->dma_as, but for
> > that it needs to be set up correctly: whenever you require translation
> > it should be something different than address_space_memory. The question
> > is why do you require translation but don't have your ->dma_as set up
> > properly? It can be a guest thing, i.e. guest just assumes it has to do
> > bus addresses, while it actually does not have to, or we indeed do have
> > an IOMMU which polices the devices access to the guest memory, but for
> > some strange reason we failed to set up ->dma_as to reflect that.  
> 
> 
> I have 2 suggestions. First is to separate how we interpret iommu_platform. I find it
> hard to do this properly without creating a new flag/command line option.
> 

A new command line option looks problematic to me because of the
existing setups. We could tie that to a compat machine, but it looks
ugly and also a little wrong from where I stand.
> 
> My second suggestion is, well .... I think it's proved that s390x-PV and AMD SEV are
> being impacted (and probably Power secure guests as well), so why not check for
> confidential guest support to skip that check entirely? Something like this patch:
> 

This is not acceptable for s390x and it should not be acceptable for SEV
or Power secure guests, because s390x Secure Execution ()support predates
the confidential guest support patches and "->cgs", and thus you don't
have to turn on CGS to use SE. Just providing the iommu_platform=on
manually on each device is perfectly fine! Should be the same for SEV

[..]
> +    if (!machine->cgs && has_iommu &&
> +        !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
>           error_setg(errp, "iommu_platform=true is not supported by the device");
>           return;
>       }
[..]

> This will not break anything for non-secure guests and, granted that machine->cgs is already
> set at this point, this will fix the problem for s390x-PV and AMD SEV. And we won't have to
> dive deep into a virtio-bus feature negotiation saga because of something that can be easily
> handled for machine->cgs guests only.

Your assumption does not hold. See above. Unfortunately my assumption of 
->dma_as == & address_space_memory implies does not need translation
does not hold either. But IMHO we should really get to the bottom of
that, because it just does not make sense.

> 
> If this patch works for you and Brijesh I believe this is a good option.

I don't believe it is a good option. @Brijesh can you confirm that SEV
has the same problem with this approach s390x has, and that it would
break existing setups?

I have another idea, but my problem is that I don't understand enough of
the Power and PCI stuff. Anyway if for your plattform iommu_platform=on
devices can not work in a VM that does not have an IOMMU you could
error out on that. You could express that via a machine property, and
then make sure your dma address space is not address_space_memory, if
that machine property is set.

Regards,
Halil
Michael S. Tsirkin Jan. 28, 2022, 11:52 a.m. UTC | #9
On Fri, Jan 28, 2022 at 08:02:39AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 1/27/22 23:29, Halil Pasic wrote:
> > On Thu, 27 Jan 2022 18:34:23 -0300
> > Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> > 
> > > On 1/27/22 10:28, Halil Pasic wrote:
> > > > ping^2
> > > > 
> > > > Also adding Brijesh and Daniel, as I believe you guys should be
> > > > interested in this, and I'm yet to receive review.
> > > > 
> > > > @Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
> > > > too, and that the fix works for your platforms as well?
> > > 
> > > I failed to find a host that has Power secure execution support. I'll keep looking.
> > > 
> > > 
> > > Meanwhile I have to mention that this patch re-introduced the problem that Kevin's
> > > commit fixed.
> 
> [...]
> 
> > > 
> > > I made a little experiment with upstream and reverting Kevin's patch and the result is
> > > the same, meaning that this is the original bug [1] Kevin fixed back then. Note that [1]
> > > was reported on x86, meaning that this particular issue seems to be arch agnostic.
> > 
> > We don't have this problem on s390, so it ain't entirely arch agnostic.
> 
> It is arch agnostic in a way that it relies on iommu_platform support being true to this
> specific device (vhost-user-fs-pci) instead of some particularity of the machine.

But it is specific to VMs with an IOMMU or other ways to restrict access
such as cgs, right?
Without a vIOMMU or cgs in the VM the ACCESS_PLATFORM flag is a nop for
the guest, it doesn't affect anything except slowing things down
somewhat, right?

> > 
> > > 
> > > 
> > > My point here is that your patch fixes the situation for s390x, and Brijesh already chimed
> > > in claiming that it fixed for AMD SEV, but it reintroduced a bug. I believe you should
> > > include this test case with vhost-user in your testing to figure out a way to fix what
> > > is needed without adding this particular regression.
> > 
> > Can you help me with this? IMHO the big problem is that iommu_platform
> > is used for two distinct things. I've described that in the commit
> > message.
> > 
> > We may be able to differentiate between the two using ->dma_as, but for
> > that it needs to be set up correctly: whenever you require translation
> > it should be something different than address_space_memory. The question
> > is why do you require translation but don't have your ->dma_as set up
> > properly? It can be a guest thing, i.e. guest just assumes it has to do
> > bus addresses, while it actually does not have to, or we indeed do have
> > an IOMMU which polices the devices access to the guest memory, but for
> > some strange reason we failed to set up ->dma_as to reflect that.
> 
> 
> I have 2 suggestions. First is to separate how we interpret iommu_platform. I find it
> hard to do this properly without creating a new flag/command line option.

We do want to switch to call it access_platform at some point anyway.
When we do, we can make it only mean the guest flag.


> 
> My second suggestion is, well .... I think it's proved that s390x-PV and AMD SEV are
> being impacted (and probably Power secure guests as well), so why not check for
> confidential guest support to skip that check entirely? Something like this patch:
> 
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index d23db98c56..4305fdd1b7 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -29,6 +29,7 @@
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio.h"
>  #include "exec/address-spaces.h"
> +#include "hw/boards.h"
>  /* #define DEBUG_VIRTIO_BUS */
> @@ -42,6 +43,7 @@ do { printf("virtio_bus: " fmt , ## __VA_ARGS__); } while (0)
>  /* A VirtIODevice is being plugged */
>  void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>  {
> +    MachineState *machine = MACHINE(qdev_get_machine());
>      DeviceState *qdev = DEVICE(vdev);
>      BusState *qbus = BUS(qdev_get_parent_bus(qdev));
>      VirtioBusState *bus = VIRTIO_BUS(qbus);
> @@ -69,7 +71,18 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>          return;
>      }
> -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> +    /*
> +     * Confidential guest technologies such as AMD SEV and s390x-PV relies
> +     * on device/hypervisor offering _F_ACCESS_PLATFORM so the guest grants
> +     * access to the portions of memory the device needs to see. For these
> +     * guests, _F_ACCESS_PLATFORM is about the restricted access to memory,
> +     * but not about infering whether iommu_platform is supported in the
> +     * device.
> +     *
> +     * Skip this check for these guests by checking machine->cgs.
> +     */
> +    if (!machine->cgs && has_iommu &&
> +        !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
>          error_setg(errp, "iommu_platform=true is not supported by the device");
>          return;
>      }
> -- 
> 2.34.1

In fact I proposed setting _F_ACCESS_PLATFORM automatically in the past,



> 
> This will not break anything for non-secure guests and, granted that machine->cgs is already
> set at this point, this will fix the problem for s390x-PV and AMD SEV. And we won't have to
> dive deep into a virtio-bus feature negotiation saga because of something that can be easily
> handled for machine->cgs guests only.
> 
> If this patch works for you and Brijesh I believe this is a good option.
> 
> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
> 
> > 
> > @Michael: what is your opinion?
> > 
> > > 
> > > 
> > > In fact, I have a feeling that this is not the first time this kind of situation is discussed
> > > around here. This reminds me of [2] and a discussion about the order virtiofs features
> > > are negotiated versus when/how QEMU inits the devices.
> > > 
> > > 
> > > 
> > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1935019
> > > [2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg05644.html
> > > 
> > > 
> > > Thanks,
> > > 
> > > 
> > > Daniel
> > > 
> > > 
> > > > 
> > > > Regards,
> > > > Halil
> > > > 
> > > > On Tue, 25 Jan 2022 11:21:12 +0100
> > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > > ping
> > > > > 
> > > > > On Mon, 17 Jan 2022 13:02:38 +0100
> > > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > > > The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> > > > > > unsupported") claims to fail the device hotplug when iommu_platform
> > > > > > is requested, but not supported by the (vhost) device. On the first
> > > > > > glance the condition for detecting that situation looks perfect, but
> > > > > > because a certain peculiarity of virtio_platform it ain't.
> > > > > > 
> > > > > > In fact the aforementioned commit introduces a regression. It breaks
> > > > > > virtio-fs support for Secure Execution, and most likely also for AMD SEV
> > > > > > or any other confidential guest scenario that relies encrypted guest
> > > > > > memory.  The same also applies to any other vhost device that does not
> > > > > > support _F_ACCESS_PLATFORM.
> > > > > > 
> > > > > > The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
> > > > > > "device can not access all of the guest RAM" and "iova != gpa, thus
> > > > > > device needs to translate iova".
> > > > > > 
> > > > > > Confidential guest technologies currently rely on the device/hypervisor
> > > > > > offering _F_ACCESS_PLATFORM, so that, after the feature has been
> > > > > > negotiated, the guest  grants access to the portions of memory the
> > > > > > device needs to see. So in for confidential guests, generally,
> > > > > > _F_ACCESS_PLATFORM is about the restricted access to memory, but not
> > > > > > about the addresses used being something else than guest physical
> > > > > > addresses.
> > > > > > 
> > > > > > This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
> > > > > > turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
> > > > > > form the vhost device that does not need it, because on the vhost
> > > > > > interface it only means "I/O address translation is needed".
> > > > > > 
> > > > > > This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
> > > > > > VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
> > > > > > situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
> > > > > > by the device, and thus no device capability is needed. In this
> > > > > > situation claiming that the device does not support iommu_plattform=on
> > > > > > is counter-productive. So let us stop doing that!
> > > > > > 
> > > > > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > > > > > Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
> > > > > > Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
> > > > > > unsupported")
> > > > > > Cc: Kevin Wolf <kwolf@redhat.com>
> > > > > > Cc: qemu-stable@nongnu.org
> > > > > > 
> > > > > > ---
> > > > > > 
> > > > > > v1->v2:
> > > > > > * Commit message tweaks. Most notably fixed commit SHA (Michael)
> > > > > > 
> > > > > > ---
> > > > > >    hw/virtio/virtio-bus.c | 11 ++++++-----
> > > > > >    1 file changed, 6 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > > > > > index d23db98c56..c1578f3de2 100644
> > > > > > --- a/hw/virtio/virtio-bus.c
> > > > > > +++ b/hw/virtio/virtio-bus.c
> > > > > > @@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> > > > > >            return;
> > > > > >        }
> > > > > > -    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > > > > > -        error_setg(errp, "iommu_platform=true is not supported by the device");
> > > > > > -        return;
> > > > > > -    }
> > > > > > -
> > > > > >        if (klass->device_plugged != NULL) {
> > > > > >            klass->device_plugged(qbus->parent, &local_err);
> > > > > >        }
> > > > > > @@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> > > > > >        } else {
> > > > > >            vdev->dma_as = &address_space_memory;
> > > > > >        }
> > > > > > +
> > > > > > +    if (has_iommu && vdev->dma_as != &address_space_memory
> > > > > > +                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> > > > > > +        error_setg(errp, "iommu_platform=true is not supported by the device");
> > > > > > +        return;
> > > > > > +    }
> > > > > >    }
> > > > > >    /* Reset the virtio_bus */
> > > > > > 
> > > > > > base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32
> > > > 
> > > 
> >
Daniel Henrique Barboza Jan. 28, 2022, 12:12 p.m. UTC | #10
On 1/28/22 08:48, Halil Pasic wrote:
> On Fri, 28 Jan 2022 08:02:39 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>>> We may be able to differentiate between the two using ->dma_as, but for
>>> that it needs to be set up correctly: whenever you require translation
>>> it should be something different than address_space_memory. The question
>>> is why do you require translation but don't have your ->dma_as set up
>>> properly? It can be a guest thing, i.e. guest just assumes it has to do
>>> bus addresses, while it actually does not have to, or we indeed do have
>>> an IOMMU which polices the devices access to the guest memory, but for
>>> some strange reason we failed to set up ->dma_as to reflect that.
>>
>>
>> I have 2 suggestions. First is to separate how we interpret iommu_platform. I find it
>> hard to do this properly without creating a new flag/command line option.
>>
> 
> A new command line option looks problematic to me because of the
> existing setups. We could tie that to a compat machine, but it looks
> ugly and also a little wrong from where I stand.
>>
>> My second suggestion is, well .... I think it's proved that s390x-PV and AMD SEV are
>> being impacted (and probably Power secure guests as well), so why not check for
>> confidential guest support to skip that check entirely? Something like this patch:
>>
> 
> This is not acceptable for s390x and it should not be acceptable for SEV
> or Power secure guests, because s390x Secure Execution ()support predates
> the confidential guest support patches and "->cgs", and thus you don't
> have to turn on CGS to use SE. Just providing the iommu_platform=on
> manually on each device is perfectly fine! Should be the same for SEV

Hm, that's unfortunate. Checking machine->cgs would be an easy way out.

> 
> [..]
>> +    if (!machine->cgs && has_iommu &&
>> +        !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
>>            error_setg(errp, "iommu_platform=true is not supported by the device");
>>            return;
>>        }
> [..]
> 
>> This will not break anything for non-secure guests and, granted that machine->cgs is already
>> set at this point, this will fix the problem for s390x-PV and AMD SEV. And we won't have to
>> dive deep into a virtio-bus feature negotiation saga because of something that can be easily
>> handled for machine->cgs guests only.
> 
> Your assumption does not hold. See above. Unfortunately my assumption of
> ->dma_as == & address_space_memory implies does not need translation
> does not hold either. But IMHO we should really get to the bottom of
> that, because it just does not make sense.


I'll make an attempt to understand the logic in Power side.

> 
>>
>> If this patch works for you and Brijesh I believe this is a good option.
> 
> I don't believe it is a good option. @Brijesh can you confirm that SEV
> has the same problem with this approach s390x has, and that it would
> break existing setups?
> 
> I have another idea, but my problem is that I don't understand enough of
> the Power and PCI stuff. Anyway if for your plattform iommu_platform=on
> devices can not work in a VM that does not have an IOMMU you could
> error out on that. You could express that via a machine property, and
> then make sure your dma address space is not address_space_memory, if
> that machine property is set.


Bear in mind that the root problem of what I've reported up there isn't something that's
just Power specific. Any arch in which vhost-user-fs-pci doesn not support iommu_platform
will have the problem as well (e.g. x86 and the RH bug Kevin fixed).


What I mean is that I can fix my side using the PowerPC PCI specifications and be done
with it, but that would not help x86 for example. I believe a better way is to use the
PowerPC case to understand where the overall common logic can be improved to everyone.


Thanks,


Daniel



> 
> Regards,
> Halil
diff mbox series

Patch

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d23db98c56..c1578f3de2 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -69,11 +69,6 @@  void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
         return;
     }
 
-    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
-        error_setg(errp, "iommu_platform=true is not supported by the device");
-        return;
-    }
-
     if (klass->device_plugged != NULL) {
         klass->device_plugged(qbus->parent, &local_err);
     }
@@ -88,6 +83,12 @@  void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
     } else {
         vdev->dma_as = &address_space_memory;
     }
+
+    if (has_iommu && vdev->dma_as != &address_space_memory
+                  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+        error_setg(errp, "iommu_platform=true is not supported by the device");
+        return;
+    }
 }
 
 /* Reset the virtio_bus */