diff mbox series

[v2,1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV

Message ID 20200514221155.32079-1-pasic@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV | expand

Commit Message

Halil Pasic May 14, 2020, 10:11 p.m. UTC
The virtio specification tells that the device is to present
VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
device "can only access certain memory addresses with said access
specified and/or granted by the platform". This is the case for a
protected VMs, as the device can access only memory addresses that are
in pages that are currently shared (only the guest can share/unsare its
pages).

No VM, however, starts out as a protected VM, but some VMs may be
converted to protected VMs if the guest decides so.

Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
the property iommu_on is a minor disaster. Since the correctness of the
paravirtualized virtio devices depends (and thus in a sense the
correctness of the hypervisor) it, then the hypervisor should have the
last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
not.

Currently presenting a PV guest with a (paravirtualized) virtio-ccw
device has catastrophic consequences for the VM (after the hypervisors
access to protected memory). This is especially grave in case of device
hotplug (because in this case the guest is more likely to be in the
middle of something important).

Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically
for virtio-ccw devices, i.e. force it before we start the protected VM.
If the VM should cease to be protected, the original value is restored.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
---

NOTES:

* Doing more system_resets() is a big hack.  We should look into this.
* The user interface implications of this patch are also an ugly can of
worms. We need to discuss them.


v1 --> v2:
* Use the default or user supplied iommu_on flag when when !PV
* Use virtio functions for feature manipulation

Link to v1:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg683775.html

Unfortunately the v1 did not see much discussion because we had more
pressing issues.

---
 hw/s390x/s390-virtio-ccw.c |  2 ++
 hw/s390x/virtio-ccw.c      | 14 ++++++++++++++
 hw/s390x/virtio-ccw.h      |  1 +
 3 files changed, 17 insertions(+)


base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628

Comments

Halil Pasic May 20, 2020, 12:16 p.m. UTC | #1
On Fri, 15 May 2020 00:11:55 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> The virtio specification tells that the device is to present
> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> device "can only access certain memory addresses with said access
> specified and/or granted by the platform". This is the case for a
> protected VMs, as the device can access only memory addresses that are
> in pages that are currently shared (only the guest can share/unsare its
> pages).
> 
> No VM, however, starts out as a protected VM, but some VMs may be
> converted to protected VMs if the guest decides so.
> 
> Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> the property iommu_on is a minor disaster. Since the correctness of the
> paravirtualized virtio devices depends (and thus in a sense the
> correctness of the hypervisor) it, then the hypervisor should have the
> last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> not.
> 
> Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> device has catastrophic consequences for the VM (after the hypervisors
> access to protected memory). This is especially grave in case of device
> hotplug (because in this case the guest is more likely to be in the
> middle of something important).
> 
> Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically
> for virtio-ccw devices, i.e. force it before we start the protected VM.
> If the VM should cease to be protected, the original value is restored.
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> ---
> 
> NOTES:
> 
> * Doing more system_resets() is a big hack.  We should look into this.
> * The user interface implications of this patch are also an ugly can of
> worms. We need to discuss them.
> 
> 
> v1 --> v2:
> * Use the default or user supplied iommu_on flag when when !PV
> * Use virtio functions for feature manipulation
> 
> Link to v1:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg683775.html
> 
> Unfortunately the v1 did not see much discussion because we had more
> pressing issues.
> 
>

polite ping
Michael S. Tsirkin May 20, 2020, 4:23 p.m. UTC | #2
On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote:
> The virtio specification tells that the device is to present
> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> device "can only access certain memory addresses with said access
> specified and/or granted by the platform". This is the case for a
> protected VMs, as the device can access only memory addresses that are
> in pages that are currently shared (only the guest can share/unsare its
> pages).
> 
> No VM, however, starts out as a protected VM, but some VMs may be
> converted to protected VMs if the guest decides so.
> 
> Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> the property iommu_on is a minor disaster. Since the correctness of the
> paravirtualized virtio devices depends (and thus in a sense the
> correctness of the hypervisor) it, then the hypervisor should have the
> last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> not.

So, how about this: switch iommu to on/off/auto.  Add a property with a
reasonable name "allow protected"?  If set allow switch to protected
memory and also set iommu auto to on by default.  If not set then don't.

This will come handy for other things like migrating to hosts without
protected memory support.


Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename
the property (keeping old one around for compat)?
I feel this will address lots of complaints ...

> Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> device has catastrophic consequences for the VM (after the hypervisors
> access to protected memory). This is especially grave in case of device
> hotplug (because in this case the guest is more likely to be in the
> middle of something important).
> 
> Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically
> for virtio-ccw devices, i.e. force it before we start the protected VM.
> If the VM should cease to be protected, the original value is restored.
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>


I don't really understand things fully but it looks like you are
changing features of a device.  If so this bothers me, resets
happen at random times while driver is active, and we never
expect features to change.


> ---
> 
> NOTES:
> 
> * Doing more system_resets() is a big hack.  We should look into this.
> * The user interface implications of this patch are also an ugly can of
> worms. We need to discuss them.
> 
> 
> v1 --> v2:
> * Use the default or user supplied iommu_on flag when when !PV
> * Use virtio functions for feature manipulation
> 
> Link to v1:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg683775.html
> 
> Unfortunately the v1 did not see much discussion because we had more
> pressing issues.
> 
> ---
>  hw/s390x/s390-virtio-ccw.c |  2 ++
>  hw/s390x/virtio-ccw.c      | 14 ++++++++++++++
>  hw/s390x/virtio-ccw.h      |  1 +
>  3 files changed, 17 insertions(+)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index f660070d22..705e6b153a 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
>      migrate_del_blocker(pv_mig_blocker);
>      error_free_or_abort(&pv_mig_blocker);
>      qemu_balloon_inhibit(false);
> +    subsystem_reset();
>  }
>  
>  static int s390_machine_protect(S390CcwMachineState *ms)
> @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
>      if (rc) {
>          goto out_err;
>      }
> +    subsystem_reset();
>      return rc;
>  
>  out_err:
> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> index 64f928fc7d..67d5bc68ba 100644
> --- a/hw/s390x/virtio-ccw.c
> +++ b/hw/s390x/virtio-ccw.c
> @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d)
>      VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
>      VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
>      VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
> +    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
> +
> +    /*
> +     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
> +     * in PV, has catastrophic consequences for the VM. Let's force
> +     * VIRTIO_F_IOMMU_PLATFORM not already specified.
> +     */
> +    if (ms->pv && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
> +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        dev->forced_iommu_platform = true;
> +    } else if (!ms->pv && dev->forced_iommu_platform) {
> +        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        dev->forced_iommu_platform = false;
> +    }
>  
>      virtio_ccw_reset_virtio(dev, vdev);
>      if (vdc->parent_reset) {
> diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
> index 3453aa1f98..34ff7b0b4e 100644
> --- a/hw/s390x/virtio-ccw.h
> +++ b/hw/s390x/virtio-ccw.h
> @@ -99,6 +99,7 @@ struct VirtioCcwDevice {
>      IndAddr *summary_indicator;
>      uint64_t ind_bit;
>      bool force_revision_1;
> +    bool forced_iommu_platform;
>  };
>  
>  /* The maximum virtio revision we support. */
> 

This seems unused. Why is it helpful?


> base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
> -- 
> 2.17.1
Halil Pasic May 22, 2020, 9:04 p.m. UTC | #3
On Wed, 20 May 2020 12:23:24 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote:
> > The virtio specification tells that the device is to present
> > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > device "can only access certain memory addresses with said access
> > specified and/or granted by the platform". This is the case for a
> > protected VMs, as the device can access only memory addresses that are
> > in pages that are currently shared (only the guest can share/unsare its
> > pages).
> > 
> > No VM, however, starts out as a protected VM, but some VMs may be
> > converted to protected VMs if the guest decides so.
> > 
> > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > the property iommu_on is a minor disaster. Since the correctness of the
> > paravirtualized virtio devices depends (and thus in a sense the
> > correctness of the hypervisor) it, then the hypervisor should have the
> > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> > not.
> 
> So, how about this: switch iommu to on/off/auto.

Many thanks for the reveiw, and sorry about the delay on my side. We
have holidays here in Germany and I was not motivated enough up until
now to check on my mails.


I've actually played  with the thought of switching iommu_platform to 
'on/off/auto', but I didn't find an easy way to do it. I will look
again. This would be the first property of this kind in QEMU, or?

The 'on/off/auto' would be certainly much cleaner form user-interface
perspective. The downsides are that it is more invasive, and more
complicated. I'm afraid that it would also leave more possibilities for
user error.

>  Add a property with a
> reasonable name "allow protected"?  If set allow switch to protected
> memory and also set iommu auto to on by default.  If not set then don't.
>

I think we have "allow protected" already expressed via cpu models. I'm
also not sure how libvirt would react to the idea of a new machine
property for this. You did mean "allow protected" as machine property,
or?

AFAIU "allow protected" would be required for the !PV to PV switch, and
we would have to reject paravirtualized devices with iommu_platform='off'
on VM construction or hotplug (iommu_platform='auto/on' would be fine).

Could you please confirm that I understood this correctly?


> This will come handy for other things like migrating to hosts without
> protected memory support.
> 

This is already covered by cpu model AFAIK.

> 
> Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename
> the property (keeping old one around for compat)?

You mean the like rename 'iommu_platform' to 'platform_access'? I like
the idea, but I'm not sure libvirt will like it as well. Boris any
opinions?

> I feel this will address lots of complaints ...
> 
> > Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> > device has catastrophic consequences for the VM (after the hypervisors
> > access to protected memory). This is especially grave in case of device
> > hotplug (because in this case the guest is more likely to be in the
> > middle of something important).
> > 
> > Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically
> > for virtio-ccw devices, i.e. force it before we start the protected VM.
> > If the VM should cease to be protected, the original value is restored.
> > 
> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> 
> 
> I don't really understand things fully but it looks like you are
> changing features of a device.  If so this bothers me, resets
> happen at random times while driver is active, and we never
> expect features to change.
>

Changing the device features is IMHO all right because the features can
change only immediately after a system reset and before the first vCPU
is run. That is ensured by two facts.


First, the feature can only change when ms->pv changes. That is on the
first reset after the VM entered or left the "protected virtualization"
mode of operation. And that switch requires a system reset. Because the
PV switch is initiated by the guest, and the guest is rebooted as a
consequence, the guest will never observe the change in features.

By the way, when switching between PV and !PV the features of the
cpu (model) also change.

Second,  virtio_ccw_reset() -- the function that is modified -- does
not get called on a reset that is initiated via the transport. We have
virtio_ccw_reset_virtio() for that.

[..]

> >      VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
> > +    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
> > +
> > +    /*
> > +     * An attempt to use a paravirt device without
> > VIRTIO_F_IOMMU_PLATFORM
> > +     * in PV, has catastrophic consequences for the VM. Let's force
> > +     * VIRTIO_F_IOMMU_PLATFORM not already specified.
> > +     */
> > +    if (ms->pv && !virtio_host_has_feature(vdev,
> > VIRTIO_F_IOMMU_PLATFORM)) {
> > +        virtio_add_feature(&vdev->host_features,
> > VIRTIO_F_IOMMU_PLATFORM);
> > +        dev->forced_iommu_platform = true;
> > +    } else if (!ms->pv && dev->forced_iommu_platform) {
> > +        virtio_clear_feature(&vdev->host_features,
> > VIRTIO_F_IOMMU_PLATFORM);
> > +        dev->forced_iommu_platform = false;
> > +    }
> >  
> >      virtio_ccw_reset_virtio(dev, vdev);
> >      if (vdc->parent_reset) {
> > diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
> > index 3453aa1f98..34ff7b0b4e 100644
> > --- a/hw/s390x/virtio-ccw.h
> > +++ b/hw/s390x/virtio-ccw.h
> > @@ -99,6 +99,7 @@ struct VirtioCcwDevice {
> >      IndAddr *summary_indicator;
> >      uint64_t ind_bit;
> >      bool force_revision_1;
> > +    bool forced_iommu_platform;
> >  };
> >  
> >  /* The maximum virtio revision we support. */
> > 
> 
> This seems unused. Why is it helpful?
> 

You mean the "base-commit: SHA-1"?

It is what the --base option of git format-patch generates, and it tells
what exact commit the series is based on. Can be useful when it is not
clear against which git subtree was developed, or for comparatively old
series. It hopefully also indicates what code level was tested by the
developer.


Thanks again!

Regards,
Halil

> 
> > base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
> > -- 
> > 2.17.1
>
Cornelia Huck May 28, 2020, 11:21 a.m. UTC | #4
On Fri, 22 May 2020 23:04:51 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Wed, 20 May 2020 12:23:24 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote:  
> > > The virtio specification tells that the device is to present
> > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > > device "can only access certain memory addresses with said access
> > > specified and/or granted by the platform". This is the case for a
> > > protected VMs, as the device can access only memory addresses that are
> > > in pages that are currently shared (only the guest can share/unsare its
> > > pages).
> > > 
> > > No VM, however, starts out as a protected VM, but some VMs may be
> > > converted to protected VMs if the guest decides so.
> > > 
> > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > > the property iommu_on is a minor disaster. Since the correctness of the
> > > paravirtualized virtio devices depends (and thus in a sense the
> > > correctness of the hypervisor) it, then the hypervisor should have the
> > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> > > not.  
> > 
> > So, how about this: switch iommu to on/off/auto.  
> 
> Many thanks for the reveiw, and sorry about the delay on my side. We
> have holidays here in Germany and I was not motivated enough up until
> now to check on my mails.
> 
> 
> I've actually played  with the thought of switching iommu_platform to 
> 'on/off/auto', but I didn't find an easy way to do it. I will look
> again. This would be the first property of this kind in QEMU, or?

virtio-pci uses it for 'disable-legacy'.

> 
> The 'on/off/auto' would be certainly much cleaner form user-interface
> perspective. The downsides are that it is more invasive, and more
> complicated. I'm afraid that it would also leave more possibilities for
> user error.

To me, on/off/auto sounds like a reasonable thing to do.

What possibilities of 'user error' do you see? Shouldn't we fence off
misconfigurations, if the consequences would be disastrous?

> 
> >  Add a property with a
> > reasonable name "allow protected"?  If set allow switch to protected
> > memory and also set iommu auto to on by default.  If not set then don't.
> >  
> 
> I think we have "allow protected" already expressed via cpu models. I'm
> also not sure how libvirt would react to the idea of a new machine
> property for this. You did mean "allow protected" as machine property,
> or?

"Unpack facility in cpu model" means "guest may transition into pv
mode", right? What does it look like when the guest actually has
transitioned?

> 
> AFAIU "allow protected" would be required for the !PV to PV switch, and
> we would have to reject paravirtualized devices with iommu_platform='off'
> on VM construction or hotplug (iommu_platform='auto/on' would be fine).
> 
> Could you please confirm that I understood this correctly?
> 
> 
> > This will come handy for other things like migrating to hosts without
> > protected memory support.
> >   
> 
> This is already covered by cpu model AFAIK.

I don't think we'd want to migrate between pv and non-pv anyway?

> 
> > 
> > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename
> > the property (keeping old one around for compat)?  
> 
> You mean the like rename 'iommu_platform' to 'platform_access'? I like
> the idea, but I'm not sure libvirt will like it as well. Boris any
> opinions?
> 
> > I feel this will address lots of complaints ...
> >   
> > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> > > device has catastrophic consequences for the VM (after the hypervisors
> > > access to protected memory). This is especially grave in case of device
> > > hotplug (because in this case the guest is more likely to be in the
> > > middle of something important).
> > > 
> > > Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically
> > > for virtio-ccw devices, i.e. force it before we start the protected VM.
> > > If the VM should cease to be protected, the original value is restored.
> > > 
> > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>  
> > 
> > 
> > I don't really understand things fully but it looks like you are
> > changing features of a device.  If so this bothers me, resets
> > happen at random times while driver is active, and we never
> > expect features to change.
> >  
> 
> Changing the device features is IMHO all right because the features can
> change only immediately after a system reset and before the first vCPU
> is run. That is ensured by two facts.
> 
> 
> First, the feature can only change when ms->pv changes. That is on the
> first reset after the VM entered or left the "protected virtualization"
> mode of operation. And that switch requires a system reset. Because the
> PV switch is initiated by the guest, and the guest is rebooted as a
> consequence, the guest will never observe the change in features.

This really needs more comments, as it is not obvious to the casual
reader. (I also stumbled over the resets.)

But I wonder whether we are actually missing those subsystems resets
today?
Janosch Frank May 28, 2020, 2:42 p.m. UTC | #5
On 5/28/20 1:21 PM, Cornelia Huck wrote:
> On Fri, 22 May 2020 23:04:51 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Wed, 20 May 2020 12:23:24 -0400
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>
>>> On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote:  
>>>> The virtio specification tells that the device is to present
>>>> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
>>>> device "can only access certain memory addresses with said access
>>>> specified and/or granted by the platform". This is the case for a
>>>> protected VMs, as the device can access only memory addresses that are
>>>> in pages that are currently shared (only the guest can share/unsare its
>>>> pages).
>>>>
>>>> No VM, however, starts out as a protected VM, but some VMs may be
>>>> converted to protected VMs if the guest decides so.
>>>>
>>>> Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
>>>> the property iommu_on is a minor disaster. Since the correctness of the
>>>> paravirtualized virtio devices depends (and thus in a sense the
>>>> correctness of the hypervisor) it, then the hypervisor should have the
>>>> last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
>>>> not.  
>>>
>>> So, how about this: switch iommu to on/off/auto.  
>>
>> Many thanks for the reveiw, and sorry about the delay on my side. We
>> have holidays here in Germany and I was not motivated enough up until
>> now to check on my mails.
>>
>>
>> I've actually played  with the thought of switching iommu_platform to 
>> 'on/off/auto', but I didn't find an easy way to do it. I will look
>> again. This would be the first property of this kind in QEMU, or?
> 
> virtio-pci uses it for 'disable-legacy'.
> 
>>
>> The 'on/off/auto' would be certainly much cleaner form user-interface
>> perspective. The downsides are that it is more invasive, and more
>> complicated. I'm afraid that it would also leave more possibilities for
>> user error.
> 
> To me, on/off/auto sounds like a reasonable thing to do.
> 
> What possibilities of 'user error' do you see? Shouldn't we fence off
> misconfigurations, if the consequences would be disastrous?
> 
>>
>>>  Add a property with a
>>> reasonable name "allow protected"?  If set allow switch to protected
>>> memory and also set iommu auto to on by default.  If not set then don't.
>>>  
>>
>> I think we have "allow protected" already expressed via cpu models. I'm
>> also not sure how libvirt would react to the idea of a new machine
>> property for this. You did mean "allow protected" as machine property,
>> or?
> 
> "Unpack facility in cpu model" means "guest may transition into pv
> mode", right? What does it look like when the guest actually has
> transitioned?

Well, we don't sync the features that the protected guest has back into
QEMU. So basically the VM doesn't really change except for ms->pv now
being true.



> 
>>
>> AFAIU "allow protected" would be required for the !PV to PV switch, and
>> we would have to reject paravirtualized devices with iommu_platform='off'
>> on VM construction or hotplug (iommu_platform='auto/on' would be fine).
>>
>> Could you please confirm that I understood this correctly?
>>
>>
>>> This will come handy for other things like migrating to hosts without
>>> protected memory support.
>>>   
>>
>> This is already covered by cpu model AFAIK.
> 
> I don't think we'd want to migrate between pv and non-pv anyway?

What exactly do you mean by that?
I'd expect that the VM can either be migrated in PV or non-PV mode and
not in a transition phase.

> 
>>
>>>
>>> Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename
>>> the property (keeping old one around for compat)?  
>>
>> You mean the like rename 'iommu_platform' to 'platform_access'? I like
>> the idea, but I'm not sure libvirt will like it as well. Boris any
>> opinions?
>>
>>> I feel this will address lots of complaints ...
>>>   
>>>> Currently presenting a PV guest with a (paravirtualized) virtio-ccw
>>>> device has catastrophic consequences for the VM (after the hypervisors
>>>> access to protected memory). This is especially grave in case of device
>>>> hotplug (because in this case the guest is more likely to be in the
>>>> middle of something important).
>>>>
>>>> Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically
>>>> for virtio-ccw devices, i.e. force it before we start the protected VM.
>>>> If the VM should cease to be protected, the original value is restored.
>>>>
>>>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>  
>>>
>>>
>>> I don't really understand things fully but it looks like you are
>>> changing features of a device.  If so this bothers me, resets
>>> happen at random times while driver is active, and we never
>>> expect features to change.
>>>  
>>
>> Changing the device features is IMHO all right because the features can
>> change only immediately after a system reset and before the first vCPU
>> is run. That is ensured by two facts.
>>
>>
>> First, the feature can only change when ms->pv changes. That is on the
>> first reset after the VM entered or left the "protected virtualization"
>> mode of operation. And that switch requires a system reset. Because the
>> PV switch is initiated by the guest, and the guest is rebooted as a
>> consequence, the guest will never observe the change in features.
> 
> This really needs more comments, as it is not obvious to the casual
> reader. (I also stumbled over the resets.)
> 
> But I wonder whether we are actually missing those subsystems resets
> today?
> 
>
Halil Pasic May 28, 2020, 5:52 p.m. UTC | #6
On Thu, 28 May 2020 13:21:12 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 22 May 2020 23:04:51 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Wed, 20 May 2020 12:23:24 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
[..]
> > > So, how about this: switch iommu to on/off/auto.  
> > 
> > Many thanks for the reveiw, and sorry about the delay on my side. We
> > have holidays here in Germany and I was not motivated enough up until
> > now to check on my mails.
> > 
> > 
> > I've actually played  with the thought of switching iommu_platform to 
> > 'on/off/auto', but I didn't find an easy way to do it. I will look
> > again. This would be the first property of this kind in QEMU, or?
> 
> virtio-pci uses it for 'disable-legacy'.
> 

Thank you very much! This makes tinging about 'on/off/auto' much easier.

> > 
> > The 'on/off/auto' would be certainly much cleaner form user-interface
> > perspective. The downsides are that it is more invasive, and more
> > complicated. I'm afraid that it would also leave more possibilities for
> > user error.
> 
> To me, on/off/auto sounds like a reasonable thing to do.
> 
> What possibilities of 'user error' do you see? 

I will whip up a prototype first and then come back to you with more
details.

The short answer is if the user isn't very careful about all the whistles
and bells, I understand that the user will end up with a partially or
fully non-PV-compatible VM.

I had an internal bugreport where there was a nic generated by default
that of course did not have iommu_platform='on'.



> Shouldn't we fence off
> misconfigurations, if the consequences would be disastrous?
> 

I fully agree! This is unfortunately currently not the case. My patch
takes the approach of avoiding miss-configuration in the first place,
instead of sapping the user for it.

> > 
> > >  Add a property with a
> > > reasonable name "allow protected"?  If set allow switch to protected
> > > memory and also set iommu auto to on by default.  If not set then don't.
> > >  
> > 
> > I think we have "allow protected" already expressed via cpu models. I'm
> > also not sure how libvirt would react to the idea of a new machine
> > property for this. You did mean "allow protected" as machine property,
> > or?
> 
> "Unpack facility in cpu model" means "guest may transition into pv
> mode", right? What does it look like when the guest actually has
> transitioned?

Janosch has answered these. Will add my thoughts there.

> 
> > 
> > AFAIU "allow protected" would be required for the !PV to PV switch, and
> > we would have to reject paravirtualized devices with iommu_platform='off'
> > on VM construction or hotplug (iommu_platform='auto/on' would be fine).
> > 
> > Could you please confirm that I understood this correctly?
> > 
> > 
> > > This will come handy for other things like migrating to hosts without
> > > protected memory support.
> > >   
> > 
> > This is already covered by cpu model AFAIK.
> 
> I don't think we'd want to migrate between pv and non-pv anyway?
> 

ditto

[..]
> > > 
> > > I don't really understand things fully but it looks like you are
> > > changing features of a device.  If so this bothers me, resets
> > > happen at random times while driver is active, and we never
> > > expect features to change.
> > >  
> > 
> > Changing the device features is IMHO all right because the features can
> > change only immediately after a system reset and before the first vCPU
> > is run. That is ensured by two facts.
> > 
> > 
> > First, the feature can only change when ms->pv changes. That is on the
> > first reset after the VM entered or left the "protected virtualization"
> > mode of operation. And that switch requires a system reset. Because the
> > PV switch is initiated by the guest, and the guest is rebooted as a
> > consequence, the guest will never observe the change in features.
> 
> This really needs more comments, as it is not obvious to the casual
> reader. (I also stumbled over the resets.)

Sorry, where exactly would you like to have those extra comments?

> 
> But I wonder whether we are actually missing those subsystems resets
> today?
> 

If I have to settle for yes or no, my answer is no.

We need at least one subsystem reset during the conversion. Without
my patch applied things look like this

$ git grep -p -B 5 -e subsystem_reset HEAD~1 -- hw/s390x/s390-virtio-ccw.c
HEAD~1:hw/s390x/s390-virtio-ccw.c=static const char *const reset_dev_types[] = {
--
HEAD~1:hw/s390x/s390-virtio-ccw.c-    "s390-sclp-event-facility",
HEAD~1:hw/s390x/s390-virtio-ccw.c-    "s390-flic",
HEAD~1:hw/s390x/s390-virtio-ccw.c-    "diag288",
HEAD~1:hw/s390x/s390-virtio-ccw.c-};
HEAD~1:hw/s390x/s390-virtio-ccw.c-
HEAD~1:hw/s390x/s390-virtio-ccw.c:static void subsystem_reset(void)
--
HEAD~1:hw/s390x/s390-virtio-ccw.c=static void s390_machine_reset(MachineState *machine)
--
HEAD~1:hw/s390x/s390-virtio-ccw.c-    case S390_RESET_MODIFIED_CLEAR:
HEAD~1:hw/s390x/s390-virtio-ccw.c-        /*
HEAD~1:hw/s390x/s390-virtio-ccw.c-         * Susbsystem reset needs to be done before we unshare memory
HEAD~1:hw/s390x/s390-virtio-ccw.c-         * and lose access to VIRTIO structures in guest memory.
HEAD~1:hw/s390x/s390-virtio-ccw.c-         */
HEAD~1:hw/s390x/s390-virtio-ccw.c:        subsystem_reset();
--
HEAD~1:hw/s390x/s390-virtio-ccw.c-    case S390_RESET_LOAD_NORMAL:
HEAD~1:hw/s390x/s390-virtio-ccw.c-        /*
HEAD~1:hw/s390x/s390-virtio-ccw.c-         * Susbsystem reset needs to be done before we unshare memory
HEAD~1:hw/s390x/s390-virtio-ccw.c-         * and lose access to VIRTIO structures in guest memory.
HEAD~1:hw/s390x/s390-virtio-ccw.c-         */
HEAD~1:hw/s390x/s390-virtio-ccw.c:        subsystem_reset();
--
HEAD~1:hw/s390x/s390-virtio-ccw.c-        }
HEAD~1:hw/s390x/s390-virtio-ccw.c-        run_on_cpu(cs, s390_do_cpu_initial_reset, RUN_ON_CPU_NULL);
HEAD~1:hw/s390x/s390-virtio-ccw.c-        run_on_cpu(cs, s390_do_cpu_load_normal, RUN_ON_CPU_NULL);
HEAD~1:hw/s390x/s390-virtio-ccw.c-        break;
HEAD~1:hw/s390x/s390-virtio-ccw.c-    case S390_RESET_PV: /* Subcode 10 */
HEAD~1:hw/s390x/s390-virtio-ccw.c:        subsystem_reset();

That is except for 
hw/s390x/s390-virtio-ccw.c-    case S390_RESET_EXTERNAL:
hw/s390x/s390-virtio-ccw.c-    case S390_RESET_REIPL:
hw/s390x/s390-virtio-ccw.c-        if (s390_is_pv()) {
hw/s390x/s390-virtio-ccw.c-            s390_machine_unprotect(ms);
hw/s390x/s390-virtio-ccw.c-        }
hw/s390x/s390-virtio-ccw.c-
hw/s390x/s390-virtio-ccw.c-        qemu_devices_reset();

Which does a qemu_devices_reset(), we already have a subsystem_reset(),
but for the cases with a PV transition this reset happens before mc->pv
is changed, so I can't react properly in the callback. For my purposes
the qemu_devices_reset() is sufficient, but I'm not sure.

The qemu_devices_reset() seems to come form db3b2566e0 ("s390x: machine
reset function with new ipl cpu handling") authored by David and reviewed by
you.

The subsystem reset from 4e872a3fb0 ("s390: provide I/O subsystem
reset") authored by Christian.

From I quick look, I believe what is done by subsystem_reset() should
be a real subset of what is done by qemu_devices_reset().

Maybe the subsystem_reset() can be just moved and the extra
subsystem_reset() calls added by me can be removed. I didn't look into
that, because it would have been wasted effort if the community rejects this
approach.

I hope this answers your questions!

Thanks for having a look!

Regards,
Halil
Halil Pasic May 28, 2020, 6:49 p.m. UTC | #7
On Thu, 28 May 2020 16:42:49 +0200
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 5/28/20 1:21 PM, Cornelia Huck wrote:
> >> I think we have "allow protected" already expressed via cpu models. I'm
> >> also not sure how libvirt would react to the idea of a new machine
> >> property for this. You did mean "allow protected" as machine property,
> >> or?
> > 
> > "Unpack facility in cpu model" means "guest may transition into pv
> > mode", right? What does it look like when the guest actually has
> > transitioned?
> 
> Well, we don't sync the features that the protected guest has back into
> QEMU. So basically the VM doesn't really change except for ms->pv now
> being true.
> 

The features as observed by the guest do change, some quite drastically,
it is just that the CPU model maintained by QEMU does not change. That
is the changes can not be inspected. 

Unfortunately I'm not very familiar with the details, but my guess is
that
a) the ultravisor does what needs to be done with regards to features
that are obligatory or not prohibited in PV mode.
b) either the initial CPU model determines the CPU model after the
conversion fully, or we will need to express something more via
the QEMU cpu model. But we will have to do a fair amount of work
before we get migration, and I would hate to wait with this until
then.

Important for me is the following. 
1) The user asks for a VM with certain
characteristics including cpu features. E.g. AP and unpack facilities.
2) The specified VM is sane, and gets started.
3) The OS decides to go secure.
4) Certain characteristics of the VM get changed as observed by the OS
(e.g. gains the ability to do uv calls, but also loses stuff).
5) The changes are not reflected via QEMU interfaces.

Compared to this my patch introduces a very similar behavior, in a sense
that the characteristics as observed by the guest change during the
transition, and that in a sense, after the transition the user gets
something different than she has asked for. The differences are that
this change ain't enforced by the ultravisor, and can be inspected
through the QEMU property 'iommu_platform'.

We can IMHO clam that the user opted in for this weird override of
featues with 'unpack' and with DIAG 308 subcode 10. That is what I mean
by 'already expressed': the machine property would be redundant and
add extra complexity. Conny do you agree?

> 
> 
> > 
> >>
> >> AFAIU "allow protected" would be required for the !PV to PV switch, and
> >> we would have to reject paravirtualized devices with iommu_platform='off'
> >> on VM construction or hotplug (iommu_platform='auto/on' would be fine).
> >>
> >> Could you please confirm that I understood this correctly?
> >>
> >>
> >>> This will come handy for other things like migrating to hosts without
> >>> protected memory support.
> >>>   
> >>
> >> This is already covered by cpu model AFAIK.
> > 
> > I don't think we'd want to migrate between pv and non-pv anyway?
> 
> What exactly do you mean by that?
> I'd expect that the VM can either be migrated in PV or non-PV mode and
> not in a transition phase.

> 
I agree. I don't think migrating an in transition VM is practicable.
Currently migration is inhibited. We would probably need to inhibit
migration during transition, and make ms->pv conceptually a part of
the migration state. Both the source and the target would need to do
some things differently if the migration is requested while in PV
mode.

Regards,
Halil
Halil Pasic June 5, 2020, 11:32 p.m. UTC | #8
On Wed, 20 May 2020 12:23:24 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote:
> > The virtio specification tells that the device is to present
> > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > device "can only access certain memory addresses with said access
> > specified and/or granted by the platform". This is the case for a
> > protected VMs, as the device can access only memory addresses that are
> > in pages that are currently shared (only the guest can share/unsare its
> > pages).
> > 
> > No VM, however, starts out as a protected VM, but some VMs may be
> > converted to protected VMs if the guest decides so.
> > 
> > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > the property iommu_on is a minor disaster. Since the correctness of the
> > paravirtualized virtio devices depends (and thus in a sense the
> > correctness of the hypervisor) it, then the hypervisor should have the
> > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> > not.
> 
> So, how about this: switch iommu to on/off/auto.  Add a property with a
> reasonable name "allow protected"?  If set allow switch to protected
> memory and also set iommu auto to on by default.  If not set then don't.
> 
> This will come handy for other things like migrating to hosts without
> protected memory support.
> 
> 
> Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename
> the property (keeping old one around for compat)?
> I feel this will address lots of complaints ...
> 

I'm not sure I've entirely understood your proposal, but I tried to
do something in that direction. 

Short summary of the changes: 
* added new property "access_platform" as on/off/auto (default auto)
* added alias "iommu_platform" for compatibility
* rewrote this patch to on/off/auto so that we only do the override when
  user specified auto 

Let me list some pros and cons (compared to the previous patch):

PRO:
* Thanks to on/off/auto we don't override what the user specified. From 
user interface perspective preferable. I usually hate software that
thinks its than me and can do the opposite I tell it.

CON:
* It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)"
  against "3 files changed, 17 insertions(+)"
* Unlike the previous one, this one is not fool-proof! The user can
  still specify access_platform=off to lets say a hotplug device, and
  bring down the guest. We could however fence such stuff with an error
  message. Would be even more code though.
* As far as I can tell 'auto' was used to pick a value on initialization
  time. This is a novel, and possibly dodgy use in a sense that the value
  of the property may change during the lifetime of the VM.
* We may need to do something about libvirt.

Further improvements are possible and probably necessary if we want
to go down this path. But I would like to verify that first.

----------------------------8<---------------------------------
From: Halil Pasic <pasic@linux.ibm.com>
Date: Wed, 26 Feb 2020 16:48:21 +0100
Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV

The virtio specification tells that the device is to present
VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
device "can only access certain memory addresses with said access
specified and/or granted by the platform". This is the case for a
protected VMs, as the device can access only memory addresses that are
in pages that are currently shared (only the guest can share/unsare its
pages).

No VM, however, starts out as a protected VM, but some VMs may be
converted to protected VMs if the guest decides so.

Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
the property iommu_on is a minor disaster. Since the correctness of the
paravirtualized virtio devices depends (and thus in a sense the
correctness of the hypervisor) it, then the hypervisor should have the
last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
not.

Currently presenting a PV guest with a (paravirtualized) virtio-ccw
device has catastrophic consequences for the VM (after the hypervisors
access to protected memory). This is especially grave in case of device
hotplug (because in this case the guest is more likely to be in the
middle of something important).

Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio
feature automatically. This is accomplished  by turning the property
into an 'on/off/auto' property, and for virtio-ccw devices if auto
was specified forcing its value before  we start the protected VM. If
the VM should cease to be protected, the original value is restored.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
---
 hw/s390x/s390-virtio-ccw.c |  2 ++
 hw/s390x/virtio-ccw.c      | 14 ++++++++++++++
 hw/virtio/virtio.c         | 19 +++++++++++++++++++
 include/hw/virtio/virtio.h |  4 ++--
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index f660070d22..705e6b153a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
     migrate_del_blocker(pv_mig_blocker);
     error_free_or_abort(&pv_mig_blocker);
     qemu_balloon_inhibit(false);
+    subsystem_reset();
 }
 
 static int s390_machine_protect(S390CcwMachineState *ms)
@@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
     if (rc) {
         goto out_err;
     }
+    subsystem_reset();
     return rc;
 
 out_err:
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 64f928fc7d..2bb29b57aa 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d)
     VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
     VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
     VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
+    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
+
+    /*
+     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
+     * in PV, has catastrophic consequences for the VM. Let's force
+     * VIRTIO_F_IOMMU_PLATFORM not already specified.
+     */
+    if (vdev->access_platform_auto && ms->pv) {
+        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
+        vdev->access_platform = ON_OFF_AUTO_ON;
+    } else if (vdev->access_platform_auto) {
+        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
+        vdev->access_platform = ON_OFF_AUTO_OFF;
+    }
 
     virtio_ccw_reset_virtio(dev, vdev);
     if (vdc->parent_reset) {
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index b6c8ef5bc0..f6bd271f14 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data,
 
     object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size,
                             vdev_name, &error_abort, NULL);
+    object_property_add_alias(OBJECT(vdev), "iommu_platform",
+                              OBJECT(vdev), "access_platform", &error_abort);
     qdev_alias_all_properties(vdev, proxy_obj);
+    object_property_add_alias(proxy_obj, "iommu_platform",
+                              OBJECT(vdev), "access_platform", &error_abort);
 }
 
 void virtio_init(VirtIODevice *vdev, const char *name,
@@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    switch (vdev->access_platform) {
+    case ON_OFF_AUTO_ON:
+        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
+        break;
+    case ON_OFF_AUTO_AUTO:
+        /* transport code can mange access_platform */
+        vdev->access_platform_auto = true;
+        break;
+    case ON_OFF_AUTO_OFF: /*fall through*/
+    default:
+        vdev->access_platform_auto = false;
+    }
+
     vdev->listener.commit = virtio_memory_listener_commit;
     memory_listener_register(&vdev->listener, vdev->dma_as);
 }
@@ -3681,6 +3698,8 @@ static Property virtio_properties[] = {
     DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features),
     DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true),
     DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true),
+    DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform,
+                            ON_OFF_AUTO_AUTO),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b69d517496..b77e1545b4 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -110,6 +110,8 @@ struct VirtIODevice
     uint8_t device_endian;
     bool use_guest_notifier_mask;
     AddressSpace *dma_as;
+    OnOffAuto access_platform;
+    bool access_platform_auto;
     QLIST_HEAD(, VirtQueue) *vector_queues;
 };
 
@@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf;
                       VIRTIO_F_NOTIFY_ON_EMPTY, true), \
     DEFINE_PROP_BIT64("any_layout", _state, _field, \
                       VIRTIO_F_ANY_LAYOUT, true), \
-    DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
-                      VIRTIO_F_IOMMU_PLATFORM, false), \
     DEFINE_PROP_BIT64("packed", _state, _field, \
                       VIRTIO_F_RING_PACKED, false)
 

base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
Cornelia Huck June 8, 2020, 4:14 p.m. UTC | #9
On Sat, 6 Jun 2020 01:32:17 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Wed, 20 May 2020 12:23:24 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote:  
> > > The virtio specification tells that the device is to present
> > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > > device "can only access certain memory addresses with said access
> > > specified and/or granted by the platform". This is the case for a
> > > protected VMs, as the device can access only memory addresses that are
> > > in pages that are currently shared (only the guest can share/unsare its
> > > pages).
> > > 
> > > No VM, however, starts out as a protected VM, but some VMs may be
> > > converted to protected VMs if the guest decides so.
> > > 
> > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > > the property iommu_on is a minor disaster. Since the correctness of the
> > > paravirtualized virtio devices depends (and thus in a sense the
> > > correctness of the hypervisor) it, then the hypervisor should have the
> > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> > > not.  
> > 
> > So, how about this: switch iommu to on/off/auto.  Add a property with a
> > reasonable name "allow protected"?  If set allow switch to protected
> > memory and also set iommu auto to on by default.  If not set then don't.
> > 
> > This will come handy for other things like migrating to hosts without
> > protected memory support.
> > 
> > 
> > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename
> > the property (keeping old one around for compat)?
> > I feel this will address lots of complaints ...
> >   
> 
> I'm not sure I've entirely understood your proposal, but I tried to
> do something in that direction. 
> 
> Short summary of the changes: 
> * added new property "access_platform" as on/off/auto (default auto)
> * added alias "iommu_platform" for compatibility
> * rewrote this patch to on/off/auto so that we only do the override when
>   user specified auto 
> 
> Let me list some pros and cons (compared to the previous patch):
> 
> PRO:
> * Thanks to on/off/auto we don't override what the user specified. From 
> user interface perspective preferable. I usually hate software that
> thinks its than me and can do the opposite I tell it.

Agreed.

> 
> CON:
> * It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)"
>   against "3 files changed, 17 insertions(+)"
> * Unlike the previous one, this one is not fool-proof! The user can
>   still specify access_platform=off to lets say a hotplug device, and
>   bring down the guest. We could however fence such stuff with an error
>   message. Would be even more code though.

I think trying to hotplug such a device to a guest running in protected
mode should simply fail (and not crash anything.)

> * As far as I can tell 'auto' was used to pick a value on initialization
>   time. This is a novel, and possibly dodgy use in a sense that the value
>   of the property may change during the lifetime of the VM.

You mean that we start the vm once with support for prot virt, and
later without?

> * We may need to do something about libvirt.

I'm also not 100% sure about migration... would it make sense to
discuss all of this in the context of the cross-arch patchset? It seems
power has similar issues.

> 
> Further improvements are possible and probably necessary if we want
> to go down this path. But I would like to verify that first.
> 
> ----------------------------8<---------------------------------
> From: Halil Pasic <pasic@linux.ibm.com>
> Date: Wed, 26 Feb 2020 16:48:21 +0100
> Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV
> 
> The virtio specification tells that the device is to present
> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> device "can only access certain memory addresses with said access
> specified and/or granted by the platform". This is the case for a
> protected VMs, as the device can access only memory addresses that are
> in pages that are currently shared (only the guest can share/unsare its
> pages).
> 
> No VM, however, starts out as a protected VM, but some VMs may be
> converted to protected VMs if the guest decides so.
> 
> Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> the property iommu_on is a minor disaster. Since the correctness of the
> paravirtualized virtio devices depends (and thus in a sense the
> correctness of the hypervisor) it, then the hypervisor should have the
> last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> not.
> 
> Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> device has catastrophic consequences for the VM (after the hypervisors
> access to protected memory). This is especially grave in case of device
> hotplug (because in this case the guest is more likely to be in the
> middle of something important).

You mean for virtio-ccw devices that don't have iommu_on, right?


> 
> Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio
> feature automatically. This is accomplished  by turning the property
> into an 'on/off/auto' property, and for virtio-ccw devices if auto
> was specified forcing its value before  we start the protected VM. If
> the VM should cease to be protected, the original value is restored.
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> ---
>  hw/s390x/s390-virtio-ccw.c |  2 ++
>  hw/s390x/virtio-ccw.c      | 14 ++++++++++++++
>  hw/virtio/virtio.c         | 19 +++++++++++++++++++
>  include/hw/virtio/virtio.h |  4 ++--
>  4 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index f660070d22..705e6b153a 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
>      migrate_del_blocker(pv_mig_blocker);
>      error_free_or_abort(&pv_mig_blocker);
>      qemu_balloon_inhibit(false);
> +    subsystem_reset();
>  }
>  
>  static int s390_machine_protect(S390CcwMachineState *ms)
> @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
>      if (rc) {
>          goto out_err;
>      }
> +    subsystem_reset();
>      return rc;
>  
>  out_err:
> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> index 64f928fc7d..2bb29b57aa 100644
> --- a/hw/s390x/virtio-ccw.c
> +++ b/hw/s390x/virtio-ccw.c
> @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d)
>      VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
>      VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
>      VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
> +    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
> +
> +    /*
> +     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
> +     * in PV, has catastrophic consequences for the VM. Let's force
> +     * VIRTIO_F_IOMMU_PLATFORM not already specified.
> +     */
> +    if (vdev->access_platform_auto && ms->pv) {
> +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        vdev->access_platform = ON_OFF_AUTO_ON;
> +    } else if (vdev->access_platform_auto) {
> +        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        vdev->access_platform = ON_OFF_AUTO_OFF;
> +    }

If the consequences are so dire, we really should disallow adding a
device of IOMMU_PLATFORM off if pv is on.

(Can we disallow transition to pv if it is off? Maybe with the machine
property approach from the cross-arch patchset?)

>  
>      virtio_ccw_reset_virtio(dev, vdev);
>      if (vdc->parent_reset) {
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index b6c8ef5bc0..f6bd271f14 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data,
>  
>      object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size,
>                              vdev_name, &error_abort, NULL);
> +    object_property_add_alias(OBJECT(vdev), "iommu_platform",
> +                              OBJECT(vdev), "access_platform", &error_abort);
>      qdev_alias_all_properties(vdev, proxy_obj);
> +    object_property_add_alias(proxy_obj, "iommu_platform",
> +                              OBJECT(vdev), "access_platform", &error_abort);
>  }
>  
>  void virtio_init(VirtIODevice *vdev, const char *name,
> @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    switch (vdev->access_platform) {
> +    case ON_OFF_AUTO_ON:
> +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        break;
> +    case ON_OFF_AUTO_AUTO:
> +        /* transport code can mange access_platform */
> +        vdev->access_platform_auto = true;

Can we really make that transport-specific? While ccw implies s390, pci
might be a variety of architectures.

> +        break;
> +    case ON_OFF_AUTO_OFF: /*fall through*/
> +    default:
> +        vdev->access_platform_auto = false;
> +    }
> +
>      vdev->listener.commit = virtio_memory_listener_commit;
>      memory_listener_register(&vdev->listener, vdev->dma_as);
>  }
> @@ -3681,6 +3698,8 @@ static Property virtio_properties[] = {
>      DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features),
>      DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true),
>      DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true),
> +    DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform,
> +                            ON_OFF_AUTO_AUTO),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b69d517496..b77e1545b4 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -110,6 +110,8 @@ struct VirtIODevice
>      uint8_t device_endian;
>      bool use_guest_notifier_mask;
>      AddressSpace *dma_as;
> +    OnOffAuto access_platform;
> +    bool access_platform_auto;
>      QLIST_HEAD(, VirtQueue) *vector_queues;
>  };
>  
> @@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>                        VIRTIO_F_NOTIFY_ON_EMPTY, true), \
>      DEFINE_PROP_BIT64("any_layout", _state, _field, \
>                        VIRTIO_F_ANY_LAYOUT, true), \
> -    DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> -                      VIRTIO_F_IOMMU_PLATFORM, false), \

I'm wondering about migration compat.

>      DEFINE_PROP_BIT64("packed", _state, _field, \
>                        VIRTIO_F_RING_PACKED, false)
>  
> 
> base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
Michael S. Tsirkin June 8, 2020, 4:53 p.m. UTC | #10
On Sat, Jun 06, 2020 at 01:32:17AM +0200, Halil Pasic wrote:
> On Wed, 20 May 2020 12:23:24 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote:
> > > The virtio specification tells that the device is to present
> > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > > device "can only access certain memory addresses with said access
> > > specified and/or granted by the platform". This is the case for a
> > > protected VMs, as the device can access only memory addresses that are
> > > in pages that are currently shared (only the guest can share/unsare its
> > > pages).
> > > 
> > > No VM, however, starts out as a protected VM, but some VMs may be
> > > converted to protected VMs if the guest decides so.
> > > 
> > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > > the property iommu_on is a minor disaster. Since the correctness of the
> > > paravirtualized virtio devices depends (and thus in a sense the
> > > correctness of the hypervisor) it, then the hypervisor should have the
> > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> > > not.
> > 
> > So, how about this: switch iommu to on/off/auto.  Add a property with a
> > reasonable name "allow protected"?  If set allow switch to protected
> > memory and also set iommu auto to on by default.  If not set then don't.
> > 
> > This will come handy for other things like migrating to hosts without
> > protected memory support.
> > 
> > 
> > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename
> > the property (keeping old one around for compat)?
> > I feel this will address lots of complaints ...
> > 
> 
> I'm not sure I've entirely understood your proposal, but I tried to
> do something in that direction. 
> 
> Short summary of the changes: 
> * added new property "access_platform" as on/off/auto (default auto)
> * added alias "iommu_platform" for compatibility
> * rewrote this patch to on/off/auto so that we only do the override when
>   user specified auto 
> 
> Let me list some pros and cons (compared to the previous patch):
> 
> PRO:
> * Thanks to on/off/auto we don't override what the user specified. From 
> user interface perspective preferable. I usually hate software that
> thinks its than me and can do the opposite I tell it.
> 
> CON:
> * It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)"
>   against "3 files changed, 17 insertions(+)"
> * Unlike the previous one, this one is not fool-proof! The user can
>   still specify access_platform=off to lets say a hotplug device, and
>   bring down the guest. We could however fence such stuff with an error
>   message. Would be even more code though.
> * As far as I can tell 'auto' was used to pick a value on initialization
>   time. This is a novel, and possibly dodgy use in a sense that the value
>   of the property may change during the lifetime of the VM.
> * We may need to do something about libvirt.
> 
> Further improvements are possible and probably necessary if we want
> to go down this path. But I would like to verify that first.

I think it should be even simpler. If switching to protected
VM is *allowed* by host then auto should mean on.
No changes of features across reset necessary.



> ----------------------------8<---------------------------------
> From: Halil Pasic <pasic@linux.ibm.com>
> Date: Wed, 26 Feb 2020 16:48:21 +0100
> Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV
> 
> The virtio specification tells that the device is to present
> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> device "can only access certain memory addresses with said access
> specified and/or granted by the platform". This is the case for a
> protected VMs, as the device can access only memory addresses that are
> in pages that are currently shared (only the guest can share/unsare its
> pages).
> 
> No VM, however, starts out as a protected VM, but some VMs may be
> converted to protected VMs if the guest decides so.
> 
> Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> the property iommu_on is a minor disaster. Since the correctness of the
> paravirtualized virtio devices depends (and thus in a sense the
> correctness of the hypervisor) it, then the hypervisor should have the
> last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> not.
> 
> Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> device has catastrophic consequences for the VM (after the hypervisors
> access to protected memory). This is especially grave in case of device
> hotplug (because in this case the guest is more likely to be in the
> middle of something important).
> 
> Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio
> feature automatically. This is accomplished  by turning the property
> into an 'on/off/auto' property, and for virtio-ccw devices if auto
> was specified forcing its value before  we start the protected VM. If
> the VM should cease to be protected, the original value is restored.
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> ---
>  hw/s390x/s390-virtio-ccw.c |  2 ++
>  hw/s390x/virtio-ccw.c      | 14 ++++++++++++++
>  hw/virtio/virtio.c         | 19 +++++++++++++++++++
>  include/hw/virtio/virtio.h |  4 ++--
>  4 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index f660070d22..705e6b153a 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
>      migrate_del_blocker(pv_mig_blocker);
>      error_free_or_abort(&pv_mig_blocker);
>      qemu_balloon_inhibit(false);
> +    subsystem_reset();
>  }
>  
>  static int s390_machine_protect(S390CcwMachineState *ms)
> @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
>      if (rc) {
>          goto out_err;
>      }
> +    subsystem_reset();
>      return rc;
>  
>  out_err:
> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> index 64f928fc7d..2bb29b57aa 100644
> --- a/hw/s390x/virtio-ccw.c
> +++ b/hw/s390x/virtio-ccw.c
> @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d)
>      VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
>      VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
>      VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
> +    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
> +
> +    /*
> +     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
> +     * in PV, has catastrophic consequences for the VM. Let's force
> +     * VIRTIO_F_IOMMU_PLATFORM not already specified.
> +     */
> +    if (vdev->access_platform_auto && ms->pv) {
> +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        vdev->access_platform = ON_OFF_AUTO_ON;
> +    } else if (vdev->access_platform_auto) {
> +        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        vdev->access_platform = ON_OFF_AUTO_OFF;
> +    }
>  
>      virtio_ccw_reset_virtio(dev, vdev);
>      if (vdc->parent_reset) {
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index b6c8ef5bc0..f6bd271f14 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data,
>  
>      object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size,
>                              vdev_name, &error_abort, NULL);
> +    object_property_add_alias(OBJECT(vdev), "iommu_platform",
> +                              OBJECT(vdev), "access_platform", &error_abort);
>      qdev_alias_all_properties(vdev, proxy_obj);
> +    object_property_add_alias(proxy_obj, "iommu_platform",
> +                              OBJECT(vdev), "access_platform", &error_abort);
>  }
>  
>  void virtio_init(VirtIODevice *vdev, const char *name,
> @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    switch (vdev->access_platform) {
> +    case ON_OFF_AUTO_ON:
> +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        break;
> +    case ON_OFF_AUTO_AUTO:
> +        /* transport code can mange access_platform */
> +        vdev->access_platform_auto = true;
> +        break;
> +    case ON_OFF_AUTO_OFF: /*fall through*/
> +    default:
> +        vdev->access_platform_auto = false;
> +    }
> +
>      vdev->listener.commit = virtio_memory_listener_commit;
>      memory_listener_register(&vdev->listener, vdev->dma_as);
>  }
> @@ -3681,6 +3698,8 @@ static Property virtio_properties[] = {
>      DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features),
>      DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true),
>      DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true),
> +    DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform,
> +                            ON_OFF_AUTO_AUTO),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b69d517496..b77e1545b4 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -110,6 +110,8 @@ struct VirtIODevice
>      uint8_t device_endian;
>      bool use_guest_notifier_mask;
>      AddressSpace *dma_as;
> +    OnOffAuto access_platform;
> +    bool access_platform_auto;
>      QLIST_HEAD(, VirtQueue) *vector_queues;
>  };
>  
> @@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>                        VIRTIO_F_NOTIFY_ON_EMPTY, true), \
>      DEFINE_PROP_BIT64("any_layout", _state, _field, \
>                        VIRTIO_F_ANY_LAYOUT, true), \
> -    DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> -                      VIRTIO_F_IOMMU_PLATFORM, false), \
>      DEFINE_PROP_BIT64("packed", _state, _field, \
>                        VIRTIO_F_RING_PACKED, false)
>  
> 
> base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
> -- 
> 2.17.1
> 
>
Halil Pasic June 8, 2020, 5 p.m. UTC | #11
[..]
> > Let me list some pros and cons (compared to the previous patch):
> > 
> > PRO:
> > * Thanks to on/off/auto we don't override what the user specified. From 
> > user interface perspective preferable. I usually hate software that
> > thinks its than me and can do the opposite I tell it.
> 
> Agreed.
> 
> > 
> > CON:
> > * It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)"
> >   against "3 files changed, 17 insertions(+)"
> > * Unlike the previous one, this one is not fool-proof! The user can
> >   still specify access_platform=off to lets say a hotplug device, and
> >   bring down the guest. We could however fence such stuff with an error
> >   message. Would be even more code though.
> 
> I think trying to hotplug such a device to a guest running in protected
> mode should simply fail (and not crash anything.)

Yes, if on/off/auto with a similar semantics like here is the path
we are going to walk, I will definitely have to throw in some code that
fails the hotplug of such devices.

> 
> > * As far as I can tell 'auto' was used to pick a value on initialization
> >   time. This is a novel, and possibly dodgy use in a sense that the value
> >   of the property may change during the lifetime of the VM.
> 
> You mean that we start the vm once with support for prot virt, and
> later without?

No, this patch does not care if VM supports prot virt or not, it only
cares about the mode we are running in (prot virt or not). That is, I
still might add F_ACCESS_PLATFORM when the VM gets transitioned to a
prot virt VM. And this is something other uses of OnOffAuto don't seem
to do. 

> 
> > * We may need to do something about libvirt.
> 
> I'm also not 100% sure about migration... would it make sense to
> discuss all of this in the context of the cross-arch patchset? It seems
> power has similar issues.
> 

I'm going to definitely have a good look at that. What I think special
about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs
to go through ZONE_DMA (this is a problem of the implementation that
stemming form a limitation of the DMA API, upstream didn't let me
fix it). 

> > 
> > Further improvements are possible and probably necessary if we want
> > to go down this path. But I would like to verify that first.
> > 
> > ----------------------------8<---------------------------------
> > From: Halil Pasic <pasic@linux.ibm.com>
> > Date: Wed, 26 Feb 2020 16:48:21 +0100
> > Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV
> > 
> > The virtio specification tells that the device is to present
> > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > device "can only access certain memory addresses with said access
> > specified and/or granted by the platform". This is the case for a
> > protected VMs, as the device can access only memory addresses that are
> > in pages that are currently shared (only the guest can share/unsare its
> > pages).
> > 
> > No VM, however, starts out as a protected VM, but some VMs may be
> > converted to protected VMs if the guest decides so.
> > 
> > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > the property iommu_on is a minor disaster. Since the correctness of the
> > paravirtualized virtio devices depends (and thus in a sense the
> > correctness of the hypervisor) it, then the hypervisor should have the
> > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> > not.
> > 
> > Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> > device has catastrophic consequences for the VM (after the hypervisors
> > access to protected memory). This is especially grave in case of device
> > hotplug (because in this case the guest is more likely to be in the
> > middle of something important).
> 
> You mean for virtio-ccw devices that don't have iommu_on, right? 
> 
> 

Right, I'm missing the most important words.

> > 
> > Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio
> > feature automatically. This is accomplished  by turning the property
> > into an 'on/off/auto' property, and for virtio-ccw devices if auto
> > was specified forcing its value before  we start the protected VM. If
> > the VM should cease to be protected, the original value is restored.
> > 
> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > ---
> >  hw/s390x/s390-virtio-ccw.c |  2 ++
> >  hw/s390x/virtio-ccw.c      | 14 ++++++++++++++
> >  hw/virtio/virtio.c         | 19 +++++++++++++++++++
> >  include/hw/virtio/virtio.h |  4 ++--
> >  4 files changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > index f660070d22..705e6b153a 100644
> > --- a/hw/s390x/s390-virtio-ccw.c
> > +++ b/hw/s390x/s390-virtio-ccw.c
> > @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
> >      migrate_del_blocker(pv_mig_blocker);
> >      error_free_or_abort(&pv_mig_blocker);
> >      qemu_balloon_inhibit(false);
> > +    subsystem_reset();
> >  }
> >  
> >  static int s390_machine_protect(S390CcwMachineState *ms)
> > @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
> >      if (rc) {
> >          goto out_err;
> >      }
> > +    subsystem_reset();
> >      return rc;
> >  
> >  out_err:
> > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> > index 64f928fc7d..2bb29b57aa 100644
> > --- a/hw/s390x/virtio-ccw.c
> > +++ b/hw/s390x/virtio-ccw.c
> > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d)
> >      VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
> >      VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
> >      VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
> > +    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
> > +
> > +    /*
> > +     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
> > +     * in PV, has catastrophic consequences for the VM. Let's force
> > +     * VIRTIO_F_IOMMU_PLATFORM not already specified.
> > +     */
> > +    if (vdev->access_platform_auto && ms->pv) {
> > +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > +        vdev->access_platform = ON_OFF_AUTO_ON;
> > +    } else if (vdev->access_platform_auto) {
> > +        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > +        vdev->access_platform = ON_OFF_AUTO_OFF;
> > +    }
> 
> If the consequences are so dire, we really should disallow adding a
> device of IOMMU_PLATFORM off if pv is on.

I totally agree. My previous patch didn't have the problem because there
we just forced what we need.

> 
> (Can we disallow transition to pv if it is off? Maybe with the machine
> property approach from the cross-arch patchset?)

No we can't disallow transition because for hotplug that already
happened.

I can't say anything about the cross-arch patchset. Will come back to you
once I'm smarter.

> 
> >  
> >      virtio_ccw_reset_virtio(dev, vdev);
> >      if (vdc->parent_reset) {
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index b6c8ef5bc0..f6bd271f14 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data,
> >  
> >      object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size,
> >                              vdev_name, &error_abort, NULL);
> > +    object_property_add_alias(OBJECT(vdev), "iommu_platform",
> > +                              OBJECT(vdev), "access_platform", &error_abort);
> >      qdev_alias_all_properties(vdev, proxy_obj);
> > +    object_property_add_alias(proxy_obj, "iommu_platform",
> > +                              OBJECT(vdev), "access_platform", &error_abort);
> >  }
> >  
> >  void virtio_init(VirtIODevice *vdev, const char *name,
> > @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
> >          return;
> >      }
> >  
> > +    switch (vdev->access_platform) {
> > +    case ON_OFF_AUTO_ON:
> > +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > +        break;
> > +    case ON_OFF_AUTO_AUTO:
> > +        /* transport code can mange access_platform */
> > +        vdev->access_platform_auto = true;
> 
> Can we really make that transport-specific? While ccw implies s390, pci
> might be a variety of architectures.
> 

Right, this is more about the machine than the transport. I was thinking
of a machine hook, but decided to discuss the more basic stuff first
(i.e. is it OK to change the property after stuff is realized).

This would already fix the most pressing issue which is virtio-ccw. I
didn't even bother checking if virtio-pci works with PV out of the box,
or if something needs to be done there. My bet is that it does not work.

> > +        break;
> > +    case ON_OFF_AUTO_OFF: /*fall through*/
> > +    default:
> > +        vdev->access_platform_auto = false;
> > +    }
> > +
> >      vdev->listener.commit = virtio_memory_listener_commit;
> >      memory_listener_register(&vdev->listener, vdev->dma_as);
> >  }
> > @@ -3681,6 +3698,8 @@ static Property virtio_properties[] = {
> >      DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features),
> >      DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true),
> >      DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true),
> > +    DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform,
> > +                            ON_OFF_AUTO_AUTO),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index b69d517496..b77e1545b4 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -110,6 +110,8 @@ struct VirtIODevice
> >      uint8_t device_endian;
> >      bool use_guest_notifier_mask;
> >      AddressSpace *dma_as;
> > +    OnOffAuto access_platform;
> > +    bool access_platform_auto;
> >      QLIST_HEAD(, VirtQueue) *vector_queues;
> >  };
> >  
> > @@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf;
> >                        VIRTIO_F_NOTIFY_ON_EMPTY, true), \
> >      DEFINE_PROP_BIT64("any_layout", _state, _field, \
> >                        VIRTIO_F_ANY_LAYOUT, true), \
> > -    DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> > -                      VIRTIO_F_IOMMU_PLATFORM, false), \
> 
> I'm wondering about migration compat.

Should be fine, I have the alias for that. But if this is the
path we are taking I will definitely test it.

Thanks for having a look and for all the good questions!

Regards,
Halil 


> 
> >      DEFINE_PROP_BIT64("packed", _state, _field, \
> >                        VIRTIO_F_RING_PACKED, false)
> >  
> > 
> > base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
>
Cornelia Huck June 9, 2020, 6:44 a.m. UTC | #12
On Mon, 8 Jun 2020 19:00:45 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:


> > I'm also not 100% sure about migration... would it make sense to
> > discuss all of this in the context of the cross-arch patchset? It seems
> > power has similar issues.
> >   
> 
> I'm going to definitely have a good look at that. What I think special
> about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs
> to go through ZONE_DMA (this is a problem of the implementation that
> stemming form a limitation of the DMA API, upstream didn't let me
> fix it). 

My understanding is that power runs into similar issues, but I don't
know much about power, so I might be entirely wrong :)

> 
> > > 
> > > Further improvements are possible and probably necessary if we want
> > > to go down this path. But I would like to verify that first.
> > > 
> > > ----------------------------8<---------------------------------
> > > From: Halil Pasic <pasic@linux.ibm.com>
> > > Date: Wed, 26 Feb 2020 16:48:21 +0100
> > > Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV
> > > 
> > > The virtio specification tells that the device is to present
> > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > > device "can only access certain memory addresses with said access
> > > specified and/or granted by the platform". This is the case for a
> > > protected VMs, as the device can access only memory addresses that are
> > > in pages that are currently shared (only the guest can share/unsare its
> > > pages).
> > > 
> > > No VM, however, starts out as a protected VM, but some VMs may be
> > > converted to protected VMs if the guest decides so.
> > > 
> > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > > the property iommu_on is a minor disaster. Since the correctness of the
> > > paravirtualized virtio devices depends (and thus in a sense the
> > > correctness of the hypervisor) it, then the hypervisor should have the
> > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or
> > > not.
> > > 
> > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw
> > > device has catastrophic consequences for the VM (after the hypervisors
> > > access to protected memory). This is especially grave in case of device
> > > hotplug (because in this case the guest is more likely to be in the
> > > middle of something important).  
> > 
> > You mean for virtio-ccw devices that don't have iommu_on, right? 
> > 
> >   
> 
> Right, I'm missing the most important words.
> 
> > > 
> > > Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio
> > > feature automatically. This is accomplished  by turning the property
> > > into an 'on/off/auto' property, and for virtio-ccw devices if auto
> > > was specified forcing its value before  we start the protected VM. If
> > > the VM should cease to be protected, the original value is restored.
> > > 
> > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > > ---
> > >  hw/s390x/s390-virtio-ccw.c |  2 ++
> > >  hw/s390x/virtio-ccw.c      | 14 ++++++++++++++
> > >  hw/virtio/virtio.c         | 19 +++++++++++++++++++
> > >  include/hw/virtio/virtio.h |  4 ++--
> > >  4 files changed, 37 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > > index f660070d22..705e6b153a 100644
> > > --- a/hw/s390x/s390-virtio-ccw.c
> > > +++ b/hw/s390x/s390-virtio-ccw.c
> > > @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms)
> > >      migrate_del_blocker(pv_mig_blocker);
> > >      error_free_or_abort(&pv_mig_blocker);
> > >      qemu_balloon_inhibit(false);
> > > +    subsystem_reset();
> > >  }
> > >  
> > >  static int s390_machine_protect(S390CcwMachineState *ms)
> > > @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
> > >      if (rc) {
> > >          goto out_err;
> > >      }
> > > +    subsystem_reset();
> > >      return rc;
> > >  
> > >  out_err:
> > > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> > > index 64f928fc7d..2bb29b57aa 100644
> > > --- a/hw/s390x/virtio-ccw.c
> > > +++ b/hw/s390x/virtio-ccw.c
> > > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d)
> > >      VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
> > >      VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
> > >      VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
> > > +    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
> > > +
> > > +    /*
> > > +     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
> > > +     * in PV, has catastrophic consequences for the VM. Let's force
> > > +     * VIRTIO_F_IOMMU_PLATFORM not already specified.
> > > +     */
> > > +    if (vdev->access_platform_auto && ms->pv) {
> > > +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > > +        vdev->access_platform = ON_OFF_AUTO_ON;
> > > +    } else if (vdev->access_platform_auto) {
> > > +        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > > +        vdev->access_platform = ON_OFF_AUTO_OFF;
> > > +    }  
> > 
> > If the consequences are so dire, we really should disallow adding a
> > device of IOMMU_PLATFORM off if pv is on.  
> 
> I totally agree. My previous patch didn't have the problem because there
> we just forced what we need.
> 
> > 
> > (Can we disallow transition to pv if it is off? Maybe with the machine
> > property approach from the cross-arch patchset?)  
> 
> No we can't disallow transition because for hotplug that already
> happened.

I mean, can a virtio devices without IOMMU_PLATFORM act as a transition
blocker (i.e. an attempt by a guest to move to pv would fail?)

> 
> I can't say anything about the cross-arch patchset. Will come back to you
> once I'm smarter.
> 
> >   
> > >  
> > >      virtio_ccw_reset_virtio(dev, vdev);
> > >      if (vdc->parent_reset) {
> > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > index b6c8ef5bc0..f6bd271f14 100644
> > > --- a/hw/virtio/virtio.c
> > > +++ b/hw/virtio/virtio.c
> > > @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data,
> > >  
> > >      object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size,
> > >                              vdev_name, &error_abort, NULL);
> > > +    object_property_add_alias(OBJECT(vdev), "iommu_platform",
> > > +                              OBJECT(vdev), "access_platform", &error_abort);
> > >      qdev_alias_all_properties(vdev, proxy_obj);
> > > +    object_property_add_alias(proxy_obj, "iommu_platform",
> > > +                              OBJECT(vdev), "access_platform", &error_abort);
> > >  }
> > >  
> > >  void virtio_init(VirtIODevice *vdev, const char *name,
> > > @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
> > >          return;
> > >      }
> > >  
> > > +    switch (vdev->access_platform) {
> > > +    case ON_OFF_AUTO_ON:
> > > +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > > +        break;
> > > +    case ON_OFF_AUTO_AUTO:
> > > +        /* transport code can mange access_platform */
> > > +        vdev->access_platform_auto = true;  
> > 
> > Can we really make that transport-specific? While ccw implies s390, pci
> > might be a variety of architectures.
> >   
> 
> Right, this is more about the machine than the transport. I was thinking
> of a machine hook, but decided to discuss the more basic stuff first
> (i.e. is it OK to change the property after stuff is realized).
> 
> This would already fix the most pressing issue which is virtio-ccw. I
> didn't even bother checking if virtio-pci works with PV out of the box,
> or if something needs to be done there. My bet is that it does not work.

I agree, virtio-pci + pv is unlikely to work. But if at all possible,
I'd prefer a general solution anyway, as other architectures care about
virtio-pci.
Halil Pasic June 9, 2020, 9:41 a.m. UTC | #13
On Tue, 9 Jun 2020 08:44:02 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 8 Jun 2020 19:00:45 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> 
> > > I'm also not 100% sure about migration... would it make sense to
> > > discuss all of this in the context of the cross-arch patchset? It seems
> > > power has similar issues.
> > >   
> > 
> > I'm going to definitely have a good look at that. What I think special
> > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs
> > to go through ZONE_DMA (this is a problem of the implementation that
> > stemming form a limitation of the DMA API, upstream didn't let me
> > fix it). 
> 
> My understanding is that power runs into similar issues, but I don't
> know much about power, so I might be entirely wrong :)
> 

I will come back to you once I've figured that patch-set out.

[..]

> > > > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d)
> > > >      VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
> > > >      VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
> > > >      VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
> > > > +    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
> > > > +
> > > > +    /*
> > > > +     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
> > > > +     * in PV, has catastrophic consequences for the VM. Let's force
> > > > +     * VIRTIO_F_IOMMU_PLATFORM not already specified.
> > > > +     */
> > > > +    if (vdev->access_platform_auto && ms->pv) {
> > > > +        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > > > +        vdev->access_platform = ON_OFF_AUTO_ON;
> > > > +    } else if (vdev->access_platform_auto) {
> > > > +        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > > > +        vdev->access_platform = ON_OFF_AUTO_OFF;
> > > > +    }  
> > > 
> > > If the consequences are so dire, we really should disallow adding a
> > > device of IOMMU_PLATFORM off if pv is on.  
> > 
> > I totally agree. My previous patch didn't have the problem because there
> > we just forced what we need.
> > 
> > > 
> > > (Can we disallow transition to pv if it is off? Maybe with the machine
> > > property approach from the cross-arch patchset?)  
> > 
> > No we can't disallow transition because for hotplug that already
> > happened.
> 
> I mean, can a virtio devices without IOMMU_PLATFORM act as a transition
> blocker (i.e. an attempt by a guest to move to pv would fail?)
>

I don't know. Janosch could answer that, but he is on vacation. Adding
Claudio maybe he can answer. My understanding is, that while it might be
possible, it is ugly at best. The ability to do a transition is
indicated by a CPU model feature. Indicating the feature to the guest and
then failing the transition sounds wrong to me.
 
[..]

> > 
> > Right, this is more about the machine than the transport. I was thinking
> > of a machine hook, but decided to discuss the more basic stuff first
> > (i.e. is it OK to change the property after stuff is realized).
> > 
> > This would already fix the most pressing issue which is virtio-ccw. I
> > didn't even bother checking if virtio-pci works with PV out of the box,
> > or if something needs to be done there. My bet is that it does not work.
> 
> I agree, virtio-pci + pv is unlikely to work. But if at all possible,
> I'd prefer a general solution anyway, as other architectures care about
> virtio-pci.
> 
> 

I'm with you. I hoped to get changing features on the fly approved first
before setting out to solve this. But I will look into it.

Regards,
Halil
Pierre Morel June 9, 2020, 2:02 p.m. UTC | #14
On 2020-06-09 11:41, Halil Pasic wrote:
> On Tue, 9 Jun 2020 08:44:02 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Mon, 8 Jun 2020 19:00:45 +0200
>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>>

...snip...

>>
>> I mean, can a virtio devices without IOMMU_PLATFORM act as a transition
>> blocker (i.e. an attempt by a guest to move to pv would fail?)
>>
> 
> I don't know. Janosch could answer that, but he is on vacation. Adding
> Claudio maybe he can answer. My understanding is, that while it might be
> possible, it is ugly at best. The ability to do a transition is
> indicated by a CPU model feature. Indicating the feature to the guest and
> then failing the transition sounds wrong to me.


The guest image is encrypted and the transition to PV is done by the 
stage 3 boot loader, part of the loaded image, before Linux is started.
At this moment the guest is not aware of the virtio devices.

Regards,
Pierre
Claudio Imbrenda June 9, 2020, 3:47 p.m. UTC | #15
On Tue, 9 Jun 2020 11:41:30 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

[...]

> I don't know. Janosch could answer that, but he is on vacation. Adding
> Claudio maybe he can answer. My understanding is, that while it might
> be possible, it is ugly at best. The ability to do a transition is
> indicated by a CPU model feature. Indicating the feature to the guest
> and then failing the transition sounds wrong to me.

I agree. If the feature is advertised, then it has to work. I don't
think we even have an architected way to fail the transition for that
reason.

What __could__ be done is to prevent qemu from even starting if an
incompatible device is specified together with PV.

Another option is to disable PV at the qemu level if an incompatible
device is present. This will have the effect that trying to boot a
secure guest will fail mysteriously, which is IMHO also not too great.

do we really have that many incompatible devices?
Cornelia Huck June 9, 2020, 4:05 p.m. UTC | #16
On Tue, 9 Jun 2020 17:47:47 +0200
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> On Tue, 9 Jun 2020 11:41:30 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> [...]
> 
> > I don't know. Janosch could answer that, but he is on vacation. Adding
> > Claudio maybe he can answer. My understanding is, that while it might
> > be possible, it is ugly at best. The ability to do a transition is
> > indicated by a CPU model feature. Indicating the feature to the guest
> > and then failing the transition sounds wrong to me.  
> 
> I agree. If the feature is advertised, then it has to work. I don't
> think we even have an architected way to fail the transition for that
> reason.
> 
> What __could__ be done is to prevent qemu from even starting if an
> incompatible device is specified together with PV.

Seems reasonable, if an incompatible device can crash the whole guest.
Better not even let it start. (And prevent hotplugging it into a
running guest.)

> 
> Another option is to disable PV at the qemu level if an incompatible
> device is present. This will have the effect that trying to boot a
> secure guest will fail mysteriously, which is IMHO also not too great.

Yes, if that is not architected, and no other possible failure code can
map to that case.

> 
> do we really have that many incompatible devices?

Which devices are compatible in the end? It seems the only ones that
are known to be working are virtio-ccw devices with IOMMU_PLATFORM on.
virtio-pci devices and non-virtio ccw (vfio-ccw, 3270) seem to be out,
as far as I understand it. What about non-ccw? PCI in general, vfio-ap?
Halil Pasic June 9, 2020, 4:28 p.m. UTC | #17
On Tue, 9 Jun 2020 17:47:47 +0200
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> On Tue, 9 Jun 2020 11:41:30 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> [...]
> 
> > I don't know. Janosch could answer that, but he is on vacation. Adding
> > Claudio maybe he can answer. My understanding is, that while it might
> > be possible, it is ugly at best. The ability to do a transition is
> > indicated by a CPU model feature. Indicating the feature to the guest
> > and then failing the transition sounds wrong to me.
> 
> I agree. If the feature is advertised, then it has to work. I don't
> think we even have an architected way to fail the transition for that
> reason.
> 
> What __could__ be done is to prevent qemu from even starting if an
> incompatible device is specified together with PV.

AFAIU, the "specified together with PV" is the problem here. Currently
we don't "specify PV" but PV is just a capability that is managed by the
CPU model (like so many other). I.e. the fact that the
visualization environment is capable providing PV (unpack facility
available), and the fact, that the end user didn't fence the unpack
facility, does not mean, the user is dead set to use PV.

My understanding is, that we want PV to just work, without having to
put together a peculiar VM definition that says: this is going to be
used as a PV VM.

> 
> Another option is to disable PV at the qemu level if an incompatible
> device is present. This will have the effect that trying to boot a
> secure guest will fail mysteriously, which is IMHO also not too great.
> 

This would contradict with if feature is advertised, then it has to work
or?

> do we really have that many incompatible devices?
>
Michael S. Tsirkin June 9, 2020, 4:41 p.m. UTC | #18
On Tue, Jun 09, 2020 at 05:47:47PM +0200, Claudio Imbrenda wrote:
> On Tue, 9 Jun 2020 11:41:30 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> [...]
> 
> > I don't know. Janosch could answer that, but he is on vacation. Adding
> > Claudio maybe he can answer. My understanding is, that while it might
> > be possible, it is ugly at best. The ability to do a transition is
> > indicated by a CPU model feature. Indicating the feature to the guest
> > and then failing the transition sounds wrong to me.
> 
> I agree. If the feature is advertised, then it has to work. I don't
> think we even have an architected way to fail the transition for that
> reason.

So my suggestion was basically a flag that sets both the
CPU model feature and the virtio feature.



> What __could__ be done is to prevent qemu from even starting if an
> incompatible device is specified together with PV.
>
> Another option is to disable PV at the qemu level if an incompatible
> device is present. This will have the effect that trying to boot a
> secure guest will fail mysteriously, which is IMHO also not too great.
> 
> do we really have that many incompatible devices?
>
Halil Pasic June 9, 2020, 4:41 p.m. UTC | #19
On Tue, 9 Jun 2020 18:05:59 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> > 
> > do we really have that many incompatible devices?  
> 
> Which devices are compatible in the end? It seems the only ones that
> are known to be working are virtio-ccw devices with IOMMU_PLATFORM on.
> virtio-pci devices and non-virtio ccw (vfio-ccw, 3270) seem to be out,
> as far as I understand it. What about non-ccw? PCI in general, vfio-ap?

AFAIU the situation is somewhat complicated. Only virtio devices have
the notion of indicating and the duty to indicate access restrictions.
We as hypervisor need to prevent the inconsistency where:
* the VM is prot virt
* the guest can detect that it is running with prot virt using the UV
  call interface
* and the virtio devices emulated by us (QEMU) lie that memory access
  by the device is not restricted (F_ACCESS_PLATFORM not offered).

It is unfortunate that the consequences are this severe, but it is the
responsibility to drive the devices accordingly if prot virt is
detected. If the guest does this, then s270 and vfio-ccw should work. The
problem is, that currently Linux is verifiedly doing the thing only for
virtio-ccw.

Regards,
Halil
Michael S. Tsirkin June 9, 2020, 4:44 p.m. UTC | #20
On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> On Tue, 9 Jun 2020 17:47:47 +0200
> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
> > On Tue, 9 Jun 2020 11:41:30 +0200
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> > 
> > [...]
> > 
> > > I don't know. Janosch could answer that, but he is on vacation. Adding
> > > Claudio maybe he can answer. My understanding is, that while it might
> > > be possible, it is ugly at best. The ability to do a transition is
> > > indicated by a CPU model feature. Indicating the feature to the guest
> > > and then failing the transition sounds wrong to me.
> > 
> > I agree. If the feature is advertised, then it has to work. I don't
> > think we even have an architected way to fail the transition for that
> > reason.
> > 
> > What __could__ be done is to prevent qemu from even starting if an
> > incompatible device is specified together with PV.
> 
> AFAIU, the "specified together with PV" is the problem here. Currently
> we don't "specify PV" but PV is just a capability that is managed by the
> CPU model (like so many other).

So if we want to keep it user friendly, there could be
protection property with values on/off/auto, and auto
would poke at host capability to figure out whether
it's supported.

Both virtio and CPU would inherit from that.

This will allow other useful features such as ability
to hide PV from guest, which could in turn be handy e.g.
to allow migration to hosts without PV support,
or if host wants to force ability to read guest memory
e.g. for security.
David Gibson June 10, 2020, 4:25 a.m. UTC | #21
On Tue, Jun 09, 2020 at 08:44:02AM +0200, Cornelia Huck wrote:
> On Mon, 8 Jun 2020 19:00:45 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> 
> > > I'm also not 100% sure about migration... would it make sense to
> > > discuss all of this in the context of the cross-arch patchset? It seems
> > > power has similar issues.
> > >   
> > 
> > I'm going to definitely have a good look at that. What I think special
> > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs
> > to go through ZONE_DMA (this is a problem of the implementation that
> > stemming form a limitation of the DMA API, upstream didn't let me
> > fix it). 
> 
> My understanding is that power runs into similar issues, but I don't
> know much about power, so I might be entirely wrong :)

Sort of, but not to the same extent, I think.

[snip]
> > > (Can we disallow transition to pv if it is off? Maybe with the machine
> > > property approach from the cross-arch patchset?)  
> > 
> > No we can't disallow transition because for hotplug that already
> > happened.
> 
> I mean, can a virtio devices without IOMMU_PLATFORM act as a transition
> blocker (i.e. an attempt by a guest to move to pv would fail?)

This might be possible, but I suspect having to explicitly say you
want pv support, then having it validate virtio (and anything else it
needs) will give a better UX.
David Gibson June 10, 2020, 4:29 a.m. UTC | #22
On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> On Tue, 9 Jun 2020 17:47:47 +0200
> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> 
> > On Tue, 9 Jun 2020 11:41:30 +0200
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> > 
> > [...]
> > 
> > > I don't know. Janosch could answer that, but he is on vacation. Adding
> > > Claudio maybe he can answer. My understanding is, that while it might
> > > be possible, it is ugly at best. The ability to do a transition is
> > > indicated by a CPU model feature. Indicating the feature to the guest
> > > and then failing the transition sounds wrong to me.
> > 
> > I agree. If the feature is advertised, then it has to work. I don't
> > think we even have an architected way to fail the transition for that
> > reason.
> > 
> > What __could__ be done is to prevent qemu from even starting if an
> > incompatible device is specified together with PV.
> 
> AFAIU, the "specified together with PV" is the problem here. Currently
> we don't "specify PV" but PV is just a capability that is managed by the
> CPU model (like so many other). I.e. the fact that the
> visualization environment is capable providing PV (unpack facility
> available), and the fact, that the end user didn't fence the unpack
> facility, does not mean, the user is dead set to use PV.
> 
> My understanding is, that we want PV to just work, without having to
> put together a peculiar VM definition that says: this is going to be
> used as a PV VM.

Having had a similar discussion for POWER, I no longer think this is a
wise model.  I think we want to have an explicit "allow PV" option -
but we do want it to be a *single* option, rather than having to
change configuration of a whole bunch of places.

My intention is for my 'host-trust-limitation' series to accomplish
that.
David Gibson June 10, 2020, 4:31 a.m. UTC | #23
On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> > On Tue, 9 Jun 2020 17:47:47 +0200
> > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> > 
> > > On Tue, 9 Jun 2020 11:41:30 +0200
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > 
> > > [...]
> > > 
> > > > I don't know. Janosch could answer that, but he is on vacation. Adding
> > > > Claudio maybe he can answer. My understanding is, that while it might
> > > > be possible, it is ugly at best. The ability to do a transition is
> > > > indicated by a CPU model feature. Indicating the feature to the guest
> > > > and then failing the transition sounds wrong to me.
> > > 
> > > I agree. If the feature is advertised, then it has to work. I don't
> > > think we even have an architected way to fail the transition for that
> > > reason.
> > > 
> > > What __could__ be done is to prevent qemu from even starting if an
> > > incompatible device is specified together with PV.
> > 
> > AFAIU, the "specified together with PV" is the problem here. Currently
> > we don't "specify PV" but PV is just a capability that is managed by the
> > CPU model (like so many other).
> 
> So if we want to keep it user friendly, there could be
> protection property with values on/off/auto, and auto
> would poke at host capability to figure out whether
> it's supported.
> 
> Both virtio and CPU would inherit from that.

Right, that's what I have in mind for my 'host-trust-limitation'
property (a generalized version of the existing 'memory-encryption'
machine option).  My draft patches already set virtio properties
accordingly, it should be possible to set (default) cpu properties as
well.

> This will allow other useful features such as ability
> to hide PV from guest, which could in turn be handy e.g.
> to allow migration to hosts without PV support,
> or if host wants to force ability to read guest memory
> e.g. for security.
> 
>
David Hildenbrand June 10, 2020, 7:22 a.m. UTC | #24
On 10.06.20 06:31, David Gibson wrote:
> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
>>> On Tue, 9 Jun 2020 17:47:47 +0200
>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
>>>
>>>> On Tue, 9 Jun 2020 11:41:30 +0200
>>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>>
>>>> [...]
>>>>
>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding
>>>>> Claudio maybe he can answer. My understanding is, that while it might
>>>>> be possible, it is ugly at best. The ability to do a transition is
>>>>> indicated by a CPU model feature. Indicating the feature to the guest
>>>>> and then failing the transition sounds wrong to me.
>>>>
>>>> I agree. If the feature is advertised, then it has to work. I don't
>>>> think we even have an architected way to fail the transition for that
>>>> reason.
>>>>
>>>> What __could__ be done is to prevent qemu from even starting if an
>>>> incompatible device is specified together with PV.
>>>
>>> AFAIU, the "specified together with PV" is the problem here. Currently
>>> we don't "specify PV" but PV is just a capability that is managed by the
>>> CPU model (like so many other).
>>
>> So if we want to keep it user friendly, there could be
>> protection property with values on/off/auto, and auto
>> would poke at host capability to figure out whether
>> it's supported.
>>
>> Both virtio and CPU would inherit from that.
> 
> Right, that's what I have in mind for my 'host-trust-limitation'
> property (a generalized version of the existing 'memory-encryption'
> machine option).  My draft patches already set virtio properties
> accordingly, it should be possible to set (default) cpu properties as
> well.

No crazy CPU model hacks please (at least speaking for the s390x).
David Gibson June 10, 2020, 10:07 a.m. UTC | #25
On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote:
> On 10.06.20 06:31, David Gibson wrote:
> > On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
> >> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> >>> On Tue, 9 Jun 2020 17:47:47 +0200
> >>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> >>>
> >>>> On Tue, 9 Jun 2020 11:41:30 +0200
> >>>> Halil Pasic <pasic@linux.ibm.com> wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>> I don't know. Janosch could answer that, but he is on vacation. Adding
> >>>>> Claudio maybe he can answer. My understanding is, that while it might
> >>>>> be possible, it is ugly at best. The ability to do a transition is
> >>>>> indicated by a CPU model feature. Indicating the feature to the guest
> >>>>> and then failing the transition sounds wrong to me.
> >>>>
> >>>> I agree. If the feature is advertised, then it has to work. I don't
> >>>> think we even have an architected way to fail the transition for that
> >>>> reason.
> >>>>
> >>>> What __could__ be done is to prevent qemu from even starting if an
> >>>> incompatible device is specified together with PV.
> >>>
> >>> AFAIU, the "specified together with PV" is the problem here. Currently
> >>> we don't "specify PV" but PV is just a capability that is managed by the
> >>> CPU model (like so many other).
> >>
> >> So if we want to keep it user friendly, there could be
> >> protection property with values on/off/auto, and auto
> >> would poke at host capability to figure out whether
> >> it's supported.
> >>
> >> Both virtio and CPU would inherit from that.
> > 
> > Right, that's what I have in mind for my 'host-trust-limitation'
> > property (a generalized version of the existing 'memory-encryption'
> > machine option).  My draft patches already set virtio properties
> > accordingly, it should be possible to set (default) cpu properties as
> > well.
> 
> No crazy CPU model hacks please (at least speaking for the s390x).

Uh... I'm not really sure what you have in mind here.
David Hildenbrand June 10, 2020, 10:24 a.m. UTC | #26
On 10.06.20 12:07, David Gibson wrote:
> On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote:
>> On 10.06.20 06:31, David Gibson wrote:
>>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
>>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
>>>>> On Tue, 9 Jun 2020 17:47:47 +0200
>>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
>>>>>
>>>>>> On Tue, 9 Jun 2020 11:41:30 +0200
>>>>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding
>>>>>>> Claudio maybe he can answer. My understanding is, that while it might
>>>>>>> be possible, it is ugly at best. The ability to do a transition is
>>>>>>> indicated by a CPU model feature. Indicating the feature to the guest
>>>>>>> and then failing the transition sounds wrong to me.
>>>>>>
>>>>>> I agree. If the feature is advertised, then it has to work. I don't
>>>>>> think we even have an architected way to fail the transition for that
>>>>>> reason.
>>>>>>
>>>>>> What __could__ be done is to prevent qemu from even starting if an
>>>>>> incompatible device is specified together with PV.
>>>>>
>>>>> AFAIU, the "specified together with PV" is the problem here. Currently
>>>>> we don't "specify PV" but PV is just a capability that is managed by the
>>>>> CPU model (like so many other).
>>>>
>>>> So if we want to keep it user friendly, there could be
>>>> protection property with values on/off/auto, and auto
>>>> would poke at host capability to figure out whether
>>>> it's supported.
>>>>
>>>> Both virtio and CPU would inherit from that.
>>>
>>> Right, that's what I have in mind for my 'host-trust-limitation'
>>> property (a generalized version of the existing 'memory-encryption'
>>> machine option).  My draft patches already set virtio properties
>>> accordingly, it should be possible to set (default) cpu properties as
>>> well.
>>
>> No crazy CPU model hacks please (at least speaking for the s390x).
> 
> Uh... I'm not really sure what you have in mind here.
> 

Reading along I got the impression that we want to glue the availability
of CPU features to other QEMU cmdline parameters (besides the
accelerator). ("to set (default) cpu properties as well"). If we are
talking about other CPU properties not expressed as CPU features (e.g.,
-cpu X,Y=on ...), then there is no issue.
Halil Pasic June 10, 2020, 1 p.m. UTC | #27
On Wed, 10 Jun 2020 12:24:14 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 10.06.20 12:07, David Gibson wrote:
> > On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote:
> >> On 10.06.20 06:31, David Gibson wrote:
> >>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
> >>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> >>>>> On Tue, 9 Jun 2020 17:47:47 +0200
> >>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> >>>>>
> >>>>>> On Tue, 9 Jun 2020 11:41:30 +0200
> >>>>>> Halil Pasic <pasic@linux.ibm.com> wrote:
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding
> >>>>>>> Claudio maybe he can answer. My understanding is, that while it might
> >>>>>>> be possible, it is ugly at best. The ability to do a transition is
> >>>>>>> indicated by a CPU model feature. Indicating the feature to the guest
> >>>>>>> and then failing the transition sounds wrong to me.
> >>>>>>
> >>>>>> I agree. If the feature is advertised, then it has to work. I don't
> >>>>>> think we even have an architected way to fail the transition for that
> >>>>>> reason.
> >>>>>>
> >>>>>> What __could__ be done is to prevent qemu from even starting if an
> >>>>>> incompatible device is specified together with PV.
> >>>>>
> >>>>> AFAIU, the "specified together with PV" is the problem here. Currently
> >>>>> we don't "specify PV" but PV is just a capability that is managed by the
> >>>>> CPU model (like so many other).
> >>>>
> >>>> So if we want to keep it user friendly, there could be
> >>>> protection property with values on/off/auto, and auto
> >>>> would poke at host capability to figure out whether
> >>>> it's supported.
> >>>>
> >>>> Both virtio and CPU would inherit from that.
> >>>
> >>> Right, that's what I have in mind for my 'host-trust-limitation'
> >>> property (a generalized version of the existing 'memory-encryption'
> >>> machine option).  My draft patches already set virtio properties
> >>> accordingly, it should be possible to set (default) cpu properties as
> >>> well.
> >>
> >> No crazy CPU model hacks please (at least speaking for the s390x).
> > 
> > Uh... I'm not really sure what you have in mind here.
> > 
> 
> Reading along I got the impression that we want to glue the availability
> of CPU features to other QEMU cmdline parameters (besides the
> accelerator). ("to set (default) cpu properties as well"). If we are
> talking about other CPU properties not expressed as CPU features (e.g.,
> -cpu X,Y=on ...), then there is no issue.
> 

I share the concerns broght forward by David.
Halil Pasic June 10, 2020, 1:15 p.m. UTC | #28
On Tue, 9 Jun 2020 12:44:39 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> > On Tue, 9 Jun 2020 17:47:47 +0200
> > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> > 
> > > On Tue, 9 Jun 2020 11:41:30 +0200
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > 
> > > [...]
> > > 
> > > > I don't know. Janosch could answer that, but he is on vacation. Adding
> > > > Claudio maybe he can answer. My understanding is, that while it might
> > > > be possible, it is ugly at best. The ability to do a transition is
> > > > indicated by a CPU model feature. Indicating the feature to the guest
> > > > and then failing the transition sounds wrong to me.
> > > 
> > > I agree. If the feature is advertised, then it has to work. I don't
> > > think we even have an architected way to fail the transition for that
> > > reason.
> > > 
> > > What __could__ be done is to prevent qemu from even starting if an
> > > incompatible device is specified together with PV.
> > 
> > AFAIU, the "specified together with PV" is the problem here. Currently
> > we don't "specify PV" but PV is just a capability that is managed by the
> > CPU model (like so many other).
> 
> So if we want to keep it user friendly, there could be
> protection property with values on/off/auto, and auto
> would poke at host capability to figure out whether
> it's supported.
> 
> Both virtio and CPU would inherit from that.
> 
> This will allow other useful features such as ability
> to hide PV from guest, which could in turn be handy e.g.
> to allow migration to hosts without PV support,
> or if host wants to force ability to read guest memory
> e.g. for security.
> 
> 

We already have the ability to "hide PV from guest". One 
just needs to specify that the unpack facility is not ought to be
included in the CPU model. E.g. something like -cpu host,unpack=off.

Regards,
Halil
Viktor Mihajlovski June 10, 2020, 1:19 p.m. UTC | #29
On 6/10/20 12:24 PM, David Hildenbrand wrote:
> On 10.06.20 12:07, David Gibson wrote:
>> On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote:
>>> On 10.06.20 06:31, David Gibson wrote:
>>>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
>>>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
>>>>>> On Tue, 9 Jun 2020 17:47:47 +0200
>>>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
>>>>>>
>>>>>>> On Tue, 9 Jun 2020 11:41:30 +0200
>>>>>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding
>>>>>>>> Claudio maybe he can answer. My understanding is, that while it might
>>>>>>>> be possible, it is ugly at best. The ability to do a transition is
>>>>>>>> indicated by a CPU model feature. Indicating the feature to the guest
>>>>>>>> and then failing the transition sounds wrong to me.
>>>>>>>
>>>>>>> I agree. If the feature is advertised, then it has to work. I don't
>>>>>>> think we even have an architected way to fail the transition for that
>>>>>>> reason.
>>>>>>>
>>>>>>> What __could__ be done is to prevent qemu from even starting if an
>>>>>>> incompatible device is specified together with PV.
>>>>>>
>>>>>> AFAIU, the "specified together with PV" is the problem here. Currently
>>>>>> we don't "specify PV" but PV is just a capability that is managed by the
>>>>>> CPU model (like so many other).
>>>>>
>>>>> So if we want to keep it user friendly, there could be
>>>>> protection property with values on/off/auto, and auto
>>>>> would poke at host capability to figure out whether
>>>>> it's supported.
>>>>>
>>>>> Both virtio and CPU would inherit from that.
>>>>
>>>> Right, that's what I have in mind for my 'host-trust-limitation'
>>>> property (a generalized version of the existing 'memory-encryption'
>>>> machine option).  My draft patches already set virtio properties
>>>> accordingly, it should be possible to set (default) cpu properties as
>>>> well.
>>>
>>> No crazy CPU model hacks please (at least speaking for the s390x).
>>
>> Uh... I'm not really sure what you have in mind here.
>>
> 
> Reading along I got the impression that we want to glue the availability
> of CPU features to other QEMU cmdline parameters (besides the
> accelerator). ("to set (default) cpu properties as well"). If we are
> talking about other CPU properties not expressed as CPU features (e.g.,
> -cpu X,Y=on ...), then there is no issue.
> 

The reason that the capability to run in PV mode is expressed in the CPU
model is that this capability *is* provided by the CPU in terms of
available instructions. I wouldn't see a benefit in providing
a meta-property that needs to be synced with the CPU model.

So, if something has to be concluded from the fact that a VM
could run in PV mode, that decision should be derived from the
CPU model.
Halil Pasic June 10, 2020, 1:34 p.m. UTC | #30
On Tue, 9 Jun 2020 18:05:59 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> Which devices are compatible in the end? It seems the only ones that
> are known to be working are virtio-ccw devices with IOMMU_PLATFORM on.
> virtio-pci devices and non-virtio ccw (vfio-ccw, 3270) seem to be out,
> as far as I understand it. What about non-ccw? PCI in general, vfio-ap?

I've already answered how virtio is special. Regarding PCI, it is
currently not supported in prot virt mode, and this is properly handled
by the ultravisor: facility bits fenced and operation exceptions
indicated if the guest were to try and do PCI instructions nevertheless.
That also means we don't have to worry about virtio-pci for the
moment. The status current of AP is analogous to that of PCI.

So AFAIU all we have to worry about at the moment is ccw and virtio-ccw.

Regards,
Halil
Halil Pasic June 10, 2020, 1:57 p.m. UTC | #31
On Wed, 10 Jun 2020 14:29:29 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> > On Tue, 9 Jun 2020 17:47:47 +0200
> > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> > 
> > > On Tue, 9 Jun 2020 11:41:30 +0200
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > 
> > > [...]
> > > 
> > > > I don't know. Janosch could answer that, but he is on vacation. Adding
> > > > Claudio maybe he can answer. My understanding is, that while it might
> > > > be possible, it is ugly at best. The ability to do a transition is
> > > > indicated by a CPU model feature. Indicating the feature to the guest
> > > > and then failing the transition sounds wrong to me.
> > > 
> > > I agree. If the feature is advertised, then it has to work. I don't
> > > think we even have an architected way to fail the transition for that
> > > reason.
> > > 
> > > What __could__ be done is to prevent qemu from even starting if an
> > > incompatible device is specified together with PV.
> > 
> > AFAIU, the "specified together with PV" is the problem here. Currently
> > we don't "specify PV" but PV is just a capability that is managed by the
> > CPU model (like so many other). I.e. the fact that the
> > visualization environment is capable providing PV (unpack facility
> > available), and the fact, that the end user didn't fence the unpack
> > facility, does not mean, the user is dead set to use PV.
> > 
> > My understanding is, that we want PV to just work, without having to
> > put together a peculiar VM definition that says: this is going to be
> > used as a PV VM.
> 
> Having had a similar discussion for POWER, I no longer think this is a
> wise model.  I think we want to have an explicit "allow PV" option -
> but we do want it to be a *single* option, rather than having to
> change configuration of a whole bunch of places.
> 
> My intention is for my 'host-trust-limitation' series to accomplish
> that.
> 

Dave, many thanks for your input. I would be interested to read up that
discussion you hand for POWER to try to catch the train of thought. Can
you give me a pointer?

My current understanding is that s390x already has the "allow PV" option,
which is the CPU model feature. But its dynamics is just like the
dynamics of other CPU model features, in a sense that you may have to
disable it explicitly.

Our problem is, that iommu_platform=on comes at a price point for us,
and we don't want to enforce it when it is not needed. And if the guest
does not decide to do the transition to protected, it is not needed.

Thus any scheme were we pessimise based on the sheer possibility of
protected virtualization seems wrong to me.

The sad thing is that QEMU has every information it needs to do what is
best: for paravirtualized devices
* use F_ACCESS_PLATFORM when needed, to make the guest work harder and
work around the access restrictions imposed by memory protection, and 
* don't use F_ACCESS_PLATFORM when when not needed, and allow for
  optimization based on the fact that no such access restrictions exist.

Sure we can burden up the user, to tell us if the VM is intended to be
used with memory protection or not. But what does it buy us? The
opportunity to create dodgy configurations?

Regards,
Halil
David Hildenbrand June 10, 2020, 2 p.m. UTC | #32
On 10.06.20 15:19, Viktor Mihajlovski wrote:
> 
> 
> On 6/10/20 12:24 PM, David Hildenbrand wrote:
>> On 10.06.20 12:07, David Gibson wrote:
>>> On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote:
>>>> On 10.06.20 06:31, David Gibson wrote:
>>>>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
>>>>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
>>>>>>> On Tue, 9 Jun 2020 17:47:47 +0200
>>>>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
>>>>>>>
>>>>>>>> On Tue, 9 Jun 2020 11:41:30 +0200
>>>>>>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding
>>>>>>>>> Claudio maybe he can answer. My understanding is, that while it might
>>>>>>>>> be possible, it is ugly at best. The ability to do a transition is
>>>>>>>>> indicated by a CPU model feature. Indicating the feature to the guest
>>>>>>>>> and then failing the transition sounds wrong to me.
>>>>>>>>
>>>>>>>> I agree. If the feature is advertised, then it has to work. I don't
>>>>>>>> think we even have an architected way to fail the transition for that
>>>>>>>> reason.
>>>>>>>>
>>>>>>>> What __could__ be done is to prevent qemu from even starting if an
>>>>>>>> incompatible device is specified together with PV.
>>>>>>>
>>>>>>> AFAIU, the "specified together with PV" is the problem here. Currently
>>>>>>> we don't "specify PV" but PV is just a capability that is managed by the
>>>>>>> CPU model (like so many other).
>>>>>>
>>>>>> So if we want to keep it user friendly, there could be
>>>>>> protection property with values on/off/auto, and auto
>>>>>> would poke at host capability to figure out whether
>>>>>> it's supported.
>>>>>>
>>>>>> Both virtio and CPU would inherit from that.
>>>>>
>>>>> Right, that's what I have in mind for my 'host-trust-limitation'
>>>>> property (a generalized version of the existing 'memory-encryption'
>>>>> machine option).  My draft patches already set virtio properties
>>>>> accordingly, it should be possible to set (default) cpu properties as
>>>>> well.
>>>>
>>>> No crazy CPU model hacks please (at least speaking for the s390x).
>>>
>>> Uh... I'm not really sure what you have in mind here.
>>>
>>
>> Reading along I got the impression that we want to glue the availability
>> of CPU features to other QEMU cmdline parameters (besides the
>> accelerator). ("to set (default) cpu properties as well"). If we are
>> talking about other CPU properties not expressed as CPU features (e.g.,
>> -cpu X,Y=on ...), then there is no issue.
>>
> 
> The reason that the capability to run in PV mode is expressed in the CPU
> model is that this capability *is* provided by the CPU in terms of
> available instructions. I wouldn't see a benefit in providing
> a meta-property that needs to be synced with the CPU model.
> 
> So, if something has to be concluded from the fact that a VM
> could run in PV mode, that decision should be derived from the
> CPU model.
> 

Exactly.
Halil Pasic June 10, 2020, 9:37 p.m. UTC | #33
On Wed, 10 Jun 2020 14:25:54 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> > > I'm going to definitely have a good look at that. What I think special
> > > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs
> > > to go through ZONE_DMA (this is a problem of the implementation that
> > > stemming form a limitation of the DMA API, upstream didn't let me
> > > fix it).   
> > 
> > My understanding is that power runs into similar issues, but I don't
> > know much about power, so I might be entirely wrong :)  
> 
> Sort of, but not to the same extent, I think.

I'm curious what are the ramifications of a misguided hotplug on POWER?
Does using F_ACCESS_PLATFORM when it isn't required have any
significant drawbacks, or are you fine to just go with the safe option?

Regards,
Halil
David Gibson June 19, 2020, 12:33 a.m. UTC | #34
On Wed, Jun 10, 2020 at 12:24:14PM +0200, David Hildenbrand wrote:
> On 10.06.20 12:07, David Gibson wrote:
> > On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote:
> >> On 10.06.20 06:31, David Gibson wrote:
> >>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
> >>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> >>>>> On Tue, 9 Jun 2020 17:47:47 +0200
> >>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> >>>>>
> >>>>>> On Tue, 9 Jun 2020 11:41:30 +0200
> >>>>>> Halil Pasic <pasic@linux.ibm.com> wrote:
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding
> >>>>>>> Claudio maybe he can answer. My understanding is, that while it might
> >>>>>>> be possible, it is ugly at best. The ability to do a transition is
> >>>>>>> indicated by a CPU model feature. Indicating the feature to the guest
> >>>>>>> and then failing the transition sounds wrong to me.
> >>>>>>
> >>>>>> I agree. If the feature is advertised, then it has to work. I don't
> >>>>>> think we even have an architected way to fail the transition for that
> >>>>>> reason.
> >>>>>>
> >>>>>> What __could__ be done is to prevent qemu from even starting if an
> >>>>>> incompatible device is specified together with PV.
> >>>>>
> >>>>> AFAIU, the "specified together with PV" is the problem here. Currently
> >>>>> we don't "specify PV" but PV is just a capability that is managed by the
> >>>>> CPU model (like so many other).
> >>>>
> >>>> So if we want to keep it user friendly, there could be
> >>>> protection property with values on/off/auto, and auto
> >>>> would poke at host capability to figure out whether
> >>>> it's supported.
> >>>>
> >>>> Both virtio and CPU would inherit from that.
> >>>
> >>> Right, that's what I have in mind for my 'host-trust-limitation'
> >>> property (a generalized version of the existing 'memory-encryption'
> >>> machine option).  My draft patches already set virtio properties
> >>> accordingly, it should be possible to set (default) cpu properties as
> >>> well.
> >>
> >> No crazy CPU model hacks please (at least speaking for the s390x).
> > 
> > Uh... I'm not really sure what you have in mind here.
> > 
> 
> Reading along I got the impression that we want to glue the availability
> of CPU features to other QEMU cmdline parameters (besides the
> accelerator). ("to set (default) cpu properties as well"). If we are
> talking about other CPU properties not expressed as CPU features (e.g.,
> -cpu X,Y=on ...), then there is no issue.

Well, depends what you mean by "glue".  What I have in mind is that
setting the host-trust-limitation machine property will change the
defaults for cpu features in include the necessary feature for s390,
just as the draft code already changes the defaults for the relevant
virtio properties.  My intention is that if you explicitly put feature
properties on the cpu, that will override those defaults.

Is that acceptable?  I'm aware that this property affecting things in
distant devices is kinda weird and ugly, but I don't see how else we
can make configuring this not horribly complicated and differently so
for each platform.
David Gibson June 19, 2020, 12:36 a.m. UTC | #35
On Wed, Jun 10, 2020 at 03:19:22PM +0200, Viktor Mihajlovski wrote:
> 
> 
> On 6/10/20 12:24 PM, David Hildenbrand wrote:
> > On 10.06.20 12:07, David Gibson wrote:
> > > On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote:
> > > > On 10.06.20 06:31, David Gibson wrote:
> > > > > On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> > > > > > > On Tue, 9 Jun 2020 17:47:47 +0200
> > > > > > > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> > > > > > > 
> > > > > > > > On Tue, 9 Jun 2020 11:41:30 +0200
> > > > > > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > > 
> > > > > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding
> > > > > > > > > Claudio maybe he can answer. My understanding is, that while it might
> > > > > > > > > be possible, it is ugly at best. The ability to do a transition is
> > > > > > > > > indicated by a CPU model feature. Indicating the feature to the guest
> > > > > > > > > and then failing the transition sounds wrong to me.
> > > > > > > > 
> > > > > > > > I agree. If the feature is advertised, then it has to work. I don't
> > > > > > > > think we even have an architected way to fail the transition for that
> > > > > > > > reason.
> > > > > > > > 
> > > > > > > > What __could__ be done is to prevent qemu from even starting if an
> > > > > > > > incompatible device is specified together with PV.
> > > > > > > 
> > > > > > > AFAIU, the "specified together with PV" is the problem here. Currently
> > > > > > > we don't "specify PV" but PV is just a capability that is managed by the
> > > > > > > CPU model (like so many other).
> > > > > > 
> > > > > > So if we want to keep it user friendly, there could be
> > > > > > protection property with values on/off/auto, and auto
> > > > > > would poke at host capability to figure out whether
> > > > > > it's supported.
> > > > > > 
> > > > > > Both virtio and CPU would inherit from that.
> > > > > 
> > > > > Right, that's what I have in mind for my 'host-trust-limitation'
> > > > > property (a generalized version of the existing 'memory-encryption'
> > > > > machine option).  My draft patches already set virtio properties
> > > > > accordingly, it should be possible to set (default) cpu properties as
> > > > > well.
> > > > 
> > > > No crazy CPU model hacks please (at least speaking for the s390x).
> > > 
> > > Uh... I'm not really sure what you have in mind here.
> > > 
> > 
> > Reading along I got the impression that we want to glue the availability
> > of CPU features to other QEMU cmdline parameters (besides the
> > accelerator). ("to set (default) cpu properties as well"). If we are
> > talking about other CPU properties not expressed as CPU features (e.g.,
> > -cpu X,Y=on ...), then there is no issue.
> > 
> 
> The reason that the capability to run in PV mode is expressed in the CPU
> model is that this capability *is* provided by the CPU in terms of
> available instructions. I wouldn't see a benefit in providing
> a meta-property that needs to be synced with the CPU model.
> 
> So, if something has to be concluded from the fact that a VM
> could run in PV mode, that decision should be derived from the
> CPU model.

The trouble is that that approach is inherently s390 specific, and I'm
hoping we can make the configuration at least somewhat common between
platforms.

It also seems a very nasty layering violation to me for changing cpu
properties to affect apparently unrelated devices (like virtio, for
example).  It's still a bit nasty doing it from a machine property,
but it seems more reasonable to me that a machine property could
affect things elsewhere in the.. well.. machine.
David Gibson June 19, 2020, 12:59 a.m. UTC | #36
On Wed, Jun 10, 2020 at 03:57:03PM +0200, Halil Pasic wrote:
> On Wed, 10 Jun 2020 14:29:29 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote:
> > > On Tue, 9 Jun 2020 17:47:47 +0200
> > > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:
> > > 
> > > > On Tue, 9 Jun 2020 11:41:30 +0200
> > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > I don't know. Janosch could answer that, but he is on vacation. Adding
> > > > > Claudio maybe he can answer. My understanding is, that while it might
> > > > > be possible, it is ugly at best. The ability to do a transition is
> > > > > indicated by a CPU model feature. Indicating the feature to the guest
> > > > > and then failing the transition sounds wrong to me.
> > > > 
> > > > I agree. If the feature is advertised, then it has to work. I don't
> > > > think we even have an architected way to fail the transition for that
> > > > reason.
> > > > 
> > > > What __could__ be done is to prevent qemu from even starting if an
> > > > incompatible device is specified together with PV.
> > > 
> > > AFAIU, the "specified together with PV" is the problem here. Currently
> > > we don't "specify PV" but PV is just a capability that is managed by the
> > > CPU model (like so many other). I.e. the fact that the
> > > visualization environment is capable providing PV (unpack facility
> > > available), and the fact, that the end user didn't fence the unpack
> > > facility, does not mean, the user is dead set to use PV.
> > > 
> > > My understanding is, that we want PV to just work, without having to
> > > put together a peculiar VM definition that says: this is going to be
> > > used as a PV VM.
> > 
> > Having had a similar discussion for POWER, I no longer think this is a
> > wise model.  I think we want to have an explicit "allow PV" option -
> > but we do want it to be a *single* option, rather than having to
> > change configuration of a whole bunch of places.
> > 
> > My intention is for my 'host-trust-limitation' series to accomplish
> > that.
> 
> Dave, many thanks for your input. I would be interested to read up that
> discussion you hand for POWER to try to catch the train of thought. Can
> you give me a pointer?

Urgh.. not really.. it was spread out over several discussions, some
of which were on IRC or Slack, rather than email.

> My current understanding is that s390x already has the "allow PV" option,
> which is the CPU model feature. But its dynamics is just like the
> dynamics of other CPU model features, in a sense that you may have to
> disable it explicitly.
> 
> Our problem is, that iommu_platform=on comes at a price point for us,
> and we don't want to enforce it when it is not needed. And if the guest
> does not decide to do the transition to protected, it is not needed.
> 
> Thus any scheme were we pessimise based on the sheer possibility of
> protected virtualization seems wrong to me.

Hrm, I see your point.  So... I guess my thinking is that although the
strict meaning of the proposed host-trust-limitation option is just
that "protection _can_ be used, at the guest/platform's option", it is
a strong hint that we're expecting protection to be used.

So would this work for s390:
  * The cpu feature remains, as now, enabled by default
  * The host-trust-limitation option would apply the protection
    necessary virtio options (and any other changes to defaults we
    discover we need), just as it does for SEV and POWER PEF
  * Optionally, the s390 machine type code could error out if
    host-trust-limitation is specified, but the cpu option is
    explicitly disabled

> The sad thing is that QEMU has every information it needs to do what is
> best: for paravirtualized devices
> * use F_ACCESS_PLATFORM when needed, to make the guest work harder and
> work around the access restrictions imposed by memory protection, and 
> * don't use F_ACCESS_PLATFORM when when not needed, and allow for
>   optimization based on the fact that no such access restrictions exist.

Right.. IIUC you're suggesting delaying finalization of the device's
featureset until the guest driver actually starts probing it

> Sure we can burden up the user, to tell us if the VM is intended to be
> used with memory protection or not. But what does it buy us? The
> opportunity to create dodgy configurations?

So, I don't know what the situation is with z, but for POWER machines
with the ultravisor running are rare (read, not actually available
outside IBM yet), and not directly tied to a cpu version (obviously
you need a cpu with support, but you also need to actually be running
under an ultravisor, which is optional).  So what are our options:

1) Require explicitly enabling PEF support - this is burdening the
   user, as you say, but..

2) Allow by default - but fail if the host doesn't have support.  That
   means explicitly *disabling* on non-ultravisor machines, a much
   bigger imposition on the user

3) Enable conditionally depending on host support.  Seems nice, but
   it's badly broken, as we've found with previous times we've tried
   to automatically do things based on host capabilities.  The problem
   is that once you have this, it's not obvious without knowing a
   bunch about the hosts which ones it will be safe to migrate
   between.  That horribly breaks things like RHV that want to do load
   balancing migrations within a cluster

Basically having different guest-visible features depending on host
properties is just unworkable, which brings us back to (1) being the
least bad option.
David Gibson June 19, 2020, 1:01 a.m. UTC | #37
On Wed, Jun 10, 2020 at 11:37:14PM +0200, Halil Pasic wrote:
> On Wed, 10 Jun 2020 14:25:54 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > > > I'm going to definitely have a good look at that. What I think special
> > > > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs
> > > > to go through ZONE_DMA (this is a problem of the implementation that
> > > > stemming form a limitation of the DMA API, upstream didn't let me
> > > > fix it).   
> > > 
> > > My understanding is that power runs into similar issues, but I don't
> > > know much about power, so I might be entirely wrong :)  
> > 
> > Sort of, but not to the same extent, I think.
> 
> I'm curious what are the ramifications of a misguided hotplug on POWER?
> Does using F_ACCESS_PLATFORM when it isn't required have any
> significant drawbacks, or are you fine to just go with the safe option?

I expect it will have some performance impact, though it shouldn't be
*that* bad, at least if your guest kernel is ddw / large IOMMU window
capable.

Changing the default would require a machine type version bump, of
course.
diff mbox series

Patch

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index f660070d22..705e6b153a 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -330,6 +330,7 @@  static void s390_machine_unprotect(S390CcwMachineState *ms)
     migrate_del_blocker(pv_mig_blocker);
     error_free_or_abort(&pv_mig_blocker);
     qemu_balloon_inhibit(false);
+    subsystem_reset();
 }
 
 static int s390_machine_protect(S390CcwMachineState *ms)
@@ -382,6 +383,7 @@  static int s390_machine_protect(S390CcwMachineState *ms)
     if (rc) {
         goto out_err;
     }
+    subsystem_reset();
     return rc;
 
 out_err:
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 64f928fc7d..67d5bc68ba 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -874,6 +874,20 @@  static void virtio_ccw_reset(DeviceState *d)
     VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
     VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
     VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
+    S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
+
+    /*
+     * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM
+     * in PV, has catastrophic consequences for the VM. Let's force
+     * VIRTIO_F_IOMMU_PLATFORM not already specified.
+     */
+    if (ms->pv && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
+        dev->forced_iommu_platform = true;
+    } else if (!ms->pv && dev->forced_iommu_platform) {
+        virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
+        dev->forced_iommu_platform = false;
+    }
 
     virtio_ccw_reset_virtio(dev, vdev);
     if (vdc->parent_reset) {
diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index 3453aa1f98..34ff7b0b4e 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -99,6 +99,7 @@  struct VirtioCcwDevice {
     IndAddr *summary_indicator;
     uint64_t ind_bit;
     bool force_revision_1;
+    bool forced_iommu_platform;
 };
 
 /* The maximum virtio revision we support. */