diff mbox

[Qemu-devel,05/11] qemu: MSI-X support functions

Message ID 20090610142508.GA28409@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael S. Tsirkin June 10, 2009, 2:25 p.m. UTC
On Wed, Jun 10, 2009 at 03:07:34PM +0100, Paul Brook wrote:
> > > > Note that platform must set a flag to declare MSI supported.
> > > > For PC this will be set by APIC.
> > >
> > > This sounds wrong. The device shouldn't know or care whether the system
> > > has a MSI capable interrupt controller. That's for the guest OS to figure
> > > out.
> >
> > You are right of course. In theory there's nothing that breaks if I
> > set this flag to on, on all platforms. OTOH if qemu emulates some
> > controller incorrectly, guest might misdetect MSI support in the
> > controller, and things will break horribly.
> >
> > It seems safer to have a flag that can be enabled by people
> > that know about a specific platform.
> 
> No. The solution is to fix whatever is broken.

That's easy enough then. Patch below.

> 
> If we really need to avoid MSI-X capable devices then that should be done 
> explicity per-device. i.e. you have a different virtio-net device that does 
> not use MSI-X.
> 
> Paul

Why should it be done per-device?

--->

Don't add an option for platforms to disable MSI-X in all devices.
Paul Brook will find and fix all platforms that have broken MSI-X
emulation.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

Comments

Paul Brook June 10, 2009, 2:39 p.m. UTC | #1
> > If we really need to avoid MSI-X capable devices then that should be done
> > explicity per-device. i.e. you have a different virtio-net device that
> > does not use MSI-X.
> >
> > Paul
>
> Why should it be done per-device?


Because otherwise you end up with the horrible hacks that you're currently 
tripping over: devices have to magically morph into a different device when 
you load a VM. That's seems just plain wrong to me. Loading a VM shouldn't not 
do anything that can't happen during normal operation.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin June 10, 2009, 2:47 p.m. UTC | #2
On Wed, Jun 10, 2009 at 03:39:05PM +0100, Paul Brook wrote:
> > > If we really need to avoid MSI-X capable devices then that should be done
> > > explicity per-device. i.e. you have a different virtio-net device that
> > > does not use MSI-X.
> > >
> > > Paul
> >
> > Why should it be done per-device?
> 
> 
> Because otherwise you end up with the horrible hacks that you're currently 
> tripping over: devices have to magically morph into a different device when 
> you load a VM.

No, the hacks are there so I that I can support loading and saving from
non-MSI setups in a backward-compatible way.

The flag we are discussing is set at qemu startup and can't change
across load/store.

> That's seems just plain wrong to me.
> Loading a VM shouldn't not 
> do anything that can't happen during normal operation.

At least wrt pci, we are very far from this state: load just overwrites
all registers, readonly or not, which can never happen during normal
operation. And if we "fix" it, and only edit BAR registers, the happy
result will be that we can't add functionality to PCI devices without
breaking guests and/or breaking loading from old images each time.

> Paul
Paul Brook June 10, 2009, 3:15 p.m. UTC | #3
> > That's seems just plain wrong to me.
> > Loading a VM shouldn't not
> > do anything that can't happen during normal operation.
>
> At least wrt pci, we are very far from this state: load just overwrites
> all registers, readonly or not, which can never happen during normal
> operation.

IMO that code is wrong. We should only be loading things that the guest can 
change (directly or indirectly).

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin June 10, 2009, 3:52 p.m. UTC | #4
On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote:
> > > That's seems just plain wrong to me.
> > > Loading a VM shouldn't not
> > > do anything that can't happen during normal operation.
> >
> > At least wrt pci, we are very far from this state: load just overwrites
> > all registers, readonly or not, which can never happen during normal
> > operation.
> 
> IMO that code is wrong. We should only be loading things that the guest can 
> change (directly or indirectly).
> 
> Paul

Making it work this way will mean that minor changes to a device can
break backwards compatibility with old images, often in surprising ways.
What are the advantages?
Paul Brook June 10, 2009, 4:08 p.m. UTC | #5
On Wednesday 10 June 2009, Michael S. Tsirkin wrote:
> On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote:
> > > > That's seems just plain wrong to me.
> > > > Loading a VM shouldn't not
> > > > do anything that can't happen during normal operation.
> > >
> > > At least wrt pci, we are very far from this state: load just overwrites
> > > all registers, readonly or not, which can never happen during normal
> > > operation.
> >
> > IMO that code is wrong. We should only be loading things that the guest
> > can change (directly or indirectly).
>
> Making it work this way will mean that minor changes to a device can
> break backwards compatibility with old images, often in surprising ways.
> What are the advantages?

If you can't create an identical machine from scratch then I don't consider 
snapshot/migration to be a useful feature. i.e. as soon as you shutdown and 
restart the guest it is liable to break anyway.

It may be that the snapshot/migration code wants to include a machine config, 
and create a new machine from that. However this is a separate issue, and 
arguably something your VM manager should be handling for you.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin June 10, 2009, 4:26 p.m. UTC | #6
On Wed, Jun 10, 2009 at 05:08:15PM +0100, Paul Brook wrote:
> On Wednesday 10 June 2009, Michael S. Tsirkin wrote:
> > On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote:
> > > > > That's seems just plain wrong to me.
> > > > > Loading a VM shouldn't not
> > > > > do anything that can't happen during normal operation.
> > > >
> > > > At least wrt pci, we are very far from this state: load just overwrites
> > > > all registers, readonly or not, which can never happen during normal
> > > > operation.
> > >
> > > IMO that code is wrong. We should only be loading things that the guest
> > > can change (directly or indirectly).
> >
> > Making it work this way will mean that minor changes to a device can
> > break backwards compatibility with old images, often in surprising ways.
> > What are the advantages?
> 
> If you can't create an identical machine from scratch then I don't consider 
> snapshot/migration to be a useful feature. i.e. as soon as you shutdown and 
> restart the guest it is liable to break anyway.

Why is liable to break?
Configuration does not change until you load another image.
Look here:

void msix_reset(PCIDevice *dev)
{
    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
        return;

...
}

So once you load and image with MSIX capability off,
it will stay off across guest restarts.

> It may be that the snapshot/migration code wants to include a machine config, 
> and create a new machine from that. However this is a separate issue, and 
> arguably something your VM manager should be handling for you.
> 
> Paul

Since the image already has a necessary information, duplicating
it in a separate machine config will just lead to errors.
Paul Brook June 10, 2009, 4:46 p.m. UTC | #7
> > If you can't create an identical machine from scratch then I don't
> > consider snapshot/migration to be a useful feature. i.e. as soon as you
> > shutdown and restart the guest it is liable to break anyway.
>
> Why is liable to break?

A VM booted on an old version of qemu and migrated to a new version will 
behave differently to a the same VM booted on a new version of qemu.
I hope I don't need to explain why this is bad.

As previously discussed, any guest visible changes are liable to break a guest 
OS, particularly guests like Windows which deliberately break when run on 
"different" hardware. Personally I don't particularly care, but if we support 
live migration we also need to support "cold" migration - i.e. shutdown and 
restart.

>So once you load and image with MSIX capability off,
>it will stay off across guest restarts.

I'm assuming guest restart includes restarting qemu.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin June 10, 2009, 5:03 p.m. UTC | #8
On Wed, Jun 10, 2009 at 05:46:03PM +0100, Paul Brook wrote:
> > > If you can't create an identical machine from scratch then I don't
> > > consider snapshot/migration to be a useful feature. i.e. as soon as you
> > > shutdown and restart the guest it is liable to break anyway.
> >
> > Why is liable to break?
> 
> A VM booted on an old version of qemu and migrated to a new version will 
> behave differently to a the same VM booted on a new version of qemu.

It will behave identically. That's what the patch does: discover
how did the device behave on old qemu, and make it behave the same way
on new qemu.

> I hope I don't need to explain why this is bad.
> 
> As previously discussed, any guest visible changes are liable to break a guest 
> OS, particularly guests like Windows which deliberately break when run on 
> "different" hardware. Personally I don't particularly care, but if we support 
> live migration we also need to support "cold" migration - i.e. shutdown and 
> restart.
> 
> >So once you load and image with MSIX capability off,
> >it will stay off across guest restarts.
> 
> I'm assuming guest restart includes restarting qemu.
> 
> Paul

If you restart qemu, and load an image, what we should do is look at the
image and behave in a way consistent with how the qemu that created the
image behaved. If you load an image, you switch to a VM, and should be
consistent with the VM you just loaded. And we do not need a flag or a
machine description file to tell us this.
Paul Brook June 10, 2009, 5:30 p.m. UTC | #9
On Wednesday 10 June 2009, Michael S. Tsirkin wrote:
> On Wed, Jun 10, 2009 at 05:46:03PM +0100, Paul Brook wrote:
> > > > If you can't create an identical machine from scratch then I don't
> > > > consider snapshot/migration to be a useful feature. i.e. as soon as
> > > > you shutdown and restart the guest it is liable to break anyway.
> > >
> > > Why is liable to break?
> >
> > A VM booted on an old version of qemu and migrated to a new version will
> > behave differently to a the same VM booted on a new version of qemu.
>
> It will behave identically. That's what the patch does: discover
> how did the device behave on old qemu, and make it behave the same way
> on new qemu.

You're missing the point.  After doing a live migration from old-qemu to new-
qemu, there is no snapshot to load.  We need to be able to shutdown the guest, 
kill qemu (without saving a snapshot), then start qemu with the exact same 
hardware.

If we can't start a new qemu with the same hardware configuration then we 
should not be allowing migration or loading of snapshots.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin June 10, 2009, 6:07 p.m. UTC | #10
On Wed, Jun 10, 2009 at 06:30:16PM +0100, Paul Brook wrote:
> On Wednesday 10 June 2009, Michael S. Tsirkin wrote:
> > On Wed, Jun 10, 2009 at 05:46:03PM +0100, Paul Brook wrote:
> > > > > If you can't create an identical machine from scratch then I don't
> > > > > consider snapshot/migration to be a useful feature. i.e. as soon as
> > > > > you shutdown and restart the guest it is liable to break anyway.
> > > >
> > > > Why is liable to break?
> > >
> > > A VM booted on an old version of qemu and migrated to a new version will
> > > behave differently to a the same VM booted on a new version of qemu.
> >
> > It will behave identically. That's what the patch does: discover
> > how did the device behave on old qemu, and make it behave the same way
> > on new qemu.
> 
> You're missing the point.  After doing a live migration from old-qemu to new-
> qemu, there is no snapshot to load.  We need to be able to shutdown the guest, 
> kill qemu (without saving a snapshot), then start qemu with the exact same 
> hardware.

Yes, I see how this would sometimes be useful. So this feature request
would mean that we need a flag to disable msix in virtio net.

I think we can agree that hardware can change across reboots. It can
with real hardware and it happens without guest doing anything.

> If we can't start a new qemu with the same hardware configuration then we 
> should not be allowing migration or loading of snapshots.
> 
> Paul

OK, so I'll add an option in virtio-net to disable msi-x, and such
an option will be added in any device with msi-x support.
Will that address your concern?
Paul Brook June 10, 2009, 7:04 p.m. UTC | #11
> > If we can't start a new qemu with the same hardware configuration then we
> > should not be allowing migration or loading of snapshots.
>
> OK, so I'll add an option in virtio-net to disable msi-x, and such
> an option will be added in any device with msi-x support.
> Will that address your concern?

Yes, as long as migration fails when you try to migrate to the wrong kind of 
device.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin June 11, 2009, 8:29 a.m. UTC | #12
On Wed, Jun 10, 2009 at 08:04:13PM +0100, Paul Brook wrote:
> > > If we can't start a new qemu with the same hardware configuration then we
> > > should not be allowing migration or loading of snapshots.
> >
> > OK, so I'll add an option in virtio-net to disable msi-x, and such
> > an option will be added in any device with msi-x support.
> > Will that address your concern?
> 
> Yes, as long as migration fails when you try to migrate to the wrong kind of 
> device.
> 
> Paul

I think the right way to do this, is to make sure that standard
read-only registers in PCI config space are not modified in migration
(device-specific registers could have changed as a result of guest
actions, so we can't make assumptions).
diff mbox

Patch

diff --git a/hw/apic.c b/hw/apic.c
index ed03a36..9d44061 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -945,7 +945,6 @@  int apic_init(CPUState *env)
     s->cpu_env = env;
 
     apic_reset(s);
-    msix_supported = 1;
 
     /* XXX: mapping more APICs at the same memory location */
     if (apic_io_memory == 0) {
diff --git a/hw/msix.c b/hw/msix.c
index ce4e6ba..16efb27 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -62,9 +62,6 @@ 
 /* Flag to globally disable MSI-X support */
 int msix_disable;
 
-/* Flag for interrupt controller to declare MSI-X support */
-int msix_supported;
-
 /* Add MSI-X capability to the config space for the device. */
 /* Given a bar and its size, add MSI-X table on top of it
  * and fill MSI-X capability in the config space.
@@ -232,10 +229,7 @@  void msix_mmio_map(PCIDevice *d, int region_num,
 int msix_init(struct PCIDevice *dev, unsigned short nentries,
               unsigned bar_nr, unsigned bar_size)
 {
-    int ret = -ENOMEM;
-    /* Nothing to do if MSI is not supported by interrupt controller */
-    if (!msix_supported)
-        return -ENOTTY;
+    int ret;
 
     if (nentries > MSIX_MAX_ENTRIES)
         return -EINVAL;
diff --git a/hw/msix.h b/hw/msix.h
index 79e84a3..2fcadd3 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -30,6 +30,5 @@  void msix_notify(PCIDevice *dev, unsigned vector);
 void msix_reset(PCIDevice *dev);
 
 extern int msix_disable;
-extern int msix_supported;
 
 #endif