diff mbox

qemu: msi irq allocation api

Message ID 20090520162130.GA22109@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael S. Tsirkin May 20, 2009, 4:21 p.m. UTC
define api for allocating/setting up msi-x irqs, and for updating them
with msi-x vector information, supply implementation in ioapic. Please
comment on this API: I intend to port my msi-x patch to work on top of
it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/apic.c     |    1 -
 hw/ioapic.c   |   65 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/irq.c      |   10 ++++++++
 hw/irq.h      |    5 ++++
 hw/pc.c       |    1 +
 hw/pc.h       |    2 +
 hw/pci.c      |    2 +
 hw/pci.h      |   10 ++++++++
 qemu-common.h |    1 +
 9 files changed, 96 insertions(+), 1 deletions(-)

Comments

Blue Swirl May 20, 2009, 5:21 p.m. UTC | #1
On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> define api for allocating/setting up msi-x irqs, and for updating them
>  with msi-x vector information, supply implementation in ioapic. Please
>  comment on this API: I intend to port my msi-x patch to work on top of
>  it.
>
>  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Sparc64 also uses packets ("mondos", not implemented yet) for
interrupt vector data, there the packet size is 8 * 64 bits. I think
we should aim for a more generic API that covers this case also.

For example, irq.c could support opaque packet payload of
unspecified/predefined size. MSI packet structure should be defined in
ioapic.c.

The pci_msi_ops structure could be 'const', or do you expect it to
change during execution?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 20, 2009, 5:32 p.m. UTC | #2
Blue Swirl wrote:
> Sparc64 also uses packets ("mondos", not implemented yet) for
> interrupt vector data, there the packet size is 8 * 64 bits. I think
> we should aim for a more generic API that covers this case also.
>
>   

Is the packet structure visible to software?
Michael S. Tsirkin May 20, 2009, 5:35 p.m. UTC | #3
On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > define api for allocating/setting up msi-x irqs, and for updating them
> >  with msi-x vector information, supply implementation in ioapic. Please
> >  comment on this API: I intend to port my msi-x patch to work on top of
> >  it.
> >
> >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Sparc64 also uses packets ("mondos", not implemented yet) for
> interrupt vector data, there the packet size is 8 * 64 bits.
> I think we should aim for a more generic API that covers this case also.

Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
MSI, not "mondos". What code would benefit from this abstraction?

> For example, irq.c could support opaque packet payload of
> unspecified/predefined size.  MSI packet structure should be defined
> in ioapic.c.

Note that MSI does not have packets and MSI interrupts do not pass any payload.

> The pci_msi_ops structure could be 'const', or do you expect it to
> change during execution?

Right. I'll fix that.
Blue Swirl May 20, 2009, 5:40 p.m. UTC | #4
On 5/20/09, Avi Kivity <avi@redhat.com> wrote:
> Blue Swirl wrote:
>
> > Sparc64 also uses packets ("mondos", not implemented yet) for
> > interrupt vector data, there the packet size is 8 * 64 bits. I think
> > we should aim for a more generic API that covers this case also.
> >
> >
> >
>
>  Is the packet structure visible to software?

Yes, and the content of the packets is defined by software. If there
is framing outside the user data (I don't know), this is not visible.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Blue Swirl May 20, 2009, 5:44 p.m. UTC | #5
On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
>  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > > define api for allocating/setting up msi-x irqs, and for updating them
>  > >  with msi-x vector information, supply implementation in ioapic. Please
>  > >  comment on this API: I intend to port my msi-x patch to work on top of
>  > >  it.
>  > >
>  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>  >
>  > Sparc64 also uses packets ("mondos", not implemented yet) for
>  > interrupt vector data, there the packet size is 8 * 64 bits.
>  > I think we should aim for a more generic API that covers this case also.
>
>
> Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
>  MSI, not "mondos". What code would benefit from this abstraction?

Sparc64 emulation, of course. I think also the API would be neater.

>  > For example, irq.c could support opaque packet payload of
>  > unspecified/predefined size.  MSI packet structure should be defined
>  > in ioapic.c.
>
>
> Note that MSI does not have packets and MSI interrupts do not pass any payload.

I don't know too much about MSI, what's the 'data' field in msi_state then?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 20, 2009, 6:24 p.m. UTC | #6
On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > >  it.
> >  > >
> >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >  >
> >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > I think we should aim for a more generic API that covers this case also.
> >
> >
> > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  MSI, not "mondos". What code would benefit from this abstraction?
> 
> Sparc64 emulation, of course. I think also the API would be neater.
> 
> >  > For example, irq.c could support opaque packet payload of
> >  > unspecified/predefined size.  MSI packet structure should be defined
> >  > in ioapic.c.
> >
> >
> > Note that MSI does not have packets and MSI interrupts do not pass any payload.
> 
> I don't know too much about MSI, what's the 'data' field in msi_state then?

opaque stuff that apic uses to select irq and cpu.
Michael S. Tsirkin May 20, 2009, 6:28 p.m. UTC | #7
On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > >  it.
> >  > >
> >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >  >
> >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > I think we should aim for a more generic API that covers this case also.
> >
> >
> > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  MSI, not "mondos". What code would benefit from this abstraction?
> 
> Sparc64 emulation, of course. I think also the API would be neater.

Since "mondos" are not interrupts, why use irqs for them?
Blue Swirl May 20, 2009, 6:38 p.m. UTC | #8
On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
>  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
>  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
>  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
>  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
>  > >  > >  it.
>  > >  > >
>  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>  > >  >
>  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
>  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
>  > >  > I think we should aim for a more generic API that covers this case also.
>  > >
>  > >
>  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
>  > >  MSI, not "mondos". What code would benefit from this abstraction?
>  >
>  > Sparc64 emulation, of course. I think also the API would be neater.
>
>
> Since "mondos" are not interrupts, why use irqs for them?

I just said above that they are used for interrupt vector data. What
makes you think they are not interrupts?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 20, 2009, 8:02 p.m. UTC | #9
On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > >  > >  it.
> >  > >  > >
> >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >  > >  >
> >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > >  > I think we should aim for a more generic API that covers this case also.
> >  > >
> >  > >
> >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> >  >
> >  > Sparc64 emulation, of course. I think also the API would be neater.
> >
> >
> > Since "mondos" are not interrupts, why use irqs for them?
> 
> I just said above that they are used for interrupt vector data. What
> makes you think they are not interrupts?

I'm sorry, I don't really know anything about sparc.
All I am saying is that in PCI, interrupts never pass data,
so qemu_set_irq as it is now, is a good API to send them.

For the sparc feature you describe, you probably want to add
a message data parameter to qemu_set_irq, but it's not
really useful for MSI.
Michael S. Tsirkin May 20, 2009, 8:17 p.m. UTC | #10
On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> > >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
> > >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
> > >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> > >  > >  > >  it.
> > >  > >  > >
> > >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > >  > >  >
> > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> > >  > >  > I think we should aim for a more generic API that covers this case also.
> > >  > >
> > >  > >
> > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> > >  >
> > >  > Sparc64 emulation, of course. I think also the API would be neater.
> > >
> > >
> > > Since "mondos" are not interrupts, why use irqs for them?
> > 
> > I just said above that they are used for interrupt vector data. What
> > makes you think they are not interrupts?
> 
> I'm sorry, I don't really know anything about sparc.
> All I am saying is that in PCI, interrupts never pass data,
> so qemu_set_irq as it is now, is a good API to send them.
> 
> For the sparc feature you describe, you probably want to add
> a message data parameter to qemu_set_irq, but it's not
> really useful for MSI.

Just to clarify, the main difference is that with MSI/MSI-X
both data and address fields are mostly static, modifying them
involves ioapic and device updates which might be an expensive
operation (e.g. with kvm, needs an extra system call).

So I don't think it makes sense to pass MSI-X data field
with each call to qemu_set_irq.
Blue Swirl May 20, 2009, 8:18 p.m. UTC | #11
On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
>  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
>  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
>  > >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
>  > >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
>  > >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
>  > >  > >  > >  it.
>  > >  > >  > >
>  > >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>  > >  > >  >
>  > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
>  > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
>  > >  > >  > I think we should aim for a more generic API that covers this case also.
>  > >  > >
>  > >  > >
>  > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
>  > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
>  > >  >
>  > >  > Sparc64 emulation, of course. I think also the API would be neater.
>  > >
>  > >
>  > > Since "mondos" are not interrupts, why use irqs for them?
>  >
>  > I just said above that they are used for interrupt vector data. What
>  > makes you think they are not interrupts?
>
>
> I'm sorry, I don't really know anything about sparc.
>  All I am saying is that in PCI, interrupts never pass data,
>  so qemu_set_irq as it is now, is a good API to send them.
>
>  For the sparc feature you describe, you probably want to add
>  a message data parameter to qemu_set_irq, but it's not
>  really useful for MSI.

But maybe this API would be useful for both MSI and mondo packets:
/* Get vector data stored in the irq */
void *qemu_irq_get_data(qemu_irq irq);

/* Set vector data stored in the irq */
void qemu_irq_set_data(qemu_irq irq, void *data);

You'd use a struct msi_state as the opaque (not using the array
indexed by irq number), I'd use some kind of
struct mondo {
 uint64_t packet[8];
};

Instead of ioapic_update_msi(), you'd call qemu_irq_set_data().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Blue Swirl May 20, 2009, 8:26 p.m. UTC | #12
On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
>  > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
>  > > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
>  > > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
>  > > >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > > >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
>  > > >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
>  > > >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
>  > > >  > >  > >  it.
>  > > >  > >  > >
>  > > >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>  > > >  > >  >
>  > > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
>  > > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
>  > > >  > >  > I think we should aim for a more generic API that covers this case also.
>  > > >  > >
>  > > >  > >
>  > > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
>  > > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
>  > > >  >
>  > > >  > Sparc64 emulation, of course. I think also the API would be neater.
>  > > >
>  > > >
>  > > > Since "mondos" are not interrupts, why use irqs for them?
>  > >
>  > > I just said above that they are used for interrupt vector data. What
>  > > makes you think they are not interrupts?
>  >
>  > I'm sorry, I don't really know anything about sparc.
>  > All I am saying is that in PCI, interrupts never pass data,
>  > so qemu_set_irq as it is now, is a good API to send them.
>  >
>  > For the sparc feature you describe, you probably want to add
>  > a message data parameter to qemu_set_irq, but it's not
>  > really useful for MSI.
>
>
> Just to clarify, the main difference is that with MSI/MSI-X
>  both data and address fields are mostly static, modifying them
>  involves ioapic and device updates which might be an expensive
>  operation (e.g. with kvm, needs an extra system call).
>
>  So I don't think it makes sense to pass MSI-X data field
>  with each call to qemu_set_irq.

No, but I think the Sparc situation is the same, the packet data is
static for the interrupt source in question.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 20, 2009, 8:29 p.m. UTC | #13
On Wed, May 20, 2009 at 11:18:56PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > >  > >  > >  it.
> >  > >  > >  > >
> >  > >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >  > >  > >  >
> >  > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > >  > >  > I think we should aim for a more generic API that covers this case also.
> >  > >  > >
> >  > >  > >
> >  > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> >  > >  >
> >  > >  > Sparc64 emulation, of course. I think also the API would be neater.
> >  > >
> >  > >
> >  > > Since "mondos" are not interrupts, why use irqs for them?
> >  >
> >  > I just said above that they are used for interrupt vector data. What
> >  > makes you think they are not interrupts?
> >
> >
> > I'm sorry, I don't really know anything about sparc.
> >  All I am saying is that in PCI, interrupts never pass data,
> >  so qemu_set_irq as it is now, is a good API to send them.
> >
> >  For the sparc feature you describe, you probably want to add
> >  a message data parameter to qemu_set_irq, but it's not
> >  really useful for MSI.
> 
> But maybe this API would be useful for both MSI and mondo packets:
> /* Get vector data stored in the irq */
> void *qemu_irq_get_data(qemu_irq irq);

Hmm. Yes, it would work for MSI.
However I am thinking ahead to kvm support, which needs to
update the irqchip (with a system call) on each state update.
So 
- I really need a callback on data update
- data update might fail

Which if added to the generic API would make it too complex,
IMO.

> /* Set vector data stored in the irq */
> void qemu_irq_set_data(qemu_irq irq, void *data);
> You'd use a struct msi_state as the opaque (not using the array
> indexed by irq number), I'd use some kind of
> struct mondo {
>  uint64_t packet[8];
> };
> 
> Instead of ioapic_update_msi(), you'd call qemu_irq_set_data().

I guess it's a matter of taste, personally I prefer type-safe functions
to hiding everything in typeless void* buffers.
Michael S. Tsirkin May 20, 2009, 8:33 p.m. UTC | #14
On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
> >  > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> >  > > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > > >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > > >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > > >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > > >  > >  > >  it.
> >  > > >  > >  > >
> >  > > >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >  > > >  > >  >
> >  > > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > > >  > >  > I think we should aim for a more generic API that covers this case also.
> >  > > >  > >
> >  > > >  > >
> >  > > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  > > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> >  > > >  >
> >  > > >  > Sparc64 emulation, of course. I think also the API would be neater.
> >  > > >
> >  > > >
> >  > > > Since "mondos" are not interrupts, why use irqs for them?
> >  > >
> >  > > I just said above that they are used for interrupt vector data. What
> >  > > makes you think they are not interrupts?
> >  >
> >  > I'm sorry, I don't really know anything about sparc.
> >  > All I am saying is that in PCI, interrupts never pass data,
> >  > so qemu_set_irq as it is now, is a good API to send them.
> >  >
> >  > For the sparc feature you describe, you probably want to add
> >  > a message data parameter to qemu_set_irq, but it's not
> >  > really useful for MSI.
> >
> >
> > Just to clarify, the main difference is that with MSI/MSI-X
> >  both data and address fields are mostly static, modifying them
> >  involves ioapic and device updates which might be an expensive
> >  operation (e.g. with kvm, needs an extra system call).
> >
> >  So I don't think it makes sense to pass MSI-X data field
> >  with each call to qemu_set_irq.
> 
> No, but I think the Sparc situation is the same, the packet data is
> static for the interrupt source in question.

So, ok, we could add data update callback and then MSI and sparc
would do their thing there. I'm not convinced I like all this
play with untyped buffers, do you think it's helpful?

If yes, maybe I'll try to code it up and see how does it look.
Blue Swirl May 20, 2009, 8:44 p.m. UTC | #15
On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote:
>  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > > On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
>  > >  > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
>  > >  > > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > >  > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
>  > >  > > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > >  > > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
>  > >  > > >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  > >  > > >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
>  > >  > > >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
>  > >  > > >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
>  > >  > > >  > >  > >  it.
>  > >  > > >  > >  > >
>  > >  > > >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>  > >  > > >  > >  >
>  > >  > > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
>  > >  > > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
>  > >  > > >  > >  > I think we should aim for a more generic API that covers this case also.
>  > >  > > >  > >
>  > >  > > >  > >
>  > >  > > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
>  > >  > > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
>  > >  > > >  >
>  > >  > > >  > Sparc64 emulation, of course. I think also the API would be neater.
>  > >  > > >
>  > >  > > >
>  > >  > > > Since "mondos" are not interrupts, why use irqs for them?
>  > >  > >
>  > >  > > I just said above that they are used for interrupt vector data. What
>  > >  > > makes you think they are not interrupts?
>  > >  >
>  > >  > I'm sorry, I don't really know anything about sparc.
>  > >  > All I am saying is that in PCI, interrupts never pass data,
>  > >  > so qemu_set_irq as it is now, is a good API to send them.
>  > >  >
>  > >  > For the sparc feature you describe, you probably want to add
>  > >  > a message data parameter to qemu_set_irq, but it's not
>  > >  > really useful for MSI.
>  > >
>  > >
>  > > Just to clarify, the main difference is that with MSI/MSI-X
>  > >  both data and address fields are mostly static, modifying them
>  > >  involves ioapic and device updates which might be an expensive
>  > >  operation (e.g. with kvm, needs an extra system call).
>  > >
>  > >  So I don't think it makes sense to pass MSI-X data field
>  > >  with each call to qemu_set_irq.
>  >
>  > No, but I think the Sparc situation is the same, the packet data is
>  > static for the interrupt source in question.
>
>
> So, ok, we could add data update callback and then MSI and sparc
>  would do their thing there. I'm not convinced I like all this
>  play with untyped buffers, do you think it's helpful?
>
>  If yes, maybe I'll try to code it up and see how does it look.

Well, get/set_data could be limited to irq.c, ioapic.c could export
something like get/set_msi_data. MSI callers should only use
get/set_msi_data. Ditto for get/set_mondo_data.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 21, 2009, 9:24 a.m. UTC | #16
On Wed, May 20, 2009 at 11:44:57PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > > On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
> >  > >  > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> >  > >  > > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > >  > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > >  > > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > >  > > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > >  > > >  > >  > On 5/20/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  > >  > > >  > >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > >  > > >  > >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > >  > > >  > >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > >  > > >  > >  > >  it.
> >  > >  > > >  > >  > >
> >  > >  > > >  > >  > >  Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >  > >  > > >  > >  >
> >  > >  > > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > >  > > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > >  > > >  > >  > I think we should aim for a more generic API that covers this case also.
> >  > >  > > >  > >
> >  > >  > > >  > >
> >  > >  > > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  > >  > > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> >  > >  > > >  >
> >  > >  > > >  > Sparc64 emulation, of course. I think also the API would be neater.
> >  > >  > > >
> >  > >  > > >
> >  > >  > > > Since "mondos" are not interrupts, why use irqs for them?
> >  > >  > >
> >  > >  > > I just said above that they are used for interrupt vector data. What
> >  > >  > > makes you think they are not interrupts?
> >  > >  >
> >  > >  > I'm sorry, I don't really know anything about sparc.
> >  > >  > All I am saying is that in PCI, interrupts never pass data,
> >  > >  > so qemu_set_irq as it is now, is a good API to send them.
> >  > >  >
> >  > >  > For the sparc feature you describe, you probably want to add
> >  > >  > a message data parameter to qemu_set_irq, but it's not
> >  > >  > really useful for MSI.
> >  > >
> >  > >
> >  > > Just to clarify, the main difference is that with MSI/MSI-X
> >  > >  both data and address fields are mostly static, modifying them
> >  > >  involves ioapic and device updates which might be an expensive
> >  > >  operation (e.g. with kvm, needs an extra system call).
> >  > >
> >  > >  So I don't think it makes sense to pass MSI-X data field
> >  > >  with each call to qemu_set_irq.
> >  >
> >  > No, but I think the Sparc situation is the same, the packet data is
> >  > static for the interrupt source in question.
> >
> >
> > So, ok, we could add data update callback and then MSI and sparc
> >  would do their thing there. I'm not convinced I like all this
> >  play with untyped buffers, do you think it's helpful?
> >
> >  If yes, maybe I'll try to code it up and see how does it look.
> 
> Well, get/set_data could be limited to irq.c, ioapic.c could export
> something like get/set_msi_data. MSI callers should only use
> get/set_msi_data. Ditto for get/set_mondo_data.

I started coding it up and got lost in a maze of callbacks and void pointers :)

And I also remembered that besides data, there's a mask and pending bits in msi
(it's ignored in the patch I sent, but I'm fixing that), so there needs
to be an interface for this as well.

I am guessing that using qemu_irq_get_opaque and qemu_irq_get_vector that I
added, you'll find it's easy to implement mondos in very few lines of code.
We can always add another level of indirection later when sparc code
surfaces, if we find it helps.
Paul Brook May 21, 2009, 10:09 a.m. UTC | #17
On Wednesday 20 May 2009, Michael S. Tsirkin wrote:
> define api for allocating/setting up msi-x irqs, and for updating them
> with msi-x vector information, supply implementation in ioapic. Please
> comment on this API: I intend to port my msi-x patch to work on top of
> it.

I though the point of MSI is that they are just a regular memory writes, and 
don't require any special bus support.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 21, 2009, 10:23 a.m. UTC | #18
Paul Brook wrote:
> On Wednesday 20 May 2009, Michael S. Tsirkin wrote:
>   
>> define api for allocating/setting up msi-x irqs, and for updating them
>> with msi-x vector information, supply implementation in ioapic. Please
>> comment on this API: I intend to port my msi-x patch to work on top of
>> it.
>>     
>
> I though the point of MSI is that they are just a regular memory writes, and 
> don't require any special bus support.
>   

The PCI bus doesn't need any special support (I think) but something on 
the other end needs to interpret those writes.

In any case we need some internal API for this, and qemu_irq looks like 
a good choice.
Paul Brook May 21, 2009, 10:34 a.m. UTC | #19
> The PCI bus doesn't need any special support (I think) but something on
> the other end needs to interpret those writes.

Sure. But there's definitely nothing PCI specific about it. I assumed this 
would all be contained within the APIC.

> In any case we need some internal API for this, and qemu_irq looks like
> a good choice.

What do you expect to be using this API?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 21, 2009, 10:50 a.m. UTC | #20
On Thu, May 21, 2009 at 11:34:11AM +0100, Paul Brook wrote:
> > The PCI bus doesn't need any special support (I think) but something on
> > the other end needs to interpret those writes.
> 
> Sure. But there's definitely nothing PCI specific about it. I assumed this 
> would all be contained within the APIC.

Exactly. APIC supplies callbacks to set up/free/mask/unmask MSI interrupts.
For kvm, we'll have another implementation that passes these requests
on to kernel.

> > In any case we need some internal API for this, and qemu_irq looks like
> > a good choice.
> 
> What do you expect to be using this API?
> 
> Paul

emulated PCI devices such as virtio.
Hope to send a patch shortly.
Avi Kivity May 21, 2009, 11:01 a.m. UTC | #21
Paul Brook wrote:
>> The PCI bus doesn't need any special support (I think) but something on
>> the other end needs to interpret those writes.
>>     
>
> Sure. But there's definitely nothing PCI specific about it. I assumed this 
> would all be contained within the APIC.
>   

MSIs are defined by PCI and their configuration is done using the PCI 
configuration space.

>> In any case we need some internal API for this, and qemu_irq looks like
>> a good choice.
>>     
>
> What do you expect to be using this API?
>   

virtio, emulated devices capable of supporting MSI (e1000?), device 
assignment (not yet in qemu.git).
Paul Brook May 21, 2009, 12:01 p.m. UTC | #22
> >> The PCI bus doesn't need any special support (I think) but something on
> >> the other end needs to interpret those writes.
> >
> > Sure. But there's definitely nothing PCI specific about it. I assumed
> > this would all be contained within the APIC.
>
> MSIs are defined by PCI and their configuration is done using the PCI
> configuration space.

A MSI is just a regular memory write, and the PCI spec explicitly states that 
a target (e.g. the APIC) is unable to distinguish between a MSI and any other 
write. The PCI config bits just provide a way of telling the device where/what 
to write.

> >> In any case we need some internal API for this, and qemu_irq looks like
> >> a good choice.
> >
> > What do you expect to be using this API?
>
> virtio, emulated devices capable of supporting MSI (e1000?), device
> assignment (not yet in qemu.git).

It probably makes sense to have common infrastructure in pci.c to 
expose/implement device side MSI functionality. However I see no need for a 
direct API between the device and the APIC. We already have an API for memory 
accesses and MMIO regions. I'm pretty sure a system could implement MSI by 
pointing the device at system ram, and having the CPU periodically poll that.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 21, 2009, 12:08 p.m. UTC | #23
Paul Brook wrote:
>>>> In any case we need some internal API for this, and qemu_irq looks like
>>>> a good choice.
>>>>         
>>> What do you expect to be using this API?
>>>       
>> virtio, emulated devices capable of supporting MSI (e1000?), device
>> assignment (not yet in qemu.git).
>>     
>
> It probably makes sense to have common infrastructure in pci.c to 
> expose/implement device side MSI functionality. However I see no need for a 
> direct API between the device and the APIC. We already have an API for memory 
> accesses and MMIO regions. I'm pretty sure a system could implement MSI by 
> pointing the device at system ram, and having the CPU periodically poll that.
>   

Instead of writing directly, let's abstract it behind a qemu_set_irq().  
This is easier for device authors.  The default implementation of the 
irq callback could write to apic memory, while for kvm we can directly 
trigger the interrupt via the kvm APIs.
Michael S. Tsirkin May 21, 2009, 12:14 p.m. UTC | #24
On Thu, May 21, 2009 at 03:08:18PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
>>>>> In any case we need some internal API for this, and qemu_irq looks like
>>>>> a good choice.
>>>>>         
>>>> What do you expect to be using this API?
>>>>       
>>> virtio, emulated devices capable of supporting MSI (e1000?), device
>>> assignment (not yet in qemu.git).
>>>     
>>
>> It probably makes sense to have common infrastructure in pci.c to  
>> expose/implement device side MSI functionality. However I see no need 
>> for a direct API between the device and the APIC. We already have an 
>> API for memory accesses and MMIO regions. I'm pretty sure a system 
>> could implement MSI by pointing the device at system ram, and having 
>> the CPU periodically poll that.
>>   
>
> Instead of writing directly, let's abstract it behind a qemu_set_irq().   
> This is easier for device authors.  The default implementation of the  
> irq callback could write to apic memory, while for kvm we can directly  
> trigger the interrupt via the kvm APIs.

Right.
Paul Brook May 21, 2009, 12:29 p.m. UTC | #25
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >>>> In any case we need some internal API for this, and qemu_irq looks
> >>>> like a good choice.
> >>>
> >>> What do you expect to be using this API?
> >>
> >> virtio, emulated devices capable of supporting MSI (e1000?), device
> >> assignment (not yet in qemu.git).
> >
> > It probably makes sense to have common infrastructure in pci.c to
> > expose/implement device side MSI functionality. However I see no need for
> > a direct API between the device and the APIC. We already have an API for
> > memory accesses and MMIO regions. I'm pretty sure a system could
> > implement MSI by pointing the device at system ram, and having the CPU
> > periodically poll that.
>
> Instead of writing directly, let's abstract it behind a qemu_set_irq().
> This is easier for device authors.  The default implementation of the
> irq callback could write to apic memory, while for kvm we can directly
> trigger the interrupt via the kvm APIs.

I'm still not convinced.

A tight coupling between PCI devices and the APIC is just going to cause us 
problems later one. I'm going to come back to the fact that these are memory 
writes so once we get IOMMU support they will presumably be subject to 
remapping by that, just like any other memory access.

Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
off event, not a level state. OTOH stl_phys is exactly the right interface.

The KVM interface should be contained within the APIC implementation.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 21, 2009, 12:38 p.m. UTC | #26
Paul Brook wrote:
>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>> This is easier for device authors.  The default implementation of the
>> irq callback could write to apic memory, while for kvm we can directly
>> trigger the interrupt via the kvm APIs.
>>     
>
> I'm still not convinced.
>
> A tight coupling between PCI devices and the APIC is just going to cause us 
> problems later one. I'm going to come back to the fact that these are memory 
> writes so once we get IOMMU support they will presumably be subject to 
> remapping by that, just like any other memory access.
>   

I'm not suggesting the qemu_irq will extend all the way to the apic.  
Think of it as connecting the device core with its interrupt unit.

> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
> off event, not a level state. OTOH stl_phys is exactly the right interface.
>   

The qemu_irq callback should do an stl_phys().  The device is happy 
since it's using the same API it uses for non-MSI.  The APIC is happy 
since it isn't connected directly to the device.  stl_phys() is happy 
since it sees more traffic and can serve more ads.  kvm is happy since 
it can hijack the callback to throw the interrupt directly into the kernel.

> The KVM interface should be contained within the APIC implementation.
>   

Tricky, but doable.
Michael S. Tsirkin May 21, 2009, 1:06 p.m. UTC | #27
On Thu, May 21, 2009 at 01:29:37PM +0100, Paul Brook wrote:
> On Thursday 21 May 2009, Avi Kivity wrote:
> > Paul Brook wrote:
> > >>>> In any case we need some internal API for this, and qemu_irq looks
> > >>>> like a good choice.
> > >>>
> > >>> What do you expect to be using this API?
> > >>
> > >> virtio, emulated devices capable of supporting MSI (e1000?), device
> > >> assignment (not yet in qemu.git).
> > >
> > > It probably makes sense to have common infrastructure in pci.c to
> > > expose/implement device side MSI functionality. However I see no need for
> > > a direct API between the device and the APIC. We already have an API for
> > > memory accesses and MMIO regions. I'm pretty sure a system could
> > > implement MSI by pointing the device at system ram, and having the CPU
> > > periodically poll that.
> >
> > Instead of writing directly, let's abstract it behind a qemu_set_irq().
> > This is easier for device authors.  The default implementation of the
> > irq callback could write to apic memory, while for kvm we can directly
> > trigger the interrupt via the kvm APIs.
> 
> I'm still not convinced.
> 
> A tight coupling between PCI devices and the APIC is just going to cause us 
> problems later one. I'm going to come back to the fact that these are memory 
> writes so once we get IOMMU support they will presumably be subject to 
> remapping by that, just like any other memory access.

Actually, MSI messages are handled by IOMMU as interrupts, not as
regular memory accesses. iommu book has comments such as
• Interrupt addresses are never translated to memory addresses, but
other special address ranges may be reclaimed to be backed with memory.

> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
> off event, not a level state.

Yes, I just chose to ignore the level value. It does not look like such
a big issue ... Do you advocate a new qemu_msi structure then?

> OTOH stl_phys is exactly the right interface.

Not really. These are writes but not into physical memory.
For example, on intel 32 bit, stl_phys gets a 32 bit address,
MSI writes encode the interrupt vector in high 32 bit bit of
the address - way outside actual physical memory.

> The KVM interface should be contained within the APIC implementation.
> 
> Paul

Unfortunately kvm capabilities that are present in current kernels
do not map well to this interface. You need to perform
expensive set up for each interrupt vector you are going to use,
be it MSI or regular interrupt.
Michael S. Tsirkin May 21, 2009, 1:08 p.m. UTC | #28
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
>>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>>> This is easier for device authors.  The default implementation of the
>>> irq callback could write to apic memory, while for kvm we can directly
>>> trigger the interrupt via the kvm APIs.
>>>     
>>
>> I'm still not convinced.
>>
>> A tight coupling between PCI devices and the APIC is just going to 
>> cause us problems later one. I'm going to come back to the fact that 
>> these are memory writes so once we get IOMMU support they will 
>> presumably be subject to remapping by that, just like any other memory 
>> access.
>>   
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.   
> Think of it as connecting the device core with its interrupt unit.
>
>> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
>> off event, not a level state. OTOH stl_phys is exactly the right interface.
>>   
>
> The qemu_irq callback should do an stl_phys().  The device is happy  
> since it's using the same API it uses for non-MSI.  The APIC is happy  
> since it isn't connected directly to the device.  stl_phys() is happy  
> since it sees more traffic and can serve more ads.  kvm is happy since  
> it can hijack the callback to throw the interrupt directly into the 
> kernel.

I like the monetization angle here.

>> The KVM interface should be contained within the APIC implementation.
>>   
>
> Tricky, but doable.

I'd rather keep it simple.
Paul Brook May 21, 2009, 1:09 p.m. UTC | #29
> > A tight coupling between PCI devices and the APIC is just going to cause
> > us problems later one. I'm going to come back to the fact that these are
> > memory writes so once we get IOMMU support they will presumably be
> > subject to remapping by that, just like any other memory access.
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.
> Think of it as connecting the device core with its interrupt unit.
>
> > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a
> > one- off event, not a level state. OTOH stl_phys is exactly the right
> > interface.
>
> The qemu_irq callback should do an stl_phys().  The device is happy
> since it's using the same API it uses for non-MSI. 

MSI provides multiple edge triggered interrupts, whereas traditional mode 
provides a single level triggered interrupt. My guess is most devices will 
want to treat these differently anyway.

Either way, this is an implementation detail between pci.c and individual 
devices. It has nothing to do with the APIC.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 21, 2009, 1:11 p.m. UTC | #30
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
>>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>>> This is easier for device authors.  The default implementation of the
>>> irq callback could write to apic memory, while for kvm we can directly
>>> trigger the interrupt via the kvm APIs.
>>>     
>>
>> I'm still not convinced.
>>
>> A tight coupling between PCI devices and the APIC is just going to 
>> cause us problems later one. I'm going to come back to the fact that 
>> these are memory writes so once we get IOMMU support they will 
>> presumably be subject to remapping by that, just like any other memory 
>> access.
>>   
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.   
> Think of it as connecting the device core with its interrupt unit.
>
>> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
>> off event, not a level state. OTOH stl_phys is exactly the right interface.
>>   
>
> The qemu_irq callback should do an stl_phys().

Actually, it seems we can't do it this way now as stl_phys
only gets a 32 bit address. So I'll use apic_deliver for now,
but yes, it will be easy to later rewrite MSI implementation this way
if that limitatiuon is lifted.


> The device is happy  
> since it's using the same API it uses for non-MSI.  The APIC is happy  
> since it isn't connected directly to the device.  stl_phys() is happy  
> since it sees more traffic and can serve more ads.  kvm is happy since  
> it can hijack the callback to throw the interrupt directly into the 
> kernel.
>
>> The KVM interface should be contained within the APIC implementation.
>>   
>
> Tricky, but doable.
>
> -- 
> error compiling committee.c: too many arguments to function
Michael S. Tsirkin May 21, 2009, 1:12 p.m. UTC | #31
On Thu, May 21, 2009 at 02:09:32PM +0100, Paul Brook wrote:
> > > A tight coupling between PCI devices and the APIC is just going to cause
> > > us problems later one. I'm going to come back to the fact that these are
> > > memory writes so once we get IOMMU support they will presumably be
> > > subject to remapping by that, just like any other memory access.
> >
> > I'm not suggesting the qemu_irq will extend all the way to the apic.
> > Think of it as connecting the device core with its interrupt unit.
> >
> > > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a
> > > one- off event, not a level state. OTOH stl_phys is exactly the right
> > > interface.
> >
> > The qemu_irq callback should do an stl_phys().  The device is happy
> > since it's using the same API it uses for non-MSI. 
> 
> MSI provides multiple edge triggered interrupts, whereas traditional mode 
> provides a single level triggered interrupt. My guess is most devices will 
> want to treat these differently anyway.

So, is qemu_send_msi better than qemu_set_irq.

> Either way, this is an implementation detail between pci.c and individual 
> devices. It has nothing to do with the APIC.
> 
> Paul
Paul Brook May 21, 2009, 1:23 p.m. UTC | #32
> > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > provides a single level triggered interrupt. My guess is most devices
> > will want to treat these differently anyway.
>
> So, is qemu_send_msi better than qemu_set_irq.

Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Brook May 21, 2009, 1:31 p.m. UTC | #33
On Thursday 21 May 2009, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > mode provides a single level triggered interrupt. My guess is most
> > > devices will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
>
> Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

To clarify, you seem to be trying to fuse two largely separate features 
together.

MSI is a standard PCI device capability[1] that involves the device performing 
a 32-bit memory write when something interesting occurs. These writes may or 
may not be directed at a APIC.

The x86 APIC has a memory mapped interface that allows generation of CPU 
interrupts in response response to memory writes. These may or may not come 
from an MSI capable PCI device.

Paul

[1] Note a *device* capability, not a bus capability.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 21, 2009, 1:46 p.m. UTC | #34
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > > provides a single level triggered interrupt. My guess is most devices
> > > will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
> 
> Neither. pci_send_msi,

Works for me.

> which is a trivial wrapper around stl_phys.

OK, but I'm adding another level of indirection in the middle,
to allow us to tie in a kvm backend.
Paul Brook May 21, 2009, 1:53 p.m. UTC | #35
> > which is a trivial wrapper around stl_phys.
>
> OK, but I'm adding another level of indirection in the middle,
> to allow us to tie in a kvm backend.

kvm has no business messing with the PCI device code.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 21, 2009, 2:01 p.m. UTC | #36
Paul Brook wrote:
>>> which is a trivial wrapper around stl_phys.
>>>       
>> OK, but I'm adding another level of indirection in the middle,
>> to allow us to tie in a kvm backend.
>>     
>
> kvm has no business messing with the PCI device code.
>   

kvm has a fast path for irq injection.  If qemu wants to support it we 
need some abstraction here.
Michael S. Tsirkin May 21, 2009, 2:07 p.m. UTC | #37
On Thu, May 21, 2009 at 02:53:14PM +0100, Paul Brook wrote:
> > > which is a trivial wrapper around stl_phys.
> >
> > OK, but I'm adding another level of indirection in the middle,
> > to allow us to tie in a kvm backend.
> 
> kvm has no business messing with the PCI device code.

Yes it has :)

kvm needs data on MSI entries: that's the interface
current kernel exposes for injecting these interrupts.

I think we also need to support in-kernel devices which
would inject MSI interrupt directly from kernel.
For these, kvm would need to know when mask bit changes
and give us info on pending bit.

That's a fair amount of PCI specific code in kvm.
Paul Brook May 21, 2009, 2:14 p.m. UTC | #38
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >>> which is a trivial wrapper around stl_phys.
> >>
> >> OK, but I'm adding another level of indirection in the middle,
> >> to allow us to tie in a kvm backend.
> >
> > kvm has no business messing with the PCI device code.
>
> kvm has a fast path for irq injection.  If qemu wants to support it we
> need some abstraction here.

Fast path from where to where? Having the PCI layer bypass/re-implement the 
APIC and inject the interrupt directly into the cpu core sounds a particularly 
bad idea.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 21, 2009, 2:37 p.m. UTC | #39
Paul Brook wrote:
> On Thursday 21 May 2009, Avi Kivity wrote:
>   
>> Paul Brook wrote:
>>     
>>>>> which is a trivial wrapper around stl_phys.
>>>>>           
>>>> OK, but I'm adding another level of indirection in the middle,
>>>> to allow us to tie in a kvm backend.
>>>>         
>>> kvm has no business messing with the PCI device code.
>>>       
>> kvm has a fast path for irq injection.  If qemu wants to support it we
>> need some abstraction here.
>>     
>
> Fast path from where to where? Having the PCI layer bypass/re-implement the 
> APIC and inject the interrupt directly into the cpu core sounds a particularly 
> bad idea.
>   

kvm implements the APIC in the host kernel (qemu upstream doesn't 
support this yet).  The fast path is wired to the in-kernel APIC, not 
the cpu core directly.

The idea is to wire it to UIO for device assignment, to a virtio-device 
implemented in the kernel, and to qemu.
Michael S. Tsirkin May 21, 2009, 2:46 p.m. UTC | #40
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > > provides a single level triggered interrupt. My guess is most devices
> > > will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
> 
> Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

I guess I'll start with that. This only works if target_phys_addr_t is
a 64 bit field (because MSI addresses are typically outside the 32 bit
memory space), I guess the simplest solution is to disable MSI if it's
not so.
Paul Brook May 21, 2009, 2:50 p.m. UTC | #41
> >>> kvm has no business messing with the PCI device code.
> >>
> >> kvm has a fast path for irq injection.  If qemu wants to support it we
> >> need some abstraction here.
> >
> > Fast path from where to where? Having the PCI layer bypass/re-implement
> > the APIC and inject the interrupt directly into the cpu core sounds a
> > particularly bad idea.
>
> kvm implements the APIC in the host kernel (qemu upstream doesn't
> support this yet).  The fast path is wired to the in-kernel APIC, not
> the cpu core directly.
>
> The idea is to wire it to UIO for device assignment, to a virtio-device
> implemented in the kernel, and to qemu.

I still don't see why you're trying to bypass straight from the pci layer to 
the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
presumably got to update the apic state anyway.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 21, 2009, 2:54 p.m. UTC | #42
Paul Brook wrote:
>> kvm implements the APIC in the host kernel (qemu upstream doesn't
>> support this yet).  The fast path is wired to the in-kernel APIC, not
>> the cpu core directly.
>>
>> The idea is to wire it to UIO for device assignment, to a virtio-device
>> implemented in the kernel, and to qemu.
>>     
>
> I still don't see why you're trying to bypass straight from the pci layer to 
> the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
> presumably got to update the apic state anyway.
>   

The fast path is an eventfd so that we don't have to teach all the 
clients about the details of MSI.  Userspace programs the MSI details 
into kvm and hands the client an eventfd.  All the client has to do is 
bang on the eventfd for the interrupt to be queued.  The eventfd 
provides event coalescing and is equally useful from the kernel and 
userspace, and can be used with targets other than kvm.
Paul Brook May 21, 2009, 3:01 p.m. UTC | #43
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >> kvm implements the APIC in the host kernel (qemu upstream doesn't
> >> support this yet).  The fast path is wired to the in-kernel APIC, not
> >> the cpu core directly.
> >>
> >> The idea is to wire it to UIO for device assignment, to a virtio-device
> >> implemented in the kernel, and to qemu.
> >
> > I still don't see why you're trying to bypass straight from the pci layer
> > to the apic. Why can't you just pass the apic MMIO writes to the kernel?
> > You've presumably got to update the apic state anyway.
>
> The fast path is an eventfd so that we don't have to teach all the
> clients about the details of MSI.  Userspace programs the MSI details
> into kvm and hands the client an eventfd.  All the client has to do is
> bang on the eventfd for the interrupt to be queued.  The eventfd
> provides event coalescing and is equally useful from the kernel and
> userspace, and can be used with targets other than kvm.

So presumably if a device triggers an APIC interrupt using a write that isn't 
one of the currently configured PCI devices, it all explodes horribly?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 21, 2009, 3:11 p.m. UTC | #44
Paul Brook wrote:
>> The fast path is an eventfd so that we don't have to teach all the
>> clients about the details of MSI.  Userspace programs the MSI details
>> into kvm and hands the client an eventfd.  All the client has to do is
>> bang on the eventfd for the interrupt to be queued.  The eventfd
>> provides event coalescing and is equally useful from the kernel and
>> userspace, and can be used with targets other than kvm.
>>     
>
> So presumably if a device triggers an APIC interrupt using a write that isn't 
> one of the currently configured PCI devices, it all explodes horribly?
>   

I don't follow.  Can you elaborate on your scenario?
Michael S. Tsirkin May 21, 2009, 4:45 p.m. UTC | #45
On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
> On Thursday 21 May 2009, Paul Brook wrote:
> > > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > > mode provides a single level triggered interrupt. My guess is most
> > > > devices will want to treat these differently anyway.
> > >
> > > So, is qemu_send_msi better than qemu_set_irq.
> >
> > Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
> 
> To clarify, you seem to be trying to fuse two largely separate features 
> together.
> 
> MSI is a standard PCI device capability[1] that involves the device performing 
> a 32-bit memory write when something interesting occurs. These writes may or 
> may not be directed at a APIC.
> 
> The x86 APIC has a memory mapped interface that allows generation of CPU 
> interrupts in response response to memory writes. These may or may not come 
> from an MSI capable PCI device.
> 
> Paul
> 
> [1] Note a *device* capability, not a bus capability.

Paul, so I went over specs, and what you say about APIC here does not
seem to be what Intel actually implemented.  Specifically, Intel
implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
signal interrupts by memory writes.

For example, after reset, when CPU writes to address 0xfee00000 this
is an access to a reserved register in APIC, but when PCI device
does write to 0xfee00000, this triggers an interrupt to destination 0.

See section 9.12 in Intel® 64 and IA-32 Architectures Software
Developer’s Manual Volume 3A: System Programming Guide, Part 1
http://www.intel.com/Assets/PDF/manual/253668.pdf

So it seems that what we need to do in pci is:

if (!msi_ops || msi_ops->send_msi(address, data))
	stl_phy(address, data);

where send_msi is wired to apic_send_msi and
where apic_send_msi returns an error for an address
outside of the MSI range 0xfee00000 - 0xfeefffff

Makes sense?
Michael S. Tsirkin May 21, 2009, 4:49 p.m. UTC | #46
On Thu, May 21, 2009 at 03:50:18PM +0100, Paul Brook wrote:
> > >>> kvm has no business messing with the PCI device code.
> > >>
> > >> kvm has a fast path for irq injection.  If qemu wants to support it we
> > >> need some abstraction here.
> > >
> > > Fast path from where to where? Having the PCI layer bypass/re-implement
> > > the APIC and inject the interrupt directly into the cpu core sounds a
> > > particularly bad idea.
> >
> > kvm implements the APIC in the host kernel (qemu upstream doesn't
> > support this yet).  The fast path is wired to the in-kernel APIC, not
> > the cpu core directly.
> >
> > The idea is to wire it to UIO for device assignment, to a virtio-device
> > implemented in the kernel, and to qemu.
> 
> I still don't see why you're trying to bypass straight from the pci layer to 
> the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
> presumably got to update the apic state anyway.
> 
> Paul

As far as I can tell, at least on Intel, MSI interrupts are not MMIO writes.
They are PCI memory writes to a hard-coded address range that
are passed to APIC. I don't think MMIO writes can triger MSI,
or at least this does not seem to be documented.
Michael S. Tsirkin May 21, 2009, 5:33 p.m. UTC | #47
On Thu, May 21, 2009 at 07:45:20PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
> > On Thursday 21 May 2009, Paul Brook wrote:
> > > > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > > > mode provides a single level triggered interrupt. My guess is most
> > > > > devices will want to treat these differently anyway.
> > > >
> > > > So, is qemu_send_msi better than qemu_set_irq.
> > >
> > > Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
> > 
> > To clarify, you seem to be trying to fuse two largely separate features 
> > together.
> > 
> > MSI is a standard PCI device capability[1] that involves the device performing 
> > a 32-bit memory write when something interesting occurs. These writes may or 
> > may not be directed at a APIC.
> > 
> > The x86 APIC has a memory mapped interface that allows generation of CPU 
> > interrupts in response response to memory writes. These may or may not come 
> > from an MSI capable PCI device.
> > 
> > Paul
> > 
> > [1] Note a *device* capability, not a bus capability.
> 
> Paul, so I went over specs, and what you say about APIC here does not
> seem to be what Intel actually implemented.  Specifically, Intel
> implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
> signal interrupts by memory writes.
> 
> For example, after reset, when CPU writes to address 0xfee00000 this
> is an access to a reserved register in APIC, but when PCI device
> does write to 0xfee00000, this triggers an interrupt to destination 0.
> 
> See section 9.12 in Intel® 64 and IA-32 Architectures Software
> Developer’s Manual Volume 3A: System Programming Guide, Part 1
> http://www.intel.com/Assets/PDF/manual/253668.pdf
> 
> So it seems that what we need to do in pci is:
> 
> if (!msi_ops || msi_ops->send_msi(address, data))
> 	stl_phy(address, data);
> 
> where send_msi is wired to apic_send_msi and
> where apic_send_msi returns an error for an address
> outside of the MSI range 0xfee00000 - 0xfeefffff
> 
> Makes sense?


So I ended up with these ops:
	allocate
	free
	update
	send
which APIC will define and MSI emulation
will use. Here, send will return error for addresses
outside 0xfeexxxxx range, and device will do a plain
stl_phy.
Avi Kivity May 24, 2009, 11:14 a.m. UTC | #48
Michael S. Tsirkin wrote:
> kvm needs data on MSI entries: that's the interface
> current kernel exposes for injecting these interrupts.
>
> I think we also need to support in-kernel devices which
> would inject MSI interrupt directly from kernel.
> For these, kvm would need to know when mask bit changes
> and give us info on pending bit.
>
> That's a fair amount of PCI specific code in kvm

My plan is to get rid of it.  The mask stuff need not be pci specific.
diff mbox

Patch

diff --git a/hw/apic.c b/hw/apic.c
index d63d74b..2d2de69 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -929,4 +929,3 @@  int apic_init(CPUState *env)
     local_apics[s->id] = s;
     return 0;
 }
-
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 317c2c2..5a99c46 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@ 
 
 #include "hw.h"
 #include "pc.h"
+#include "pci.h"
 #include "qemu-timer.h"
 #include "host-utils.h"
 
@@ -43,6 +44,16 @@ 
 #define IOAPIC_DM_SIPI			0x5
 #define IOAPIC_DM_EXTINT		0x7
 
+/* Intel APIC constants: from include/asm/msidef.h */
+#define MSI_DATA_VECTOR_SHIFT		0
+#define MSI_DATA_VECTOR_MASK		0x000000ff
+#define MSI_DATA_DELIVERY_MODE_SHIFT	8
+#define MSI_ADDR_DEST_MODE_SHIFT	2
+#define MSI_DATA_TRIGGER_SHIFT		15
+#define MSI_ADDR_DEST_ID_SHIFT		12
+#define	MSI_ADDR_DEST_ID_MASK		0x00ffff0
+#define MSI_DATA_LEVEL_SHIFT		14
+
 struct IOAPICState {
     uint8_t id;
     uint8_t ioregsel;
@@ -51,6 +62,11 @@  struct IOAPICState {
     uint64_t ioredtbl[IOAPIC_NUM_PINS];
 };
 
+struct msi_state {
+    uint64_t addr;
+    uint32_t data;
+};
+
 static void ioapic_service(IOAPICState *s)
 {
     uint8_t i;
@@ -259,3 +275,52 @@  IOAPICState *ioapic_init(void)
 
     return s;
 }
+
+/* MSI/MSI-X support */
+static void ioapic_send_msi(void *opaque, int irq, int level)
+{
+    struct msi_state *state = opaque;
+    uint8_t dest = (state[irq].addr & MSI_ADDR_DEST_ID_MASK)
+        >> MSI_ADDR_DEST_ID_SHIFT;
+    uint8_t vector = ((state[irq].addr >> 32) & MSI_DATA_VECTOR_MASK)
+        >> MSI_DATA_VECTOR_SHIFT;
+    uint8_t dest_mode = (state[irq].addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+    uint8_t trigger_mode = (state[irq].data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+    uint8_t delivery = (state[irq].data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+    apic_deliver_irq(dest, dest_mode, delivery, vector, 0, trigger_mode);
+}
+
+static qemu_irq *ioapic_allocate_msi(int nentries)
+{
+    struct msi_state *state = qemu_mallocz(nentries * sizeof *state);
+    qemu_irq *irqs;
+    if (!state)
+        return NULL;
+    irqs = qemu_allocate_irqs(ioapic_send_msi, state, nentries);
+    if (!irqs)
+        qemu_free(state);
+    return irqs;
+}
+
+static void ioapic_free_msi(qemu_irq *irq)
+{
+    qemu_free(qemu_irq_get_opaque(irq[0]));
+    qemu_free_irqs(irq);
+}
+
+static int ioapic_update_msi(qemu_irq irq, uint64_t addr, uint32_t data,
+                             int masked)
+{
+    struct msi_state *state = qemu_irq_get_opaque(irq);
+    int vector = qemu_irq_get_vector(irq);
+    state[vector].addr = addr;
+    state[vector].data = data;
+    return 0;
+}
+
+struct pci_msi_ops ioapic_msi_ops = {
+    .allocate = ioapic_allocate_msi,
+    .update = ioapic_update_msi,
+    .free = ioapic_free_msi,
+};
+
diff --git a/hw/irq.c b/hw/irq.c
index 7703f62..9180381 100644
--- a/hw/irq.c
+++ b/hw/irq.c
@@ -75,3 +75,13 @@  qemu_irq qemu_irq_invert(qemu_irq irq)
     qemu_irq_raise(irq);
     return qemu_allocate_irqs(qemu_notirq, irq, 1)[0];
 }
+
+void *qemu_irq_get_opaque(qemu_irq irq)
+{
+    return irq->opaque;
+}
+
+int qemu_irq_get_vector(qemu_irq irq)
+{
+    return irq->n;
+}
diff --git a/hw/irq.h b/hw/irq.h
index 5daae44..0e3144d 100644
--- a/hw/irq.h
+++ b/hw/irq.h
@@ -32,4 +32,9 @@  void qemu_free_irqs(qemu_irq *s);
 /* Returns a new IRQ with opposite polarity.  */
 qemu_irq qemu_irq_invert(qemu_irq irq);
 
+/* Get the pointer stored in the irq. */
+void *qemu_irq_get_opaque(qemu_irq irq);
+
+/* Get vector stored in the irq */
+int qemu_irq_get_vector(qemu_irq irq);
 #endif
diff --git a/hw/pc.c b/hw/pc.c
index 61f6e7b..1b287c3 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -962,6 +962,7 @@  static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
     if (pci_enabled) {
         pci_bus = i440fx_init(&i440fx_state, i8259);
         piix3_devfn = piix3_init(pci_bus, -1);
+        pci_msi_ops = &ioapic_msi_ops;
     } else {
         pci_bus = NULL;
     }
diff --git a/hw/pc.h b/hw/pc.h
index 50e6c39..2013aa9 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -55,6 +55,8 @@  void ioapic_set_irq(void *opaque, int vector, int level);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
 
+extern struct pci_msi_ops ioapic_msi_ops;
+
 /* i8254.c */
 
 #define PIT_FREQ 1193182
diff --git a/hw/pci.c b/hw/pci.c
index b8186f6..cd453c9 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -57,6 +57,8 @@  static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU;
 static int pci_irq_index;
 static PCIBus *first_bus;
 
+struct pci_msi_ops *pci_msi_ops;
+
 static void pcibus_save(QEMUFile *f, void *opaque)
 {
     PCIBus *bus = (PCIBus *)opaque;
diff --git a/hw/pci.h b/hw/pci.h
index a629e60..8883f08 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -280,4 +280,14 @@  PCIBus *pci_apb_init(target_phys_addr_t special_base,
 PCIBus *sh_pci_register_bus(pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
                             qemu_irq *pic, int devfn_min, int nirq);
 
+/* MSI/MSI-X */
+
+struct pci_msi_ops {
+    qemu_irq *(*allocate)(int nentries);
+    int (*update)(qemu_irq, uint64_t addr, uint32_t data, int masked);
+    void (*free)(qemu_irq *);
+};
+
+extern struct pci_msi_ops *pci_msi_ops;
+
 #endif
diff --git a/qemu-common.h b/qemu-common.h
index c90c3e3..d5a1112 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -178,6 +178,7 @@  typedef struct PCIDevice PCIDevice;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
 struct pcmcia_card_s;
+struct pci_msi_ops;
 
 /* CPU save/load.  */
 void cpu_save(QEMUFile *f, void *opaque);