[v3,8/9] vpci/msi: add MSI handlers

Message ID	20170427143546.14662-9-roger.pau@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Roger Pau Monne <roger.pau@citrix.com> To: <xen-devel@lists.xenproject.org> Date: Thu, 27 Apr 2017 15:35:45 +0100 Message-ID: <20170427143546.14662-9-roger.pau@citrix.com> In-Reply-To: <20170427143546.14662-1-roger.pau@citrix.com> References: <20170427143546.14662-1-roger.pau@citrix.com> MIME-Version: 1.0 Cc: Andrew Cooper <andrew.cooper3@citrix.com>, julien.grall@arm.com, Paul Durrant <paul.durrant@citrix.com>, Jan Beulich <jbeulich@suse.com>, boris.ostrovsky@oracle.com, Roger Pau Monne <roger.pau@citrix.com> Subject: [Xen-devel] [PATCH v3 8/9] vpci/msi: add MSI handlers Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Message ID

20170427143546.14662-9-roger.pau@citrix.com (mailing list archive)

State

New, archived

Headers

From: Roger Pau Monne <roger.pau@citrix.com>
To: <xen-devel@lists.xenproject.org>
Date: Thu, 27 Apr 2017 15:35:45 +0100
Message-ID: <20170427143546.14662-9-roger.pau@citrix.com>
In-Reply-To: <20170427143546.14662-1-roger.pau@citrix.com>
References: <20170427143546.14662-1-roger.pau@citrix.com>
MIME-Version: 1.0
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, julien.grall@arm.com,
	Paul Durrant <paul.durrant@citrix.com>, Jan Beulich <jbeulich@suse.com>, 
	boris.ostrovsky@oracle.com, Roger Pau Monne <roger.pau@citrix.com>
Subject: [Xen-devel] [PATCH v3 8/9] vpci/msi: add MSI handlers
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Commit Message

Roger Pau Monné April 27, 2017, 2:35 p.m. UTC

Add handlers for the MSI control, address, data and mask fields in order to
detect accesses to them and setup the interrupts as requested by the guest.

Note that the pending register is not trapped, and the guest can freely
read/write to it.

Whether Xen is going to provide this functionality to Dom0 (MSI emulation) is
controlled by the "msi" option in the dom0 field. When disabling this option
Xen will hide the MSI capability structure from Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v2:
 - Add an arch-specific abstraction layer. Note that this is only implemented
   for x86 currently.
 - Add a wrapper to detect MSI enabling for vPCI.

NB: I've only been able to test this with devices using a single MSI interrupt
and no mask register. I will try to find hardware that supports the mask
register and more than one vector, but I cannot make any promises.

If there are doubts about the untested parts we could always force Xen to
report no per-vector masking support and only 1 available vector, but I would
rather avoid doing it.
---
 docs/misc/xen-command-line.markdown |   9 +-
 xen/arch/x86/dom0_build.c           |  12 +-
 xen/arch/x86/hvm/vmsi.c             | 141 +++++++++++++
 xen/drivers/vpci/Makefile           |   2 +-
 xen/drivers/vpci/capabilities.c     |   7 +-
 xen/drivers/vpci/msi.c              | 392 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/io.h        |  19 ++
 xen/include/xen/hvm/irq.h           |   1 +
 xen/include/xen/vpci.h              |  26 +++
 9 files changed, 601 insertions(+), 8 deletions(-)
 create mode 100644 xen/drivers/vpci/msi.c

Comments

Jan Beulich May 26, 2017, 3:26 p.m. UTC | #1

>>> On 27.04.17 at 16:35, <roger.pau@citrix.com> wrote:
> Add handlers for the MSI control, address, data and mask fields in order to
> detect accesses to them and setup the interrupts as requested by the guest.
> 
> Note that the pending register is not trapped, and the guest can freely
> read/write to it.
> 
> Whether Xen is going to provide this functionality to Dom0 (MSI emulation) is
> controlled by the "msi" option in the dom0 field. When disabling this option
> Xen will hide the MSI capability structure from Dom0.

Unless there's an actual reason behind this, I'd view this as a
development only thing, which shouldn't be in a non-RFC patch.

> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -622,3 +622,144 @@ void msix_write_completion(struct vcpu *v)
>      if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
>          gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
>  }
> +
> +static unsigned int msi_vector(uint16_t data)
> +{
> +    return (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;

I know code elsewhere does it this way, but I'm intending to switch
that to use MASK_EXTR(), and I'd like to ask to use that construct
right away in new code.

> +static unsigned int msi_flags(uint16_t data, uint64_t addr)
> +{
> +    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;
> +
> +    rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
> +    dm = (addr >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
> +    dest_id = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> +    deliv_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
> +    trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;

I'm sure you've simply copied code from elsewhere, which I agree
should generally be fine. However, on top of what I did say above
I'd also like to request to drop such stray 0x prefixes, plus I think
the 7 wants a #define.

> +    return dest_id | (rh << GFLAGS_SHIFT_RH) | (dm << GFLAGS_SHIFT_DM) |
> +           (deliv_mode << GFLAGS_SHIFT_DELIV_MODE) |
> +           (trig_mode << GFLAGS_SHIFT_TRG_MODE);

How come dest_id has no shift? Please let's not assume the shift
is zero now and forever.

> +void vpci_msi_mask(struct vpci_arch_msi *arch, unsigned int entry, bool mask)
> +{
> +    struct pirq *pinfo;
> +    struct irq_desc *desc;
> +    unsigned long flags;
> +    int irq;
> +
> +    ASSERT(arch->pirq != -1);

Perhaps better ">= 0"?

> +    pinfo = pirq_info(current->domain, arch->pirq + entry);
> +    ASSERT(pinfo);
> +
> +    irq = pinfo->arch.irq;
> +    ASSERT(irq < nr_irqs);
> +
> +    desc = irq_to_desc(irq);
> +    ASSERT(desc);

It's not entirely clear to me where all the checks are which allow
the checks here to be ASSERT()s.

> +int vpci_msi_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> +                    uint64_t address, uint32_t data, unsigned int vectors)
> +{
> +    struct msi_info msi_info = {
> +        .seg = pdev->seg,
> +        .bus = pdev->bus,
> +        .devfn = pdev->devfn,
> +        .entry_nr = vectors,
> +    };
> +    int index = -1, rc;
> +    unsigned int i;
> +
> +    ASSERT(arch->pirq == -1);
> +
> +    /* Get a PIRQ. */
> +    rc = allocate_and_map_msi_pirq(pdev->domain, &index, &arch->pirq,
> +                                   &msi_info);
> +    if ( rc )
> +    {
> +        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> +                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                PCI_FUNC(pdev->devfn), rc);
> +        return rc;
> +    }
> +
> +    for ( i = 0; i < vectors; i++ )
> +    {
> +        xen_domctl_bind_pt_irq_t bind = {
> +            .hvm_domid = DOMID_SELF,
> +            .machine_irq = arch->pirq + i,
> +            .irq_type = PT_IRQ_TYPE_MSI,
> +            .u.msi.gvec = msi_vector(data) + i,
> +            .u.msi.gflags = msi_flags(data, address),
> +        };
> +
> +        pcidevs_lock();
> +        rc = pt_irq_create_bind(pdev->domain, &bind);
> +        if ( rc )

I don't think you need to hold the lock for the if() and its body. While
I see unmap_domain_pirq(), I don't really see why it does, so perhaps
there's some cleanup potential up front.

> --- a/xen/drivers/vpci/capabilities.c
> +++ b/xen/drivers/vpci/capabilities.c
> @@ -109,7 +109,7 @@ static int vpci_index_capabilities(struct pci_dev *pdev)
>      return 0;
>  }
>  
> -static void vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
> +void xen_vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)

What's the xen_ prefix good for?

> +static int vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
> +                                 union vpci_val *val, void *data)
> +{
> +    struct vpci_msi *msi = data;

const

> +    if ( msi->enabled )
> +        val->word |= PCI_MSI_FLAGS_ENABLE;
> +    if ( msi->masking )
> +        val->word |= PCI_MSI_FLAGS_MASKBIT;
> +    if ( msi->address64 )
> +        val->word |= PCI_MSI_FLAGS_64BIT;
> +
> +    /* Set multiple message capable. */
> +    val->word |= ((fls(msi->max_vectors) - 1) << 1) & PCI_MSI_FLAGS_QMASK;
> +
> +    /* Set current number of configured vectors. */
> +    val->word |= ((fls(msi->guest_vectors) - 1) << 4) & PCI_MSI_FLAGS_QSIZE;

The 1 and 4 here clearly need #define-s or the use of MASK_INSR().

> +static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
> +                                  union vpci_val val, void *data)
> +{
> +    struct vpci_msi *msi = data;
> +    unsigned int vectors = 1 << ((val.word & PCI_MSI_FLAGS_QSIZE) >> 4);
> +    int rc;
> +
> +    if ( vectors > msi->max_vectors )
> +        return -EINVAL;
> +
> +    msi->guest_vectors = vectors;
> +
> +    if ( !!(val.word & PCI_MSI_FLAGS_ENABLE) == msi->enabled )
> +        return 0;
> +
> +    if ( val.word & PCI_MSI_FLAGS_ENABLE )
> +    {
> +        ASSERT(!msi->enabled && !msi->vectors);

I can see the value of the right side, but the left (with the imediately
prior if())?

> +        rc = vpci_msi_enable(&msi->arch, pdev, msi->address, msi->data,
> +                             vectors);
> +        if ( rc )
> +            return rc;
> +
> +        /* Apply the mask bits. */
> +        if ( msi->masking )
> +        {
> +            uint32_t mask = msi->mask;
> +
> +            while ( mask )
> +            {
> +                unsigned int i = ffs(mask);

ffs(), just like fls(), returns 1-based values, so this looks to be off by
one.

> +                vpci_msi_mask(&msi->arch, i, true);
> +                __clear_bit(i, &mask);
> +            }
> +        }
> +
> +        __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                         PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 1);

Seems like you'll never come through msi_capability_init(); I can't
see how it can be a good idea to bypass lots of stuff.

> +static int vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
> +                                        union vpci_val val, void *data)
> +{
> +    struct vpci_msi *msi = data;
> +
> +    /* Clear high part. */
> +    msi->address &= ~GENMASK(63, 32);
> +    msi->address |= (uint64_t)val.double_word << 32;

Is the value written here actually being used for anything other than
for reading back (also applicable to the high bits of the low half of the
address)?

> +static int vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,
> +                              union vpci_val *val, void *data)
> +{
> +    struct vpci_msi *msi = data;
> +
> +    val->double_word = msi->mask;
> +
> +    return 0;
> +}
> +
> +static int vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,
> +                               union vpci_val val, void *data)
> +{
> +    struct vpci_msi *msi = data;
> +    uint32_t dmask;
> +
> +    dmask = msi->mask ^ val.double_word;
> +
> +    if ( !dmask )
> +        return 0;
> +
> +    while ( dmask && msi->enabled )
> +    {
> +        unsigned int i = ffs(dmask);
> +
> +        vpci_msi_mask(&msi->arch, i, !test_bit(i, &msi->mask));
> +        __clear_bit(i, &dmask);
> +    }

I think this loop should be limited to the number of enabled vectors
(and the same likely applies then to vpci_msi_control_write()).

> +static int vpci_init_msi(struct pci_dev *pdev)
> +{
> +    uint8_t seg = pdev->seg, bus = pdev->bus;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    struct vpci_msi *msi = NULL;

Pointless initializer.

> +    unsigned int msi_offset;
> +    uint16_t control;
> +    int rc;
> +
> +    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
> +    if ( !msi_offset )
> +        return 0;
> +
> +    if ( !vpci_msi_enabled(pdev->domain) )
> +    {
> +        xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSI);
> +        return 0;
> +    }
> +
> +    msi = xzalloc(struct vpci_msi);
> +    if ( !msi )
> +        return -ENOMEM;
> +
> +    control = pci_conf_read16(seg, bus, slot, func,
> +                              msi_control_reg(msi_offset));
> +
> +    rc = xen_vpci_add_register(pdev, vpci_msi_control_read,
> +                               vpci_msi_control_write,
> +                               msi_control_reg(msi_offset), 2, msi);
> +    if ( rc )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "%04x:%02x:%02x.%u: failed to add handler for MSI control: %d\n",
> +                seg, bus, slot, func, rc);
> +        goto error;
> +    }
> +
> +    /* Get the maximum number of vectors the device supports. */
> +    msi->max_vectors = multi_msi_capable(control);
> +    ASSERT(msi->max_vectors <= 32);
> +
> +    /* Initial value after reset. */
> +    msi->guest_vectors = 1;
> +
> +    /* No PIRQ bind yet. */
> +    vpci_msi_arch_init(&msi->arch);
> +
> +    if ( is_64bit_address(control) )
> +        msi->address64 = true;
> +    if ( is_mask_bit_support(control) )
> +        msi->masking = true;
> +
> +    rc = xen_vpci_add_register(pdev, vpci_msi_address_read,
> +                               vpci_msi_address_write,
> +                               msi_lower_address_reg(msi_offset), 4, msi);
> +    if ( rc )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
> +                seg, bus, slot, func, rc);
> +        goto error;
> +    }
> +
> +    rc = xen_vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,
> +                               msi_data_reg(msi_offset, msi->address64), 2,
> +                               msi);
> +    if ( rc )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
> +                seg, bus, slot, func, rc);

Twice the same message text is unhelpful (and actually there's a third
one below). But iirc I did indicate anyway that I think most of them
should go away. Note also how much thy contribute to the function's
size.

> +static int __init vpci_msi_setup_keyhandler(void)
> +{
> +    register_keyhandler('Z', vpci_dump_msi, "dump guest MSI state", 1);

Please let's avoid using new (and even non-intuitive) keys if at all
possible. This is Dom0 only, so can easily be chained onto e.g. the
'M' handler.

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -89,9 +89,35 @@ struct vpci {
>  
>      /* List of capabilities supported by the device. */
>      struct list_head cap_list;
> +
> +    /* MSI data. */
> +    struct vpci_msi {
> +        /* Maximum number of vectors supported by the device. */
> +        unsigned int max_vectors;
> +        /* Current guest-written number of vectors. */
> +        unsigned int guest_vectors;
> +        /* Number of vectors configured. */
> +        unsigned int vectors;

So coming here I still don't really see what the difference between
these last two fields is (and hence why you need two).

Jan

Roger Pau Monné June 27, 2017, 10:22 a.m. UTC | #2

On Fri, May 26, 2017 at 09:26:03AM -0600, Jan Beulich wrote:
> >>> On 27.04.17 at 16:35, <roger.pau@citrix.com> wrote:
> > Add handlers for the MSI control, address, data and mask fields in order to
> > detect accesses to them and setup the interrupts as requested by the guest.
> > 
> > Note that the pending register is not trapped, and the guest can freely
> > read/write to it.
> > 
> > Whether Xen is going to provide this functionality to Dom0 (MSI emulation) is
> > controlled by the "msi" option in the dom0 field. When disabling this option
> > Xen will hide the MSI capability structure from Dom0.
> 
> Unless there's an actual reason behind this, I'd view this as a
> development only thing, which shouldn't be in a non-RFC patch.

Yes, I've removed the patch to hide capabilities ATM.

> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -622,3 +622,144 @@ void msix_write_completion(struct vcpu *v)
> >      if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
> >          gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
> >  }
> > +
> > +static unsigned int msi_vector(uint16_t data)
> > +{
> > +    return (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
> 
> I know code elsewhere does it this way, but I'm intending to switch
> that to use MASK_EXTR(), and I'd like to ask to use that construct
> right away in new code.
> 
> > +static unsigned int msi_flags(uint16_t data, uint64_t addr)
> > +{
> > +    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;
> > +
> > +    rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
> > +    dm = (addr >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
> > +    dest_id = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> > +    deliv_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
> > +    trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
> 
> I'm sure you've simply copied code from elsewhere, which I agree
> should generally be fine. However, on top of what I did say above
> I'd also like to request to drop such stray 0x prefixes, plus I think
> the 7 wants a #define.

I've switched this to using the MASK_EXTR macro, so there's no longer
the need to add the 0x1.

> > +    return dest_id | (rh << GFLAGS_SHIFT_RH) | (dm << GFLAGS_SHIFT_DM) |
> > +           (deliv_mode << GFLAGS_SHIFT_DELIV_MODE) |
> > +           (trig_mode << GFLAGS_SHIFT_TRG_MODE);
> 
> How come dest_id has no shift? Please let's not assume the shift
> is zero now and forever.

I've added a define for GFLAGS_SHIFT_DEST_ID that sets it to 0.

> > +void vpci_msi_mask(struct vpci_arch_msi *arch, unsigned int entry, bool mask)
> > +{
> > +    struct pirq *pinfo;
> > +    struct irq_desc *desc;
> > +    unsigned long flags;
> > +    int irq;
> > +
> > +    ASSERT(arch->pirq != -1);
> 
> Perhaps better ">= 0"?

OK.

> > +    pinfo = pirq_info(current->domain, arch->pirq + entry);
> > +    ASSERT(pinfo);
> > +
> > +    irq = pinfo->arch.irq;
> > +    ASSERT(irq < nr_irqs);
> > +
> > +    desc = irq_to_desc(irq);
> > +    ASSERT(desc);
> 
> It's not entirely clear to me where all the checks are which allow
> the checks here to be ASSERT()s.

Hm, this function is only called if the pirq is set (ie: if the
interrupt is bound to the domain). AFAICT if Xen cannot get the irq or
the desc related to this pirq it means that something/someone has
unbound or changed the pirq under Xen's feet, and thus the expected
state is no longer valid.

I could add something like:

if ( irq >= nr_irqs || irq < 0 )
{
    ASSERET_UNREACABLE();
    return;
}

And the same for the desc check if that seems more sensible.

> > +int vpci_msi_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> > +                    uint64_t address, uint32_t data, unsigned int vectors)
> > +{
> > +    struct msi_info msi_info = {
> > +        .seg = pdev->seg,
> > +        .bus = pdev->bus,
> > +        .devfn = pdev->devfn,
> > +        .entry_nr = vectors,
> > +    };
> > +    int index = -1, rc;
> > +    unsigned int i;
> > +
> > +    ASSERT(arch->pirq == -1);
> > +
> > +    /* Get a PIRQ. */
> > +    rc = allocate_and_map_msi_pirq(pdev->domain, &index, &arch->pirq,
> > +                                   &msi_info);
> > +    if ( rc )
> > +    {
> > +        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> > +                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> > +                PCI_FUNC(pdev->devfn), rc);
> > +        return rc;
> > +    }
> > +
> > +    for ( i = 0; i < vectors; i++ )
> > +    {
> > +        xen_domctl_bind_pt_irq_t bind = {
> > +            .hvm_domid = DOMID_SELF,
> > +            .machine_irq = arch->pirq + i,
> > +            .irq_type = PT_IRQ_TYPE_MSI,
> > +            .u.msi.gvec = msi_vector(data) + i,
> > +            .u.msi.gflags = msi_flags(data, address),
> > +        };
> > +
> > +        pcidevs_lock();
> > +        rc = pt_irq_create_bind(pdev->domain, &bind);
> > +        if ( rc )
> 
> I don't think you need to hold the lock for the if() and its body. While
> I see unmap_domain_pirq(), I don't really see why it does, so perhaps
> there's some cleanup potential up front.

unmap_domain_pirq might call into pci_disable_msi which I assume
requires the pci lock to be hold (although has no assert to that
effect).

I can send a pre-patch to remove the pci lock assert from
unmap_domain_pirq but I'm not that familiar with this code (TBH I
thought that anything dealing with PCI devices needed to hold the pci
lock).

> > --- a/xen/drivers/vpci/capabilities.c
> > +++ b/xen/drivers/vpci/capabilities.c
> > @@ -109,7 +109,7 @@ static int vpci_index_capabilities(struct pci_dev *pdev)
> >      return 0;
> >  }
> >  
> > -static void vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
> > +void xen_vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
> 
> What's the xen_ prefix good for?

Nah, nothing, in any case this code is now gone.

> > +static int vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
> > +                                 union vpci_val *val, void *data)
> > +{
> > +    struct vpci_msi *msi = data;
> 
> const
> 
> > +    if ( msi->enabled )
> > +        val->word |= PCI_MSI_FLAGS_ENABLE;
> > +    if ( msi->masking )
> > +        val->word |= PCI_MSI_FLAGS_MASKBIT;
> > +    if ( msi->address64 )
> > +        val->word |= PCI_MSI_FLAGS_64BIT;
> > +
> > +    /* Set multiple message capable. */
> > +    val->word |= ((fls(msi->max_vectors) - 1) << 1) & PCI_MSI_FLAGS_QMASK;
> > +
> > +    /* Set current number of configured vectors. */
> > +    val->word |= ((fls(msi->guest_vectors) - 1) << 4) & PCI_MSI_FLAGS_QSIZE;
> 
> The 1 and 4 here clearly need #define-s or the use of MASK_INSR().

I think using MASK_INSR is better an more inline with the previous
changes that you requested.

> > +static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
> > +                                  union vpci_val val, void *data)
> > +{
> > +    struct vpci_msi *msi = data;
> > +    unsigned int vectors = 1 << ((val.word & PCI_MSI_FLAGS_QSIZE) >> 4);
> > +    int rc;
> > +
> > +    if ( vectors > msi->max_vectors )
> > +        return -EINVAL;
> > +
> > +    msi->guest_vectors = vectors;
> > +
> > +    if ( !!(val.word & PCI_MSI_FLAGS_ENABLE) == msi->enabled )
> > +        return 0;
> > +
> > +    if ( val.word & PCI_MSI_FLAGS_ENABLE )
> > +    {
> > +        ASSERT(!msi->enabled && !msi->vectors);
> 
> I can see the value of the right side, but the left (with the imediately
> prior if())?

Right, this is redundant given the checks above.

> > +        rc = vpci_msi_enable(&msi->arch, pdev, msi->address, msi->data,
> > +                             vectors);
> > +        if ( rc )
> > +            return rc;
> > +
> > +        /* Apply the mask bits. */
> > +        if ( msi->masking )
> > +        {
> > +            uint32_t mask = msi->mask;
> > +
> > +            while ( mask )
> > +            {
> > +                unsigned int i = ffs(mask);
> 
> ffs(), just like fls(), returns 1-based values, so this looks to be off by
> one.

Thanks, good catch.

> > +                vpci_msi_mask(&msi->arch, i, true);
> > +                __clear_bit(i, &mask);
> > +            }
> > +        }
> > +
> > +        __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> > +                         PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 1);
> 
> Seems like you'll never come through msi_capability_init(); I can't
> see how it can be a good idea to bypass lots of stuff.

AFAICT this is done as part of the vpci_msi_enable call just above:

vpci_msi_enable -> allocate_and_map_msi_pirq -> map_domain_pirq ->
pci_enable_msi -> __pci_enable_msi -> msi_capability_init.

> > +static int vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
> > +                                        union vpci_val val, void *data)
> > +{
> > +    struct vpci_msi *msi = data;
> > +
> > +    /* Clear high part. */
> > +    msi->address &= ~GENMASK(63, 32);
> > +    msi->address |= (uint64_t)val.double_word << 32;
> 
> Is the value written here actually being used for anything other than
> for reading back (also applicable to the high bits of the low half of the
> address)?

It's used in a arch-specific way. But Xen needs to store it anyway, so
the guest can read back whatever it writes. I have no idea what ARM
might store here.

> > +static int vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,
> > +                              union vpci_val *val, void *data)
> > +{
> > +    struct vpci_msi *msi = data;
> > +
> > +    val->double_word = msi->mask;
> > +
> > +    return 0;
> > +}
> > +
> > +static int vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,
> > +                               union vpci_val val, void *data)
> > +{
> > +    struct vpci_msi *msi = data;
> > +    uint32_t dmask;
> > +
> > +    dmask = msi->mask ^ val.double_word;
> > +
> > +    if ( !dmask )
> > +        return 0;
> > +
> > +    while ( dmask && msi->enabled )
> > +    {
> > +        unsigned int i = ffs(dmask);
> > +
> > +        vpci_msi_mask(&msi->arch, i, !test_bit(i, &msi->mask));
> > +        __clear_bit(i, &dmask);
> > +    }
> 
> I think this loop should be limited to the number of enabled vectors
> (and the same likely applies then to vpci_msi_control_write()).

Done, I've changed it to:

for ( i = ffs(mask) - 1; mask && i < msi->vectors; i = ffs(mask) - 1 )
{
    vpci_msi_mask(&msi->arch, i, MASK_EXTR(val.u32, 1 << i));
    __clear_bit(i, &mask);
}

> > +static int vpci_init_msi(struct pci_dev *pdev)
> > +{
> > +    uint8_t seg = pdev->seg, bus = pdev->bus;
> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +    struct vpci_msi *msi = NULL;
> 
> Pointless initializer.

Removed.

> > +    unsigned int msi_offset;
> > +    uint16_t control;
> > +    int rc;
> > +
> > +    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
> > +    if ( !msi_offset )
> > +        return 0;
> > +
> > +    if ( !vpci_msi_enabled(pdev->domain) )
> > +    {
> > +        xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSI);
> > +        return 0;
> > +    }
> > +
> > +    msi = xzalloc(struct vpci_msi);
> > +    if ( !msi )
> > +        return -ENOMEM;
> > +
> > +    control = pci_conf_read16(seg, bus, slot, func,
> > +                              msi_control_reg(msi_offset));
> > +
> > +    rc = xen_vpci_add_register(pdev, vpci_msi_control_read,
> > +                               vpci_msi_control_write,
> > +                               msi_control_reg(msi_offset), 2, msi);
> > +    if ( rc )
> > +    {
> > +        dprintk(XENLOG_ERR,
> > +                "%04x:%02x:%02x.%u: failed to add handler for MSI control: %d\n",
> > +                seg, bus, slot, func, rc);
> > +        goto error;
> > +    }
> > +
> > +    /* Get the maximum number of vectors the device supports. */
> > +    msi->max_vectors = multi_msi_capable(control);
> > +    ASSERT(msi->max_vectors <= 32);
> > +
> > +    /* Initial value after reset. */
> > +    msi->guest_vectors = 1;
> > +
> > +    /* No PIRQ bind yet. */
> > +    vpci_msi_arch_init(&msi->arch);
> > +
> > +    if ( is_64bit_address(control) )
> > +        msi->address64 = true;
> > +    if ( is_mask_bit_support(control) )
> > +        msi->masking = true;
> > +
> > +    rc = xen_vpci_add_register(pdev, vpci_msi_address_read,
> > +                               vpci_msi_address_write,
> > +                               msi_lower_address_reg(msi_offset), 4, msi);
> > +    if ( rc )
> > +    {
> > +        dprintk(XENLOG_ERR,
> > +                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
> > +                seg, bus, slot, func, rc);
> > +        goto error;
> > +    }
> > +
> > +    rc = xen_vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,
> > +                               msi_data_reg(msi_offset, msi->address64), 2,
> > +                               msi);
> > +    if ( rc )
> > +    {
> > +        dprintk(XENLOG_ERR,
> > +                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
> > +                seg, bus, slot, func, rc);
> 
> Twice the same message text is unhelpful (and actually there's a third
> one below). But iirc I did indicate anyway that I think most of them
> should go away. Note also how much thy contribute to the function's
> size.

OK, I will remove those then.

> > +static int __init vpci_msi_setup_keyhandler(void)
> > +{
> > +    register_keyhandler('Z', vpci_dump_msi, "dump guest MSI state", 1);
> 
> Please let's avoid using new (and even non-intuitive) keys if at all
> possible. This is Dom0 only, so can easily be chained onto e.g. the
> 'M' handler.

I assumed none of the debug keys where actually intuitive. I've wired
it to the 'M' handler, we can always add it's own key if the need
arises.

> > --- a/xen/include/xen/vpci.h
> > +++ b/xen/include/xen/vpci.h
> > @@ -89,9 +89,35 @@ struct vpci {
> >  
> >      /* List of capabilities supported by the device. */
> >      struct list_head cap_list;
> > +
> > +    /* MSI data. */
> > +    struct vpci_msi {
> > +        /* Maximum number of vectors supported by the device. */
> > +        unsigned int max_vectors;
> > +        /* Current guest-written number of vectors. */
> > +        unsigned int guest_vectors;
> > +        /* Number of vectors configured. */
> > +        unsigned int vectors;
> 
> So coming here I still don't really see what the difference between
> these last two fields is (and hence why you need two).

Right, there's no need for having both of them, so I 'just removed
guest_vectors.

Thanks for the detailed review, Roger.

Jan Beulich June 27, 2017, 11:44 a.m. UTC | #3

>>> Roger Pau Monne <roger.pau@citrix.com> 06/27/17 12:23 PM >>>
>On Fri, May 26, 2017 at 09:26:03AM -0600, Jan Beulich wrote:
>> >>> On 27.04.17 at 16:35, <roger.pau@citrix.com> wrote:
>> > +    pinfo = pirq_info(current->domain, arch->pirq + entry);
>> > +    ASSERT(pinfo);
>> > +
>> > +    irq = pinfo->arch.irq;
>> > +    ASSERT(irq < nr_irqs);
>> > +
>> > +    desc = irq_to_desc(irq);
>> > +    ASSERT(desc);
>> 
>> It's not entirely clear to me where all the checks are which allow
>> the checks here to be ASSERT()s.
>
>Hm, this function is only called if the pirq is set (ie: if the
>interrupt is bound to the domain). AFAICT if Xen cannot get the irq or
>the desc related to this pirq it means that something/someone has
>unbound or changed the pirq under Xen's feet, and thus the expected
>state is no longer valid.
>
>I could add something like:
>
>if ( irq >= nr_irqs || irq < 0 )
>{
    >ASSERET_UNREACABLE();
    >return;
>}
>
>And the same for the desc check if that seems more sensible.

Well, if the function indeed is being called only when everything has already
been set up (and can't be torn down in a racy way), then I'm not overly
concerned which of the two forms you use. 

>> > +        pcidevs_lock();
>> > +        rc = pt_irq_create_bind(pdev->domain, &bind);
>> > +        if ( rc )
>> 
>> I don't think you need to hold the lock for the if() and its body. While
>> I see unmap_domain_pirq(), I don't really see why it does, so perhaps
>> there's some cleanup potential up front.
>
>unmap_domain_pirq might call into pci_disable_msi which I assume
>requires the pci lock to be hold (although has no assert to that
>effect).

Yeah, maybe it's indeed better to keep it. List access strictly requires
holding the lock, while I think we don't consistently hold the lock for
mere use of individual devices. If there was any real issue with that, I
think we'd rather need to refcount the devices.

>> > +static int vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
>> > +                                        union vpci_val val, void *data)
>> > +{
>> > +    struct vpci_msi *msi = data;
>> > +
>> > +    /* Clear high part. */
>> > +    msi->address &= ~GENMASK(63, 32);
>> > +    msi->address |= (uint64_t)val.double_word << 32;
>> 
>> Is the value written here actually being used for anything other than
>> for reading back (also applicable to the high bits of the low half of the
>> address)?
>
>It's used in a arch-specific way. But Xen needs to store it anyway, so
>the guest can read back whatever it writes. I have no idea what ARM
>might store here.

Hmm, I'm concerned you may introduce incorrect behavior here if the value
written is other than expected. But perhaps not much of a problem as long
as all this is Dom0 only.

Jan

Roger Pau Monné June 27, 2017, 12:44 p.m. UTC | #4

On Tue, Jun 27, 2017 at 05:44:23AM -0600, Jan Beulich wrote:
> >>> Roger Pau Monne <roger.pau@citrix.com> 06/27/17 12:23 PM >>>
> >On Fri, May 26, 2017 at 09:26:03AM -0600, Jan Beulich wrote:
> >> > +static int vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
> >> > +                                        union vpci_val val, void *data)
> >> > +{
> >> > +    struct vpci_msi *msi = data;
> >> > +
> >> > +    /* Clear high part. */
> >> > +    msi->address &= ~GENMASK(63, 32);
> >> > +    msi->address |= (uint64_t)val.double_word << 32;
> >> 
> >> Is the value written here actually being used for anything other than
> >> for reading back (also applicable to the high bits of the low half of the
> >> address)?
> >
> >It's used in a arch-specific way. But Xen needs to store it anyway, so
> >the guest can read back whatever it writes. I have no idea what ARM
> >might store here.
> 
> Hmm, I'm concerned you may introduce incorrect behavior here if the value
> written is other than expected. But perhaps not much of a problem as long
> as all this is Dom0 only.

IMHO if Xen needs to check this value for sanity it should be done in
the arch specific handler (vpci_msi_arch_enable), when the domain
tries to enable MSI. In any case, we can cross this bridge when we get
to enabling all this for DomUs.

Thanks, Roger.

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 44d99852aa..73013aea14 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -660,7 +660,7 @@  affinities to prefer but be not limited to the specified node(s).
 Pin dom0 vcpus to their respective pcpus
 
 ### dom0
-> `= List of [ pvh | shadow ]`
+> `= List of [ pvh | shadow | msi ]`
 
 > Sub-options:
 
@@ -677,6 +677,13 @@  Flag that makes a dom0 boot in PVHv2 mode.
 Flag that makes a dom0 use shadow paging. Only works when "pvh" is
 enabled.
 
+> `msi`
+
+> Default: `true`
+
+Enable or disable (using the `no-` prefix) the MSI emulation inside of
+Xen for a PVH Dom0. Note that this option has no effect on a PV Dom0.
+
 ### dtuart (ARM)
 > `= path [:options]`
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index cc8acad688..01afcf6215 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -176,29 +176,37 @@  struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 bool __initdata opt_dom0_shadow;
 #endif
 bool __initdata dom0_pvh;
+bool __initdata dom0_msi = true;
 
 /*
  * List of parameters that affect Dom0 creation:
  *
  *  - pvh               Create a PVHv2 Dom0.
  *  - shadow            Use shadow paging for Dom0.
+ *  - msi               MSI functionality.
  */
 static void __init parse_dom0_param(char *s)
 {
     char *ss;
+    bool enabled;
 
     do {
+        enabled = !!strncmp(s, "no-", 3);
+        if ( !enabled )
+            s += 3;
 
         ss = strchr(s, ',');
         if ( ss )
             *ss = '\0';
 
         if ( !strcmp(s, "pvh") )
-            dom0_pvh = true;
+            dom0_pvh = enabled;
 #ifdef CONFIG_SHADOW_PAGING
         else if ( !strcmp(s, "shadow") )
-            opt_dom0_shadow = true;
+            opt_dom0_shadow = enabled;
 #endif
+        else if ( !strcmp(s, "msi") )
+            dom0_msi = enabled;
 
         s = ss + 1;
     } while ( ss );
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index a36692c313..f23b2f43d6 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -622,3 +622,144 @@  void msix_write_completion(struct vcpu *v)
     if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
         gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
 }
+
+static unsigned int msi_vector(uint16_t data)
+{
+    return (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+}
+
+static unsigned int msi_flags(uint16_t data, uint64_t addr)
+{
+    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;
+
+    rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
+    dm = (addr >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
+    dest_id = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    deliv_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+    trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+
+    return dest_id | (rh << GFLAGS_SHIFT_RH) | (dm << GFLAGS_SHIFT_DM) |
+           (deliv_mode << GFLAGS_SHIFT_DELIV_MODE) |
+           (trig_mode << GFLAGS_SHIFT_TRG_MODE);
+}
+
+void vpci_msi_mask(struct vpci_arch_msi *arch, unsigned int entry, bool mask)
+{
+    struct pirq *pinfo;
+    struct irq_desc *desc;
+    unsigned long flags;
+    int irq;
+
+    ASSERT(arch->pirq != -1);
+    pinfo = pirq_info(current->domain, arch->pirq + entry);
+    ASSERT(pinfo);
+
+    irq = pinfo->arch.irq;
+    ASSERT(irq < nr_irqs);
+
+    desc = irq_to_desc(irq);
+    ASSERT(desc);
+
+    spin_lock_irqsave(&desc->lock, flags);
+    guest_mask_msi_irq(desc, mask);
+    spin_unlock_irqrestore(&desc->lock, flags);
+}
+
+int vpci_msi_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                    uint64_t address, uint32_t data, unsigned int vectors)
+{
+    struct msi_info msi_info = {
+        .seg = pdev->seg,
+        .bus = pdev->bus,
+        .devfn = pdev->devfn,
+        .entry_nr = vectors,
+    };
+    int index = -1, rc;
+    unsigned int i;
+
+    ASSERT(arch->pirq == -1);
+
+    /* Get a PIRQ. */
+    rc = allocate_and_map_msi_pirq(pdev->domain, &index, &arch->pirq,
+                                   &msi_info);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
+                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                PCI_FUNC(pdev->devfn), rc);
+        return rc;
+    }
+
+    for ( i = 0; i < vectors; i++ )
+    {
+        xen_domctl_bind_pt_irq_t bind = {
+            .hvm_domid = DOMID_SELF,
+            .machine_irq = arch->pirq + i,
+            .irq_type = PT_IRQ_TYPE_MSI,
+            .u.msi.gvec = msi_vector(data) + i,
+            .u.msi.gflags = msi_flags(data, address),
+        };
+
+        pcidevs_lock();
+        rc = pt_irq_create_bind(pdev->domain, &bind);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                    PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
+            spin_lock(&pdev->domain->event_lock);
+            unmap_domain_pirq(pdev->domain, arch->pirq);
+            spin_unlock(&pdev->domain->event_lock);
+            pcidevs_unlock();
+            arch->pirq = -1;
+            return rc;
+        }
+        pcidevs_unlock();
+    }
+
+    return 0;
+}
+
+int vpci_msi_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                     unsigned int vectors)
+{
+    unsigned int i;
+
+    ASSERT(arch->pirq != -1);
+
+    for ( i = 0; i < vectors; i++ )
+    {
+        xen_domctl_bind_pt_irq_t bind = {
+            .hvm_domid = DOMID_SELF,
+            .machine_irq = arch->pirq + i,
+            .irq_type = PT_IRQ_TYPE_MSI,
+        };
+
+        pcidevs_lock();
+        pt_irq_destroy_bind(pdev->domain, &bind);
+        pcidevs_unlock();
+    }
+
+    pcidevs_lock();
+    spin_lock(&pdev->domain->event_lock);
+    unmap_domain_pirq(pdev->domain, arch->pirq);
+    spin_unlock(&pdev->domain->event_lock);
+    pcidevs_unlock();
+
+    arch->pirq = -1;
+
+    return 0;
+}
+
+int vpci_msi_arch_init(struct vpci_arch_msi *arch)
+{
+    arch->pirq = -1;
+    return 0;
+}
+
+void vpci_msi_arch_print(struct vpci_arch_msi *arch)
+{
+    printk("PIRQ: %d\n", arch->pirq);
+}
+
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index c3f3085c93..ef4fc6caf3 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@ 
-obj-y += vpci.o header.o capabilities.o
+obj-y += vpci.o header.o capabilities.o msi.o
diff --git a/xen/drivers/vpci/capabilities.c b/xen/drivers/vpci/capabilities.c
index 204355e673..ad9f45c2e1 100644
--- a/xen/drivers/vpci/capabilities.c
+++ b/xen/drivers/vpci/capabilities.c
@@ -109,7 +109,7 @@  static int vpci_index_capabilities(struct pci_dev *pdev)
     return 0;
 }
 
-static void vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
+void xen_vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id)
 {
     struct vpci_capability *cap;
     uint8_t cap_offset;
@@ -138,9 +138,8 @@  static int vpci_capabilities_init(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
-    /* Mask MSI and MSI-X capabilities until Xen handles them. */
-    vpci_mask_capability(pdev, PCI_CAP_ID_MSI);
-    vpci_mask_capability(pdev, PCI_CAP_ID_MSIX);
+    /* Mask MSI-X capability until Xen handles it. */
+    xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSIX);
 
     return 0;
 }
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
new file mode 100644
index 0000000000..318dc0e626
--- /dev/null
+++ b/xen/drivers/vpci/msi.c
@@ -0,0 +1,392 @@ 
+/*
+ * Handlers for accesses to the MSI capability structure.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+#include <asm/msi.h>
+#include <xen/keyhandler.h>
+
+/* Handlers for the MSI control field (PCI_MSI_FLAGS). */
+static int vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
+                                 union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    if ( msi->enabled )
+        val->word |= PCI_MSI_FLAGS_ENABLE;
+    if ( msi->masking )
+        val->word |= PCI_MSI_FLAGS_MASKBIT;
+    if ( msi->address64 )
+        val->word |= PCI_MSI_FLAGS_64BIT;
+
+    /* Set multiple message capable. */
+    val->word |= ((fls(msi->max_vectors) - 1) << 1) & PCI_MSI_FLAGS_QMASK;
+
+    /* Set current number of configured vectors. */
+    val->word |= ((fls(msi->guest_vectors) - 1) << 4) & PCI_MSI_FLAGS_QSIZE;
+
+    return 0;
+}
+
+static int vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
+                                  union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+    unsigned int vectors = 1 << ((val.word & PCI_MSI_FLAGS_QSIZE) >> 4);
+    int rc;
+
+    if ( vectors > msi->max_vectors )
+        return -EINVAL;
+
+    msi->guest_vectors = vectors;
+
+    if ( !!(val.word & PCI_MSI_FLAGS_ENABLE) == msi->enabled )
+        return 0;
+
+    if ( val.word & PCI_MSI_FLAGS_ENABLE )
+    {
+        ASSERT(!msi->enabled && !msi->vectors);
+
+        rc = vpci_msi_enable(&msi->arch, pdev, msi->address, msi->data,
+                             vectors);
+        if ( rc )
+            return rc;
+
+        /* Apply the mask bits. */
+        if ( msi->masking )
+        {
+            uint32_t mask = msi->mask;
+
+            while ( mask )
+            {
+                unsigned int i = ffs(mask);
+
+                vpci_msi_mask(&msi->arch, i, true);
+                __clear_bit(i, &mask);
+            }
+        }
+
+        __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                         PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 1);
+
+        msi->vectors = vectors;
+        msi->enabled = true;
+    }
+    else
+    {
+        ASSERT(msi->enabled && msi->vectors);
+
+        __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                         PCI_FUNC(pdev->devfn), reg - PCI_MSI_FLAGS, 0);
+
+
+        rc = vpci_msi_disable(&msi->arch, pdev, msi->vectors);
+        if ( rc )
+            return rc;
+
+        msi->vectors = 0;
+        msi->enabled = false;
+    }
+
+    return rc;
+}
+
+/* Handlers for the address field (32bit or low part of a 64bit address). */
+static int vpci_msi_address_read(struct pci_dev *pdev, unsigned int reg,
+                                 union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->double_word = msi->address;
+
+    return 0;
+}
+
+static int vpci_msi_address_write(struct pci_dev *pdev, unsigned int reg,
+                                  union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear low part. */
+    msi->address &= ~GENMASK(31, 0);
+    msi->address |= val.double_word;
+
+    return 0;
+}
+
+/* Handlers for the high part of a 64bit address field. */
+static int vpci_msi_address_upper_read(struct pci_dev *pdev, unsigned int reg,
+                                       union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->double_word = msi->address >> 32;
+
+    return 0;
+}
+
+static int vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
+                                        union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear high part. */
+    msi->address &= ~GENMASK(63, 32);
+    msi->address |= (uint64_t)val.double_word << 32;
+
+    return 0;
+}
+
+/* Handlers for the data field. */
+static int vpci_msi_data_read(struct pci_dev *pdev, unsigned int reg,
+                              union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->word = msi->data;
+
+    return 0;
+}
+
+static int vpci_msi_data_write(struct pci_dev *pdev, unsigned int reg,
+                               union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    msi->data = val.word;
+
+    return 0;
+}
+
+static int vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,
+                              union vpci_val *val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    val->double_word = msi->mask;
+
+    return 0;
+}
+
+static int vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,
+                               union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+    uint32_t dmask;
+
+    dmask = msi->mask ^ val.double_word;
+
+    if ( !dmask )
+        return 0;
+
+    while ( dmask && msi->enabled )
+    {
+        unsigned int i = ffs(dmask);
+
+        vpci_msi_mask(&msi->arch, i, !test_bit(i, &msi->mask));
+        __clear_bit(i, &dmask);
+    }
+
+    msi->mask = val.double_word;
+    return 0;
+}
+
+static int vpci_init_msi(struct pci_dev *pdev)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msi *msi = NULL;
+    unsigned int msi_offset;
+    uint16_t control;
+    int rc;
+
+    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
+    if ( !msi_offset )
+        return 0;
+
+    if ( !vpci_msi_enabled(pdev->domain) )
+    {
+        xen_vpci_mask_capability(pdev, PCI_CAP_ID_MSI);
+        return 0;
+    }
+
+    msi = xzalloc(struct vpci_msi);
+    if ( !msi )
+        return -ENOMEM;
+
+    control = pci_conf_read16(seg, bus, slot, func,
+                              msi_control_reg(msi_offset));
+
+    rc = xen_vpci_add_register(pdev, vpci_msi_control_read,
+                               vpci_msi_control_write,
+                               msi_control_reg(msi_offset), 2, msi);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler for MSI control: %d\n",
+                seg, bus, slot, func, rc);
+        goto error;
+    }
+
+    /* Get the maximum number of vectors the device supports. */
+    msi->max_vectors = multi_msi_capable(control);
+    ASSERT(msi->max_vectors <= 32);
+
+    /* Initial value after reset. */
+    msi->guest_vectors = 1;
+
+    /* No PIRQ bind yet. */
+    vpci_msi_arch_init(&msi->arch);
+
+    if ( is_64bit_address(control) )
+        msi->address64 = true;
+    if ( is_mask_bit_support(control) )
+        msi->masking = true;
+
+    rc = xen_vpci_add_register(pdev, vpci_msi_address_read,
+                               vpci_msi_address_write,
+                               msi_lower_address_reg(msi_offset), 4, msi);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
+                seg, bus, slot, func, rc);
+        goto error;
+    }
+
+    rc = xen_vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,
+                               msi_data_reg(msi_offset, msi->address64), 2,
+                               msi);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR,
+                "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
+                seg, bus, slot, func, rc);
+        goto error;
+    }
+
+    if ( msi->address64 )
+    {
+        rc = xen_vpci_add_register(pdev, vpci_msi_address_upper_read,
+                                   vpci_msi_address_upper_write,
+                                   msi_upper_address_reg(msi_offset), 4, msi);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: failed to add handler for MSI address: %d\n",
+                    seg, bus, slot, func, rc);
+            goto error;
+        }
+    }
+
+    if ( msi->masking )
+    {
+        rc = xen_vpci_add_register(pdev, vpci_msi_mask_read,
+                                   vpci_msi_mask_write,
+                                   msi_mask_bits_reg(msi_offset,
+                                                     msi->address64), 4, msi);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: failed to add handler for MSI mask: %d\n",
+                    seg, bus, slot, func, rc);
+            goto error;
+        }
+    }
+
+    pdev->vpci->msi = msi;
+
+    return 0;
+
+ error:
+    ASSERT(rc);
+    xfree(msi);
+    return rc;
+}
+
+REGISTER_VPCI_INIT(vpci_init_msi, false);
+
+static void vpci_dump_msi(unsigned char key)
+{
+    struct domain *d;
+    struct pci_dev *pdev;
+
+    printk("Guest MSI information:\n");
+
+    for_each_domain ( d )
+    {
+        if ( !has_vpci(d) )
+            continue;
+
+        vpci_lock(d);
+        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list)
+        {
+            uint8_t seg = pdev->seg, bus = pdev->bus;
+            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+            struct vpci_msi *msi = pdev->vpci->msi;
+            uint16_t data;
+            uint64_t addr;
+
+            if ( !msi )
+                continue;
+
+            printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
+
+            printk("Enabled: %u Supports masking: %u 64-bit addresses: %u\n",
+                   msi->enabled, msi->masking, msi->address64);
+            printk("Max vectors: %u guest vectors: %u enabled vectors: %u\n",
+                   msi->max_vectors, msi->guest_vectors, msi->vectors);
+
+            vpci_msi_arch_print(&msi->arch);
+
+            data = msi->data;
+            addr = msi->address;
+            printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu\n",
+                   (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT,
+                   data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
+                   data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
+                   data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
+                   addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
+                   addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "cpu",
+                   (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT);
+
+            if ( msi->masking )
+                printk("mask=%#032x\n", msi->mask);
+            printk("\n");
+        }
+        vpci_unlock(d);
+    }
+}
+
+static int __init vpci_msi_setup_keyhandler(void)
+{
+    register_keyhandler('Z', vpci_dump_msi, "dump guest MSI state", 1);
+    return 0;
+}
+__initcall(vpci_msi_setup_keyhandler);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index f9b9829eaf..ae3af43749 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -20,6 +20,7 @@ 
 #define __ASM_X86_HVM_IO_H__
 
 #include <xen/mm.h>
+#include <xen/pci.h>
 #include <asm/hvm/vpic.h>
 #include <asm/hvm/vioapic.h>
 #include <public/hvm/ioreq.h>
@@ -126,6 +127,24 @@  void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
 void msix_write_completion(struct vcpu *);
 void msixtbl_init(struct domain *d);
 
+/* Is emulated MSI enabled? */
+extern bool dom0_msi;
+#define vpci_msi_enabled(d) (!is_hardware_domain((d)) || dom0_msi)
+
+/* Arch-specific MSI data for vPCI. */
+struct vpci_arch_msi {
+    int pirq;
+};
+
+/* Arch-specific vPCI MSI helpers. */
+void vpci_msi_mask(struct vpci_arch_msi *arch, unsigned int entry, bool mask);
+int vpci_msi_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                    uint64_t address, uint32_t data, unsigned int vectors);
+int vpci_msi_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                     unsigned int vectors);
+int vpci_msi_arch_init(struct vpci_arch_msi *arch);
+void vpci_msi_arch_print(struct vpci_arch_msi *arch);
+
 enum stdvga_cache_state {
     STDVGA_CACHE_UNINITIALIZED,
     STDVGA_CACHE_ENABLED,
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 0d2c72c109..37dfb3b6c5 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -58,6 +58,7 @@  struct dev_intx_gsi_link {
 #define VMSI_TRIG_MODE    0x8000
 
 #define GFLAGS_SHIFT_RH             8
+#define GFLAGS_SHIFT_DM             9
 #define GFLAGS_SHIFT_DELIV_MODE     12
 #define GFLAGS_SHIFT_TRG_MODE       15
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 2bf61d6c15..373b8d6505 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -89,9 +89,35 @@  struct vpci {
 
     /* List of capabilities supported by the device. */
     struct list_head cap_list;
+
+    /* MSI data. */
+    struct vpci_msi {
+        /* Maximum number of vectors supported by the device. */
+        unsigned int max_vectors;
+        /* Current guest-written number of vectors. */
+        unsigned int guest_vectors;
+        /* Number of vectors configured. */
+        unsigned int vectors;
+        /* Address and data fields. */
+        uint64_t address;
+        uint16_t data;
+        /* Mask bitfield. */
+        uint32_t mask;
+        /* Enabled? */
+        bool enabled;
+        /* Supports per-vector masking? */
+        bool masking;
+        /* 64-bit address capable? */
+        bool address64;
+        /* Arch-specific data. */
+        struct vpci_arch_msi arch;
+    } *msi;
 #endif
 };
 
+/* Mask a PCI capability. */
+void xen_vpci_mask_capability(struct pci_dev *pdev, uint8_t cap_id);
+
 #endif
 
 /*

[v3,8/9] vpci/msi: add MSI handlers

Commit Message

Comments

Patch