diff mbox

[v4,7/9] vpci/msi: add MSI handlers

Message ID 20170630150117.88489-8-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Roger Pau Monné June 30, 2017, 3:01 p.m. UTC
Add handlers for the MSI control, address, data and mask fields in
order to detect accesses to them and setup the interrupts as requested
by the guest.

Note that the pending register is not trapped, and the guest can
freely read/write to it.

Whether Xen is going to provide this functionality to Dom0 (MSI
emulation) is controlled by the "msi" option in the dom0 field. When
disabling this option Xen will hide the MSI capability structure from
Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v3:
 - Propagate changes from previous versions: drop xen_ prefix, drop
   return value from handlers, use the new vpci_val fields.
 - Use MASK_EXTR.
 - Remove the usage of GENMASK.
 - Add GFLAGS_SHIFT_DEST_ID and use it in msi_flags.
 - Add "arch" to the MSI arch specific functions.
 - Move the dumping of vPCI MSI information to dump_msi (key 'M').
 - Remove the guest_vectors field.
 - Allow the guest to change the number of active vectors without
   having to disable and enable MSI.
 - Check the number of active vectors when parsing the disable
   mask.
 - Remove the debug messages from vpci_init_msi.
 - Move the arch-specific part of the dump handler to x86/hvm/vmsi.c.
 - Use trylock in the dump handler to get the vpci lock.

Changes since v2:
 - Add an arch-specific abstraction layer. Note that this is only implemented
   for x86 currently.
 - Add a wrapper to detect MSI enabling for vPCI.

NB: I've only been able to test this with devices using a single MSI interrupt
and no mask register. I will try to find hardware that supports the mask
register and more than one vector, but I cannot make any promises.

If there are doubts about the untested parts we could always force Xen to
report no per-vector masking support and only 1 available vector, but I would
rather avoid doing it.
---
 xen/arch/x86/hvm/vmsi.c      | 149 ++++++++++++++++++
 xen/arch/x86/msi.c           |   3 +
 xen/drivers/vpci/Makefile    |   2 +-
 xen/drivers/vpci/msi.c       | 348 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/io.h |  18 +++
 xen/include/asm-x86/msi.h    |   1 +
 xen/include/xen/hvm/irq.h    |   2 +
 xen/include/xen/vpci.h       |  26 ++++
 8 files changed, 548 insertions(+), 1 deletion(-)
 create mode 100644 xen/drivers/vpci/msi.c

Comments

Paul Durrant July 18, 2017, 8:56 a.m. UTC | #1
> -----Original Message-----

> From: Roger Pau Monne [mailto:roger.pau@citrix.com]

> Sent: 30 June 2017 16:01

> To: xen-devel@lists.xenproject.org

> Cc: boris.ostrovsky@oracle.com; julien.grall@arm.com;

> konrad.wilk@oracle.com; Roger Pau Monne <roger.pau@citrix.com>; Jan

> Beulich <jbeulich@suse.com>; Andrew Cooper

> <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>

> Subject: [PATCH v4 7/9] vpci/msi: add MSI handlers

> 

> Add handlers for the MSI control, address, data and mask fields in

> order to detect accesses to them and setup the interrupts as requested

> by the guest.

> 

> Note that the pending register is not trapped, and the guest can

> freely read/write to it.

> 

> Whether Xen is going to provide this functionality to Dom0 (MSI

> emulation) is controlled by the "msi" option in the dom0 field. When

> disabling this option Xen will hide the MSI capability structure from

> Dom0.

> 

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

> ---

> Cc: Jan Beulich <jbeulich@suse.com>

> Cc: Andrew Cooper <andrew.cooper3@citrix.com>

> Cc: Paul Durrant <paul.durrant@citrix.com>

> ---

> Changes since v3:

>  - Propagate changes from previous versions: drop xen_ prefix, drop

>    return value from handlers, use the new vpci_val fields.

>  - Use MASK_EXTR.

>  - Remove the usage of GENMASK.

>  - Add GFLAGS_SHIFT_DEST_ID and use it in msi_flags.

>  - Add "arch" to the MSI arch specific functions.

>  - Move the dumping of vPCI MSI information to dump_msi (key 'M').

>  - Remove the guest_vectors field.

>  - Allow the guest to change the number of active vectors without

>    having to disable and enable MSI.

>  - Check the number of active vectors when parsing the disable

>    mask.

>  - Remove the debug messages from vpci_init_msi.

>  - Move the arch-specific part of the dump handler to x86/hvm/vmsi.c.

>  - Use trylock in the dump handler to get the vpci lock.

> 

> Changes since v2:

>  - Add an arch-specific abstraction layer. Note that this is only implemented

>    for x86 currently.

>  - Add a wrapper to detect MSI enabling for vPCI.

> 

> NB: I've only been able to test this with devices using a single MSI interrupt

> and no mask register. I will try to find hardware that supports the mask

> register and more than one vector, but I cannot make any promises.

> 

> If there are doubts about the untested parts we could always force Xen to

> report no per-vector masking support and only 1 available vector, but I would

> rather avoid doing it.

> ---

>  xen/arch/x86/hvm/vmsi.c      | 149 ++++++++++++++++++

>  xen/arch/x86/msi.c           |   3 +

>  xen/drivers/vpci/Makefile    |   2 +-

>  xen/drivers/vpci/msi.c       | 348

> +++++++++++++++++++++++++++++++++++++++++++

>  xen/include/asm-x86/hvm/io.h |  18 +++

>  xen/include/asm-x86/msi.h    |   1 +

>  xen/include/xen/hvm/irq.h    |   2 +

>  xen/include/xen/vpci.h       |  26 ++++

>  8 files changed, 548 insertions(+), 1 deletion(-)

>  create mode 100644 xen/drivers/vpci/msi.c

> 

> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c

> index a36692c313..5732c70b5c 100644

> --- a/xen/arch/x86/hvm/vmsi.c

> +++ b/xen/arch/x86/hvm/vmsi.c

> @@ -622,3 +622,152 @@ void msix_write_completion(struct vcpu *v)

>      if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )

>          gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");

>  }

> +

> +static unsigned int msi_vector(uint16_t data)

> +{

> +    return MASK_EXTR(data, MSI_DATA_VECTOR_MASK);

> +}

> +

> +static unsigned int msi_flags(uint16_t data, uint64_t addr)

> +{

> +    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;

> +

> +    rh = MASK_EXTR(addr, MSI_ADDR_REDIRECTION_MASK);

> +    dm = MASK_EXTR(addr, MSI_ADDR_DESTMODE_MASK);

> +    dest_id = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);

> +    deliv_mode = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);

> +    trig_mode = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);

> +

> +    return (dest_id << GFLAGS_SHIFT_DEST_ID) | (rh << GFLAGS_SHIFT_RH)

> |

> +           (dm << GFLAGS_SHIFT_DM) | (deliv_mode <<

> GFLAGS_SHIFT_DELIV_MODE) |

> +           (trig_mode << GFLAGS_SHIFT_TRG_MODE);

> +}

> +

> +void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                        unsigned int entry, bool mask)

> +{

> +    struct domain *d = pdev->domain;

> +    const struct pirq *pinfo;

> +    struct irq_desc *desc;

> +    unsigned long flags;

> +    int irq;

> +

> +    ASSERT(arch->pirq >= 0);

> +    pinfo = pirq_info(d, arch->pirq + entry);

> +    ASSERT(pinfo);

> +

> +    irq = pinfo->arch.irq;

> +    ASSERT(irq < nr_irqs && irq >= 0);

> +

> +    desc = irq_to_desc(irq);

> +    ASSERT(desc);

> +

> +    spin_lock_irqsave(&desc->lock, flags);

> +    guest_mask_msi_irq(desc, mask);

> +    spin_unlock_irqrestore(&desc->lock, flags);

> +}

> +

> +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                         uint64_t address, uint32_t data, unsigned int vectors)

> +{

> +    struct msi_info msi_info = {

> +        .seg = pdev->seg,

> +        .bus = pdev->bus,

> +        .devfn = pdev->devfn,

> +        .entry_nr = vectors,

> +    };

> +    unsigned int i;

> +    int rc;

> +

> +    ASSERT(arch->pirq == -1);

> +

> +    /* Get a PIRQ. */

> +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,

> +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);

> +    if ( rc )

> +    {

> +        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ:

> %d\n",

> +                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                PCI_FUNC(pdev->devfn), rc);

> +        return rc;

> +    }

> +

> +    for ( i = 0; i < vectors; i++ )

> +    {

> +        xen_domctl_bind_pt_irq_t bind = {

> +            .machine_irq = arch->pirq + i,

> +            .irq_type = PT_IRQ_TYPE_MSI,

> +            .u.msi.gvec = msi_vector(data) + i,

> +            .u.msi.gflags = msi_flags(data, address),

> +        };

> +

> +        pcidevs_lock();

> +        rc = pt_irq_create_bind(pdev->domain, &bind);

> +        if ( rc )

> +        {

> +            dprintk(XENLOG_ERR,

> +                    "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",

> +                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                    PCI_FUNC(pdev->devfn), arch->pirq + i, rc);

> +            spin_lock(&pdev->domain->event_lock);

> +            unmap_domain_pirq(pdev->domain, arch->pirq);

> +            spin_unlock(&pdev->domain->event_lock);

> +            pcidevs_unlock();

> +            arch->pirq = -1;

> +            return rc;

> +        }

> +        pcidevs_unlock();

> +    }

> +

> +    return 0;

> +}

> +

> +int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev

> *pdev,

> +                          unsigned int vectors)

> +{

> +    unsigned int i;

> +

> +    ASSERT(arch->pirq != -1);

> +

> +    for ( i = 0; i < vectors; i++ )

> +    {

> +        xen_domctl_bind_pt_irq_t bind = {

> +            .machine_irq = arch->pirq + i,

> +            .irq_type = PT_IRQ_TYPE_MSI,

> +        };

> +

> +        pcidevs_lock();

> +        pt_irq_destroy_bind(pdev->domain, &bind);

> +        pcidevs_unlock();

> +    }

> +

> +    pcidevs_lock();

> +    spin_lock(&pdev->domain->event_lock);

> +    unmap_domain_pirq(pdev->domain, arch->pirq);

> +    spin_unlock(&pdev->domain->event_lock);

> +    pcidevs_unlock();

> +

> +    arch->pirq = -1;

> +

> +    return 0;

> +}

> +

> +int vpci_msi_arch_init(struct vpci_arch_msi *arch)

> +{

> +    arch->pirq = -1;

> +    return 0;

> +}

> +

> +void vpci_msi_arch_print(struct vpci_arch_msi *arch, uint16_t data,

> +                         uint64_t addr)

> +{

> +    printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu pirq: %d\n",

> +           MASK_EXTR(data, MSI_DATA_VECTOR_MASK),

> +           data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",

> +           data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",

> +           data & MSI_DATA_LEVEL_ASSERT ? "" : "de",

> +           addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",

> +           addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "cpu",

> +           MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),

> +           arch->pirq);

> +}

> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c

> index d98f400699..573378d6c3 100644

> --- a/xen/arch/x86/msi.c

> +++ b/xen/arch/x86/msi.c

> @@ -30,6 +30,7 @@

>  #include <public/physdev.h>

>  #include <xen/iommu.h>

>  #include <xsm/xsm.h>

> +#include <xen/vpci.h>

> 

>  static s8 __read_mostly use_msi = -1;

>  boolean_param("msi", use_msi);

> @@ -1536,6 +1537,8 @@ static void dump_msi(unsigned char key)

>                 attr.guest_masked ? 'G' : ' ',

>                 mask);

>      }

> +

> +    vpci_dump_msi();

>  }

> 

>  static int __init msi_setup_keyhandler(void)

> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile

> index 241467212f..62cec9e82b 100644

> --- a/xen/drivers/vpci/Makefile

> +++ b/xen/drivers/vpci/Makefile

> @@ -1 +1 @@

> -obj-y += vpci.o header.o

> +obj-y += vpci.o header.o msi.o

> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c

> new file mode 100644

> index 0000000000..d8f3418616

> --- /dev/null

> +++ b/xen/drivers/vpci/msi.c

> @@ -0,0 +1,348 @@

> +/*

> + * Handlers for accesses to the MSI capability structure.

> + *

> + * Copyright (C) 2017 Citrix Systems R&D

> + *

> + * This program is free software; you can redistribute it and/or

> + * modify it under the terms and conditions of the GNU General Public

> + * License, version 2, as published by the Free Software Foundation.

> + *

> + * This program is distributed in the hope that it will be useful,

> + * but WITHOUT ANY WARRANTY; without even the implied warranty of

> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> GNU

> + * General Public License for more details.

> + *

> + * You should have received a copy of the GNU General Public

> + * License along with this program; If not, see

> <http://www.gnu.org/licenses/>.

> + */

> +

> +#include <xen/sched.h>

> +#include <xen/vpci.h>

> +#include <asm/msi.h>

> +#include <xen/keyhandler.h>

> +

> +/* Handlers for the MSI control field (PCI_MSI_FLAGS). */

> +static void vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,

> +                                  union vpci_val *val, void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    /* Set multiple message capable. */

> +    val->u16 = MASK_INSR(fls(msi->max_vectors) - 1,

> PCI_MSI_FLAGS_QMASK);

> +

> +    if ( msi->enabled ) {

> +        val->u16 |= PCI_MSI_FLAGS_ENABLE;

> +        val->u16 |= MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE);

> +    }

> +    val->u16 |= msi->masking ? PCI_MSI_FLAGS_MASKBIT : 0;

> +    val->u16 |= msi->address64 ? PCI_MSI_FLAGS_64BIT : 0;

> +}

> +

> +static void vpci_msi_enable(struct pci_dev *pdev, struct vpci_msi *msi,

> +                            unsigned int vectors)

> +{

> +    int ret;

> +

> +    ASSERT(!msi->vectors);

> +

> +    ret = vpci_msi_arch_enable(&msi->arch, pdev, msi->address, msi->data,

> +                               vectors);

> +    if ( ret )

> +        return;

> +

> +    /* Apply the mask bits. */

> +    if ( msi->masking )

> +    {

> +        unsigned int i;

> +        uint32_t mask = msi->mask;

> +

> +        for ( i = ffs(mask) - 1; mask && i < vectors; i = ffs(mask) - 1 )

> +        {

> +            vpci_msi_arch_mask(&msi->arch, pdev, i, true);

> +            __clear_bit(i, &mask);

> +        }

> +    }

> +

> +    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                     PCI_FUNC(pdev->devfn), msi->pos, 1);

> +

> +    msi->vectors = vectors;

> +    msi->enabled = true;

> +}

> +

> +static int vpci_msi_disable(struct pci_dev *pdev, struct vpci_msi *msi)

> +{

> +    int ret;

> +

> +    ASSERT(msi->vectors);

> +

> +    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                     PCI_FUNC(pdev->devfn), msi->pos, 0);

> +

> +    ret = vpci_msi_arch_disable(&msi->arch, pdev, msi->vectors);

> +    if ( ret )

> +        return ret;

> +

> +    msi->vectors = 0;

> +    msi->enabled = false;

> +

> +    return 0;

> +}

> +

> +static void vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,

> +                                   union vpci_val val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +    unsigned int vectors = 1 << MASK_EXTR(val.u16, PCI_MSI_FLAGS_QSIZE);

> +    int ret;

> +

> +    if ( vectors > msi->max_vectors )

> +        vectors = msi->max_vectors;

> +

> +    if ( !!(val.u16 & PCI_MSI_FLAGS_ENABLE) == msi->enabled &&

> +         (vectors == msi->vectors || !msi->enabled) )

> +        return;


Personally I find the above logic a little tricky to follow. Would it be clearer to fold it into the logic below? (I only understood the reason for the logic above after reading the logic below).

> +

> +    if ( val.u16 & PCI_MSI_FLAGS_ENABLE )

> +    {

> +        if ( msi->enabled )

> +        {

> +            /*

> +             * Change to the number of enabled vectors, disable and

> +             * enable MSI in order to apply it.

> +             */

> +            ret = vpci_msi_disable(pdev, msi);

> +            if ( ret )

> +                return;

> +        }

> +        vpci_msi_enable(pdev, msi, vectors);

> +    }

> +    else

> +        vpci_msi_disable(pdev, msi);

> +}

> +

> +/* Handlers for the address field (32bit or low part of a 64bit address). */

> +static void vpci_msi_address_read(struct pci_dev *pdev, unsigned int reg,

> +                                  union vpci_val *val, void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    val->u32 = msi->address;

> +}

> +

> +static void vpci_msi_address_write(struct pci_dev *pdev, unsigned int reg,

> +                                   union vpci_val val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +

> +    /* Clear low part. */

> +    msi->address &= ~(uint64_t)0xffffffff;

> +    msi->address |= val.u32;

> +}

> +

> +/* Handlers for the high part of a 64bit address field. */

> +static void vpci_msi_address_upper_read(struct pci_dev *pdev, unsigned

> int reg,

> +                                        union vpci_val *val, void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    val->u32 = msi->address >> 32;

> +}

> +

> +static void vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned

> int reg,

> +                                         union vpci_val val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +

> +    /* Clear high part. */

> +    msi->address &= ~((uint64_t)0xffffffff << 32);

> +    msi->address |= (uint64_t)val.u32 << 32;

> +}

> +

> +/* Handlers for the data field. */

> +static void vpci_msi_data_read(struct pci_dev *pdev, unsigned int reg,

> +                               union vpci_val *val, void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    val->u16 = msi->data;

> +}

> +

> +static void vpci_msi_data_write(struct pci_dev *pdev, unsigned int reg,

> +                                union vpci_val val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +

> +    msi->data = val.u16;

> +}

> +

> +static void vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,

> +                               union vpci_val *val, void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    val->u32 = msi->mask;

> +}

> +

> +static void vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,

> +                                union vpci_val val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +    uint32_t dmask;

> +

> +    dmask = msi->mask ^ val.u32;

> +

> +    if ( !dmask )

> +        return;

> +

> +    if ( msi->enabled )

> +    {

> +        unsigned int i;

> +

> +        for ( i = ffs(dmask) - 1; dmask && i < msi->vectors;

> +              i = ffs(dmask) - 1 )

> +        {

> +            vpci_msi_arch_mask(&msi->arch, pdev, i, MASK_EXTR(val.u32, 1 <<

> i));

> +            __clear_bit(i, &dmask);

> +        }

> +    }

> +

> +    msi->mask = val.u32;

> +}

> +

> +static int vpci_init_msi(struct pci_dev *pdev)

> +{

> +    uint8_t seg = pdev->seg, bus = pdev->bus;

> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);

> +    struct vpci_msi *msi;

> +    unsigned int msi_offset;

> +    uint16_t control;

> +    int ret;

> +

> +    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);

> +    if ( !msi_offset )

> +        return 0;

> +

> +    msi = xzalloc(struct vpci_msi);

> +    if ( !msi )

> +        return -ENOMEM;

> +

> +    msi->pos = msi_offset;

> +

> +    control = pci_conf_read16(seg, bus, slot, func,

> +                              msi_control_reg(msi_offset));

> +

> +    ret = vpci_add_register(pdev, vpci_msi_control_read,

> +                            vpci_msi_control_write,

> +                            msi_control_reg(msi_offset), 2, msi);

> +    if ( ret )

> +        goto error;

> +

> +    /* Get the maximum number of vectors the device supports. */

> +    msi->max_vectors = multi_msi_capable(control);

> +    ASSERT(msi->max_vectors <= 32);

> +

> +    /* No PIRQ bind yet. */

> +    vpci_msi_arch_init(&msi->arch);

> +

> +    if ( is_64bit_address(control) )

> +        msi->address64 = true;

> +    if ( is_mask_bit_support(control) )

> +        msi->masking = true;

> +

> +    ret = vpci_add_register(pdev, vpci_msi_address_read,

> +                            vpci_msi_address_write,

> +                            msi_lower_address_reg(msi_offset), 4, msi);

> +    if ( ret )

> +        goto error;

> +

> +    ret = vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,

> +                            msi_data_reg(msi_offset, msi->address64), 2,

> +                            msi);

> +    if ( ret )

> +        goto error;

> +

> +    if ( msi->address64 )

> +    {

> +        ret = vpci_add_register(pdev, vpci_msi_address_upper_read,

> +                                vpci_msi_address_upper_write,

> +                                msi_upper_address_reg(msi_offset), 4, msi);

> +        if ( ret )

> +            goto error;

> +    }

> +

> +    if ( msi->masking )

> +    {

> +        ret = vpci_add_register(pdev, vpci_msi_mask_read,

> vpci_msi_mask_write,

> +                                msi_mask_bits_reg(msi_offset,

> +                                                  msi->address64), 4, msi);

> +        if ( ret )

> +            goto error;

> +    }

> +

> +    pdev->vpci->msi = msi;

> +

> +    return 0;

> +

> + error:


Do you not need to clean up any added register handlers here? They have been given a context value which you're about to xfree().

  Paul

> +    ASSERT(ret);

> +    xfree(msi);

> +    return ret;

> +}

> +

> +REGISTER_VPCI_INIT(vpci_init_msi);

> +

> +void vpci_dump_msi(void)

> +{

> +    struct domain *d;

> +

> +    for_each_domain ( d )

> +    {

> +        const struct pci_dev *pdev;

> +

> +        if ( !has_vpci(d) )

> +            continue;

> +

> +        printk("vPCI MSI information for guest %u\n", d->domain_id);

> +

> +        if ( !vpci_trylock(d) )

> +        {

> +            printk("Unable to get vPCI lock, skipping\n");

> +            continue;

> +        }

> +

> +        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list )

> +        {

> +            uint8_t seg = pdev->seg, bus = pdev->bus;

> +            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev-

> >devfn);

> +            struct vpci_msi *msi = pdev->vpci->msi;

> +

> +            if ( !msi )

> +                continue;

> +

> +            printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);

> +

> +            printk("Enabled: %u Supports masking: %u 64-bit addresses: %u\n",

> +                   msi->enabled, msi->masking, msi->address64);

> +            printk("Max vectors: %u enabled vectors: %u\n",

> +                   msi->max_vectors, msi->vectors);

> +

> +            vpci_msi_arch_print(&msi->arch, msi->data, msi->address);

> +

> +            if ( msi->masking )

> +                printk("mask=%#032x\n", msi->mask);

> +        }

> +        vpci_unlock(d);

> +    }

> +}

> +

> +/*

> + * Local variables:

> + * mode: C

> + * c-file-style: "BSD"

> + * c-basic-offset: 4

> + * tab-width: 4

> + * indent-tabs-mode: nil

> + * End:

> + */

> +

> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h

> index 4fe996fe49..55ed094734 100644

> --- a/xen/include/asm-x86/hvm/io.h

> +++ b/xen/include/asm-x86/hvm/io.h

> @@ -20,6 +20,7 @@

>  #define __ASM_X86_HVM_IO_H__

> 

>  #include <xen/mm.h>

> +#include <xen/pci.h>

>  #include <asm/hvm/vpic.h>

>  #include <asm/hvm/vioapic.h>

>  #include <public/hvm/ioreq.h>

> @@ -126,6 +127,23 @@ void hvm_dpci_eoi(struct domain *d, unsigned int

> guest_irq,

>  void msix_write_completion(struct vcpu *);

>  void msixtbl_init(struct domain *d);

> 

> +/* Arch-specific MSI data for vPCI. */

> +struct vpci_arch_msi {

> +    int pirq;

> +};

> +

> +/* Arch-specific vPCI MSI helpers. */

> +void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                        unsigned int entry, bool mask);

> +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                         uint64_t address, uint32_t data,

> +                         unsigned int vectors);

> +int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev

> *pdev,

> +                          unsigned int vectors);

> +int vpci_msi_arch_init(struct vpci_arch_msi *arch);

> +void vpci_msi_arch_print(struct vpci_arch_msi *arch, uint16_t data,

> +                         uint64_t addr);

> +

>  enum stdvga_cache_state {

>      STDVGA_CACHE_UNINITIALIZED,

>      STDVGA_CACHE_ENABLED,

> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h

> index 213ee53f72..9c36c34372 100644

> --- a/xen/include/asm-x86/msi.h

> +++ b/xen/include/asm-x86/msi.h

> @@ -48,6 +48,7 @@

>  #define MSI_ADDR_REDIRECTION_SHIFT  3

>  #define MSI_ADDR_REDIRECTION_CPU    (0 <<

> MSI_ADDR_REDIRECTION_SHIFT)

>  #define MSI_ADDR_REDIRECTION_LOWPRI (1 <<

> MSI_ADDR_REDIRECTION_SHIFT)

> +#define MSI_ADDR_REDIRECTION_MASK   0x8

> 

>  #define MSI_ADDR_DEST_ID_SHIFT		12

>  #define	 MSI_ADDR_DEST_ID_MASK		0x00ff000

> diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h

> index 0d2c72c109..d07185a479 100644

> --- a/xen/include/xen/hvm/irq.h

> +++ b/xen/include/xen/hvm/irq.h

> @@ -57,7 +57,9 @@ struct dev_intx_gsi_link {

>  #define VMSI_DELIV_MASK   0x7000

>  #define VMSI_TRIG_MODE    0x8000

> 

> +#define GFLAGS_SHIFT_DEST_ID        0

>  #define GFLAGS_SHIFT_RH             8

> +#define GFLAGS_SHIFT_DM             9

>  #define GFLAGS_SHIFT_DELIV_MODE     12

>  #define GFLAGS_SHIFT_TRG_MODE       15

> 

> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h

> index 452ee482e8..2a7d7557b3 100644

> --- a/xen/include/xen/vpci.h

> +++ b/xen/include/xen/vpci.h

> @@ -13,6 +13,7 @@

>   * of just returning whether the lock is hold by any CPU).

>   */

>  #define vpci_lock(d) spin_lock_recursive(&(d)-

> >arch.hvm_domain.vpci_lock)

> +#define vpci_trylock(d) spin_trylock_recursive(&(d)-

> >arch.hvm_domain.vpci_lock)

>  #define vpci_unlock(d) spin_unlock_recursive(&(d)-

> >arch.hvm_domain.vpci_lock)

>  #define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)

> 

> @@ -85,9 +86,34 @@ struct vpci {

>          } bars[7]; /* At most 6 BARS + 1 expansion ROM BAR. */

>          /* FIXME: currently there's no support for SR-IOV. */

>      } header;

> +

> +    /* MSI data. */

> +    struct vpci_msi {

> +        /* Offset of the capability in the config space. */

> +        unsigned int pos;

> +        /* Maximum number of vectors supported by the device. */

> +        unsigned int max_vectors;

> +        /* Number of vectors configured. */

> +        unsigned int vectors;

> +        /* Address and data fields. */

> +        uint64_t address;

> +        uint16_t data;

> +        /* Mask bitfield. */

> +        uint32_t mask;

> +        /* Enabled? */

> +        bool enabled;

> +        /* Supports per-vector masking? */

> +        bool masking;

> +        /* 64-bit address capable? */

> +        bool address64;

> +        /* Arch-specific data. */

> +        struct vpci_arch_msi arch;

> +    } *msi;

>  #endif

>  };

> 

> +void vpci_dump_msi(void);

> +

>  #endif

> 

>  /*

> --

> 2.11.0 (Apple Git-81)
Jan Beulich Aug. 2, 2017, 1:34 p.m. UTC | #2
>>> Roger Pau Monne <roger.pau@citrix.com> 06/30/17 5:01 PM >>>
>Add handlers for the MSI control, address, data and mask fields in
>order to detect accesses to them and setup the interrupts as requested
>by the guest.
>
>Note that the pending register is not trapped, and the guest can
>freely read/write to it.
>
>Whether Xen is going to provide this functionality to Dom0 (MSI
>emulation) is controlled by the "msi" option in the dom0 field. When
>disabling this option Xen will hide the MSI capability structure from
>Dom0.

Isn't this last paragraph stale now?

>+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>+                        unsigned int entry, bool mask)
>+{
>+    struct domain *d = pdev->domain;
>+    const struct pirq *pinfo;
>+    struct irq_desc *desc;
>+    unsigned long flags;
>+    int irq;
>+
>+    ASSERT(arch->pirq >= 0);
>+    pinfo = pirq_info(d, arch->pirq + entry);
>+    ASSERT(pinfo);
>+
>+    irq = pinfo->arch.irq;
>+    ASSERT(irq < nr_irqs && irq >= 0);
>+
>+    desc = irq_to_desc(irq);
>+    ASSERT(desc);

I know the goal is Dom0 support only at this point, but nevertheless I think
we shouldn't have ASSERT()s in place which could trigger if Dom0
misbehaves (and which would all need to be audited if we were to extend
support to DomU): I'm not convinced all of the ones above could really only
trigger depending on Xen (mis)behavior.

>+int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>+                         uint64_t address, uint32_t data, unsigned int vectors)
>+{
>+    struct msi_info msi_info = {
>+        .seg = pdev->seg,
>+        .bus = pdev->bus,
>+        .devfn = pdev->devfn,
>+        .entry_nr = vectors,
>+    };
>+    unsigned int i;
>+    int rc;
>+
>+    ASSERT(arch->pirq == -1);

Please introduce a #define for the -1 here, to allow easily matching up
producer and consumer side(s).

>+    /* Get a PIRQ. */
>+    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
>+                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
>+    if ( rc )
>+    {
>+        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
>+                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>+                PCI_FUNC(pdev->devfn), rc);
>+        return rc;
>+    }
>+
>+    for ( i = 0; i < vectors; i++ )
>+    {
>+        xen_domctl_bind_pt_irq_t bind = {
>+            .machine_irq = arch->pirq + i,
>+            .irq_type = PT_IRQ_TYPE_MSI,
>+            .u.msi.gvec = msi_vector(data) + i,
>+            .u.msi.gflags = msi_flags(data, address),
>+        };
>+
>+        pcidevs_lock();
>+        rc = pt_irq_create_bind(pdev->domain, &bind);
>+        if ( rc )
>+        {
>+            dprintk(XENLOG_ERR,
>+                    "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
>+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>+                    PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
>+            spin_lock(&pdev->domain->event_lock);
>+            unmap_domain_pirq(pdev->domain, arch->pirq);

Don't you also need to undo the pt_irq_create_bind() calls here for all prior
successful iterations?

>+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>+                          unsigned int vectors)
>+{
>+    unsigned int i;
>+
>+    ASSERT(arch->pirq != -1);
>+
>+    for ( i = 0; i < vectors; i++ )
>+    {
>+        xen_domctl_bind_pt_irq_t bind = {
>+            .machine_irq = arch->pirq + i,
>+            .irq_type = PT_IRQ_TYPE_MSI,
>+        };
>+
>+        pcidevs_lock();
>+        pt_irq_destroy_bind(pdev->domain, &bind);

While I agree that the loop should continue of this fails, I'm not convinced
you should entirely ignore the return value here.

>+        pcidevs_unlock();
>+    }
>+
>+    pcidevs_lock();

What good does it do to acquire the lock for most of the loop body as well
as for most of the epilogue, instead of just acquiring it once ahead of the
loop?

>+int vpci_msi_arch_init(struct vpci_arch_msi *arch)
>+{
>+    arch->pirq = -1;
>+    return 0;
>+}

At this point I think the function would better return void.

>+void vpci_msi_arch_print(struct vpci_arch_msi *arch, uint16_t data,

const

>+                         uint64_t addr)
>+{
>+    printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu pirq: %d\n",
>+           MASK_EXTR(data, MSI_DATA_VECTOR_MASK),
>+           data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
>+           data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
>+           data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
>+           addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
>+           addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "cpu",

Why "cpu"? Elsewhere we call this mode "fixed".

>--- /dev/null
>+++ b/xen/drivers/vpci/msi.c
>@@ -0,0 +1,348 @@
>+/*
>+ * Handlers for accesses to the MSI capability structure.
>+ *
>+ * Copyright (C) 2017 Citrix Systems R&D
>+ *
>+ * This program is free software; you can redistribute it and/or
>+ * modify it under the terms and conditions of the GNU General Public
>+ * License, version 2, as published by the Free Software Foundation.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>+ * General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public
>+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
>+ */
>+
>+#include <xen/sched.h>
>+#include <xen/vpci.h>
>+#include <asm/msi.h>
>+#include <xen/keyhandler.h>

Please don't mix xen/ and asm/ includes, and where possible please also
alphabetically sort each group.

>+/* Handlers for the MSI control field (PCI_MSI_FLAGS). */
>+static void vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
>+                                  union vpci_val *val, void *data)
>+{
>+    const struct vpci_msi *msi = data;
>+
>+    /* Set multiple message capable. */
>+    val->u16 = MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK);

The comment is somewhat misleading - whether the device is multi-message
capable depends on msi->max_vectors.

>+    if ( msi->enabled ) {

Style.

>+        val->u16 |= PCI_MSI_FLAGS_ENABLE;
>+        val->u16 |= MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE);

Why is reading back the proper value here dependent upon MSI being
enabled?

>+static void vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
>+                                   union vpci_val val, void *data)
>+{
>+    struct vpci_msi *msi = data;
>+    unsigned int vectors = 1 << MASK_EXTR(val.u16, PCI_MSI_FLAGS_QSIZE);
>+    int ret;
>+
>+    if ( vectors > msi->max_vectors )
>+        vectors = msi->max_vectors;
>+
>+    if ( !!(val.u16 & PCI_MSI_FLAGS_ENABLE) == msi->enabled &&
>+         (vectors == msi->vectors || !msi->enabled) )
>+        return;
>+
>+    if ( val.u16 & PCI_MSI_FLAGS_ENABLE )
>+    {
>+        if ( msi->enabled )
>+        {
>+            /*
>+             * Change to the number of enabled vectors, disable and
>+             * enable MSI in order to apply it.
>+             */

At least the first part of the comment would appear to belong outside the
inner if().

>+            ret = vpci_msi_disable(pdev, msi);
>+            if ( ret )
>+                return;

Returning here without doing anything is at least strange, and hence
would call for a comment to be attached to explain the intentions.

>+static void vpci_msi_address_write(struct pci_dev *pdev, unsigned int reg,
>+                                   union vpci_val val, void *data)
>+{
>+    struct vpci_msi *msi = data;
>+
>+    /* Clear low part. */
>+    msi->address &= ~(uint64_t)0xffffffff;

~0xffffffffull?

>+static void vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
>+                                         union vpci_val val, void *data)
>+{
>+    struct vpci_msi *msi = data;
>+
>+    /* Clear high part. */
>+    msi->address &= ~((uint64_t)0xffffffff << 32);

Simply 0xffffffff?

+static void vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,

Earlier read/write pairs had comments ahead of them - for consistency
one would then belong here too.

>+static void vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,
>+                                union vpci_val val, void *data)
>+{
>+    struct vpci_msi *msi = data;
>+    uint32_t dmask;
>+
>+    dmask = msi->mask ^ val.u32;
>+
>+    if ( !dmask )
>+        return;
>+
>+    if ( msi->enabled )
>+    {
>+        unsigned int i;
>+
>+        for ( i = ffs(dmask) - 1; dmask && i < msi->vectors;
>+              i = ffs(dmask) - 1 )
>+        {
>+            vpci_msi_arch_mask(&msi->arch, pdev, i, MASK_EXTR(val.u32, 1 << i));

I don't think using MASK_EXTR() here is really advisable? Could be as simple
as (val.u32 >> i) & 1.

>+static int vpci_init_msi(struct pci_dev *pdev)
>+{
>+    uint8_t seg = pdev->seg, bus = pdev->bus;
>+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
>+    struct vpci_msi *msi;
>+    unsigned int msi_offset;

Elsewhere we call such variables just "pos".

>+    uint16_t control;
>+    int ret;
>+
>+    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
>+    if ( !msi_offset )
>+        return 0;
>+
>+    msi = xzalloc(struct vpci_msi);
>+    if ( !msi )
>+        return -ENOMEM;
>+
>+    msi->pos = msi_offset;
>+
>+    control = pci_conf_read16(seg, bus, slot, func,
>+                              msi_control_reg(msi_offset));

You don't use the value until ...

>+    ret = vpci_add_register(pdev, vpci_msi_control_read,
>+                            vpci_msi_control_write,
>+                            msi_control_reg(msi_offset), 2, msi);
>+    if ( ret )
>+        goto error;
>+
>+    /* Get the maximum number of vectors the device supports. */
>+    msi->max_vectors = multi_msi_capable(control);

... here. Please move the read down.

>+    ASSERT(msi->max_vectors <= 32);
>+
>+    /* No PIRQ bind yet. */
>+    vpci_msi_arch_init(&msi->arch);

s/bind/bound/ in the comment?

>...
>+ error:
>+    ASSERT(ret);
>+    xfree(msi);
>+    return ret;
>+}

Don't you also need to unregister address handlers you've registered?

>+void vpci_dump_msi(void)
>+{
>+    struct domain *d;
>+
>+    for_each_domain ( d )
>+    {
>+        const struct pci_dev *pdev;
>+
>+        if ( !has_vpci(d) )
>+            continue;
>+
>+        printk("vPCI MSI information for guest %u\n", d->domain_id);

"... for Dom%d" or "... for d%d" please.

>...
>+            if ( msi->masking )
>+                printk("mask=%#032x\n", msi->mask);

Why 30 hex digits? And generally # should be used only when not blank or
zero padding the value (as field width includes the 0x prefix).

>--- a/xen/include/asm-x86/msi.h
>+++ b/xen/include/asm-x86/msi.h
>@@ -48,6 +48,7 @@
>#define MSI_ADDR_REDIRECTION_SHIFT  3
>#define MSI_ADDR_REDIRECTION_CPU    (0 << MSI_ADDR_REDIRECTION_SHIFT)
>#define MSI_ADDR_REDIRECTION_LOWPRI (1 << MSI_ADDR_REDIRECTION_SHIFT)
>+#define MSI_ADDR_REDIRECTION_MASK   0x8

 (1 << MSI_ADDR_REDIRECTION_SHIFT) please.
 
>+    struct vpci_msi {
>+        /* Offset of the capability in the config space. */
>+        unsigned int pos;
>+        /* Maximum number of vectors supported by the device. */
>+        unsigned int max_vectors;
>+        /* Number of vectors configured. */
>+        unsigned int vectors;
>+        /* Address and data fields. */
>+        uint64_t address;
>+        uint16_t data;
>+        /* Mask bitfield. */
>+        uint32_t mask;
>+        /* Enabled? */
>+        bool enabled;
>+        /* Supports per-vector masking? */
>+        bool masking;
>+        /* 64-bit address capable? */
>+        bool address64;
>+        /* Arch-specific data. */
>+        struct vpci_arch_msi arch;
>+    } *msi;

Please try to reduce the number/size of holes.

Jan
Roger Pau Monné Aug. 8, 2017, 3:44 p.m. UTC | #3
On Wed, Aug 02, 2017 at 07:34:28AM -0600, Jan Beulich wrote:
> >>> Roger Pau Monne <roger.pau@citrix.com> 06/30/17 5:01 PM >>>
> >+int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> >+                         uint64_t address, uint32_t data, unsigned int vectors)
> >+{
> >+    struct msi_info msi_info = {
> >+        .seg = pdev->seg,
> >+        .bus = pdev->bus,
> >+        .devfn = pdev->devfn,
> >+        .entry_nr = vectors,
> >+    };
> >+    unsigned int i;
> >+    int rc;
> >+
> >+    ASSERT(arch->pirq == -1);
> 
> Please introduce a #define for the -1 here, to allow easily matching up
> producer and consumer side(s).

I've added a define for INVALID_PIRQ to xen/irq.h.

> >+    /* Get a PIRQ. */
> >+    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
> >+                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
> >+    if ( rc )
> >+    {
> >+        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> >+                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >+                PCI_FUNC(pdev->devfn), rc);
> >+        return rc;
> >+    }
> >+
> >+    for ( i = 0; i < vectors; i++ )
> >+    {
> >+        xen_domctl_bind_pt_irq_t bind = {
> >+            .machine_irq = arch->pirq + i,
> >+            .irq_type = PT_IRQ_TYPE_MSI,
> >+            .u.msi.gvec = msi_vector(data) + i,
> >+            .u.msi.gflags = msi_flags(data, address),
> >+        };
> >+
> >+        pcidevs_lock();
> >+        rc = pt_irq_create_bind(pdev->domain, &bind);
> >+        if ( rc )
> >+        {
> >+            dprintk(XENLOG_ERR,
> >+                    "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
> >+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >+                    PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
> >+            spin_lock(&pdev->domain->event_lock);
> >+            unmap_domain_pirq(pdev->domain, arch->pirq);
> 
> Don't you also need to undo the pt_irq_create_bind() calls here for all prior
> successful iterations?

Yes, unmap_domain_pirq calls pirq_guest_force_unbind but better not
resort to that.

> >+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> >+                          unsigned int vectors)
> >+{
> >+    unsigned int i;
> >+
> >+    ASSERT(arch->pirq != -1);
> >+
> >+    for ( i = 0; i < vectors; i++ )
> >+    {
> >+        xen_domctl_bind_pt_irq_t bind = {
> >+            .machine_irq = arch->pirq + i,
> >+            .irq_type = PT_IRQ_TYPE_MSI,
> >+        };
> >+
> >+        pcidevs_lock();
> >+        pt_irq_destroy_bind(pdev->domain, &bind);
> 
> While I agree that the loop should continue of this fails, I'm not convinced
> you should entirely ignore the return value here.

I've added a printk in order to aid debug.

> >+/* Handlers for the MSI control field (PCI_MSI_FLAGS). */
> >+static void vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
> >+                                  union vpci_val *val, void *data)
> >+{
> >+    const struct vpci_msi *msi = data;
> >+
> >+    /* Set multiple message capable. */
> >+    val->u16 = MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK);
> 
> The comment is somewhat misleading - whether the device is multi-message
> capable depends on msi->max_vectors.

Better "Set the number of supported messages"?

> >+    if ( msi->enabled ) {
> 
> Style.
> 
> >+        val->u16 |= PCI_MSI_FLAGS_ENABLE;
> >+        val->u16 |= MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE);
> 
> Why is reading back the proper value here dependent upon MSI being
> enabled?

Right, I've now slightly changed this to always store the number of
enabled vectors, regardless of whether the MSI enable bit is set or
not.

> >...
> >+ error:
> >+    ASSERT(ret);
> >+    xfree(msi);
> >+    return ret;
> >+}
> 
> Don't you also need to unregister address handlers you've registered?

vpci_add_handlers already takes care of cleaning up the register
handlers on failure.

> >+void vpci_dump_msi(void)
> >+{
> >+    struct domain *d;
> >+
> >+    for_each_domain ( d )
> >+    {
> >+        const struct pci_dev *pdev;
> >+
> >+        if ( !has_vpci(d) )
> >+            continue;
> >+
> >+        printk("vPCI MSI information for guest %u\n", d->domain_id);
> 
> "... for Dom%d" or "... for d%d" please.
> 
> >...
> >+            if ( msi->masking )
> >+                printk("mask=%#032x\n", msi->mask);
> 
> Why 30 hex digits? And generally # should be used only when not blank or
> zero padding the value (as field width includes the 0x prefix).

Ouch, that should be 8, not 32.

Thanks, Roger.
Jan Beulich Aug. 9, 2017, 8:21 a.m. UTC | #4
>>> On 08.08.17 at 17:44, <roger.pau@citrix.com> wrote:
> On Wed, Aug 02, 2017 at 07:34:28AM -0600, Jan Beulich wrote:
>> >>> Roger Pau Monne <roger.pau@citrix.com> 06/30/17 5:01 PM >>>
>> >+    /* Get a PIRQ. */
>> >+    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
>> >+                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
>> >+    if ( rc )
>> >+    {
>> >+        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
>> >+                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >+                PCI_FUNC(pdev->devfn), rc);
>> >+        return rc;
>> >+    }
>> >+
>> >+    for ( i = 0; i < vectors; i++ )
>> >+    {
>> >+        xen_domctl_bind_pt_irq_t bind = {
>> >+            .machine_irq = arch->pirq + i,
>> >+            .irq_type = PT_IRQ_TYPE_MSI,
>> >+            .u.msi.gvec = msi_vector(data) + i,
>> >+            .u.msi.gflags = msi_flags(data, address),
>> >+        };
>> >+
>> >+        pcidevs_lock();
>> >+        rc = pt_irq_create_bind(pdev->domain, &bind);
>> >+        if ( rc )
>> >+        {
>> >+            dprintk(XENLOG_ERR,
>> >+                    "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
>> >+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >+                    PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
>> >+            spin_lock(&pdev->domain->event_lock);
>> >+            unmap_domain_pirq(pdev->domain, arch->pirq);
>> 
>> Don't you also need to undo the pt_irq_create_bind() calls here for all prior
>> successful iterations?
> 
> Yes, unmap_domain_pirq calls pirq_guest_force_unbind but better not
> resort to that.

I don't understand.

>> >+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>> >+                          unsigned int vectors)
>> >+{
>> >+    unsigned int i;
>> >+
>> >+    ASSERT(arch->pirq != -1);
>> >+
>> >+    for ( i = 0; i < vectors; i++ )
>> >+    {
>> >+        xen_domctl_bind_pt_irq_t bind = {
>> >+            .machine_irq = arch->pirq + i,
>> >+            .irq_type = PT_IRQ_TYPE_MSI,
>> >+        };
>> >+
>> >+        pcidevs_lock();
>> >+        pt_irq_destroy_bind(pdev->domain, &bind);
>> 
>> While I agree that the loop should continue of this fails, I'm not convinced
>> you should entirely ignore the return value here.
> 
> I've added a printk in order to aid debug.

I've actually tried to hint at you wanting to run the loop to
completion while returning to the caller the first error you've
encountered.

>> >+/* Handlers for the MSI control field (PCI_MSI_FLAGS). */
>> >+static void vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
>> >+                                  union vpci_val *val, void *data)
>> >+{
>> >+    const struct vpci_msi *msi = data;
>> >+
>> >+    /* Set multiple message capable. */
>> >+    val->u16 = MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK);
>> 
>> The comment is somewhat misleading - whether the device is multi-message
>> capable depends on msi->max_vectors.
> 
> Better "Set the number of supported messages"?

Yes.

Jan
Roger Pau Monné Aug. 9, 2017, 8:39 a.m. UTC | #5
On Wed, Aug 09, 2017 at 02:21:33AM -0600, Jan Beulich wrote:
> >>> On 08.08.17 at 17:44, <roger.pau@citrix.com> wrote:
> > On Wed, Aug 02, 2017 at 07:34:28AM -0600, Jan Beulich wrote:
> >> >>> Roger Pau Monne <roger.pau@citrix.com> 06/30/17 5:01 PM >>>
> >> >+    /* Get a PIRQ. */
> >> >+    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
> >> >+                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
> >> >+    if ( rc )
> >> >+    {
> >> >+        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> >> >+                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >> >+                PCI_FUNC(pdev->devfn), rc);
> >> >+        return rc;
> >> >+    }
> >> >+
> >> >+    for ( i = 0; i < vectors; i++ )
> >> >+    {
> >> >+        xen_domctl_bind_pt_irq_t bind = {
> >> >+            .machine_irq = arch->pirq + i,
> >> >+            .irq_type = PT_IRQ_TYPE_MSI,
> >> >+            .u.msi.gvec = msi_vector(data) + i,
> >> >+            .u.msi.gflags = msi_flags(data, address),
> >> >+        };
> >> >+
> >> >+        pcidevs_lock();
> >> >+        rc = pt_irq_create_bind(pdev->domain, &bind);
> >> >+        if ( rc )
> >> >+        {
> >> >+            dprintk(XENLOG_ERR,
> >> >+                    "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
> >> >+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >> >+                    PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
> >> >+            spin_lock(&pdev->domain->event_lock);
> >> >+            unmap_domain_pirq(pdev->domain, arch->pirq);
> >> 
> >> Don't you also need to undo the pt_irq_create_bind() calls here for all prior
> >> successful iterations?
> > 
> > Yes, unmap_domain_pirq calls pirq_guest_force_unbind but better not
> > resort to that.
> 
> I don't understand.

I've added a calls to pt_irq_destroy_bind before calling
unmap_domain_pirq.

> >> >+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> >> >+                          unsigned int vectors)
> >> >+{
> >> >+    unsigned int i;
> >> >+
> >> >+    ASSERT(arch->pirq != -1);
> >> >+
> >> >+    for ( i = 0; i < vectors; i++ )
> >> >+    {
> >> >+        xen_domctl_bind_pt_irq_t bind = {
> >> >+            .machine_irq = arch->pirq + i,
> >> >+            .irq_type = PT_IRQ_TYPE_MSI,
> >> >+        };
> >> >+
> >> >+        pcidevs_lock();
> >> >+        pt_irq_destroy_bind(pdev->domain, &bind);
> >> 
> >> While I agree that the loop should continue of this fails, I'm not convinced
> >> you should entirely ignore the return value here.
> > 
> > I've added a printk in order to aid debug.
> 
> I've actually tried to hint at you wanting to run the loop to
> completion while returning to the caller the first error you've
> encountered.

Hm, I'm not sure of the best way to proceed here.

If vpci_msi_arch_disable returns once one of the pt_irq_destroy_bind
calls fail, further calls to vpci_msi_arch_disable are also likely to
fail if the previous call managed to destroy some of the bindings but
not all of them.

But then trying to call unmap_domain_pirq without having destroyed all
of the bindings seems likely to fail anyway...

Thanks, Roger.
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index a36692c313..5732c70b5c 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -622,3 +622,152 @@  void msix_write_completion(struct vcpu *v)
     if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
         gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
 }
+
+static unsigned int msi_vector(uint16_t data)
+{
+    return MASK_EXTR(data, MSI_DATA_VECTOR_MASK);
+}
+
+static unsigned int msi_flags(uint16_t data, uint64_t addr)
+{
+    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;
+
+    rh = MASK_EXTR(addr, MSI_ADDR_REDIRECTION_MASK);
+    dm = MASK_EXTR(addr, MSI_ADDR_DESTMODE_MASK);
+    dest_id = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);
+    deliv_mode = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);
+    trig_mode = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);
+
+    return (dest_id << GFLAGS_SHIFT_DEST_ID) | (rh << GFLAGS_SHIFT_RH) |
+           (dm << GFLAGS_SHIFT_DM) | (deliv_mode << GFLAGS_SHIFT_DELIV_MODE) |
+           (trig_mode << GFLAGS_SHIFT_TRG_MODE);
+}
+
+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                        unsigned int entry, bool mask)
+{
+    struct domain *d = pdev->domain;
+    const struct pirq *pinfo;
+    struct irq_desc *desc;
+    unsigned long flags;
+    int irq;
+
+    ASSERT(arch->pirq >= 0);
+    pinfo = pirq_info(d, arch->pirq + entry);
+    ASSERT(pinfo);
+
+    irq = pinfo->arch.irq;
+    ASSERT(irq < nr_irqs && irq >= 0);
+
+    desc = irq_to_desc(irq);
+    ASSERT(desc);
+
+    spin_lock_irqsave(&desc->lock, flags);
+    guest_mask_msi_irq(desc, mask);
+    spin_unlock_irqrestore(&desc->lock, flags);
+}
+
+int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                         uint64_t address, uint32_t data, unsigned int vectors)
+{
+    struct msi_info msi_info = {
+        .seg = pdev->seg,
+        .bus = pdev->bus,
+        .devfn = pdev->devfn,
+        .entry_nr = vectors,
+    };
+    unsigned int i;
+    int rc;
+
+    ASSERT(arch->pirq == -1);
+
+    /* Get a PIRQ. */
+    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
+                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
+                pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                PCI_FUNC(pdev->devfn), rc);
+        return rc;
+    }
+
+    for ( i = 0; i < vectors; i++ )
+    {
+        xen_domctl_bind_pt_irq_t bind = {
+            .machine_irq = arch->pirq + i,
+            .irq_type = PT_IRQ_TYPE_MSI,
+            .u.msi.gvec = msi_vector(data) + i,
+            .u.msi.gflags = msi_flags(data, address),
+        };
+
+        pcidevs_lock();
+        rc = pt_irq_create_bind(pdev->domain, &bind);
+        if ( rc )
+        {
+            dprintk(XENLOG_ERR,
+                    "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                    PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
+            spin_lock(&pdev->domain->event_lock);
+            unmap_domain_pirq(pdev->domain, arch->pirq);
+            spin_unlock(&pdev->domain->event_lock);
+            pcidevs_unlock();
+            arch->pirq = -1;
+            return rc;
+        }
+        pcidevs_unlock();
+    }
+
+    return 0;
+}
+
+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                          unsigned int vectors)
+{
+    unsigned int i;
+
+    ASSERT(arch->pirq != -1);
+
+    for ( i = 0; i < vectors; i++ )
+    {
+        xen_domctl_bind_pt_irq_t bind = {
+            .machine_irq = arch->pirq + i,
+            .irq_type = PT_IRQ_TYPE_MSI,
+        };
+
+        pcidevs_lock();
+        pt_irq_destroy_bind(pdev->domain, &bind);
+        pcidevs_unlock();
+    }
+
+    pcidevs_lock();
+    spin_lock(&pdev->domain->event_lock);
+    unmap_domain_pirq(pdev->domain, arch->pirq);
+    spin_unlock(&pdev->domain->event_lock);
+    pcidevs_unlock();
+
+    arch->pirq = -1;
+
+    return 0;
+}
+
+int vpci_msi_arch_init(struct vpci_arch_msi *arch)
+{
+    arch->pirq = -1;
+    return 0;
+}
+
+void vpci_msi_arch_print(struct vpci_arch_msi *arch, uint16_t data,
+                         uint64_t addr)
+{
+    printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu pirq: %d\n",
+           MASK_EXTR(data, MSI_DATA_VECTOR_MASK),
+           data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
+           data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
+           data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
+           addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
+           addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "cpu",
+           MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),
+           arch->pirq);
+}
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index d98f400699..573378d6c3 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -30,6 +30,7 @@ 
 #include <public/physdev.h>
 #include <xen/iommu.h>
 #include <xsm/xsm.h>
+#include <xen/vpci.h>
 
 static s8 __read_mostly use_msi = -1;
 boolean_param("msi", use_msi);
@@ -1536,6 +1537,8 @@  static void dump_msi(unsigned char key)
                attr.guest_masked ? 'G' : ' ',
                mask);
     }
+
+    vpci_dump_msi();
 }
 
 static int __init msi_setup_keyhandler(void)
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index 241467212f..62cec9e82b 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@ 
-obj-y += vpci.o header.o
+obj-y += vpci.o header.o msi.o
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
new file mode 100644
index 0000000000..d8f3418616
--- /dev/null
+++ b/xen/drivers/vpci/msi.c
@@ -0,0 +1,348 @@ 
+/*
+ * Handlers for accesses to the MSI capability structure.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+#include <asm/msi.h>
+#include <xen/keyhandler.h>
+
+/* Handlers for the MSI control field (PCI_MSI_FLAGS). */
+static void vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
+                                  union vpci_val *val, void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    /* Set multiple message capable. */
+    val->u16 = MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK);
+
+    if ( msi->enabled ) {
+        val->u16 |= PCI_MSI_FLAGS_ENABLE;
+        val->u16 |= MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE);
+    }
+    val->u16 |= msi->masking ? PCI_MSI_FLAGS_MASKBIT : 0;
+    val->u16 |= msi->address64 ? PCI_MSI_FLAGS_64BIT : 0;
+}
+
+static void vpci_msi_enable(struct pci_dev *pdev, struct vpci_msi *msi,
+                            unsigned int vectors)
+{
+    int ret;
+
+    ASSERT(!msi->vectors);
+
+    ret = vpci_msi_arch_enable(&msi->arch, pdev, msi->address, msi->data,
+                               vectors);
+    if ( ret )
+        return;
+
+    /* Apply the mask bits. */
+    if ( msi->masking )
+    {
+        unsigned int i;
+        uint32_t mask = msi->mask;
+
+        for ( i = ffs(mask) - 1; mask && i < vectors; i = ffs(mask) - 1 )
+        {
+            vpci_msi_arch_mask(&msi->arch, pdev, i, true);
+            __clear_bit(i, &mask);
+        }
+    }
+
+    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), msi->pos, 1);
+
+    msi->vectors = vectors;
+    msi->enabled = true;
+}
+
+static int vpci_msi_disable(struct pci_dev *pdev, struct vpci_msi *msi)
+{
+    int ret;
+
+    ASSERT(msi->vectors);
+
+    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), msi->pos, 0);
+
+    ret = vpci_msi_arch_disable(&msi->arch, pdev, msi->vectors);
+    if ( ret )
+        return ret;
+
+    msi->vectors = 0;
+    msi->enabled = false;
+
+    return 0;
+}
+
+static void vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
+                                   union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+    unsigned int vectors = 1 << MASK_EXTR(val.u16, PCI_MSI_FLAGS_QSIZE);
+    int ret;
+
+    if ( vectors > msi->max_vectors )
+        vectors = msi->max_vectors;
+
+    if ( !!(val.u16 & PCI_MSI_FLAGS_ENABLE) == msi->enabled &&
+         (vectors == msi->vectors || !msi->enabled) )
+        return;
+
+    if ( val.u16 & PCI_MSI_FLAGS_ENABLE )
+    {
+        if ( msi->enabled )
+        {
+            /*
+             * Change to the number of enabled vectors, disable and
+             * enable MSI in order to apply it.
+             */
+            ret = vpci_msi_disable(pdev, msi);
+            if ( ret )
+                return;
+        }
+        vpci_msi_enable(pdev, msi, vectors);
+    }
+    else
+        vpci_msi_disable(pdev, msi);
+}
+
+/* Handlers for the address field (32bit or low part of a 64bit address). */
+static void vpci_msi_address_read(struct pci_dev *pdev, unsigned int reg,
+                                  union vpci_val *val, void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    val->u32 = msi->address;
+}
+
+static void vpci_msi_address_write(struct pci_dev *pdev, unsigned int reg,
+                                   union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear low part. */
+    msi->address &= ~(uint64_t)0xffffffff;
+    msi->address |= val.u32;
+}
+
+/* Handlers for the high part of a 64bit address field. */
+static void vpci_msi_address_upper_read(struct pci_dev *pdev, unsigned int reg,
+                                        union vpci_val *val, void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    val->u32 = msi->address >> 32;
+}
+
+static void vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
+                                         union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear high part. */
+    msi->address &= ~((uint64_t)0xffffffff << 32);
+    msi->address |= (uint64_t)val.u32 << 32;
+}
+
+/* Handlers for the data field. */
+static void vpci_msi_data_read(struct pci_dev *pdev, unsigned int reg,
+                               union vpci_val *val, void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    val->u16 = msi->data;
+}
+
+static void vpci_msi_data_write(struct pci_dev *pdev, unsigned int reg,
+                                union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    msi->data = val.u16;
+}
+
+static void vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,
+                               union vpci_val *val, void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    val->u32 = msi->mask;
+}
+
+static void vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,
+                                union vpci_val val, void *data)
+{
+    struct vpci_msi *msi = data;
+    uint32_t dmask;
+
+    dmask = msi->mask ^ val.u32;
+
+    if ( !dmask )
+        return;
+
+    if ( msi->enabled )
+    {
+        unsigned int i;
+
+        for ( i = ffs(dmask) - 1; dmask && i < msi->vectors;
+              i = ffs(dmask) - 1 )
+        {
+            vpci_msi_arch_mask(&msi->arch, pdev, i, MASK_EXTR(val.u32, 1 << i));
+            __clear_bit(i, &dmask);
+        }
+    }
+
+    msi->mask = val.u32;
+}
+
+static int vpci_init_msi(struct pci_dev *pdev)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msi *msi;
+    unsigned int msi_offset;
+    uint16_t control;
+    int ret;
+
+    msi_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
+    if ( !msi_offset )
+        return 0;
+
+    msi = xzalloc(struct vpci_msi);
+    if ( !msi )
+        return -ENOMEM;
+
+    msi->pos = msi_offset;
+
+    control = pci_conf_read16(seg, bus, slot, func,
+                              msi_control_reg(msi_offset));
+
+    ret = vpci_add_register(pdev, vpci_msi_control_read,
+                            vpci_msi_control_write,
+                            msi_control_reg(msi_offset), 2, msi);
+    if ( ret )
+        goto error;
+
+    /* Get the maximum number of vectors the device supports. */
+    msi->max_vectors = multi_msi_capable(control);
+    ASSERT(msi->max_vectors <= 32);
+
+    /* No PIRQ bind yet. */
+    vpci_msi_arch_init(&msi->arch);
+
+    if ( is_64bit_address(control) )
+        msi->address64 = true;
+    if ( is_mask_bit_support(control) )
+        msi->masking = true;
+
+    ret = vpci_add_register(pdev, vpci_msi_address_read,
+                            vpci_msi_address_write,
+                            msi_lower_address_reg(msi_offset), 4, msi);
+    if ( ret )
+        goto error;
+
+    ret = vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,
+                            msi_data_reg(msi_offset, msi->address64), 2,
+                            msi);
+    if ( ret )
+        goto error;
+
+    if ( msi->address64 )
+    {
+        ret = vpci_add_register(pdev, vpci_msi_address_upper_read,
+                                vpci_msi_address_upper_write,
+                                msi_upper_address_reg(msi_offset), 4, msi);
+        if ( ret )
+            goto error;
+    }
+
+    if ( msi->masking )
+    {
+        ret = vpci_add_register(pdev, vpci_msi_mask_read, vpci_msi_mask_write,
+                                msi_mask_bits_reg(msi_offset,
+                                                  msi->address64), 4, msi);
+        if ( ret )
+            goto error;
+    }
+
+    pdev->vpci->msi = msi;
+
+    return 0;
+
+ error:
+    ASSERT(ret);
+    xfree(msi);
+    return ret;
+}
+
+REGISTER_VPCI_INIT(vpci_init_msi);
+
+void vpci_dump_msi(void)
+{
+    struct domain *d;
+
+    for_each_domain ( d )
+    {
+        const struct pci_dev *pdev;
+
+        if ( !has_vpci(d) )
+            continue;
+
+        printk("vPCI MSI information for guest %u\n", d->domain_id);
+
+        if ( !vpci_trylock(d) )
+        {
+            printk("Unable to get vPCI lock, skipping\n");
+            continue;
+        }
+
+        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list )
+        {
+            uint8_t seg = pdev->seg, bus = pdev->bus;
+            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+            struct vpci_msi *msi = pdev->vpci->msi;
+
+            if ( !msi )
+                continue;
+
+            printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
+
+            printk("Enabled: %u Supports masking: %u 64-bit addresses: %u\n",
+                   msi->enabled, msi->masking, msi->address64);
+            printk("Max vectors: %u enabled vectors: %u\n",
+                   msi->max_vectors, msi->vectors);
+
+            vpci_msi_arch_print(&msi->arch, msi->data, msi->address);
+
+            if ( msi->masking )
+                printk("mask=%#032x\n", msi->mask);
+        }
+        vpci_unlock(d);
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 4fe996fe49..55ed094734 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -20,6 +20,7 @@ 
 #define __ASM_X86_HVM_IO_H__
 
 #include <xen/mm.h>
+#include <xen/pci.h>
 #include <asm/hvm/vpic.h>
 #include <asm/hvm/vioapic.h>
 #include <public/hvm/ioreq.h>
@@ -126,6 +127,23 @@  void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
 void msix_write_completion(struct vcpu *);
 void msixtbl_init(struct domain *d);
 
+/* Arch-specific MSI data for vPCI. */
+struct vpci_arch_msi {
+    int pirq;
+};
+
+/* Arch-specific vPCI MSI helpers. */
+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                        unsigned int entry, bool mask);
+int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                         uint64_t address, uint32_t data,
+                         unsigned int vectors);
+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                          unsigned int vectors);
+int vpci_msi_arch_init(struct vpci_arch_msi *arch);
+void vpci_msi_arch_print(struct vpci_arch_msi *arch, uint16_t data,
+                         uint64_t addr);
+
 enum stdvga_cache_state {
     STDVGA_CACHE_UNINITIALIZED,
     STDVGA_CACHE_ENABLED,
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 213ee53f72..9c36c34372 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -48,6 +48,7 @@ 
 #define MSI_ADDR_REDIRECTION_SHIFT  3
 #define MSI_ADDR_REDIRECTION_CPU    (0 << MSI_ADDR_REDIRECTION_SHIFT)
 #define MSI_ADDR_REDIRECTION_LOWPRI (1 << MSI_ADDR_REDIRECTION_SHIFT)
+#define MSI_ADDR_REDIRECTION_MASK   0x8
 
 #define MSI_ADDR_DEST_ID_SHIFT		12
 #define	 MSI_ADDR_DEST_ID_MASK		0x00ff000
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 0d2c72c109..d07185a479 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -57,7 +57,9 @@  struct dev_intx_gsi_link {
 #define VMSI_DELIV_MASK   0x7000
 #define VMSI_TRIG_MODE    0x8000
 
+#define GFLAGS_SHIFT_DEST_ID        0
 #define GFLAGS_SHIFT_RH             8
+#define GFLAGS_SHIFT_DM             9
 #define GFLAGS_SHIFT_DELIV_MODE     12
 #define GFLAGS_SHIFT_TRG_MODE       15
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 452ee482e8..2a7d7557b3 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,6 +13,7 @@ 
  * of just returning whether the lock is hold by any CPU).
  */
 #define vpci_lock(d) spin_lock_recursive(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_trylock(d) spin_trylock_recursive(&(d)->arch.hvm_domain.vpci_lock)
 #define vpci_unlock(d) spin_unlock_recursive(&(d)->arch.hvm_domain.vpci_lock)
 #define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
 
@@ -85,9 +86,34 @@  struct vpci {
         } bars[7]; /* At most 6 BARS + 1 expansion ROM BAR. */
         /* FIXME: currently there's no support for SR-IOV. */
     } header;
+
+    /* MSI data. */
+    struct vpci_msi {
+        /* Offset of the capability in the config space. */
+        unsigned int pos;
+        /* Maximum number of vectors supported by the device. */
+        unsigned int max_vectors;
+        /* Number of vectors configured. */
+        unsigned int vectors;
+        /* Address and data fields. */
+        uint64_t address;
+        uint16_t data;
+        /* Mask bitfield. */
+        uint32_t mask;
+        /* Enabled? */
+        bool enabled;
+        /* Supports per-vector masking? */
+        bool masking;
+        /* 64-bit address capable? */
+        bool address64;
+        /* Arch-specific data. */
+        struct vpci_arch_msi arch;
+    } *msi;
 #endif
 };
 
+void vpci_dump_msi(void);
+
 #endif
 
 /*