diff mbox

[V2,4/25] Xen/doc: Add Xen virtual IOMMU doc

Message ID 1502310866-10450-5-git-send-email-tianyu.lan@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

lan,Tianyu Aug. 9, 2017, 8:34 p.m. UTC
This patch is to add Xen virtual IOMMU doc to introduce motivation,
framework, vIOMMU hypercall and xl configuration.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 docs/misc/viommu.txt | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)
 create mode 100644 docs/misc/viommu.txt

Comments

Wei Liu Aug. 17, 2017, 11:19 a.m. UTC | #1
On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
> +Now just suppport single vIOMMU for one VM and introduced domtcls are compatible
> +with multi-vIOMMU support.

Is this still true? There is an ID field in the struct which can
distinguish multiple viommus, right?

> +
> +xl vIOMMU configuration
> +=======================
> +viommu="type=intel_vtd,intremap=1,x2apic=1"

If there is provision to support multiple viommu please make this an
array.
lan,Tianyu Aug. 18, 2017, 7:17 a.m. UTC | #2
On 2017年08月17日 19:19, Wei Liu wrote:
> On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
>> +Now just suppport single vIOMMU for one VM and introduced domctls are compatible
>> +with multi-vIOMMU support.
> 
> Is this still true? 

Yes, the patchset just supports single vIOMMU for one VM.

> There is an ID field in the struct which can
> distinguish multiple viommus, right?

Yes, this is reserved for mult vIOMMU support.

> 
>> +
>> +xl vIOMMU configuration
>> +=======================
>> +viommu="type=intel_vtd,intremap=1,x2apic=1"
> 
> If there is provision to support multiple viommu please make this an
> array.

Ok. Will update.
Wei Liu Aug. 18, 2017, 10:15 a.m. UTC | #3
On Fri, Aug 18, 2017 at 03:17:37PM +0800, Lan Tianyu wrote:
> On 2017年08月17日 19:19, Wei Liu wrote:
> > On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
> >> +Now just suppport single vIOMMU for one VM and introduced domctls are compatible
> >> +with multi-vIOMMU support.
> > 
> > Is this still true? 
> 
> Yes, the patchset just supports single vIOMMU for one VM.
> 

The first part of the sentence is true, but the latter is probably not.
It seems to me domctl is able to cope with multiple viommu. Please
correct me if I'm wrong.
lan,Tianyu Aug. 22, 2017, 8:07 a.m. UTC | #4
On 2017年08月18日 18:15, Wei Liu wrote:
> On Fri, Aug 18, 2017 at 03:17:37PM +0800, Lan Tianyu wrote:
>> On 2017年08月17日 19:19, Wei Liu wrote:
>>> On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
>>>> +Now just suppport single vIOMMU for one VM and introduced domctls are compatible
>>>> +with multi-vIOMMU support.
>>>
>>> Is this still true? 
>>
>> Yes, the patchset just supports single vIOMMU for one VM.
>>
> 
> The first part of the sentence is true, but the latter is probably not.
> It seems to me domctl is able to cope with multiple viommu. Please
> correct me if I'm wrong.

These domctl is able to support multiple vIOMMU but vIOMMU device model
in Xen hypervisor only support single vIOMMU for one VM.
Wei Liu Aug. 22, 2017, 11:03 a.m. UTC | #5
On Tue, Aug 22, 2017 at 04:07:32PM +0800, Lan Tianyu wrote:
> On 2017年08月18日 18:15, Wei Liu wrote:
> > On Fri, Aug 18, 2017 at 03:17:37PM +0800, Lan Tianyu wrote:
> >> On 2017年08月17日 19:19, Wei Liu wrote:
> >>> On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
> >>>> +Now just suppport single vIOMMU for one VM and introduced domctls are compatible
> >>>> +with multi-vIOMMU support.
> >>>
> >>> Is this still true? 
> >>
> >> Yes, the patchset just supports single vIOMMU for one VM.
> >>
> > 
> > The first part of the sentence is true, but the latter is probably not.
> > It seems to me domctl is able to cope with multiple viommu. Please
> > correct me if I'm wrong.
> 
> These domctl is able to support multiple vIOMMU but vIOMMU device model
> in Xen hypervisor only support single vIOMMU for one VM.
> 

In that case please update the document.
Roger Pau Monne Aug. 22, 2017, 3:55 p.m. UTC | #6
On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
> This patch is to add Xen virtual IOMMU doc to introduce motivation,
> framework, vIOMMU hypercall and xl configuration.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  docs/misc/viommu.txt | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 139 insertions(+)
>  create mode 100644 docs/misc/viommu.txt
> 
> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
> new file mode 100644
> index 0000000..39455bb
> --- /dev/null
> +++ b/docs/misc/viommu.txt

IMHO, this should be the first patch in the series.

> @@ -0,0 +1,139 @@
> +Xen virtual IOMMU
> +
> +Motivation
> +==========
> +*) Enable more than 255 vcpu support

Seems like the "*)" is some kind of leftover?

> +HPC cloud service requires VM provides high performance parallel
> +computing and we hope to create a huge VM with >255 vcpu on one machine
> +to meet such requirement. Pin each vcpu to separate pcpus.

I would re-write this as:

The current requirements of HPC cloud service requires VM with a high
number of CPUs in order to achieve high performance in parallel
computing.

Also, this is needed in order to create VMs with > 128 vCPUs, not 255
vCPUs. That's because the APIC ID used by Xen is CPU ID * 2 (ie: CPU
127 has APIC ID 254, which is the last one available in xAPIC mode).
You should reword the paragraphs below in order to fix the mention of
255 vCPUs.

> +
> +To support >255 vcpus, X2APIC mode in guest is necessary because legacy
> +APIC(XAPIC) just supports 8-bit APIC ID and it only can support 255
> +vcpus at most. X2APIC mode supports 32-bit APIC ID and it requires
> +interrupt mapping function of vIOMMU.

Correct me if I'm wrong, but I don't think x2APIC requires vIOMMU. The
IOMMU is required so that you can route interrupts to all the possible
CPUs. One could image a setup where only CPUs with APIC IDs < 255 are
used as targets of external interrupts, and that doesn't require a
IOMMU.

> +The reason for this is that there is no modification to existing PCI MSI
> +and IOAPIC with the introduction of X2APIC. PCI MSI/IOAPIC can only send
> +interrupt message containing 8-bit APIC ID, which cannot address >255
> +cpus. Interrupt remapping supports 32-bit APIC ID and so it's necessary
> +to enable >255 cpus with x2apic mode.
> +
> +
> +vIOMMU Architecture
> +===================
> +vIOMMU device model is inside Xen hypervisor for following factors
> +    1) Avoid round trips between Qemu and Xen hypervisor
> +    2) Ease of integration with the rest of hypervisor
> +    3) HVMlite/PVH doesn't use Qemu
> +
> +* Interrupt remapping overview.
> +Interrupts from virtual devices and physical devices are delivered
> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
> +this procedure.
> +
> ++---------------------------------------------------+
> +|Qemu                       |VM                     |
> +|                           | +----------------+    |
> +|                           | |  Device driver |    |
> +|                           | +--------+-------+    |
> +|                           |          ^            |
> +|       +----------------+  | +--------+-------+    |
> +|       | Virtual device |  | |  IRQ subsystem |    |
> +|       +-------+--------+  | +--------+-------+    |
> +|               |           |          ^            |
> +|               |           |          |            |
> ++---------------------------+-----------------------+
> +|hypervisor     |                      | VIRQ       |
> +|               |            +---------+--------+   |
> +|               |            |      vLAPIC      |   |
> +|               |VIRQ        +---------+--------+   |
> +|               |                      ^            |
> +|               |                      |            |
> +|               |            +---------+--------+   |
> +|               |            |      vIOMMU      |   |
> +|               |            +---------+--------+   |
> +|               |                      ^            |
> +|               |                      |            |
> +|               |            +---------+--------+   |
> +|               |            |   vIOAPIC/vMSI   |   |
> +|               |            +----+----+--------+   |
> +|               |                 ^    ^            |
> +|               +-----------------+    |            |
> +|                                      |            |
> ++---------------------------------------------------+
> +HW                                     |IRQ
> +                                +-------------------+
> +                                |   PCI Device      |
> +                                +-------------------+
> +
> +
> +vIOMMU hypercall
> +================
> +Introduce new domctl hypercall "xen_domctl_viommu_op" to create/destroy
            ^ a
> +vIOMMU and query vIOMMU capabilities that device model can support.
         ^ s                                ^ the
> +
> +* vIOMMU hypercall parameter structure
> +
> +/* vIOMMU type - specify vendor vIOMMU device model */
> +#define VIOMMU_TYPE_INTEL_VTD     (1u << 0)
> +
> +/* vIOMMU capabilities */
> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
> +
> +struct xen_domctl_viommu_op {
> +    uint32_t cmd;
> +#define XEN_DOMCTL_create_viommu          0
> +#define XEN_DOMCTL_destroy_viommu         1
> +#define XEN_DOMCTL_query_viommu_caps      2
> +    union {
> +        struct {
> +            /* IN - vIOMMU type  */
> +            uint64_t viommu_type;
> +            /* IN - MMIO base address of vIOMMU. */
> +            uint64_t base_address;
> +            /* IN - Length of MMIO region */
> +            uint64_t length;
> +            /* IN - Capabilities with which we want to create */
> +            uint64_t capabilities;
> +            /* OUT - vIOMMU identity */
> +            uint32_t viommu_id;
> +        } create_viommu;
> +
> +        struct {
> +            /* IN - vIOMMU identity */
> +            uint32_t viommu_id;
> +        } destroy_viommu;
> +
> +        struct {
> +            /* IN - vIOMMU type */
> +            uint64_t viommu_type;
> +            /* OUT - vIOMMU Capabilities */
> +            uint64_t capabilities;
> +        } query_caps;
> +    } u;
> +};
> +
> +- XEN_DOMCTL_query_viommu_caps
> +    Query capabilities of vIOMMU device model. vIOMMU_type specifies
> +which vendor vIOMMU device model(E,G Intel VTD) is targeted and hypervisor
> +returns capability bits(E,G interrupt remapping bit).
> +
> +- XEN_DOMCTL_create_viommu
> +    Create vIOMMU device with vIOMMU_type, capabilities, MMIO
> +base address and length. Hypervisor returns viommu_id. Capabilities should
> +be in range of value returned by query_viommu_caps hypercall.
> +
> +- XEN_DOMCTL_destroy_viommu
> +    Destroy vIOMMU in Xen hypervisor with viommu_id as parameters.
> +
> +Now just suppport single vIOMMU for one VM and introduced domtcls are compatible
> +with multi-vIOMMU support.
> +
> +xl vIOMMU configuration

This should be "xl x86 vIOMMU configuration", since it's clearly x86
specific.

> +=======================
> +viommu="type=intel_vtd,intremap=1,x2apic=1"

Shouldn't this have some kind of array form? From the code I saw it
seems like you are adding support for domains having multiple IOMMUs,
in which case this should at least look like:

viommu = [
    'type=intel_vtd,intremap=1,x2apic=1',
    'type=intel_vtd,intremap=1,x2apic=1'
]

But then it's missing to which PCI bus each IOMMU is attached.

Also, why do you need the x2apic parameter? Is there any value in
providing a vIOMMU if it doesn't support x2APIC mode?

Roger.
lan,Tianyu Aug. 23, 2017, 2:06 a.m. UTC | #7
On 2017年08月22日 19:03, Wei Liu wrote:
> On Tue, Aug 22, 2017 at 04:07:32PM +0800, Lan Tianyu wrote:
>> On 2017年08月18日 18:15, Wei Liu wrote:
>>> On Fri, Aug 18, 2017 at 03:17:37PM +0800, Lan Tianyu wrote:
>>>> On 2017年08月17日 19:19, Wei Liu wrote:
>>>>> On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
>>>>>> +Now just suppport single vIOMMU for one VM and introduced domctls are compatible
>>>>>> +with multi-vIOMMU support.
>>>>>
>>>>> Is this still true? 
>>>>
>>>> Yes, the patchset just supports single vIOMMU for one VM.
>>>>
>>>
>>> The first part of the sentence is true, but the latter is probably not.
>>> It seems to me domctl is able to cope with multiple viommu. Please
>>> correct me if I'm wrong.
>>
>> These domctl is able to support multiple vIOMMU but vIOMMU device model
>> in Xen hypervisor only support single vIOMMU for one VM.
>>
> 
> In that case please update the document.
> 

OK. Will update.
lan,Tianyu Aug. 23, 2017, 7:36 a.m. UTC | #8
On 2017年08月22日 23:55, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
>> This patch is to add Xen virtual IOMMU doc to introduce motivation,
>> framework, vIOMMU hypercall and xl configuration.
>>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  docs/misc/viommu.txt | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 139 insertions(+)
>>  create mode 100644 docs/misc/viommu.txt
>>
>> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
>> new file mode 100644
>> index 0000000..39455bb
>> --- /dev/null
>> +++ b/docs/misc/viommu.txt
> 
> IMHO, this should be the first patch in the series.

OK. Will update.

> 
>> @@ -0,0 +1,139 @@
>> +Xen virtual IOMMU
>> +
>> +Motivation
>> +==========
>> +*) Enable more than 255 vcpu support
> 
> Seems like the "*)" is some kind of leftover?
> 
>> +HPC cloud service requires VM provides high performance parallel
>> +computing and we hope to create a huge VM with >255 vcpu on one machine
>> +to meet such requirement. Pin each vcpu to separate pcpus.
> 
> I would re-write this as:
> 
> The current requirements of HPC cloud service requires VM with a high
> number of CPUs in order to achieve high performance in parallel
> computing.
> 
> Also, this is needed in order to create VMs with > 128 vCPUs, not 255
> vCPUs. That's because the APIC ID used by Xen is CPU ID * 2 (ie: CPU
> 127 has APIC ID 254, which is the last one available in xAPIC mode).
> You should reword the paragraphs below in order to fix the mention of
> 255 vCPUs.

Thanks for your rewrite.

> 
>> +
>> +To support >255 vcpus, X2APIC mode in guest is necessary because legacy
>> +APIC(XAPIC) just supports 8-bit APIC ID and it only can support 255
>> +vcpus at most. X2APIC mode supports 32-bit APIC ID and it requires
>> +interrupt mapping function of vIOMMU.
> 
> Correct me if I'm wrong, but I don't think x2APIC requires vIOMMU. The
> IOMMU is required so that you can route interrupts to all the possible
> CPUs. One could image a setup where only CPUs with APIC IDs < 255 are
> used as targets of external interrupts, and that doesn't require a
> IOMMU.

This is OS behavior. IIRC, Windows strictly requires IOMMU when enable
x2apic mode and Linux kernel only has such requirement when cpu number
is > 255.


> 
>> +The reason for this is that there is no modification to existing PCI MSI
>> +and IOAPIC with the introduction of X2APIC. PCI MSI/IOAPIC can only send
>> +interrupt message containing 8-bit APIC ID, which cannot address >255
>> +cpus. Interrupt remapping supports 32-bit APIC ID and so it's necessary
>> +to enable >255 cpus with x2apic mode.
>> +
>> +
>> +vIOMMU Architecture
>> +===================
>> +vIOMMU device model is inside Xen hypervisor for following factors
>> +    1) Avoid round trips between Qemu and Xen hypervisor
>> +    2) Ease of integration with the rest of hypervisor
>> +    3) HVMlite/PVH doesn't use Qemu
>> +
>> +* Interrupt remapping overview.
>> +Interrupts from virtual devices and physical devices are delivered
>> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
>> +this procedure.
>> +
>> ++---------------------------------------------------+
>> +|Qemu                       |VM                     |
>> +|                           | +----------------+    |
>> +|                           | |  Device driver |    |
>> +|                           | +--------+-------+    |
>> +|                           |          ^            |
>> +|       +----------------+  | +--------+-------+    |
>> +|       | Virtual device |  | |  IRQ subsystem |    |
>> +|       +-------+--------+  | +--------+-------+    |
>> +|               |           |          ^            |
>> +|               |           |          |            |
>> ++---------------------------+-----------------------+
>> +|hypervisor     |                      | VIRQ       |
>> +|               |            +---------+--------+   |
>> +|               |            |      vLAPIC      |   |
>> +|               |VIRQ        +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |      vIOMMU      |   |
>> +|               |            +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |   vIOAPIC/vMSI   |   |
>> +|               |            +----+----+--------+   |
>> +|               |                 ^    ^            |
>> +|               +-----------------+    |            |
>> +|                                      |            |
>> ++---------------------------------------------------+
>> +HW                                     |IRQ
>> +                                +-------------------+
>> +                                |   PCI Device      |
>> +                                +-------------------+
>> +
>> +
>> +vIOMMU hypercall
>> +================
>> +Introduce new domctl hypercall "xen_domctl_viommu_op" to create/destroy
>             ^ a
>> +vIOMMU and query vIOMMU capabilities that device model can support.
>          ^ s                                ^ the
>> +
>> +* vIOMMU hypercall parameter structure
>> +
>> +/* vIOMMU type - specify vendor vIOMMU device model */
>> +#define VIOMMU_TYPE_INTEL_VTD     (1u << 0)
>> +
>> +/* vIOMMU capabilities */
>> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
>> +
>> +struct xen_domctl_viommu_op {
>> +    uint32_t cmd;
>> +#define XEN_DOMCTL_create_viommu          0
>> +#define XEN_DOMCTL_destroy_viommu         1
>> +#define XEN_DOMCTL_query_viommu_caps      2
>> +    union {
>> +        struct {
>> +            /* IN - vIOMMU type  */
>> +            uint64_t viommu_type;
>> +            /* IN - MMIO base address of vIOMMU. */
>> +            uint64_t base_address;
>> +            /* IN - Length of MMIO region */
>> +            uint64_t length;
>> +            /* IN - Capabilities with which we want to create */
>> +            uint64_t capabilities;
>> +            /* OUT - vIOMMU identity */
>> +            uint32_t viommu_id;
>> +        } create_viommu;
>> +
>> +        struct {
>> +            /* IN - vIOMMU identity */
>> +            uint32_t viommu_id;
>> +        } destroy_viommu;
>> +
>> +        struct {
>> +            /* IN - vIOMMU type */
>> +            uint64_t viommu_type;
>> +            /* OUT - vIOMMU Capabilities */
>> +            uint64_t capabilities;
>> +        } query_caps;
>> +    } u;
>> +};
>> +
>> +- XEN_DOMCTL_query_viommu_caps
>> +    Query capabilities of vIOMMU device model. vIOMMU_type specifies
>> +which vendor vIOMMU device model(E,G Intel VTD) is targeted and hypervisor
>> +returns capability bits(E,G interrupt remapping bit).
>> +
>> +- XEN_DOMCTL_create_viommu
>> +    Create vIOMMU device with vIOMMU_type, capabilities, MMIO
>> +base address and length. Hypervisor returns viommu_id. Capabilities should
>> +be in range of value returned by query_viommu_caps hypercall.
>> +
>> +- XEN_DOMCTL_destroy_viommu
>> +    Destroy vIOMMU in Xen hypervisor with viommu_id as parameters.
>> +
>> +Now just suppport single vIOMMU for one VM and introduced domtcls are compatible
>> +with multi-vIOMMU support.
>> +
>> +xl vIOMMU configuration
> 
> This should be "xl x86 vIOMMU configuration", since it's clearly x86
> specific.

OK. Will update.

> 
>> +=======================
>> +viommu="type=intel_vtd,intremap=1,x2apic=1"
> 
> Shouldn't this have some kind of array form? From the code I saw it
> seems like you are adding support for domains having multiple IOMMUs,
> in which case this should at least look like:

No, we don't support mult-vIOMMU but some vIOMMU data structure is
defined with multi-vIOMMU consideration.

> 
> viommu = [
>     'type=intel_vtd,intremap=1,x2apic=1',
>     'type=intel_vtd,intremap=1,x2apic=1'
> ]
> 

Wei also suggested this. Will update.

> But then it's missing to which PCI bus each IOMMU is attached.

This will be added if we really need to support multi vIOMMU.

> 
> Also, why do you need the x2apic parameter? Is there any value in
> providing a vIOMMU if it doesn't support x2APIC mode?

User can configure whether vIOMMU can support x2APIC mode and tool stack
will use this configuration to prepare ACPI DMAR table. There is an
X2APIC_OPT_OUT bit in DMAR table to tell OS not enable X2APIC mode for
IOMMU.

> 
> Roger.
>
Roger Pau Monne Aug. 23, 2017, 1:53 p.m. UTC | #9
On Wed, Aug 23, 2017 at 03:36:19PM +0800, Lan Tianyu wrote:
> On 2017年08月22日 23:55, Roger Pau Monné wrote:
> > On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
> >> This patch is to add Xen virtual IOMMU doc to introduce motivation,
> >> framework, vIOMMU hypercall and xl configuration.
> >>
> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> >> ---
> >>  docs/misc/viommu.txt | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 139 insertions(+)
> >>  create mode 100644 docs/misc/viommu.txt
> >>
> >> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
> >> new file mode 100644
> >> index 0000000..39455bb
> >> --- /dev/null
> >> +++ b/docs/misc/viommu.txt
> >> +
> >> +To support >255 vcpus, X2APIC mode in guest is necessary because legacy
> >> +APIC(XAPIC) just supports 8-bit APIC ID and it only can support 255
> >> +vcpus at most. X2APIC mode supports 32-bit APIC ID and it requires
> >> +interrupt mapping function of vIOMMU.
> > 
> > Correct me if I'm wrong, but I don't think x2APIC requires vIOMMU. The
> > IOMMU is required so that you can route interrupts to all the possible
> > CPUs. One could image a setup where only CPUs with APIC IDs < 255 are
> > used as targets of external interrupts, and that doesn't require a
> > IOMMU.
> 
> This is OS behavior. IIRC, Windows strictly requires IOMMU when enable
> x2apic mode and Linux kernel only has such requirement when cpu number
> is > 255.

But this document doesn't speak about OSes, it speaks about the IOMMU
implementation. What I think is wrong is the following sentence:

"x2APIC mode supports 32-bit APIC ID and it requires interrupt mapping
function of vIOMMU."

IMHO it should be:

"x2APIC mode supports 32-bit APIC ID and it requires the interrupt
remapping functionality of a vIOMMU if the guest wishes to route
interrupts to all available vCPUs."

> > 
> > Also, why do you need the x2apic parameter? Is there any value in
> > providing a vIOMMU if it doesn't support x2APIC mode?
> 
> User can configure whether vIOMMU can support x2APIC mode and tool stack
> will use this configuration to prepare ACPI DMAR table. There is an
> X2APIC_OPT_OUT bit in DMAR table to tell OS not enable X2APIC mode for
> IOMMU.

Let me rephrase my question, what's the value in implementing the
xAPIC support for vIOMMU?

The vIOMMU work is done so that Xen can create guests with > 128 vCPUs
(> 255 APIC IDs), at which point you _must_ use x2APIC mode. Is there
any value is providing a vIOMMU implementation that doesn't support
x2APIC?

Roger.
diff mbox

Patch

diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
new file mode 100644
index 0000000..39455bb
--- /dev/null
+++ b/docs/misc/viommu.txt
@@ -0,0 +1,139 @@ 
+Xen virtual IOMMU
+
+Motivation
+==========
+*) Enable more than 255 vcpu support
+HPC cloud service requires VM provides high performance parallel
+computing and we hope to create a huge VM with >255 vcpu on one machine
+to meet such requirement. Pin each vcpu to separate pcpus.
+
+To support >255 vcpus, X2APIC mode in guest is necessary because legacy
+APIC(XAPIC) just supports 8-bit APIC ID and it only can support 255
+vcpus at most. X2APIC mode supports 32-bit APIC ID and it requires
+interrupt mapping function of vIOMMU.
+
+The reason for this is that there is no modification to existing PCI MSI
+and IOAPIC with the introduction of X2APIC. PCI MSI/IOAPIC can only send
+interrupt message containing 8-bit APIC ID, which cannot address >255
+cpus. Interrupt remapping supports 32-bit APIC ID and so it's necessary
+to enable >255 cpus with x2apic mode.
+
+
+vIOMMU Architecture
+===================
+vIOMMU device model is inside Xen hypervisor for following factors
+    1) Avoid round trips between Qemu and Xen hypervisor
+    2) Ease of integration with the rest of hypervisor
+    3) HVMlite/PVH doesn't use Qemu
+
+* Interrupt remapping overview.
+Interrupts from virtual devices and physical devices are delivered
+to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
+this procedure.
+
++---------------------------------------------------+
+|Qemu                       |VM                     |
+|                           | +----------------+    |
+|                           | |  Device driver |    |
+|                           | +--------+-------+    |
+|                           |          ^            |
+|       +----------------+  | +--------+-------+    |
+|       | Virtual device |  | |  IRQ subsystem |    |
+|       +-------+--------+  | +--------+-------+    |
+|               |           |          ^            |
+|               |           |          |            |
++---------------------------+-----------------------+
+|hypervisor     |                      | VIRQ       |
+|               |            +---------+--------+   |
+|               |            |      vLAPIC      |   |
+|               |VIRQ        +---------+--------+   |
+|               |                      ^            |
+|               |                      |            |
+|               |            +---------+--------+   |
+|               |            |      vIOMMU      |   |
+|               |            +---------+--------+   |
+|               |                      ^            |
+|               |                      |            |
+|               |            +---------+--------+   |
+|               |            |   vIOAPIC/vMSI   |   |
+|               |            +----+----+--------+   |
+|               |                 ^    ^            |
+|               +-----------------+    |            |
+|                                      |            |
++---------------------------------------------------+
+HW                                     |IRQ
+                                +-------------------+
+                                |   PCI Device      |
+                                +-------------------+
+
+
+vIOMMU hypercall
+================
+Introduce new domctl hypercall "xen_domctl_viommu_op" to create/destroy
+vIOMMU and query vIOMMU capabilities that device model can support.
+
+* vIOMMU hypercall parameter structure
+
+/* vIOMMU type - specify vendor vIOMMU device model */
+#define VIOMMU_TYPE_INTEL_VTD     (1u << 0)
+
+/* vIOMMU capabilities */
+#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
+
+struct xen_domctl_viommu_op {
+    uint32_t cmd;
+#define XEN_DOMCTL_create_viommu          0
+#define XEN_DOMCTL_destroy_viommu         1
+#define XEN_DOMCTL_query_viommu_caps      2
+    union {
+        struct {
+            /* IN - vIOMMU type  */
+            uint64_t viommu_type;
+            /* IN - MMIO base address of vIOMMU. */
+            uint64_t base_address;
+            /* IN - Length of MMIO region */
+            uint64_t length;
+            /* IN - Capabilities with which we want to create */
+            uint64_t capabilities;
+            /* OUT - vIOMMU identity */
+            uint32_t viommu_id;
+        } create_viommu;
+
+        struct {
+            /* IN - vIOMMU identity */
+            uint32_t viommu_id;
+        } destroy_viommu;
+
+        struct {
+            /* IN - vIOMMU type */
+            uint64_t viommu_type;
+            /* OUT - vIOMMU Capabilities */
+            uint64_t capabilities;
+        } query_caps;
+    } u;
+};
+
+- XEN_DOMCTL_query_viommu_caps
+    Query capabilities of vIOMMU device model. vIOMMU_type specifies
+which vendor vIOMMU device model(E,G Intel VTD) is targeted and hypervisor
+returns capability bits(E,G interrupt remapping bit).
+
+- XEN_DOMCTL_create_viommu
+    Create vIOMMU device with vIOMMU_type, capabilities, MMIO
+base address and length. Hypervisor returns viommu_id. Capabilities should
+be in range of value returned by query_viommu_caps hypercall.
+
+- XEN_DOMCTL_destroy_viommu
+    Destroy vIOMMU in Xen hypervisor with viommu_id as parameters.
+
+Now just suppport single vIOMMU for one VM and introduced domtcls are compatible
+with multi-vIOMMU support.
+
+xl vIOMMU configuration
+=======================
+viommu="type=intel_vtd,intremap=1,x2apic=1"
+
+"type" - Specify vIOMMU device model type. Currently only supports Intel vtd
+device model.
+"intremap" - Enable vIOMMU interrupt remapping function.
+"x2apic" - Support x2apic mode with interrupt remapping function.