diff mbox

[V3,1/29] Xen/doc: Add Xen virtual IOMMU doc

Message ID 1506049330-11196-2-git-send-email-tianyu.lan@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

lan,Tianyu Sept. 22, 2017, 3:01 a.m. UTC
This patch is to add Xen virtual IOMMU doc to introduce motivation,
framework, vIOMMU hypercall and xl configuration.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 docs/misc/viommu.txt | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 136 insertions(+)
 create mode 100644 docs/misc/viommu.txt

Comments

Roger Pau Monné Oct. 18, 2017, 1:26 p.m. UTC | #1
On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
> This patch is to add Xen virtual IOMMU doc to introduce motivation,
> framework, vIOMMU hypercall and xl configuration.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  docs/misc/viommu.txt | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 136 insertions(+)
>  create mode 100644 docs/misc/viommu.txt
> 
> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
> new file mode 100644
> index 0000000..348e8c4
> --- /dev/null
> +++ b/docs/misc/viommu.txt
> @@ -0,0 +1,136 @@
> +Xen virtual IOMMU
> +
> +Motivation
> +==========
> +Enable more than 128 vcpu support
> +
> +The current requirements of HPC cloud service requires VM with a high
> +number of CPUs in order to achieve high performance in parallel
> +computing.
> +
> +To support >128 vcpus, X2APIC mode in guest is necessary because legacy
> +APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
> +CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
> +in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
> +supports 32-bit APIC ID and it requires the interrupt remapping functionality
> +of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
> +
> +The reason for this is that there is no modification for existing PCI MSI
> +and IOAPIC when introduce X2APIC.

I'm not sure the above sentence makes much sense. IMHO I would just
remove it.

> PCI MSI/IOAPIC can only send interrupt
> +message containing 8-bit APIC ID, which cannot address cpus with >254
> +APIC ID. Interrupt remapping supports 32-bit APIC ID and so it's necessary
> +for >128 vcpus support.
> +
> +
> +vIOMMU Architecture
> +===================
> +vIOMMU device model is inside Xen hypervisor for following factors
> +    1) Avoid round trips between Qemu and Xen hypervisor
> +    2) Ease of integration with the rest of hypervisor
> +    3) HVMlite/PVH doesn't use Qemu

Just use PVH here, HVMlite == PVH now.

> +
> +* Interrupt remapping overview.
> +Interrupts from virtual devices and physical devices are delivered
> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
> +this procedure.
> +
> ++---------------------------------------------------+
> +|Qemu                       |VM                     |
> +|                           | +----------------+    |
> +|                           | |  Device driver |    |
> +|                           | +--------+-------+    |
> +|                           |          ^            |
> +|       +----------------+  | +--------+-------+    |
> +|       | Virtual device |  | |  IRQ subsystem |    |
> +|       +-------+--------+  | +--------+-------+    |
> +|               |           |          ^            |
> +|               |           |          |            |
> ++---------------------------+-----------------------+
> +|hypervisor     |                      | VIRQ       |
> +|               |            +---------+--------+   |
> +|               |            |      vLAPIC      |   |
> +|               |VIRQ        +---------+--------+   |
> +|               |                      ^            |
> +|               |                      |            |
> +|               |            +---------+--------+   |
> +|               |            |      vIOMMU      |   |
> +|               |            +---------+--------+   |
> +|               |                      ^            |
> +|               |                      |            |
> +|               |            +---------+--------+   |
> +|               |            |   vIOAPIC/vMSI   |   |
> +|               |            +----+----+--------+   |
> +|               |                 ^    ^            |
> +|               +-----------------+    |            |
> +|                                      |            |
> ++---------------------------------------------------+
> +HW                                     |IRQ
> +                                +-------------------+
> +                                |   PCI Device      |
> +                                +-------------------+
> +
> +
> +vIOMMU hypercall
> +================
> +Introduce a new domctl hypercall "xen_domctl_viommu_op" to create/destroy
> +vIOMMUs.
> +
> +* vIOMMU hypercall parameter structure
> +
> +/* vIOMMU type - specify vendor vIOMMU device model */
> +#define VIOMMU_TYPE_INTEL_VTD	       0
> +
> +/* vIOMMU capabilities */
> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
> +
> +struct xen_domctl_viommu_op {
> +    uint32_t cmd;
> +#define XEN_DOMCTL_create_viommu          0
> +#define XEN_DOMCTL_destroy_viommu         1

I would invert the order of the domctl names:

#define XEN_DOMCTL_viommu_create          0
#define XEN_DOMCTL_viommu_destroy         1

It's clearer if the operation is the last part of the name.

> +    union {
> +        struct {
> +            /* IN - vIOMMU type  */
> +            uint64_t viommu_type;

Hm, do we really need a uint64_t for the IOMMU type? A uint8_t should
be more that enough (256 different IOMMU implementations).

> +            /* IN - MMIO base address of vIOMMU. */
> +            uint64_t base_address;
> +            /* IN - Capabilities with which we want to create */
> +            uint64_t capabilities;
> +            /* OUT - vIOMMU identity */
> +            uint32_t viommu_id;
> +        } create_viommu;
> +
> +        struct {
> +            /* IN - vIOMMU identity */
> +            uint32_t viommu_id;
> +        } destroy_viommu;

Do you really need the destroy operation? Do we expect to hot-unplug
vIOMMUs? Otherwise vIOMMUs should be removed when the domain is
destroyed.

> +    } u;
> +};
> +
> +- XEN_DOMCTL_create_viommu
> +    Create vIOMMU device with vIOMMU_type, capabilities and MMIO base
> +address. Hypervisor allocates viommu_id for new vIOMMU instance and return
> +back. The vIOMMU device model in hypervisor should check whether it can
> +support the input capabilities and return error if not.
> +
> +- XEN_DOMCTL_destroy_viommu
> +    Destroy vIOMMU in Xen hypervisor with viommu_id as parameter.
> +
> +These vIOMMU domctl and vIOMMU option in configure file consider multi-vIOMMU
> +support for single VM.(e.g, parameters of create/destroy vIOMMU includes
> +vIOMMU id). But function implementation only supports one vIOMMU per VM so far.
> +
> +Xen hypervisor vIOMMU command
> +=============================
> +Introduce vIOMMU command "viommu=1" to enable vIOMMU function in hypervisor.
> +It's default disabled.

Hm, I'm not sure we really need this. At the end viommu will be
disabled by default for guests, unless explicitly enabled in the
config file.

Thanks, Roger.
lan,Tianyu Oct. 19, 2017, 2:26 a.m. UTC | #2
Hi Roger:
     Thanks for review.

On 2017年10月18日 21:26, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
>> This patch is to add Xen virtual IOMMU doc to introduce motivation,
>> framework, vIOMMU hypercall and xl configuration.
>>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  docs/misc/viommu.txt | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 136 insertions(+)
>>  create mode 100644 docs/misc/viommu.txt
>>
>> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
>> new file mode 100644
>> index 0000000..348e8c4
>> --- /dev/null
>> +++ b/docs/misc/viommu.txt
>> @@ -0,0 +1,136 @@
>> +Xen virtual IOMMU
>> +
>> +Motivation
>> +==========
>> +Enable more than 128 vcpu support
>> +
>> +The current requirements of HPC cloud service requires VM with a high
>> +number of CPUs in order to achieve high performance in parallel
>> +computing.
>> +
>> +To support >128 vcpus, X2APIC mode in guest is necessary because legacy
>> +APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
>> +CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
>> +in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
>> +supports 32-bit APIC ID and it requires the interrupt remapping functionality
>> +of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
>> +
>> +The reason for this is that there is no modification for existing PCI MSI
>> +and IOAPIC when introduce X2APIC.
> 
> I'm not sure the above sentence makes much sense. IMHO I would just
> remove it.

OK. Will remove.

> 
>> PCI MSI/IOAPIC can only send interrupt
>> +message containing 8-bit APIC ID, which cannot address cpus with >254
>> +APIC ID. Interrupt remapping supports 32-bit APIC ID and so it's necessary
>> +for >128 vcpus support.
>> +
>> +
>> +vIOMMU Architecture
>> +===================
>> +vIOMMU device model is inside Xen hypervisor for following factors
>> +    1) Avoid round trips between Qemu and Xen hypervisor
>> +    2) Ease of integration with the rest of hypervisor
>> +    3) HVMlite/PVH doesn't use Qemu
> 
> Just use PVH here, HVMlite == PVH now.

OK.

> 
>> +
>> +* Interrupt remapping overview.
>> +Interrupts from virtual devices and physical devices are delivered
>> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
>> +this procedure.
>> +
>> ++---------------------------------------------------+
>> +|Qemu                       |VM                     |
>> +|                           | +----------------+    |
>> +|                           | |  Device driver |    |
>> +|                           | +--------+-------+    |
>> +|                           |          ^            |
>> +|       +----------------+  | +--------+-------+    |
>> +|       | Virtual device |  | |  IRQ subsystem |    |
>> +|       +-------+--------+  | +--------+-------+    |
>> +|               |           |          ^            |
>> +|               |           |          |            |
>> ++---------------------------+-----------------------+
>> +|hypervisor     |                      | VIRQ       |
>> +|               |            +---------+--------+   |
>> +|               |            |      vLAPIC      |   |
>> +|               |VIRQ        +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |      vIOMMU      |   |
>> +|               |            +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |   vIOAPIC/vMSI   |   |
>> +|               |            +----+----+--------+   |
>> +|               |                 ^    ^            |
>> +|               +-----------------+    |            |
>> +|                                      |            |
>> ++---------------------------------------------------+
>> +HW                                     |IRQ
>> +                                +-------------------+
>> +                                |   PCI Device      |
>> +                                +-------------------+
>> +
>> +
>> +vIOMMU hypercall
>> +================
>> +Introduce a new domctl hypercall "xen_domctl_viommu_op" to create/destroy
>> +vIOMMUs.
>> +
>> +* vIOMMU hypercall parameter structure
>> +
>> +/* vIOMMU type - specify vendor vIOMMU device model */
>> +#define VIOMMU_TYPE_INTEL_VTD	       0
>> +
>> +/* vIOMMU capabilities */
>> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
>> +
>> +struct xen_domctl_viommu_op {
>> +    uint32_t cmd;
>> +#define XEN_DOMCTL_create_viommu          0
>> +#define XEN_DOMCTL_destroy_viommu         1
> 
> I would invert the order of the domctl names:
> 
> #define XEN_DOMCTL_viommu_create          0
> #define XEN_DOMCTL_viommu_destroy         1
> 
> It's clearer if the operation is the last part of the name.

OK. Will update.

> 
>> +    union {
>> +        struct {
>> +            /* IN - vIOMMU type  */
>> +            uint64_t viommu_type;
> 
> Hm, do we really need a uint64_t for the IOMMU type? A uint8_t should
> be more that enough (256 different IOMMU implementations).

OK. Will update.

> 
>> +            /* IN - MMIO base address of vIOMMU. */
>> +            uint64_t base_address;
>> +            /* IN - Capabilities with which we want to create */
>> +            uint64_t capabilities;
>> +            /* OUT - vIOMMU identity */
>> +            uint32_t viommu_id;
>> +        } create_viommu;
>> +
>> +        struct {
>> +            /* IN - vIOMMU identity */
>> +            uint32_t viommu_id;
>> +        } destroy_viommu;
> 
> Do you really need the destroy operation? Do we expect to hot-unplug
> vIOMMUs? Otherwise vIOMMUs should be removed when the domain is
> destroyed.

Yes. no such requirement so far and added it just for multi-vIOMMU
consideration. I will remove it and add back when it's really needed.

> 
>> +    } u;
>> +};
>> +
>> +- XEN_DOMCTL_create_viommu
>> +    Create vIOMMU device with vIOMMU_type, capabilities and MMIO base
>> +address. Hypervisor allocates viommu_id for new vIOMMU instance and return
>> +back. The vIOMMU device model in hypervisor should check whether it can
>> +support the input capabilities and return error if not.
>> +
>> +- XEN_DOMCTL_destroy_viommu
>> +    Destroy vIOMMU in Xen hypervisor with viommu_id as parameter.
>> +
>> +These vIOMMU domctl and vIOMMU option in configure file consider multi-vIOMMU
>> +support for single VM.(e.g, parameters of create/destroy vIOMMU includes
>> +vIOMMU id). But function implementation only supports one vIOMMU per VM so far.
>> +
>> +Xen hypervisor vIOMMU command
>> +=============================
>> +Introduce vIOMMU command "viommu=1" to enable vIOMMU function in hypervisor.
>> +It's default disabled.
> 
> Hm, I'm not sure we really need this. At the end viommu will be
> disabled by default for guests, unless explicitly enabled in the
> config file.

This is according to Jan's early comments on RFC patch
https://patchwork.kernel.org/patch/9733869/.

"It's actually a question whether in our current scheme a Kconfig
option is appropriate here in the first place. I'd rather see this be
an always built feature which needs enabling on the command line
for the time being."


> 
> Thanks, Roger.
>
Roger Pau Monné Oct. 19, 2017, 8:49 a.m. UTC | #3
On Thu, Oct 19, 2017 at 10:26:36AM +0800, Lan Tianyu wrote:
> Hi Roger:
>      Thanks for review.
> 
> On 2017年10月18日 21:26, Roger Pau Monné wrote:
> > On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
> >> +Xen hypervisor vIOMMU command
> >> +=============================
> >> +Introduce vIOMMU command "viommu=1" to enable vIOMMU function in hypervisor.
> >> +It's default disabled.
> > 
> > Hm, I'm not sure we really need this. At the end viommu will be
> > disabled by default for guests, unless explicitly enabled in the
> > config file.
> 
> This is according to Jan's early comments on RFC patch
> https://patchwork.kernel.org/patch/9733869/.
> 
> "It's actually a question whether in our current scheme a Kconfig
> option is appropriate here in the first place. I'd rather see this be
> an always built feature which needs enabling on the command line
> for the time being."

So if I read this correctly Jan wanted you to ditch the Kconfig option
and instead rely on the command line option to enable/disable it.

I don't have a strong opinion here, so it's fine for me if you want to
keep both the Kconfig option and the command line one.

Roger.
Jan Beulich Oct. 19, 2017, 11:28 a.m. UTC | #4
>>> On 19.10.17 at 10:49, <roger.pau@citrix.com> wrote:
> On Thu, Oct 19, 2017 at 10:26:36AM +0800, Lan Tianyu wrote:
>> Hi Roger:
>>      Thanks for review.
>> 
>> On 2017年10月18日 21:26, Roger Pau Monné wrote:
>> > On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
>> >> +Xen hypervisor vIOMMU command
>> >> +=============================
>> >> +Introduce vIOMMU command "viommu=1" to enable vIOMMU function in 
> hypervisor.
>> >> +It's default disabled.
>> > 
>> > Hm, I'm not sure we really need this. At the end viommu will be
>> > disabled by default for guests, unless explicitly enabled in the
>> > config file.
>> 
>> This is according to Jan's early comments on RFC patch
>> https://patchwork.kernel.org/patch/9733869/.
>> 
>> "It's actually a question whether in our current scheme a Kconfig
>> option is appropriate here in the first place. I'd rather see this be
>> an always built feature which needs enabling on the command line
>> for the time being."
> 
> So if I read this correctly Jan wanted you to ditch the Kconfig option
> and instead rely on the command line option to enable/disable it.

Yes.

Jan
lan,Tianyu Oct. 24, 2017, 7:16 a.m. UTC | #5
On 2017年10月19日 19:28, Jan Beulich wrote:
>>>> On 19.10.17 at 10:49, <roger.pau@citrix.com> wrote:
>> On Thu, Oct 19, 2017 at 10:26:36AM +0800, Lan Tianyu wrote:
>>> Hi Roger:
>>>      Thanks for review.
>>>
>>> On 2017年10月18日 21:26, Roger Pau Monné wrote:
>>>> On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
>>>>> +Xen hypervisor vIOMMU command
>>>>> +=============================
>>>>> +Introduce vIOMMU command "viommu=1" to enable vIOMMU function in 
>> hypervisor.
>>>>> +It's default disabled.
>>>>
>>>> Hm, I'm not sure we really need this. At the end viommu will be
>>>> disabled by default for guests, unless explicitly enabled in the
>>>> config file.
>>>
>>> This is according to Jan's early comments on RFC patch
>>> https://patchwork.kernel.org/patch/9733869/.
>>>
>>> "It's actually a question whether in our current scheme a Kconfig
>>> option is appropriate here in the first place. I'd rather see this be
>>> an always built feature which needs enabling on the command line
>>> for the time being."
>>
>> So if I read this correctly Jan wanted you to ditch the Kconfig option
>> and instead rely on the command line option to enable/disable it.
> 
> Yes.
> 
> Jan
> 

OK. I will remove the command in the next version. Thanks for clarification.
diff mbox

Patch

diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
new file mode 100644
index 0000000..348e8c4
--- /dev/null
+++ b/docs/misc/viommu.txt
@@ -0,0 +1,136 @@ 
+Xen virtual IOMMU
+
+Motivation
+==========
+Enable more than 128 vcpu support
+
+The current requirements of HPC cloud service requires VM with a high
+number of CPUs in order to achieve high performance in parallel
+computing.
+
+To support >128 vcpus, X2APIC mode in guest is necessary because legacy
+APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
+CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
+in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
+supports 32-bit APIC ID and it requires the interrupt remapping functionality
+of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
+
+The reason for this is that there is no modification for existing PCI MSI
+and IOAPIC when introduce X2APIC. PCI MSI/IOAPIC can only send interrupt
+message containing 8-bit APIC ID, which cannot address cpus with >254
+APIC ID. Interrupt remapping supports 32-bit APIC ID and so it's necessary
+for >128 vcpus support.
+
+
+vIOMMU Architecture
+===================
+vIOMMU device model is inside Xen hypervisor for following factors
+    1) Avoid round trips between Qemu and Xen hypervisor
+    2) Ease of integration with the rest of hypervisor
+    3) HVMlite/PVH doesn't use Qemu
+
+* Interrupt remapping overview.
+Interrupts from virtual devices and physical devices are delivered
+to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
+this procedure.
+
++---------------------------------------------------+
+|Qemu                       |VM                     |
+|                           | +----------------+    |
+|                           | |  Device driver |    |
+|                           | +--------+-------+    |
+|                           |          ^            |
+|       +----------------+  | +--------+-------+    |
+|       | Virtual device |  | |  IRQ subsystem |    |
+|       +-------+--------+  | +--------+-------+    |
+|               |           |          ^            |
+|               |           |          |            |
++---------------------------+-----------------------+
+|hypervisor     |                      | VIRQ       |
+|               |            +---------+--------+   |
+|               |            |      vLAPIC      |   |
+|               |VIRQ        +---------+--------+   |
+|               |                      ^            |
+|               |                      |            |
+|               |            +---------+--------+   |
+|               |            |      vIOMMU      |   |
+|               |            +---------+--------+   |
+|               |                      ^            |
+|               |                      |            |
+|               |            +---------+--------+   |
+|               |            |   vIOAPIC/vMSI   |   |
+|               |            +----+----+--------+   |
+|               |                 ^    ^            |
+|               +-----------------+    |            |
+|                                      |            |
++---------------------------------------------------+
+HW                                     |IRQ
+                                +-------------------+
+                                |   PCI Device      |
+                                +-------------------+
+
+
+vIOMMU hypercall
+================
+Introduce a new domctl hypercall "xen_domctl_viommu_op" to create/destroy
+vIOMMUs.
+
+* vIOMMU hypercall parameter structure
+
+/* vIOMMU type - specify vendor vIOMMU device model */
+#define VIOMMU_TYPE_INTEL_VTD	       0
+
+/* vIOMMU capabilities */
+#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
+
+struct xen_domctl_viommu_op {
+    uint32_t cmd;
+#define XEN_DOMCTL_create_viommu          0
+#define XEN_DOMCTL_destroy_viommu         1
+    union {
+        struct {
+            /* IN - vIOMMU type  */
+            uint64_t viommu_type;
+            /* IN - MMIO base address of vIOMMU. */
+            uint64_t base_address;
+            /* IN - Capabilities with which we want to create */
+            uint64_t capabilities;
+            /* OUT - vIOMMU identity */
+            uint32_t viommu_id;
+        } create_viommu;
+
+        struct {
+            /* IN - vIOMMU identity */
+            uint32_t viommu_id;
+        } destroy_viommu;
+    } u;
+};
+
+- XEN_DOMCTL_create_viommu
+    Create vIOMMU device with vIOMMU_type, capabilities and MMIO base
+address. Hypervisor allocates viommu_id for new vIOMMU instance and return
+back. The vIOMMU device model in hypervisor should check whether it can
+support the input capabilities and return error if not.
+
+- XEN_DOMCTL_destroy_viommu
+    Destroy vIOMMU in Xen hypervisor with viommu_id as parameter.
+
+These vIOMMU domctl and vIOMMU option in configure file consider multi-vIOMMU
+support for single VM.(e.g, parameters of create/destroy vIOMMU includes
+vIOMMU id). But function implementation only supports one vIOMMU per VM so far.
+
+Xen hypervisor vIOMMU command
+=============================
+Introduce vIOMMU command "viommu=1" to enable vIOMMU function in hypervisor.
+It's default disabled.
+
+xl x86 vIOMMU configuration"
+============================
+viommu = [
+    'type=intel_vtd,intremap=1',
+    ...
+]
+
+"type" - Specify vIOMMU device model type. Currently only supports Intel vtd
+device model.
+"intremap" - Enable vIOMMU interrupt remapping function.