diff mbox series

[RFC] docs: Enhance documentation for iommu bypass

Message ID 20240522074008.GA171222@ziqianlu-desk2 (mailing list archive)
State New
Headers show
Series [RFC] docs: Enhance documentation for iommu bypass | expand

Commit Message

Aaron Lu May 22, 2024, 7:40 a.m. UTC
When Intel vIOMMU is used and irq remapping is enabled, using
bypass_iommu will cause following two callstacks dumped during kernel
boot and all PCI devices attached to root bridge lose their MSI
capabilities and fall back to using IOAPIC:

[    0.960262] ------------[ cut here ]------------
[    0.961245] WARNING: CPU: 3 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x27/0x40
[    0.963070] Modules linked in:
[    0.963695] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc7-00056-g45db3ab70092 #1
[    0.965225] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.967382] RIP: 0010:pci_msi_setup_msi_irqs+0x27/0x40
[    0.968378] Code: 90 90 90 0f 1f 44 00 00 48 8b 87 30 03 00 00 89 f2 48 85 c0 74 14 f6 40 28 01 74 0e 48 81 c7 c0 00 00 00 31 f6 e9 29 42 9e ff <0f> 0b b8 ed ff ff ff c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00
[    0.971756] RSP: 0000:ffffc90000017988 EFLAGS: 00010246
[    0.972669] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    0.973901] RDX: 0000000000000005 RSI: 0000000000000005 RDI: ffff888100ee1000
[    0.975391] RBP: 0000000000000005 R08: ffff888101f44d90 R09: 0000000000000228
[    0.976629] R10: 0000000000000001 R11: 0000000000008d3f R12: ffffc90000017b80
[    0.977864] R13: ffff888102312000 R14: ffff888100ee1000 R15: 0000000000000005
[    0.979092] FS:  0000000000000000(0000) GS:ffff88817bd80000(0000) knlGS:0000000000000000
[    0.980473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.981464] CR2: 0000000000000000 CR3: 000000000302e001 CR4: 0000000000770ef0
[    0.982687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.983919] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.985143] PKRU: 55555554
[    0.985625] Call Trace:
[    0.986056]  <TASK>
[    0.986440]  ? __warn+0x80/0x130
[    0.987014]  ? pci_msi_setup_msi_irqs+0x27/0x40
[    0.987810]  ? report_bug+0x18d/0x1c0
[    0.988443]  ? handle_bug+0x3a/0x70
[    0.989026]  ? exc_invalid_op+0x13/0x60
[    0.989672]  ? asm_exc_invalid_op+0x16/0x20
[    0.990374]  ? pci_msi_setup_msi_irqs+0x27/0x40
[    0.991118]  __pci_enable_msix_range+0x325/0x5b0
[    0.991883]  pci_alloc_irq_vectors_affinity+0xa9/0x110
[    0.992698]  vp_find_vqs_msix+0x1a8/0x4c0
[    0.993332]  vp_find_vqs+0x3a/0x1a0
[    0.993893]  vp_modern_find_vqs+0x17/0x70
[    0.994531]  init_vq+0x3ad/0x410
[    0.995051]  ? __pfx_default_calc_sets+0x10/0x10
[    0.995789]  virtblk_probe+0xeb/0xbc0
[    0.996362]  ? up_write+0x74/0x160
[    0.996900]  ? down_write+0x4d/0x80
[    0.997450]  virtio_dev_probe+0x1bc/0x270
[    0.998059]  really_probe+0xc1/0x390
[    0.998626]  ? __pfx___driver_attach+0x10/0x10
[    0.999288]  __driver_probe_device+0x78/0x150
[    0.999924]  driver_probe_device+0x1f/0x90
[    1.000506]  __driver_attach+0xce/0x1c0
[    1.001073]  bus_for_each_dev+0x70/0xc0
[    1.001638]  bus_add_driver+0x112/0x210
[    1.002191]  driver_register+0x55/0x100
[    1.002760]  virtio_blk_init+0x4c/0x90
[    1.003332]  ? __pfx_virtio_blk_init+0x10/0x10
[    1.003974]  do_one_initcall+0x41/0x240
[    1.004510]  ? kernel_init_freeable+0x240/0x4a0
[    1.005142]  kernel_init_freeable+0x321/0x4a0
[    1.005749]  ? __pfx_kernel_init+0x10/0x10
[    1.006311]  kernel_init+0x16/0x1c0
[    1.006798]  ret_from_fork+0x2d/0x50
[    1.007303]  ? __pfx_kernel_init+0x10/0x10
[    1.007883]  ret_from_fork_asm+0x1a/0x30
[    1.008431]  </TASK>
[    1.008748] ---[ end trace 0000000000000000 ]---

Another callstack happens at pci_msi_teardown_msi_irqs().

Actually every PCI device will trigger these two paths. There are only
two callstack dumps because the two places use WARN_ON_ONCE().

What happened is: when irq remapping is enabled, kernel expects all PCI
device(or its parent bridges) appear in some DMA Remapping Hardware unit
Definition(DRHD)'s device scope list and if not, this device's irq domain
will become NULL and that would make this device's MSI functionality
enabling fail.

Per my understanding, only virtualized system can have such a setup: irq
remapping enabled while not all PCI/PCIe devices appear in a DRHD's
device scope.

Enhance the document by mentioning what could happen when bypass_iommu
is used.

For detailed qemu cmdline and guest kernel dmesg, please see:
https://lore.kernel.org/qemu-devel/20240510072519.GA39314@ziqianlu-desk2/

Reported-by: Juro Bystricky <juro.bystricky@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 docs/bypass-iommu.txt | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Michael S. Tsirkin May 22, 2024, 9:28 a.m. UTC | #1
On Wed, May 22, 2024 at 03:40:08PM +0800, Aaron Lu wrote:
> When Intel vIOMMU is used and irq remapping is enabled, using
> bypass_iommu will cause following two callstacks dumped during kernel
> boot and all PCI devices attached to root bridge lose their MSI
> capabilities and fall back to using IOAPIC:
> 
> [    0.960262] ------------[ cut here ]------------
> [    0.961245] WARNING: CPU: 3 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x27/0x40
> [    0.963070] Modules linked in:
> [    0.963695] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc7-00056-g45db3ab70092 #1
> [    0.965225] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [    0.967382] RIP: 0010:pci_msi_setup_msi_irqs+0x27/0x40
> [    0.968378] Code: 90 90 90 0f 1f 44 00 00 48 8b 87 30 03 00 00 89 f2 48 85 c0 74 14 f6 40 28 01 74 0e 48 81 c7 c0 00 00 00 31 f6 e9 29 42 9e ff <0f> 0b b8 ed ff ff ff c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00
> [    0.971756] RSP: 0000:ffffc90000017988 EFLAGS: 00010246
> [    0.972669] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [    0.973901] RDX: 0000000000000005 RSI: 0000000000000005 RDI: ffff888100ee1000
> [    0.975391] RBP: 0000000000000005 R08: ffff888101f44d90 R09: 0000000000000228
> [    0.976629] R10: 0000000000000001 R11: 0000000000008d3f R12: ffffc90000017b80
> [    0.977864] R13: ffff888102312000 R14: ffff888100ee1000 R15: 0000000000000005
> [    0.979092] FS:  0000000000000000(0000) GS:ffff88817bd80000(0000) knlGS:0000000000000000
> [    0.980473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.981464] CR2: 0000000000000000 CR3: 000000000302e001 CR4: 0000000000770ef0
> [    0.982687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.983919] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    0.985143] PKRU: 55555554
> [    0.985625] Call Trace:
> [    0.986056]  <TASK>
> [    0.986440]  ? __warn+0x80/0x130
> [    0.987014]  ? pci_msi_setup_msi_irqs+0x27/0x40
> [    0.987810]  ? report_bug+0x18d/0x1c0
> [    0.988443]  ? handle_bug+0x3a/0x70
> [    0.989026]  ? exc_invalid_op+0x13/0x60
> [    0.989672]  ? asm_exc_invalid_op+0x16/0x20
> [    0.990374]  ? pci_msi_setup_msi_irqs+0x27/0x40
> [    0.991118]  __pci_enable_msix_range+0x325/0x5b0
> [    0.991883]  pci_alloc_irq_vectors_affinity+0xa9/0x110
> [    0.992698]  vp_find_vqs_msix+0x1a8/0x4c0
> [    0.993332]  vp_find_vqs+0x3a/0x1a0
> [    0.993893]  vp_modern_find_vqs+0x17/0x70
> [    0.994531]  init_vq+0x3ad/0x410
> [    0.995051]  ? __pfx_default_calc_sets+0x10/0x10
> [    0.995789]  virtblk_probe+0xeb/0xbc0
> [    0.996362]  ? up_write+0x74/0x160
> [    0.996900]  ? down_write+0x4d/0x80
> [    0.997450]  virtio_dev_probe+0x1bc/0x270
> [    0.998059]  really_probe+0xc1/0x390
> [    0.998626]  ? __pfx___driver_attach+0x10/0x10
> [    0.999288]  __driver_probe_device+0x78/0x150
> [    0.999924]  driver_probe_device+0x1f/0x90
> [    1.000506]  __driver_attach+0xce/0x1c0
> [    1.001073]  bus_for_each_dev+0x70/0xc0
> [    1.001638]  bus_add_driver+0x112/0x210
> [    1.002191]  driver_register+0x55/0x100
> [    1.002760]  virtio_blk_init+0x4c/0x90
> [    1.003332]  ? __pfx_virtio_blk_init+0x10/0x10
> [    1.003974]  do_one_initcall+0x41/0x240
> [    1.004510]  ? kernel_init_freeable+0x240/0x4a0
> [    1.005142]  kernel_init_freeable+0x321/0x4a0
> [    1.005749]  ? __pfx_kernel_init+0x10/0x10
> [    1.006311]  kernel_init+0x16/0x1c0
> [    1.006798]  ret_from_fork+0x2d/0x50
> [    1.007303]  ? __pfx_kernel_init+0x10/0x10
> [    1.007883]  ret_from_fork_asm+0x1a/0x30
> [    1.008431]  </TASK>
> [    1.008748] ---[ end trace 0000000000000000 ]---
> 
> Another callstack happens at pci_msi_teardown_msi_irqs().
> 
> Actually every PCI device will trigger these two paths. There are only
> two callstack dumps because the two places use WARN_ON_ONCE().
> 
> What happened is: when irq remapping is enabled, kernel expects all PCI
> device(or its parent bridges) appear in some DMA Remapping Hardware unit
> Definition(DRHD)'s device scope list and if not, this device's irq domain
> will become NULL and that would make this device's MSI functionality
> enabling fail.
> 
> Per my understanding, only virtualized system can have such a setup: irq
> remapping enabled while not all PCI/PCIe devices appear in a DRHD's
> device scope.
> 
> Enhance the document by mentioning what could happen when bypass_iommu
> is used.
> 
> For detailed qemu cmdline and guest kernel dmesg, please see:
> https://lore.kernel.org/qemu-devel/20240510072519.GA39314@ziqianlu-desk2/
> 
> Reported-by: Juro Bystricky <juro.bystricky@intel.com>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>

Is this issue specific to Linux?

> ---
>  docs/bypass-iommu.txt | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/docs/bypass-iommu.txt b/docs/bypass-iommu.txt
> index e6677bddd3..8226f79104 100644
> --- a/docs/bypass-iommu.txt
> +++ b/docs/bypass-iommu.txt
> @@ -68,6 +68,11 @@ devices might send malicious dma request to virtual machine if there is no
>  iommu isolation. So it would be necessary to only bypass iommu for trusted
>  device.
>  
> +When Intel IOMMU is virtualized, if irq remapping is enabled, PCI and PCIe
> +devices that bypassed vIOMMU will have their MSI/MSI-x functionalities disabled

functionality

> +and fall back to IOAPIC. If this is not desired, disable irq remapping:
> +qemu -device intel-iommu,intremap=off
> +
>  Implementation
>  ==============
>  The bypass iommu feature includes:
> -- 
> 2.45.0
Aaron Lu May 22, 2024, 12:34 p.m. UTC | #2
On Wed, May 22, 2024 at 05:28:50AM -0400, Michael S. Tsirkin wrote:
> On Wed, May 22, 2024 at 03:40:08PM +0800, Aaron Lu wrote:
> > When Intel vIOMMU is used and irq remapping is enabled, using
> > bypass_iommu will cause following two callstacks dumped during kernel
> > boot and all PCI devices attached to root bridge lose their MSI
> > capabilities and fall back to using IOAPIC:
> > 
> > [    0.960262] ------------[ cut here ]------------
> > [    0.961245] WARNING: CPU: 3 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x27/0x40
> > [    0.963070] Modules linked in:
> > [    0.963695] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc7-00056-g45db3ab70092 #1
> > [    0.965225] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [    0.967382] RIP: 0010:pci_msi_setup_msi_irqs+0x27/0x40
> > [    0.968378] Code: 90 90 90 0f 1f 44 00 00 48 8b 87 30 03 00 00 89 f2 48 85 c0 74 14 f6 40 28 01 74 0e 48 81 c7 c0 00 00 00 31 f6 e9 29 42 9e ff <0f> 0b b8 ed ff ff ff c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00
> > [    0.971756] RSP: 0000:ffffc90000017988 EFLAGS: 00010246
> > [    0.972669] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [    0.973901] RDX: 0000000000000005 RSI: 0000000000000005 RDI: ffff888100ee1000
> > [    0.975391] RBP: 0000000000000005 R08: ffff888101f44d90 R09: 0000000000000228
> > [    0.976629] R10: 0000000000000001 R11: 0000000000008d3f R12: ffffc90000017b80
> > [    0.977864] R13: ffff888102312000 R14: ffff888100ee1000 R15: 0000000000000005
> > [    0.979092] FS:  0000000000000000(0000) GS:ffff88817bd80000(0000) knlGS:0000000000000000
> > [    0.980473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.981464] CR2: 0000000000000000 CR3: 000000000302e001 CR4: 0000000000770ef0
> > [    0.982687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    0.983919] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [    0.985143] PKRU: 55555554
> > [    0.985625] Call Trace:
> > [    0.986056]  <TASK>
> > [    0.986440]  ? __warn+0x80/0x130
> > [    0.987014]  ? pci_msi_setup_msi_irqs+0x27/0x40
> > [    0.987810]  ? report_bug+0x18d/0x1c0
> > [    0.988443]  ? handle_bug+0x3a/0x70
> > [    0.989026]  ? exc_invalid_op+0x13/0x60
> > [    0.989672]  ? asm_exc_invalid_op+0x16/0x20
> > [    0.990374]  ? pci_msi_setup_msi_irqs+0x27/0x40
> > [    0.991118]  __pci_enable_msix_range+0x325/0x5b0
> > [    0.991883]  pci_alloc_irq_vectors_affinity+0xa9/0x110
> > [    0.992698]  vp_find_vqs_msix+0x1a8/0x4c0
> > [    0.993332]  vp_find_vqs+0x3a/0x1a0
> > [    0.993893]  vp_modern_find_vqs+0x17/0x70
> > [    0.994531]  init_vq+0x3ad/0x410
> > [    0.995051]  ? __pfx_default_calc_sets+0x10/0x10
> > [    0.995789]  virtblk_probe+0xeb/0xbc0
> > [    0.996362]  ? up_write+0x74/0x160
> > [    0.996900]  ? down_write+0x4d/0x80
> > [    0.997450]  virtio_dev_probe+0x1bc/0x270
> > [    0.998059]  really_probe+0xc1/0x390
> > [    0.998626]  ? __pfx___driver_attach+0x10/0x10
> > [    0.999288]  __driver_probe_device+0x78/0x150
> > [    0.999924]  driver_probe_device+0x1f/0x90
> > [    1.000506]  __driver_attach+0xce/0x1c0
> > [    1.001073]  bus_for_each_dev+0x70/0xc0
> > [    1.001638]  bus_add_driver+0x112/0x210
> > [    1.002191]  driver_register+0x55/0x100
> > [    1.002760]  virtio_blk_init+0x4c/0x90
> > [    1.003332]  ? __pfx_virtio_blk_init+0x10/0x10
> > [    1.003974]  do_one_initcall+0x41/0x240
> > [    1.004510]  ? kernel_init_freeable+0x240/0x4a0
> > [    1.005142]  kernel_init_freeable+0x321/0x4a0
> > [    1.005749]  ? __pfx_kernel_init+0x10/0x10
> > [    1.006311]  kernel_init+0x16/0x1c0
> > [    1.006798]  ret_from_fork+0x2d/0x50
> > [    1.007303]  ? __pfx_kernel_init+0x10/0x10
> > [    1.007883]  ret_from_fork_asm+0x1a/0x30
> > [    1.008431]  </TASK>
> > [    1.008748] ---[ end trace 0000000000000000 ]---
> > 
> > Another callstack happens at pci_msi_teardown_msi_irqs().
> > 
> > Actually every PCI device will trigger these two paths. There are only
> > two callstack dumps because the two places use WARN_ON_ONCE().
> > 
> > What happened is: when irq remapping is enabled, kernel expects all PCI
> > device(or its parent bridges) appear in some DMA Remapping Hardware unit
> > Definition(DRHD)'s device scope list and if not, this device's irq domain
> > will become NULL and that would make this device's MSI functionality
> > enabling fail.
> > 
> > Per my understanding, only virtualized system can have such a setup: irq
> > remapping enabled while not all PCI/PCIe devices appear in a DRHD's
> > device scope.
> > 
> > Enhance the document by mentioning what could happen when bypass_iommu
> > is used.
> > 
> > For detailed qemu cmdline and guest kernel dmesg, please see:
> > https://lore.kernel.org/qemu-devel/20240510072519.GA39314@ziqianlu-desk2/
> > 
> > Reported-by: Juro Bystricky <juro.bystricky@intel.com>
> > Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> 
> Is this issue specific to Linux?

Ah, to be honest, I have never tried any other guest OS.

I just did a quick check using FreeBSD 13.2 and it appears FreeBSD
doesn't enable MSI for PCI devices even without vIOMMU:

root@bsdvm:~ # lspci
... ...
00:03.0 SCSI storage controller: Red Hat, Inc. Virtio block device
        Subsystem: Red Hat, Inc. Device 0002
pcilib: 0000:00:03.0 64-bit device address ignored.
        Flags: bus master, fast devsel, latency 0, IRQ 23  (<-note here)
        I/O ports at c000
        Memory at fc053000 (32-bit, non-prefetchable)
        Memory at <unassigned> (64-bit, prefetchable)
        Memory at <unassigned> (32-bit, non-prefetchable)
        Capabilities: [98] MSI-X: Enable- Count=9 Masked-  (<-and here)

and from dmesg, I saw:
root@bsdvm:~ # dmesg |grep apic
ioapic0 <Version 2.0> irqs 0-23

So it appears MSI functionality is indeed not enabled even without using
vIOMMU. Adding vIOMMU and bypass iommu doesn't change anything.

But I rarely use FreeBSD so I may miss something here.

I do not have Windows VM right now and will report back once I finished
testing there.

> > ---
> >  docs/bypass-iommu.txt | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/docs/bypass-iommu.txt b/docs/bypass-iommu.txt
> > index e6677bddd3..8226f79104 100644
> > --- a/docs/bypass-iommu.txt
> > +++ b/docs/bypass-iommu.txt
> > @@ -68,6 +68,11 @@ devices might send malicious dma request to virtual machine if there is no
> >  iommu isolation. So it would be necessary to only bypass iommu for trusted
> >  device.
> >  
> > +When Intel IOMMU is virtualized, if irq remapping is enabled, PCI and PCIe
> > +devices that bypassed vIOMMU will have their MSI/MSI-x functionalities disabled
> 
> functionality

Will correct this, thanks.

> > +and fall back to IOAPIC. If this is not desired, disable irq remapping:
> > +qemu -device intel-iommu,intremap=off
> > +
> >  Implementation
> >  ==============
> >  The bypass iommu feature includes:
> > -- 
> > 2.45.0
>
Aaron Lu May 23, 2024, 12:52 p.m. UTC | #3
On Wed, May 22, 2024 at 08:34:13PM +0800, Aaron Lu wrote:
> 
> I do not have Windows VM right now and will report back once I finished
> testing there.

Tested with a Windows 10 VM and turnes out virtio pci devices always
use MSI no matter vIOMMU and bypass iommu are specified or not.

So according to the test results of Windows VM and FreeBSD VM, yeah,
it looks like this issue is Linux specific.

Maybe change the wording in the doc like below?

diff --git a/docs/bypass-iommu.txt b/docs/bypass-iommu.txt
index e6677bddd3..fa80a5ce1f 100644
--- a/docs/bypass-iommu.txt
+++ b/docs/bypass-iommu.txt
@@ -68,6 +68,12 @@ devices might send malicious dma request to virtual machine if there is no
 iommu isolation. So it would be necessary to only bypass iommu for trusted
 device.
 
+When Intel IOMMU is virtualized, if irq remapping is enabled, PCI and PCIe
+devices that bypassed vIOMMU will have their MSI/MSI-x functionality disabled
+and fall back to IOAPIC under Linux x86_64 guest. If this is not desired,
+disable irq remapping with:
+qemu -device intel-iommu,intremap=off
+
 Implementation
 ==============
 The bypass iommu feature includes:
Aaron Lu May 23, 2024, 1:15 p.m. UTC | #4
On Thu, May 23, 2024 at 08:52:35PM +0800, Aaron Lu wrote:
> On Wed, May 22, 2024 at 08:34:13PM +0800, Aaron Lu wrote:
> > 
> > I do not have Windows VM right now and will report back once I finished
> > testing there.
> 
> Tested with a Windows 10 VM and turnes out virtio pci devices always
> use MSI no matter vIOMMU and bypass iommu are specified or not.

Just noticed another thing about Windows VM.

If I install the VM without bypass iommu on:
 -machine q35,accel=kvm,kernel-irqchip=split \
 -device intel-iommu \
 -cdrom win10.iso

Then the install went well and after installation, adding bypass iommu:
 -machine q35,accel=kvm,kernel-irqchip=split,default_bus_bypass_iommu=true \
 -device intel-iommu \
 -cdrom win10.iso

doesn't change anything regarding PCI MSI functionality.

But if I install the vm with bypass iommu on:
 -machine q35,accel=kvm,kernel-irqchip=split,default_bus_bypass_iommu=true \
 -device intel-iommu \
 -cdrom win10.iso

then the install couldn't proceed and it appreas to me Windows couldn't
read from the installation iso.

Not sure what exactly happened but looks like something worth mentioning.
diff mbox series

Patch

diff --git a/docs/bypass-iommu.txt b/docs/bypass-iommu.txt
index e6677bddd3..8226f79104 100644
--- a/docs/bypass-iommu.txt
+++ b/docs/bypass-iommu.txt
@@ -68,6 +68,11 @@  devices might send malicious dma request to virtual machine if there is no
 iommu isolation. So it would be necessary to only bypass iommu for trusted
 device.
 
+When Intel IOMMU is virtualized, if irq remapping is enabled, PCI and PCIe
+devices that bypassed vIOMMU will have their MSI/MSI-x functionalities disabled
+and fall back to IOAPIC. If this is not desired, disable irq remapping:
+qemu -device intel-iommu,intremap=off
+
 Implementation
 ==============
 The bypass iommu feature includes: