Message ID | 1460979793-6621-1-git-send-email-mst@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2016-04-18 at 14:47 +0300, Michael S. Tsirkin wrote: > This adds a flag to enable/disable bypassing the IOMMU by > virtio devices. I'm still deeply unhappy with having this kind of hack in the virtio code at all, as you know. Drivers should just use the DMA API and if the *platform* wants to make it a no-op for a specific device, then it can. Remember, this isn't just virtio either. Don't we have *precisely* the same issue with assigned PCI devices on a system with an emulated Intel IOMMU? The assigned PCI devices aren't covered by the emulated IOMMU, and the platform needs to know to bypass *those* too. Now, we've had this conversation, and we accepted the hack in virtio for now until the platforms (especially SPARC and Power IIRC) can get their act together and make their DMA API implementations not broken. But now you're adding this hack to the public API where we have to support it for ever. Please, can't we avoid that?
On Mon, Apr 18, 2016 at 07:58:37AM -0400, David Woodhouse wrote: > On Mon, 2016-04-18 at 14:47 +0300, Michael S. Tsirkin wrote: > > This adds a flag to enable/disable bypassing the IOMMU by > > virtio devices. > > I'm still deeply unhappy with having this kind of hack in the virtio > code at all, as you know. Drivers should just use the DMA API and if > the *platform* wants to make it a no-op for a specific device, then it > can. > > Remember, this isn't just virtio either. Don't we have *precisely* the > same issue with assigned PCI devices on a system with an emulated Intel > IOMMU? The assigned PCI devices aren't covered by the emulated IOMMU, > and the platform needs to know to bypass *those* too. > > Now, we've had this conversation, and we accepted the hack in virtio > for now until the platforms (especially SPARC and Power IIRC) can get > their act together and make their DMA API implementations not broken. > > But now you're adding this hack to the public API where we have to > support it for ever. Please, can't we avoid that? I'm not sure I understand the issue. The public API is not about how the driver works. It doesn't say "don't use DMA API" anywhere, does it? It's about telling device whether to obey the IOMMU and about discovering whether a device is in fact under the IOMMU. Once DMA API allows bypassing IOMMU per device we'll be able to drop the ugly hack from virtio drivers, simply keying it off the given flag. > -- > dwmw2 > >
On Mon, 2016-04-18 at 16:12 +0300, Michael S. Tsirkin wrote: > I'm not sure I understand the issue. The public API is not about how > the driver works. It doesn't say "don't use DMA API" anywhere, does it? > It's about telling device whether to obey the IOMMU and > about discovering whether a device is in fact under the IOMMU. Apologies, I was wrongly reading this as a kernel patch. After a brief struggle with "telling device whether to obey the IOMMU", which is obviously completely impossible from the guest kernel, I realise my mistake :) So... on x86 how does this get reflected in the DMAR tables that the guest BIOS presents to the guest kernel, so that the guest kernel *knows* which devices are behind which IOMMU? (And are you fixing the case of assigned PCI devices, which aren't behind any IOMMU, at the same time as you answer that? :)
On Mon, Apr 18, 2016 at 10:03:52AM -0400, David Woodhouse wrote: > On Mon, 2016-04-18 at 16:12 +0300, Michael S. Tsirkin wrote: > > I'm not sure I understand the issue. The public API is not about how > > the driver works. It doesn't say "don't use DMA API" anywhere, does it? > > It's about telling device whether to obey the IOMMU and > > about discovering whether a device is in fact under the IOMMU. > > Apologies, I was wrongly reading this as a kernel patch. > > After a brief struggle with "telling device whether to obey the IOMMU", > which is obviously completely impossible from the guest kernel, I > realise my mistake :) > > So... on x86 how does this get reflected in the DMAR tables that the > guest BIOS presents to the guest kernel, so that the guest kernel > *knows* which devices are behind which IOMMU? This patch doesn't change DMAR tables, it creates a way for virtio device to tell guest "I obey what DMAR tables tell you, you can stop doing hacks". And as PPC guys seem adamant that platform tools there are no good for that purpose, there's another bit that says "ignore what platform tells you, I'm not a real device - I'm part of hypervisor and I bypass the IOMMU". > (And are you fixing the case of assigned PCI devices, which aren't > behind any IOMMU, at the same time as you answer that? :) No - Aviv B.D. has patches on list to fix that.
On Mon, 2016-04-18 at 17:23 +0300, Michael S. Tsirkin wrote: > > This patch doesn't change DMAR tables, it creates a way for virtio > device to tell guest "I obey what DMAR tables tell you, you can stop > doing hacks". > > And as PPC guys seem adamant that platform tools there are no good for > that purpose, there's another bit that says "ignore what platform tells > you, I'm not a real device - I'm part of hypervisor and I bypass the > IOMMU". ... +/* Request IOMMU passthrough (if available) + * Without VIRTIO_F_IOMMU_PLATFORM: bypass the IOMMU even if enabled. + * With VIRTIO_F_IOMMU_PLATFORM: suggest disabling IOMMU. + */ +#define VIRTIO_F_IOMMU_PASSTHROUGH 33 + +/* Do not bypass the IOMMU (if configured) */ +#define VIRTIO_F_IOMMU_PLATFORM 34 OK... let's see if I can reconcile those descriptions coherently. Setting (only) VIRTIO_F_IOMMU_PASSTHROUGH indicates to the guest that its own operating system's IOMMU code is expected to be broken, and that the virtio driver should eschew the DMA API? And that the guest OS cannot further assign the affected device to any of *its* nested guests? Not that the broken IOMMU code in said guest OS will know the latter, of course. With VIRTIO_F_IOMMU_PLATFORM set, VIRTIO_F_IOMMU_PASSTHROUGH is just a *hint*, suggesting that the guest OS should *request* a passthrough mapping from the IOMMU? Via a driver??IOMMU API which doesn't yet exist in Linux, since we only have 'iommu=pt' on the command line for that? And having *neither* of those bits sets is the status quo, which means that your OS code might well be broken and need you to eschew the DMA API, but maybe not. -- dwmw2
On Mon, Apr 18, 2016 at 11:22:03AM -0400, David Woodhouse wrote: > On Mon, 2016-04-18 at 17:23 +0300, Michael S. Tsirkin wrote: > > > > This patch doesn't change DMAR tables, it creates a way for virtio > > device to tell guest "I obey what DMAR tables tell you, you can stop > > doing hacks". > > > > And as PPC guys seem adamant that platform tools there are no good for > > that purpose, there's another bit that says "ignore what platform tells > > you, I'm not a real device - I'm part of hypervisor and I bypass the > > IOMMU". > > ... > > +/* Request IOMMU passthrough (if available) > + * Without VIRTIO_F_IOMMU_PLATFORM: bypass the IOMMU even if enabled. > + * With VIRTIO_F_IOMMU_PLATFORM: suggest disabling IOMMU. > + */ > +#define VIRTIO_F_IOMMU_PASSTHROUGH 33 > + > +/* Do not bypass the IOMMU (if configured) */ > +#define VIRTIO_F_IOMMU_PLATFORM 34 > > OK... let's see if I can reconcile those descriptions coherently. > > Setting (only) VIRTIO_F_IOMMU_PASSTHROUGH indicates to the guest that > its own operating system's IOMMU code is expected to be broken, and > that the virtio driver should eschew the DMA API? No - it tells guest that e.g. the ACPI tables (or whatever the equivalent is) do not match reality with respect to this device since IOMMU is ignored by hypervisor. Hypervisor has no idea what does guest IOMMU code do - hopefully it is not actually broken. > And that the guest OS > cannot further assign the affected device to any of *its* nested > guests? Not that the broken IOMMU code in said guest OS will know the > latter, of course. > > With VIRTIO_F_IOMMU_PLATFORM set, VIRTIO_F_IOMMU_PASSTHROUGH is just a > *hint*, suggesting that the guest OS should *request* a passthrough > mapping from the IOMMU? Right. But it'll work correctly if you don't. > Via a driver??IOMMU API which doesn't yet exist > in Linux, since we only have 'iommu=pt' on the command line for that? > > And having *neither* of those bits sets is the status quo, which means > that your OS code might well be broken and need you to eschew the DMA > API, but maybe not. The status quo is that that the IOMMU might well be bypassed and then you need to program physical addresses into the device, but maybe not. If DMA API does not give you physical addresses, you need to bypass it, but hypervisor does not know or care. > > -- > dwmw2 > >
On Mon, 2016-04-18 at 18:30 +0300, Michael S. Tsirkin wrote: > > > Setting (only) VIRTIO_F_IOMMU_PASSTHROUGH indicates to the guest that > > its own operating system's IOMMU code is expected to be broken, and > > that the virtio driver should eschew the DMA API? > > No - it tells guest that e.g. the ACPI tables (or whatever the > equivalent is) do not match reality with respect to this device > since IOMMU is ignored by hypervisor. > Hypervisor has no idea what does guest IOMMU code do - hopefully > it is not actually broken. OK, that makes sense — thanks. So where the platform *does* have a way to coherently tell the guest that some devices are behind and IOMMU and some aren't, we should never see VIRTIO_F_IOMMU_PASSTHROUGH && !VIRTIO_F_IOMMU_PLATFORM. (Except perhaps temporarily on x86 until we *do* fix the DMAR tables to tell the truth; qv.) This should *only* be a crutch for platforms which cannot properly convey that information from the hypervisor to the guest. It should be clearly documented "thou shalt not use this unless you've first attempted to fix the broken platform to get it right for itself". And if we look at it as such... does it make more sense for this to be a more *generic* qemu??guest interface? That way the software hacks can live in the OS IOMMU code where they belong, and prevent assignment to nested guests for example. And can cover cases like assigned PCI devices in existing qemu/x86 which need the same treatment. Put another way: if we're going to add code to the guest OS to look at this information, why can't we add that code in the guest's IOMMU support instead, to look at an out-of-band qemu-specific "ignore IOMMU for these devices" list instead? > The status quo is that that the IOMMU might well be bypassed > and then you need to program physical addresses into the device, > but maybe not. If DMA API does not give you physical addresses, you > need to bypass it, but hypervisor does not know or care. Right. The status quo is that qemu doesn't provide correct information about IOMMU topology to guests, and they have to have heuristics to work out whether to eschew the IOMMU for a given device or not. This is true for virtio and assigned PCI devices alike. Furthermore, some platforms don't *have* a standard way for qemu to 'tell the truth' to the guests, and that's where the real fun comes in. But still, I'd like to see a generic solution for that lack instead of a virtio-specific hack.
On Mon, Apr 18, 2016 at 11:51:41AM -0400, David Woodhouse wrote: > On Mon, 2016-04-18 at 18:30 +0300, Michael S. Tsirkin wrote: > > > > > Setting (only) VIRTIO_F_IOMMU_PASSTHROUGH indicates to the guest that > > > its own operating system's IOMMU code is expected to be broken, and > > > that the virtio driver should eschew the DMA API? > > > > No - it tells guest that e.g. the ACPI tables (or whatever the > > equivalent is) do not match reality with respect to this device > > since IOMMU is ignored by hypervisor. > > Hypervisor has no idea what does guest IOMMU code do - hopefully > > it is not actually broken. > > OK, that makes sense — thanks. > > So where the platform *does* have a way to coherently tell the guest > that some devices are behind and IOMMU and some aren't, we should never > see VIRTIO_F_IOMMU_PASSTHROUGH && !VIRTIO_F_IOMMU_PLATFORM. (Except > perhaps temporarily on x86 until we *do* fix the DMAR tables to tell > the truth; qv.) > > This should *only* be a crutch for platforms which cannot properly > convey that information from the hypervisor to the guest. It should be > clearly documented "thou shalt not use this unless you've first > attempted to fix the broken platform to get it right for itself". > > And if we look at it as such... does it make more sense for this to be > a more *generic* qemu??guest interface? That way the software hacks can > live in the OS IOMMU code where they belong, and prevent assignment to > nested guests for example. And can cover cases like assigned PCI > devices in existing qemu/x86 which need the same treatment. > > Put another way: if we're going to add code to the guest OS to look at > this information, why can't we add that code in the guest's IOMMU > support instead, to look at an out-of-band qemu-specific "ignore IOMMU > for these devices" list instead? I balk at adding more hacks to a broken system. My goals are merely to - make things work correctly with an IOMMU and new guests, so people can use userspace drivers with virtio devices - prevent security risks when guest kernel mistakenly thinks it's protected by an IOMMU, but in fact isn't - avoid breaking any working configurations Looking at guest code, it looks like virtio was always bypassing the IOMMU even if configured, but no other guest driver did. This makes me think the problem where guest drivers ignore the IOMMU is virtio specific and so a virtio specific solution seems cleaner. The problem for assigned devices is IMHO different: they bypass the guest IOMMU too but no guest driver knows about this, so guests do not work. Seems cleaner to fix QEMU to make existing guests work. > > The status quo is that that the IOMMU might well be bypassed > > and then you need to program physical addresses into the device, > > but maybe not. If DMA API does not give you physical addresses, you > > need to bypass it, but hypervisor does not know or care. > > Right. The status quo is that qemu doesn't provide correct information > about IOMMU topology to guests, and they have to have heuristics to > work out whether to eschew the IOMMU for a given device or not. This is > true for virtio and assigned PCI devices alike. True but I think we should fix QEMU to shadow IOMMU page tables for assigned devices. This seems rather possible with VT-D, and there are patches already on list. It looks like this will fix all legacy guests which is much nicer than what you suggest which will only help new guests. > Furthermore, some platforms don't *have* a standard way for qemu to > 'tell the truth' to the guests, and that's where the real fun comes in. > But still, I'd like to see a generic solution for that lack instead of > a virtio-specific hack. But the issue is not just these holes. E.g. with VT-D it is only easy to emulate because there's a "caching mode" hook. It is fundamentally paravirtualization. So a completely generic solution would be a paravirtualized IOMMU interface, replacing VT-D for VMs. It might be justified if many platforms have hard to emulate interfaces. > -- > dwmw2 > >
On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote: > I balk at adding more hacks to a broken system. My goals are > merely to > - make things work correctly with an IOMMU and new guests, > so people can use userspace drivers with virtio devices > - prevent security risks when guest kernel mistakenly thinks > it's protected by an IOMMU, but in fact isn't > - avoid breaking any working configurations AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this. That's just an optimisation, for telling an OS "you don't really need to bother with the IOMMU, even though you it works". There are two main reasons why an operating system might want to use the IOMMU via the DMA API for native drivers: - To protect against driver bugs triggering rogue DMA. - To protect against hardware (or firmware) bugs. With virtio, the first reason still exists. But the second is moot because the device is part of the hypervisor and if the hypervisor is untrustworthy then you're screwed anyway... but then again, in SoC devices you could replace 'hypervisor' with 'chip' and the same is true, isn't it? Is there *really* anything virtio-specific here? Sure, I want my *external* network device on a PCIe card with software- loadable firmware to be behind an IOMMU because I don't trust it as far as I can throw it. But for on-SoC devices surely the situation is *just* the same as devices provided by a hypervisor? And some people want that external network device to use passthrough anyway, for performance reasons. On the whole, there are *plenty* of reasons why we might want to have a passthrough mapping on a per-device basis, and I really struggle to find justification for having this 'hint' in a virtio-specific way. And it's complicating the discussion of the *actual* fix we're looking at. > Looking at guest code, it looks like virtio was always > bypassing the IOMMU even if configured, but no other > guest driver did. > > This makes me think the problem where guest drivers > ignore the IOMMU is virtio specific > and so a virtio specific solution seems cleaner. > > The problem for assigned devices is IMHO different: they bypass > the guest IOMMU too but no guest driver knows about this, > so guests do not work. Seems cleaner to fix QEMU to make > existing guests work. I certainly agree that it's better to fix QEMU. Whether devices are behind an IOMMU or not, the DMAR tables we expose to a guest should tell the truth. Part of the issue here is virtio-specific; part isn't. Basically, we have a conjunction of two separate bugs which happened to work (for virtio) — the IOMMU support in QEMU wasn't working for virtio (and assigned) devices even though it theoretically *should* have been, and the virtio drivers weren't using the DMA API as they theoretically should have been. So there were corner cases like assigned PCI devices, and real hardware implementations of virtio stuff (and perhaps virtio devices being assigned to nested guests) which didn't work. But for the *common* use case, one bug cancelled out the other. Now we want to fix both bugs, and of course that involves carefully coordinating both fixes. I *like* your idea of a flag from the hypervisor which essentially says "trust me, I'm telling the truth now". But don't think that wants to be virtio-specific, because we actually want it to cover *all* the corner cases, not just the common case which *happened* to work before due to the alignment of the two previous bugs. An updated guest OS can look for this flag (in its generic IOMMU code) and can apply a heuristic of its own to work out which devices *aren't* behind the IOMMU, if the flag isn't present. And it can get that right even for assigned devices, so that new kernels can run happily even on today's QEMU instances. And the virtio driver in new kernels should just use the DMA API and expect it to work. Just as the various drivers for assigned PCI devices do. The other interesting case for compatibility is old kernels running in a new QEMU. And for that case, things are likely to break if you suddenly start putting the virtio devices behind an IOMMU. There's nothing you can do on ARM and Power to stop that breakage, since they don't *have* a way to tell legacy guests that certain devices aren't translated. So I suspect you probably can't enable virtio-behind-IOMMU in QEMU *ever* for those platforms as the default behaviour. For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell the truth, and even legacy kernels ought to cope with that. FSVO 'ought to' where I suspect some of them will actually crash with a NULL pointer dereference if there's no "catch-all" DMAR unit in the tables, which puts it back into the same camp as ARM and Power. > True but I think we should fix QEMU to shadow IOMMU > page tables for assigned devices. This seems rather > possible with VT-D, and there are patches already on list. > > It looks like this will fix all legacy guests which is > much nicer than what you suggest which will only help new guests. Yes, we should do that. And in the short term we should at *least* fix the DMAR tables to tell the truth. > > > > Furthermore, some platforms don't *have* a standard way for qemu to > > 'tell the truth' to the guests, and that's where the real fun comes in. > > But still, I'd like to see a generic solution for that lack instead of > > a virtio-specific hack. > But the issue is not just these holes. E.g. with VT-D it is only easy > to emulate because there's a "caching mode" hook. It is fundamentally > paravirtualization. So a completely generic solution would be a > paravirtualized IOMMU interface, replacing VT-D for VMs. It might be > justified if many platforms have hard to emulate interfaces. Hm, I'm not sure I understand the point here. Either there is a way for the hypervisor to expose an IOMMU to a guest (be it full hardware virt, or paravirt). Or there isn't. If there is, it doesn't matter *how* it's done. And if there isn't, the whole discussion is moot anyway.
On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2@infradead.org> wrote: > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > the truth, and even legacy kernels ought to cope with that. > FSVO 'ought to' where I suspect some of them will actually crash with a > NULL pointer dereference if there's no "catch-all" DMAR unit in the > tables, which puts it back into the same camp as ARM and Power. I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU implementation on x86 has always been "experimental", so it just might be okay to change it in a way that causes some older kernels to OOPS. --Andy
On Mon, Apr 18, 2016 at 02:29:33PM -0400, David Woodhouse wrote: > On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote: > > I balk at adding more hacks to a broken system. My goals are > > merely to > > - make things work correctly with an IOMMU and new guests, > > so people can use userspace drivers with virtio devices > > - prevent security risks when guest kernel mistakenly thinks > > it's protected by an IOMMU, but in fact isn't > > - avoid breaking any working configurations > > AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this. > That's just an optimisation, for telling an OS "you don't really need > to bother with the IOMMU, even though you it works". > > There are two main reasons why an operating system might want to use > the IOMMU via the DMA API for native drivers: > - To protect against driver bugs triggering rogue DMA. > - To protect against hardware (or firmware) bugs. > > With virtio, the first reason still exists. But the second is moot > because the device is part of the hypervisor and if the hypervisor is > untrustworthy then you're screwed anyway... but then again, in SoC > devices you could replace 'hypervisor' with 'chip' and the same is > true, isn't it? Is there *really* anything virtio-specific here? > > Sure, I want my *external* network device on a PCIe card with software- > loadable firmware to be behind an IOMMU because I don't trust it as far > as I can throw it. But for on-SoC devices surely the situation is > *just* the same as devices provided by a hypervisor? Depends on how SoC is designed I guess. At the moment specifically QEMU runs everything in a single memory space so an IOMMU table lookup does not offer any extra protection. That's not a must, one could come up with modular hypervisor designs - it's just what we have ATM. > And some people want that external network device to use passthrough > anyway, for performance reasons. That's a policy decision though. > On the whole, there are *plenty* of reasons why we might want to have a > passthrough mapping on a per-device basis, That's true. And driver security also might differ, for example maybe I trust a distro-supplied driver more than an out of tree one. Or maybe I trust a distro-supplied userspace driver more than a closed-source one. And maybe I trust devices from same vendor as my chip more than a 3rd party one. So one can generalize this even further, think about device and driver security/trust level as an integer and platform protection as an integer. If platform IOMMU offers you extra protection over trusting the device (trust < protection) it improves you security to use platform to limit the device. If trust >= protection it just adds overhead without increasing the security. > and I really struggle to > find justification for having this 'hint' in a virtio-specific way. It's a way. No system seems to expose this information in a more generic way at the moment, and it's portable. Would you like to push for some kind of standartization of such a hint? I would be interested to hear about that. > And it's complicating the discussion of the *actual* fix we're looking > at. I guess you are right in that we should split this part out. What I wanted is really the combination PASSTHROUGH && !PLATFORM so that we can say "ok we don't need to guess, this device actually bypasses the IOMMU". And I thought it's a nice idea to use PASSTHROUGH && PLATFORM as a hint since it seemed to be unused. But maybe the best thing to do for now is to say - hosts should not set PASSTHROUGH && PLATFORM - guests should ignore PASSTHROUGH if PLATFORM is set and then we can come back to this optimization idea later if it's appropriate. So yes I think we need the two bits but no we don't need to mix the hint discussion in here. > > Looking at guest code, it looks like virtio was always > > bypassing the IOMMU even if configured, but no other > > guest driver did. > > > > This makes me think the problem where guest drivers > > ignore the IOMMU is virtio specific > > and so a virtio specific solution seems cleaner. > > > > The problem for assigned devices is IMHO different: they bypass > > the guest IOMMU too but no guest driver knows about this, > > so guests do not work. Seems cleaner to fix QEMU to make > > existing guests work. > > I certainly agree that it's better to fix QEMU. Whether devices are > behind an IOMMU or not, the DMAR tables we expose to a guest should > tell the truth. > > Part of the issue here is virtio-specific; part isn't. > > Basically, we have a conjunction of two separate bugs which happened to > work (for virtio) — the IOMMU support in QEMU wasn't working for virtio > (and assigned) devices even though it theoretically *should* have been, > and the virtio drivers weren't using the DMA API as they theoretically > should have been. > > So there were corner cases like assigned PCI devices, and real hardware > implementations of virtio stuff (and perhaps virtio devices being > assigned to nested guests) which didn't work. But for the *common* use > case, one bug cancelled out the other. > > Now we want to fix both bugs, and of course that involves carefully > coordinating both fixes. > > I *like* your idea of a flag from the hypervisor which essentially says > "trust me, I'm telling the truth now". > > But don't think that wants to be virtio-specific, because we actually > want it to cover *all* the corner cases, not just the common case which > *happened* to work before due to the alignment of the two previous > bugs. I guess we differ here. I care about fixing bugs and not breaking working setups but I see little value in working around existing bugs if they can be fixed at their source. Building a generic mechanism to report which devices bypass the IOMMU isn't trivial because there's no simple generic way to address an arbitrary device from hypervisor. For example, DMAR tables commonly use bus numbers for that but these are guest (bios) assigned. So if we used bus numbers we'd have to ask bios to build a custom ACPI table and stick bus numbers there. > An updated guest OS can look for this flag (in its generic IOMMU code) > and can apply a heuristic of its own to work out which devices *aren't* > behind the IOMMU, if the flag isn't present. And it can get that right > even for assigned devices, so that new kernels can run happily even on > today's QEMU instances. With iommu enabled? Point is, I don't really care about that. At this point only a very small number of devices work with this IOMMU at all. I expect that we'll fix assigned devices very soon. > And the virtio driver in new kernels should > just use the DMA API and expect it to work. Just as the various drivers > for assigned PCI devices do. Absolutely but that's a separate discussion. > The other interesting case for compatibility is old kernels running in > a new QEMU. And for that case, things are likely to break if you > suddenly start putting the virtio devices behind an IOMMU. There's > nothing you can do on ARM and Power to stop that breakage, since they > don't *have* a way to tell legacy guests that certain devices aren't > translated. So I suspect you probably can't enable virtio-behind-IOMMU > in QEMU *ever* for those platforms as the default behaviour. > > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > the truth, and even legacy kernels ought to cope with that. I don't see how in that legacy kernels bypassed the DMA API. To me it looks like we either use physical addresses that they give us or they don't work at all (at least without iommu=pt), since the VT-D spec says: DMA requests processed through root-entries with present field Clear result in translation-fault. So I suspect the IOMMU_PLATFORM flag would have to stay off by default for a while. > FSVO 'ought to' where I suspect some of them will actually crash with a > NULL pointer dereference if there's no "catch-all" DMAR unit in the > tables, which puts it back into the same camp as ARM and Power. Right. That would also be an issue. > > > True but I think we should fix QEMU to shadow IOMMU > > page tables for assigned devices. This seems rather > > possible with VT-D, and there are patches already on list. > > > > It looks like this will fix all legacy guests which is > > much nicer than what you suggest which will only help new guests. > > Yes, we should do that. And in the short term we should at *least* fix > the DMAR tables to tell the truth. Right. However, the way timing happens to work, we are out of time to fix it in 2.6 and we are highly likely to have the proper VFIO fix in 2.7. So I'm not sure there's space for a short term fix. > > > > > > Furthermore, some platforms don't *have* a standard way for qemu to > > > 'tell the truth' to the guests, and that's where the real fun comes in. > > > But still, I'd like to see a generic solution for that lack instead of > > > a virtio-specific hack. > > But the issue is not just these holes. E.g. with VT-D it is only easy > > to emulate because there's a "caching mode" hook. It is fundamentally > > paravirtualization. So a completely generic solution would be a > > paravirtualized IOMMU interface, replacing VT-D for VMs. It might be > > justified if many platforms have hard to emulate interfaces. > > Hm, I'm not sure I understand the point here. > > Either there is a way for the hypervisor to expose an IOMMU to a guest > (be it full hardware virt, or paravirt). Or there isn't. > > If there is, it doesn't matter *how* it's done. Well it does matter for people doing it :) > And if there isn't, the > whole discussion is moot anyway. Point was that we can always build a paravirt interface if it does not exist, but it's easier to maintain if it's minimal, being as close to emulating hardware as we can. > -- > dwmw2 > >
On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote: > On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2@infradead.org> wrote: > > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > > the truth, and even legacy kernels ought to cope with that. > > FSVO 'ought to' where I suspect some of them will actually crash with a > > NULL pointer dereference if there's no "catch-all" DMAR unit in the > > tables, which puts it back into the same camp as ARM and Power. > > I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU > implementation on x86 has always been "experimental", so it just might > be okay to change it in a way that causes some older kernels to OOPS. > > --Andy Since it's experimental, it might be OK to change *guest kernels* such that they oops on old QEMU. But guest kernels were not experimental - so we need a QEMU mode that makes them work fine. The more functionality is available in this QEMU mode, the betterm because it's going to be the default for a while. For the same reason, it is preferable to also have new kernels not crash in this mode.
On Tue, 19 Apr 2016 12:13:29 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Mon, Apr 18, 2016 at 02:29:33PM -0400, David Woodhouse wrote: > > On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote: > > > I balk at adding more hacks to a broken system. My goals are > > > merely to > > > - make things work correctly with an IOMMU and new guests, > > > so people can use userspace drivers with virtio devices > > > - prevent security risks when guest kernel mistakenly thinks > > > it's protected by an IOMMU, but in fact isn't > > > - avoid breaking any working configurations > > > > AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this. > > That's just an optimisation, for telling an OS "you don't really need > > to bother with the IOMMU, even though you it works". > > > > There are two main reasons why an operating system might want to use > > the IOMMU via the DMA API for native drivers: > > - To protect against driver bugs triggering rogue DMA. > > - To protect against hardware (or firmware) bugs. > > > > With virtio, the first reason still exists. But the second is moot > > because the device is part of the hypervisor and if the hypervisor is > > untrustworthy then you're screwed anyway... but then again, in SoC > > devices you could replace 'hypervisor' with 'chip' and the same is > > true, isn't it? Is there *really* anything virtio-specific here? > > > > Sure, I want my *external* network device on a PCIe card with software- > > loadable firmware to be behind an IOMMU because I don't trust it as far > > as I can throw it. But for on-SoC devices surely the situation is > > *just* the same as devices provided by a hypervisor? > > Depends on how SoC is designed I guess. At the moment specifically QEMU > runs everything in a single memory space so an IOMMU table lookup does > not offer any extra protection. That's not a must, one could come > up with modular hypervisor designs - it's just what we have ATM. > > > > And some people want that external network device to use passthrough > > anyway, for performance reasons. > > That's a policy decision though. > > > On the whole, there are *plenty* of reasons why we might want to have a > > passthrough mapping on a per-device basis, > > That's true. And driver security also might differ, for example maybe I > trust a distro-supplied driver more than an out of tree one. Or maybe I > trust a distro-supplied userspace driver more than a closed-source one. > And maybe I trust devices from same vendor as my chip more than a 3rd > party one. So one can generalize this even further, think about device > and driver security/trust level as an integer and platform protection as an > integer. > > If platform IOMMU offers you extra protection over trusting the device > (trust < protection) it improves you security to use platform to limit > the device. If trust >= protection it just adds overhead without > increasing the security. > > > and I really struggle to > > find justification for having this 'hint' in a virtio-specific way. > > It's a way. No system seems to expose this information in a more generic > way at the moment, and it's portable. Would you like to push for some > kind of standartization of such a hint? I would be interested > to hear about that. > > > > And it's complicating the discussion of the *actual* fix we're looking > > at. > > I guess you are right in that we should split this part out. > What I wanted is really the combination > PASSTHROUGH && !PLATFORM so that we can say "ok we don't > need to guess, this device actually bypasses the IOMMU". > > And I thought it's a nice idea to use PASSTHROUGH && PLATFORM > as a hint since it seemed to be unused. > But maybe the best thing to do for now is to say > - hosts should not set PASSTHROUGH && PLATFORM > - guests should ignore PASSTHROUGH if PLATFORM is set > > and then we can come back to this optimization idea later > if it's appropriate. > > So yes I think we need the two bits but no we don't need to > mix the hint discussion in here. > > > > Looking at guest code, it looks like virtio was always > > > bypassing the IOMMU even if configured, but no other > > > guest driver did. > > > > > > This makes me think the problem where guest drivers > > > ignore the IOMMU is virtio specific > > > and so a virtio specific solution seems cleaner. > > > > > > The problem for assigned devices is IMHO different: they bypass > > > the guest IOMMU too but no guest driver knows about this, > > > so guests do not work. Seems cleaner to fix QEMU to make > > > existing guests work. > > > > I certainly agree that it's better to fix QEMU. Whether devices are > > behind an IOMMU or not, the DMAR tables we expose to a guest should > > tell the truth. > > > > Part of the issue here is virtio-specific; part isn't. > > > > Basically, we have a conjunction of two separate bugs which happened to > > work (for virtio) — the IOMMU support in QEMU wasn't working for virtio > > (and assigned) devices even though it theoretically *should* have been, > > and the virtio drivers weren't using the DMA API as they theoretically > > should have been. > > > > So there were corner cases like assigned PCI devices, and real hardware > > implementations of virtio stuff (and perhaps virtio devices being > > assigned to nested guests) which didn't work. But for the *common* use > > case, one bug cancelled out the other. > > > > Now we want to fix both bugs, and of course that involves carefully > > coordinating both fixes. > > > > I *like* your idea of a flag from the hypervisor which essentially says > > "trust me, I'm telling the truth now". > > > > But don't think that wants to be virtio-specific, because we actually > > want it to cover *all* the corner cases, not just the common case which > > *happened* to work before due to the alignment of the two previous > > bugs. > > I guess we differ here. I care about fixing bugs and not breaking > working setups but I see little value in working around > existing bugs if they can be fixed at their source. > > Building a generic mechanism to report which devices bypass the IOMMU > isn't trivial because there's no simple generic way to address > an arbitrary device from hypervisor. For example, DMAR tables > commonly use bus numbers for that but these are guest (bios) assigned. > So if we used bus numbers we'd have to ask bios to build a custom > ACPI table and stick bus numbers there. This is incorrect, the DMAR table specifically uses devices paths in order to avoid the issue with guest assigned bus numbers. The only bus number used is the starting bus number, which is generally provided by the platform anyway. Excluding devices isn't necessarily easy with DMAR though, we don't get to be lazy and use the INCLUDE_PCI_ALL flag. Hotplug is also an issue, we either need to hot-add devices into slots where there's already the correct DMAR coverage (or lack of coverage) to represent the inclusion or exclusion or enable dynamic table support. And really it seems like dynamic tables are the only possible way DMAR could support replacing a device that obeys the IOMMU with one that does not at the same address, or vica versa. For any sort of sane implementation, it probably comes down to fully enumerating root bus devices in the DMAR and creating PCI sub-hierarchy entries for certain subordinate buses, leaving others undefined. Devices making use of the IOMMU could only be attached behind those sub-hierarchies and devices not making use of the IOMMU would be downstream of bridges not covered. The management stack would need to know where to place devices. > > An updated guest OS can look for this flag (in its generic IOMMU code) > > and can apply a heuristic of its own to work out which devices *aren't* > > behind the IOMMU, if the flag isn't present. And it can get that right > > even for assigned devices, so that new kernels can run happily even on > > today's QEMU instances. > > With iommu enabled? Point is, I don't really care about that. > At this point only a very small number of devices work with this > IOMMU at all. I expect that we'll fix assigned devices very soon. > > > And the virtio driver in new kernels should > > just use the DMA API and expect it to work. Just as the various drivers > > for assigned PCI devices do. > > Absolutely but that's a separate discussion. > > > The other interesting case for compatibility is old kernels running in > > a new QEMU. And for that case, things are likely to break if you > > suddenly start putting the virtio devices behind an IOMMU. There's > > nothing you can do on ARM and Power to stop that breakage, since they > > don't *have* a way to tell legacy guests that certain devices aren't > > translated. So I suspect you probably can't enable virtio-behind-IOMMU > > in QEMU *ever* for those platforms as the default behaviour. > > > > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > > the truth, and even legacy kernels ought to cope with that. > > I don't see how in that legacy kernels bypassed the DMA API. It's a matter of excluding the device from being explicitly covered by the DMAR AIUI. This is theoretically possible, but I wonder if it actually works for all kernels. > To me it looks like we either use physical addresses that they give us > or they don't work at all (at least without iommu=pt), > since the VT-D spec says: > DMA requests processed through root-entries with present field > Clear result in translation-fault. > > So I suspect the IOMMU_PLATFORM flag would have to stay off > by default for a while. > > > > FSVO 'ought to' where I suspect some of them will actually crash with a > > NULL pointer dereference if there's no "catch-all" DMAR unit in the > > tables, which puts it back into the same camp as ARM and Power. > > Right. That would also be an issue. > > > > > > True but I think we should fix QEMU to shadow IOMMU > > > page tables for assigned devices. This seems rather > > > possible with VT-D, and there are patches already on list. > > > > > > It looks like this will fix all legacy guests which is > > > much nicer than what you suggest which will only help new guests. > > > > Yes, we should do that. And in the short term we should at *least* fix > > the DMAR tables to tell the truth. > > Right. However, the way timing happens to work, we are out of time to > fix it in 2.6 and we are highly likely to have the proper VFIO fix in > 2.7. So I'm not sure there's space for a short term fix. Note that vfio already works with IOMMUs on power, the issue I believe we're talking about for assigned devices bypassing the guest IOMMU is limited to the QEMU VT-d implementation failing to do the proper notifies. Legacy KVM device assignment of course has no idea about the IOMMU because it piggy backs on KVM memory slot mapping instead of operating within the QEMU Memory API like vfio does. The issues I believe we're going to hit with vfio assigned devices and QEMU VT-d are that 1) the vfio IOMMU interface is not designed for the frequency of mapping that a DMA API managed guest device will generate, 2) we have accounting issues for locked pages since each device will run in a separate IOMMU domain, accounted separately, and 3) we don't have a way to expose host grouping to the VM so trying to assign multiple devices from the same group is likely to fail. We'd almost need to put all of the devices within a group behind a conventional PCI bridge in the VM to get them into the same address space, but I suspect QEMU VT-d doesn't take that aliasing into account. > > > > > > > > Furthermore, some platforms don't *have* a standard way for qemu to > > > > 'tell the truth' to the guests, and that's where the real fun comes in. > > > > But still, I'd like to see a generic solution for that lack instead of > > > > a virtio-specific hack. > > > But the issue is not just these holes. E.g. with VT-D it is only easy > > > to emulate because there's a "caching mode" hook. It is fundamentally > > > paravirtualization. So a completely generic solution would be a > > > paravirtualized IOMMU interface, replacing VT-D for VMs. It might be > > > justified if many platforms have hard to emulate interfaces. > > > > Hm, I'm not sure I understand the point here. > > > > Either there is a way for the hypervisor to expose an IOMMU to a guest > > (be it full hardware virt, or paravirt). Or there isn't. > > > > If there is, it doesn't matter *how* it's done. > > Well it does matter for people doing it :) > > > And if there isn't, the > > whole discussion is moot anyway. > > Point was that we can always build a paravirt interface > if it does not exist, but it's easier to maintain > if it's minimal, being as close to emulating hardware > as we can. > > > -- > > dwmw2 > > > > > >
On Apr 19, 2016 2:13 AM, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > I guess you are right in that we should split this part out. > What I wanted is really the combination > PASSTHROUGH && !PLATFORM so that we can say "ok we don't > need to guess, this device actually bypasses the IOMMU". What happens when you use a device like this on Xen or with a similar software translation layer? I think that a "please bypass IOMMU" feature would be better in the PCI, IOMMU, or platform code. For Xen, virtio would still want to use the DMA API, just without translating at the DMAR or hardware level. Doing it in virtio is awkward, because virtio is involved at the device level and the driver level, but the translation might be entirely in between. I think a nicer long-term approach would be to have a way to ask the guest to set up a full 1:1 mapping for performance, but to still handle the case where the guest refuses to do so or where there's more than one translation layer involved. But I agree that this part shouldn't delay the other part of your series. --Andy
On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote: >> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2@infradead.org> wrote: >> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell >> > the truth, and even legacy kernels ought to cope with that. >> > FSVO 'ought to' where I suspect some of them will actually crash with a >> > NULL pointer dereference if there's no "catch-all" DMAR unit in the >> > tables, which puts it back into the same camp as ARM and Power. >> >> I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU >> implementation on x86 has always been "experimental", so it just might >> be okay to change it in a way that causes some older kernels to OOPS. >> >> --Andy > > Since it's experimental, it might be OK to change *guest kernels* > such that they oops on old QEMU. > But guest kernels were not experimental - so we need a QEMU mode that > makes them work fine. The more functionality is available in this QEMU > mode, the betterm because it's going to be the default for a while. For > the same reason, it is preferable to also have new kernels not crash in > this mode. > People add QEMU features that need new guest kernels all time time. If you enable virtio-scsi and try to boot a guest that's too old, it won't work. So I don't see anything fundamentally wrong with saying that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest kernel is too old. It might be annoying, since old kernels do work on actual Q35 hardware, but it at least seems to be that it might be okay. --Andy
On Tue, Apr 19, 2016 at 09:00:27AM -0700, Andy Lutomirski wrote: > On Apr 19, 2016 2:13 AM, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > I guess you are right in that we should split this part out. > > What I wanted is really the combination > > PASSTHROUGH && !PLATFORM so that we can say "ok we don't > > need to guess, this device actually bypasses the IOMMU". > > What happens when you use a device like this on Xen or with a similar > software translation layer? I think you don't use it on Xen since virtio doesn't bypass an IOMMU there. If you do you have misconfigured your device.
On Tue, Apr 19, 2016 at 09:02:14AM -0700, Andy Lutomirski wrote: > On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote: > >> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2@infradead.org> wrote: > >> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > >> > the truth, and even legacy kernels ought to cope with that. > >> > FSVO 'ought to' where I suspect some of them will actually crash with a > >> > NULL pointer dereference if there's no "catch-all" DMAR unit in the > >> > tables, which puts it back into the same camp as ARM and Power. > >> > >> I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU > >> implementation on x86 has always been "experimental", so it just might > >> be okay to change it in a way that causes some older kernels to OOPS. > >> > >> --Andy > > > > Since it's experimental, it might be OK to change *guest kernels* > > such that they oops on old QEMU. > > But guest kernels were not experimental - so we need a QEMU mode that > > makes them work fine. The more functionality is available in this QEMU > > mode, the betterm because it's going to be the default for a while. For > > the same reason, it is preferable to also have new kernels not crash in > > this mode. > > > > People add QEMU features that need new guest kernels all time time. > If you enable virtio-scsi and try to boot a guest that's too old, it > won't work. So I don't see anything fundamentally wrong with saying > that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest > kernel is too old. It might be annoying, since old kernels do work on > actual Q35 hardware, but it at least seems to be that it might be > okay. > > --Andy Yes but we need a mode that makes both old and new kernels work, and that should be the default for a while. this is what the IOMMU_PASSTHROUGH flag was about: old kernels ignore it and bypass DMA API, new kernels go "oh compatibility mode" and bypass the IOMMU within DMA API.
On Tue, Apr 19, 2016 at 9:09 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Apr 19, 2016 at 09:02:14AM -0700, Andy Lutomirski wrote: >> On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote: >> >> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2@infradead.org> wrote: >> >> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell >> >> > the truth, and even legacy kernels ought to cope with that. >> >> > FSVO 'ought to' where I suspect some of them will actually crash with a >> >> > NULL pointer dereference if there's no "catch-all" DMAR unit in the >> >> > tables, which puts it back into the same camp as ARM and Power. >> >> >> >> I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU >> >> implementation on x86 has always been "experimental", so it just might >> >> be okay to change it in a way that causes some older kernels to OOPS. >> >> >> >> --Andy >> > >> > Since it's experimental, it might be OK to change *guest kernels* >> > such that they oops on old QEMU. >> > But guest kernels were not experimental - so we need a QEMU mode that >> > makes them work fine. The more functionality is available in this QEMU >> > mode, the betterm because it's going to be the default for a while. For >> > the same reason, it is preferable to also have new kernels not crash in >> > this mode. >> > >> >> People add QEMU features that need new guest kernels all time time. >> If you enable virtio-scsi and try to boot a guest that's too old, it >> won't work. So I don't see anything fundamentally wrong with saying >> that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest >> kernel is too old. It might be annoying, since old kernels do work on >> actual Q35 hardware, but it at least seems to be that it might be >> okay. >> >> --Andy > > Yes but we need a mode that makes both old and new kernels work, and > that should be the default for a while. this is what the > IOMMU_PASSTHROUGH flag was about: old kernels ignore it and bypass DMA > API, new kernels go "oh compatibility mode" and bypass the IOMMU > within DMA API. I thought that PLATFORM served that purpose. Woudn't the host advertise PLATFORM support and, if the guest doesn't ack it, the host device would skip translation? Or is that problematic for vfio? > > -- > MST
On Tue, Apr 19, 2016 at 09:12:03AM -0700, Andy Lutomirski wrote: > On Tue, Apr 19, 2016 at 9:09 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Apr 19, 2016 at 09:02:14AM -0700, Andy Lutomirski wrote: > >> On Tue, Apr 19, 2016 at 3:27 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote: > >> >> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2@infradead.org> wrote: > >> >> > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > >> >> > the truth, and even legacy kernels ought to cope with that. > >> >> > FSVO 'ought to' where I suspect some of them will actually crash with a > >> >> > NULL pointer dereference if there's no "catch-all" DMAR unit in the > >> >> > tables, which puts it back into the same camp as ARM and Power. > >> >> > >> >> I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU > >> >> implementation on x86 has always been "experimental", so it just might > >> >> be okay to change it in a way that causes some older kernels to OOPS. > >> >> > >> >> --Andy > >> > > >> > Since it's experimental, it might be OK to change *guest kernels* > >> > such that they oops on old QEMU. > >> > But guest kernels were not experimental - so we need a QEMU mode that > >> > makes them work fine. The more functionality is available in this QEMU > >> > mode, the betterm because it's going to be the default for a while. For > >> > the same reason, it is preferable to also have new kernels not crash in > >> > this mode. > >> > > >> > >> People add QEMU features that need new guest kernels all time time. > >> If you enable virtio-scsi and try to boot a guest that's too old, it > >> won't work. So I don't see anything fundamentally wrong with saying > >> that the non-experimental QEMU Q35 IOMMU mode won't boot if the guest > >> kernel is too old. It might be annoying, since old kernels do work on > >> actual Q35 hardware, but it at least seems to be that it might be > >> okay. > >> > >> --Andy > > > > Yes but we need a mode that makes both old and new kernels work, and > > that should be the default for a while. this is what the > > IOMMU_PASSTHROUGH flag was about: old kernels ignore it and bypass DMA > > API, new kernels go "oh compatibility mode" and bypass the IOMMU > > within DMA API. > > I thought that PLATFORM served that purpose. Woudn't the host > advertise PLATFORM support and, if the guest doesn't ack it, the host > device would skip translation? Or is that problematic for vfio? Exactly that's problematic for security. You can't allow guest driver to decide whether device skips security. > > > > -- > > MST > > > > -- > Andy Lutomirski > AMA Capital Management, LLC
On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: > > > I thought that PLATFORM served that purpose. Woudn't the host > > advertise PLATFORM support and, if the guest doesn't ack it, the host > > device would skip translation? Or is that problematic for vfio? > > Exactly that's problematic for security. > You can't allow guest driver to decide whether device skips security. Right. Because fundamentally, this *isn't* a property of the endpoint device, and doesn't live in virtio itself. It's a property of the platform IOMMU, and lives there.
On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: > On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: > > > > > I thought that PLATFORM served that purpose. Woudn't the host > > > advertise PLATFORM support and, if the guest doesn't ack it, the host > > > device would skip translation? Or is that problematic for vfio? > > > > Exactly that's problematic for security. > > You can't allow guest driver to decide whether device skips security. > > Right. Because fundamentally, this *isn't* a property of the endpoint > device, and doesn't live in virtio itself. > > It's a property of the platform IOMMU, and lives there. It's a property of the hypervisor virtio implementation, and lives there.
On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: >> > >> > > I thought that PLATFORM served that purpose. Woudn't the host >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host >> > > device would skip translation? Or is that problematic for vfio? >> > >> > Exactly that's problematic for security. >> > You can't allow guest driver to decide whether device skips security. >> >> Right. Because fundamentally, this *isn't* a property of the endpoint >> device, and doesn't live in virtio itself. >> >> It's a property of the platform IOMMU, and lives there. > > It's a property of the hypervisor virtio implementation, and lives there. It is now, but QEMU could, in principle, change the way it thinks about it so that virtio devices would use the QEMU DMA API but ask QEMU to pass everything through 1:1. This would be entirely invisible to guests but would make it be a property of the IOMMU implementation. At that point, maybe QEMU could find a (platform dependent) way to tell the guest what's going on. FWIW, as far as I can tell, PPC and SPARC really could, in principle, set up 1:1 mappings in the guest so that the virtio devices would work regardless of whether QEMU is ignoring the IOMMU or not -- I think the only obstacle is that the PPC and SPARC 1:1 mappings are currectly set up with an offset. I don't know too much about those platforms, but presumably the layout could be changed so that 1:1 really was 1:1. --Andy
On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote: > On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: > >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: > >> > > >> > > I thought that PLATFORM served that purpose. Woudn't the host > >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host > >> > > device would skip translation? Or is that problematic for vfio? > >> > > >> > Exactly that's problematic for security. > >> > You can't allow guest driver to decide whether device skips security. > >> > >> Right. Because fundamentally, this *isn't* a property of the endpoint > >> device, and doesn't live in virtio itself. > >> > >> It's a property of the platform IOMMU, and lives there. > > > > It's a property of the hypervisor virtio implementation, and lives there. > > It is now, but QEMU could, in principle, change the way it thinks > about it so that virtio devices would use the QEMU DMA API but ask > QEMU to pass everything through 1:1. This would be entirely invisible > to guests but would make it be a property of the IOMMU implementation. > At that point, maybe QEMU could find a (platform dependent) way to > tell the guest what's going on. > > FWIW, as far as I can tell, PPC and SPARC really could, in principle, > set up 1:1 mappings in the guest so that the virtio devices would work > regardless of whether QEMU is ignoring the IOMMU or not -- I think the > only obstacle is that the PPC and SPARC 1:1 mappings are currectly set > up with an offset. I don't know too much about those platforms, but > presumably the layout could be changed so that 1:1 really was 1:1. > > --Andy Sure. Do you see any reason why the decision to do this can't be keyed off the virtio feature bit?
On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote: >> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: >> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: >> >> > >> >> > > I thought that PLATFORM served that purpose. Woudn't the host >> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host >> >> > > device would skip translation? Or is that problematic for vfio? >> >> > >> >> > Exactly that's problematic for security. >> >> > You can't allow guest driver to decide whether device skips security. >> >> >> >> Right. Because fundamentally, this *isn't* a property of the endpoint >> >> device, and doesn't live in virtio itself. >> >> >> >> It's a property of the platform IOMMU, and lives there. >> > >> > It's a property of the hypervisor virtio implementation, and lives there. >> >> It is now, but QEMU could, in principle, change the way it thinks >> about it so that virtio devices would use the QEMU DMA API but ask >> QEMU to pass everything through 1:1. This would be entirely invisible >> to guests but would make it be a property of the IOMMU implementation. >> At that point, maybe QEMU could find a (platform dependent) way to >> tell the guest what's going on. >> >> FWIW, as far as I can tell, PPC and SPARC really could, in principle, >> set up 1:1 mappings in the guest so that the virtio devices would work >> regardless of whether QEMU is ignoring the IOMMU or not -- I think the >> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set >> up with an offset. I don't know too much about those platforms, but >> presumably the layout could be changed so that 1:1 really was 1:1. >> >> --Andy > > Sure. Do you see any reason why the decision to do this can't be > keyed off the virtio feature bit? I can think of three types of virtio host: a) virtio always bypasses the IOMMU. b) virtio never bypasses the IOMMU (unless DMAR tables or similar say it does) -- i.e. virtio works like any other device. c) virtio may bypass the IOMMU depending on what the guest asks it to do. If this is keyed off a virtio feature bit and anyone tries to implement (c), the vfio is going to have a problem. And, if it's keyed off a virtio feature bit, then (a) won't work on Xen or similar setups unless the Xen hypervisor adds a giant and probably unreliable kludge to support it. Meanwhile, 4.6-rc works fine under Xen on a default x86 QEMU configuration, and I'd really like to keep it that way. What could plausibly work using a virtio feature bit is for a device to say "hey, I'm a new device and I support the platform-defined IOMMU mechanism". This bit would be *set* on default IOMMU-less QEMU configurations and on physical virtio PCI cards. The guest could operate accordingly. I'm not sure I see a good way for feature negotiation to work the other direction, though. PPC and SPARC could only set this bit on emulated devices if they know that new guest kernels are in use. --Andy
On Tue, Apr 19, 2016 at 01:27:29PM -0700, Andy Lutomirski wrote: > On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote: > >> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: > >> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: > >> >> > > >> >> > > I thought that PLATFORM served that purpose. Woudn't the host > >> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host > >> >> > > device would skip translation? Or is that problematic for vfio? > >> >> > > >> >> > Exactly that's problematic for security. > >> >> > You can't allow guest driver to decide whether device skips security. > >> >> > >> >> Right. Because fundamentally, this *isn't* a property of the endpoint > >> >> device, and doesn't live in virtio itself. > >> >> > >> >> It's a property of the platform IOMMU, and lives there. > >> > > >> > It's a property of the hypervisor virtio implementation, and lives there. > >> > >> It is now, but QEMU could, in principle, change the way it thinks > >> about it so that virtio devices would use the QEMU DMA API but ask > >> QEMU to pass everything through 1:1. This would be entirely invisible > >> to guests but would make it be a property of the IOMMU implementation. > >> At that point, maybe QEMU could find a (platform dependent) way to > >> tell the guest what's going on. > >> > >> FWIW, as far as I can tell, PPC and SPARC really could, in principle, > >> set up 1:1 mappings in the guest so that the virtio devices would work > >> regardless of whether QEMU is ignoring the IOMMU or not -- I think the > >> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set > >> up with an offset. I don't know too much about those platforms, but > >> presumably the layout could be changed so that 1:1 really was 1:1. > >> > >> --Andy > > > > Sure. Do you see any reason why the decision to do this can't be > > keyed off the virtio feature bit? > > I can think of three types of virtio host: > > a) virtio always bypasses the IOMMU. > > b) virtio never bypasses the IOMMU (unless DMAR tables or similar say > it does) -- i.e. virtio works like any other device. > > c) virtio may bypass the IOMMU depending on what the guest asks it to do. d) some virtio devices bypass the IOMMU and some don't, e.g. it's harder to support IOMMU with vhost. > If this is keyed off a virtio feature bit and anyone tries to > implement (c), the vfio is going to have a problem. And, if it's > keyed off a virtio feature bit, then (a) won't work on Xen or similar > setups unless the Xen hypervisor adds a giant and probably unreliable > kludge to support it. Meanwhile, 4.6-rc works fine under Xen on a > default x86 QEMU configuration, and I'd really like to keep it that > way. > > What could plausibly work using a virtio feature bit is for a device > to say "hey, I'm a new device and I support the platform-defined IOMMU > mechanism". This bit would be *set* on default IOMMU-less QEMU > configurations and on physical virtio PCI cards. And clear on xen. > The guest could > operate accordingly. I'm not sure I see a good way for feature > negotiation to work the other direction, though. I agree. > PPC and SPARC could only set this bit on emulated devices if they know > that new guest kernels are in use. > > --Andy
On Tue, Apr 19, 2016 at 1:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Apr 19, 2016 at 01:27:29PM -0700, Andy Lutomirski wrote: >> On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote: >> >> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: >> >> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: >> >> >> > >> >> >> > > I thought that PLATFORM served that purpose. Woudn't the host >> >> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host >> >> >> > > device would skip translation? Or is that problematic for vfio? >> >> >> > >> >> >> > Exactly that's problematic for security. >> >> >> > You can't allow guest driver to decide whether device skips security. >> >> >> >> >> >> Right. Because fundamentally, this *isn't* a property of the endpoint >> >> >> device, and doesn't live in virtio itself. >> >> >> >> >> >> It's a property of the platform IOMMU, and lives there. >> >> > >> >> > It's a property of the hypervisor virtio implementation, and lives there. >> >> >> >> It is now, but QEMU could, in principle, change the way it thinks >> >> about it so that virtio devices would use the QEMU DMA API but ask >> >> QEMU to pass everything through 1:1. This would be entirely invisible >> >> to guests but would make it be a property of the IOMMU implementation. >> >> At that point, maybe QEMU could find a (platform dependent) way to >> >> tell the guest what's going on. >> >> >> >> FWIW, as far as I can tell, PPC and SPARC really could, in principle, >> >> set up 1:1 mappings in the guest so that the virtio devices would work >> >> regardless of whether QEMU is ignoring the IOMMU or not -- I think the >> >> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set >> >> up with an offset. I don't know too much about those platforms, but >> >> presumably the layout could be changed so that 1:1 really was 1:1. >> >> >> >> --Andy >> > >> > Sure. Do you see any reason why the decision to do this can't be >> > keyed off the virtio feature bit? >> >> I can think of three types of virtio host: >> >> a) virtio always bypasses the IOMMU. >> >> b) virtio never bypasses the IOMMU (unless DMAR tables or similar say >> it does) -- i.e. virtio works like any other device. >> >> c) virtio may bypass the IOMMU depending on what the guest asks it to do. > > d) some virtio devices bypass the IOMMU and some don't, > e.g. it's harder to support IOMMU with vhost. > > >> If this is keyed off a virtio feature bit and anyone tries to >> implement (c), the vfio is going to have a problem. And, if it's >> keyed off a virtio feature bit, then (a) won't work on Xen or similar >> setups unless the Xen hypervisor adds a giant and probably unreliable >> kludge to support it. Meanwhile, 4.6-rc works fine under Xen on a >> default x86 QEMU configuration, and I'd really like to keep it that >> way. >> >> What could plausibly work using a virtio feature bit is for a device >> to say "hey, I'm a new device and I support the platform-defined IOMMU >> mechanism". This bit would be *set* on default IOMMU-less QEMU >> configurations and on physical virtio PCI cards. > > And clear on xen. How? QEMU has no idea that the guest is running Xen.
On Tue, Apr 19, 2016 at 02:07:01PM -0700, Andy Lutomirski wrote: > On Tue, Apr 19, 2016 at 1:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Apr 19, 2016 at 01:27:29PM -0700, Andy Lutomirski wrote: > >> On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote: > >> >> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: > >> >> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: > >> >> >> > > >> >> >> > > I thought that PLATFORM served that purpose. Woudn't the host > >> >> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host > >> >> >> > > device would skip translation? Or is that problematic for vfio? > >> >> >> > > >> >> >> > Exactly that's problematic for security. > >> >> >> > You can't allow guest driver to decide whether device skips security. > >> >> >> > >> >> >> Right. Because fundamentally, this *isn't* a property of the endpoint > >> >> >> device, and doesn't live in virtio itself. > >> >> >> > >> >> >> It's a property of the platform IOMMU, and lives there. > >> >> > > >> >> > It's a property of the hypervisor virtio implementation, and lives there. > >> >> > >> >> It is now, but QEMU could, in principle, change the way it thinks > >> >> about it so that virtio devices would use the QEMU DMA API but ask > >> >> QEMU to pass everything through 1:1. This would be entirely invisible > >> >> to guests but would make it be a property of the IOMMU implementation. > >> >> At that point, maybe QEMU could find a (platform dependent) way to > >> >> tell the guest what's going on. > >> >> > >> >> FWIW, as far as I can tell, PPC and SPARC really could, in principle, > >> >> set up 1:1 mappings in the guest so that the virtio devices would work > >> >> regardless of whether QEMU is ignoring the IOMMU or not -- I think the > >> >> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set > >> >> up with an offset. I don't know too much about those platforms, but > >> >> presumably the layout could be changed so that 1:1 really was 1:1. > >> >> > >> >> --Andy > >> > > >> > Sure. Do you see any reason why the decision to do this can't be > >> > keyed off the virtio feature bit? > >> > >> I can think of three types of virtio host: > >> > >> a) virtio always bypasses the IOMMU. > >> > >> b) virtio never bypasses the IOMMU (unless DMAR tables or similar say > >> it does) -- i.e. virtio works like any other device. > >> > >> c) virtio may bypass the IOMMU depending on what the guest asks it to do. > > > > d) some virtio devices bypass the IOMMU and some don't, > > e.g. it's harder to support IOMMU with vhost. > > > > > >> If this is keyed off a virtio feature bit and anyone tries to > >> implement (c), the vfio is going to have a problem. And, if it's > >> keyed off a virtio feature bit, then (a) won't work on Xen or similar > >> setups unless the Xen hypervisor adds a giant and probably unreliable > >> kludge to support it. Meanwhile, 4.6-rc works fine under Xen on a > >> default x86 QEMU configuration, and I'd really like to keep it that > >> way. > >> > >> What could plausibly work using a virtio feature bit is for a device > >> to say "hey, I'm a new device and I support the platform-defined IOMMU > >> mechanism". This bit would be *set* on default IOMMU-less QEMU > >> configurations and on physical virtio PCI cards. > > > > And clear on xen. > > How? QEMU has no idea that the guest is running Xen. I was under impression xen_enabled() is true in QEMU. Am I wrong?
On Apr 20, 2016 6:14 AM, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Tue, Apr 19, 2016 at 02:07:01PM -0700, Andy Lutomirski wrote: > > On Tue, Apr 19, 2016 at 1:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Tue, Apr 19, 2016 at 01:27:29PM -0700, Andy Lutomirski wrote: > > >> On Tue, Apr 19, 2016 at 1:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > >> > On Tue, Apr 19, 2016 at 11:01:38AM -0700, Andy Lutomirski wrote: > > >> >> On Tue, Apr 19, 2016 at 10:49 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > >> >> > On Tue, Apr 19, 2016 at 12:26:44PM -0400, David Woodhouse wrote: > > >> >> >> On Tue, 2016-04-19 at 19:20 +0300, Michael S. Tsirkin wrote: > > >> >> >> > > > >> >> >> > > I thought that PLATFORM served that purpose. Woudn't the host > > >> >> >> > > advertise PLATFORM support and, if the guest doesn't ack it, the host > > >> >> >> > > device would skip translation? Or is that problematic for vfio? > > >> >> >> > > > >> >> >> > Exactly that's problematic for security. > > >> >> >> > You can't allow guest driver to decide whether device skips security. > > >> >> >> > > >> >> >> Right. Because fundamentally, this *isn't* a property of the endpoint > > >> >> >> device, and doesn't live in virtio itself. > > >> >> >> > > >> >> >> It's a property of the platform IOMMU, and lives there. > > >> >> > > > >> >> > It's a property of the hypervisor virtio implementation, and lives there. > > >> >> > > >> >> It is now, but QEMU could, in principle, change the way it thinks > > >> >> about it so that virtio devices would use the QEMU DMA API but ask > > >> >> QEMU to pass everything through 1:1. This would be entirely invisible > > >> >> to guests but would make it be a property of the IOMMU implementation. > > >> >> At that point, maybe QEMU could find a (platform dependent) way to > > >> >> tell the guest what's going on. > > >> >> > > >> >> FWIW, as far as I can tell, PPC and SPARC really could, in principle, > > >> >> set up 1:1 mappings in the guest so that the virtio devices would work > > >> >> regardless of whether QEMU is ignoring the IOMMU or not -- I think the > > >> >> only obstacle is that the PPC and SPARC 1:1 mappings are currectly set > > >> >> up with an offset. I don't know too much about those platforms, but > > >> >> presumably the layout could be changed so that 1:1 really was 1:1. > > >> >> > > >> >> --Andy > > >> > > > >> > Sure. Do you see any reason why the decision to do this can't be > > >> > keyed off the virtio feature bit? > > >> > > >> I can think of three types of virtio host: > > >> > > >> a) virtio always bypasses the IOMMU. > > >> > > >> b) virtio never bypasses the IOMMU (unless DMAR tables or similar say > > >> it does) -- i.e. virtio works like any other device. > > >> > > >> c) virtio may bypass the IOMMU depending on what the guest asks it to do. > > > > > > d) some virtio devices bypass the IOMMU and some don't, > > > e.g. it's harder to support IOMMU with vhost. > > > > > > > > >> If this is keyed off a virtio feature bit and anyone tries to > > >> implement (c), the vfio is going to have a problem. And, if it's > > >> keyed off a virtio feature bit, then (a) won't work on Xen or similar > > >> setups unless the Xen hypervisor adds a giant and probably unreliable > > >> kludge to support it. Meanwhile, 4.6-rc works fine under Xen on a > > >> default x86 QEMU configuration, and I'd really like to keep it that > > >> way. > > >> > > >> What could plausibly work using a virtio feature bit is for a device > > >> to say "hey, I'm a new device and I support the platform-defined IOMMU > > >> mechanism". This bit would be *set* on default IOMMU-less QEMU > > >> configurations and on physical virtio PCI cards. > > > > > > And clear on xen. > > > > How? QEMU has no idea that the guest is running Xen. > > I was under impression xen_enabled() is true in QEMU. > Am I wrong? I'd be rather surprised, given that QEMU would have to inspect the guest kernel to figure it out. I'm talking about Xen under QEMU. For example, if you feed QEMU a guest disk image that contains Fedora with the xen packages installed, you can boot it and get a grub menu. If you ask grub to boot Xen, you get Xen. If you ask grub to boot Linux directly, you don't get Xen. I assume xen_enabled is for QEMU under Xen, i.e. QEMU, running under Xen, supplying emulated devices to a Xen domU guest. Since QEMU is seeing the guest address space directly, this should be much the same as QEMU !xen_enabled -- if you boot plain Linux, everything works, but if you do Xen -> QEMU -> HVM guest running Xen PV -> Linux, then virtio drivers in the Xen PV Linux guest need to translate addresses. --Andy > > -- > MST
diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index 967cc75..bb6f34e 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -23,7 +23,8 @@ static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev) BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); - if (k->get_dma_as) { + if ((vdev->host_features & (0x1ULL << VIRTIO_F_IOMMU_PLATFORM)) && + k->get_dma_as) { return k->get_dma_as(qbus->parent); } return &address_space_memory; diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b12faa9..34d3041 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -228,7 +228,11 @@ typedef struct VirtIORNGConf VirtIORNGConf; DEFINE_PROP_BIT64("notify_on_empty", _state, _field, \ VIRTIO_F_NOTIFY_ON_EMPTY, true), \ DEFINE_PROP_BIT64("any_layout", _state, _field, \ - VIRTIO_F_ANY_LAYOUT, true) + VIRTIO_F_ANY_LAYOUT, true), \ + DEFINE_PROP_BIT64("iommu_passthrough", _state, _field, \ + VIRTIO_F_IOMMU_PASSTHROUGH, false), \ + DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ + VIRTIO_F_IOMMU_PLATFORM, false) hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n); hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n); diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h index bcc445b..5564dab 100644 --- a/include/standard-headers/linux/virtio_config.h +++ b/include/standard-headers/linux/virtio_config.h @@ -61,4 +61,12 @@ /* v1.0 compliant. */ #define VIRTIO_F_VERSION_1 32 +/* Request IOMMU passthrough (if available) + * Without VIRTIO_F_IOMMU_PLATFORM: bypass the IOMMU even if enabled. + * With VIRTIO_F_IOMMU_PLATFORM: suggest disabling IOMMU. + */ +#define VIRTIO_F_IOMMU_PASSTHROUGH 33 + +/* Do not bypass the IOMMU (if configured) */ +#define VIRTIO_F_IOMMU_PLATFORM 34 #endif /* _LINUX_VIRTIO_CONFIG_H */
This adds a flag to enable/disable bypassing the IOMMU by virtio devices. This is on top of patch http://article.gmane.org/gmane.comp.emulators.qemu/403467 virtio: convert to use DMA api Tested with patchset http://article.gmane.org/gmane.linux.kernel.virtualization/27545 virtio-pci: iommu support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- include/hw/virtio/virtio-access.h | 3 ++- include/hw/virtio/virtio.h | 6 +++++- include/standard-headers/linux/virtio_config.h | 8 ++++++++ 3 files changed, 15 insertions(+), 2 deletions(-)