Message ID | 0-v1-9e466539c244+47b5-secure_msi_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Remove IOMMU_CAP_INTR_REMAP | expand |
On 12/8/22 3:26 PM, Jason Gunthorpe wrote: > - S390 has unconditionally claimed it has secure MSI through the iommu > driver. I'm not sure how it works, or if it even does. Perhaps > zpci_set_airq() pushes the "zdev->gias" to the hypervisor which > limits a device's MSI to only certain KVM contexts (though if true > this would be considered insecure by VFIO) > There are a few layers here. Interrupt isolation and mapping on s390 is accomplished via a mapping table used by a layer of firmware (and can be shared by a hypervisor e.g. qemu/kvm) that sits between the device and the kernel/driver (s390 linux always runs on at least this 'bare-metal hypervisor' firmware layer). Indeed the initial relationship is established via zpci_set_airq -- the "zdev->fh" identifies the device, the "zdev->gisa" (if applicable) identifies the single KVM context that is eligible to receive interrupts related to the specified device as well as the single KVM context allowed to access the device via any zPCI instructions (e.g. config space access). The aibv/noi indicate the vector mappings that are authorized for that device; firmware will typically route the interrupts to the guest without hypervisor involvement once this is established, but the table is shared by the hypervisor so that it can be tapped to complete delivery when necessary. This registration process enables a firmware intermediary that will only pass along MSI from the device that has an associated, previously-authorized vector, associated with either the 'bare-metal hypervisor' (gisa = 0) and/or a specific VM (gisa != 0), depending what was registered as zpci_set_airq.
On Thu, Dec 08, 2022 at 06:37:42PM -0500, Matthew Rosato wrote: > On 12/8/22 3:26 PM, Jason Gunthorpe wrote: > > > - S390 has unconditionally claimed it has secure MSI through the iommu > > driver. I'm not sure how it works, or if it even does. Perhaps > > zpci_set_airq() pushes the "zdev->gias" to the hypervisor which > > limits a device's MSI to only certain KVM contexts (though if true > > this would be considered insecure by VFIO) > > > > There are a few layers here. Interrupt isolation and mapping on > s390 is accomplished via a mapping table used by a layer of firmware > (and can be shared by a hypervisor e.g. qemu/kvm) that sits between > the device and the kernel/driver (s390 linux always runs on at least > this 'bare-metal hypervisor' firmware layer). Indeed the initial > relationship is established via zpci_set_airq -- the "zdev->fh" > identifies the device, the "zdev->gisa" (if applicable) identifies > the single KVM context that is eligible to receive interrupts > related to the specified device as well as the single KVM context > allowed to access the device via any zPCI instructions (e.g. config > space access). The aibv/noi indicate the vector mappings that are > authorized for that device; firmware will typically route the > interrupts to the guest without hypervisor involvement once this is > established, but the table is shared by the hypervisor so that it > can be tapped to complete delivery when necessary. This > registration process enables a firmware intermediary that will only > pass along MSI from the device that has an associated, > previously-authorized vector, associated with either the 'bare-metal > hypervisor' (gisa = 0) and/or a specific VM (gisa != 0), depending > what was registered as zpci_set_airq. I suspected something like this - it technically isn't the same "secure msi" thing since a VFIO userspace with no KVM can trigger bogus MSIs against the kernel. But it has been this way for a long time, let's just document it and leave it be. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, December 9, 2022 4:26 AM > > In real HW "secure MSI" is implemented in a few different ways: > > - x86 uses "interrupt remapping" which is a block that sits between > the device and APIC, that can "remap" the MSI MemWr using per-RID > tables. Part of the remapping is discarding, the per-RID tables > will not contain vectors that have not been enabled for the device. > per-RID tables is true for AMD. However for Intel VT-d it's per-IOMMU remapping table.
On Fri, Dec 09, 2022 at 05:54:46AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Friday, December 9, 2022 4:26 AM > > > > In real HW "secure MSI" is implemented in a few different ways: > > > > - x86 uses "interrupt remapping" which is a block that sits between > > the device and APIC, that can "remap" the MSI MemWr using per-RID > > tables. Part of the remapping is discarding, the per-RID tables > > will not contain vectors that have not been enabled for the device. > > > > per-RID tables is true for AMD. > > However for Intel VT-d it's per-IOMMU remapping table. Sorry, what exactly does that mean? Doesn't the HW inspect the RID to determine what to do with the MSI? Jason
On Fri, Dec 09, 2022 at 10:38:29AM -0400, Jason Gunthorpe wrote: > On Fri, Dec 09, 2022 at 05:54:46AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Friday, December 9, 2022 4:26 AM > > > > > > In real HW "secure MSI" is implemented in a few different ways: > > > > > > - x86 uses "interrupt remapping" which is a block that sits between > > > the device and APIC, that can "remap" the MSI MemWr using per-RID > > > tables. Part of the remapping is discarding, the per-RID tables > > > will not contain vectors that have not been enabled for the device. > > > > > > > per-RID tables is true for AMD. > > > > However for Intel VT-d it's per-IOMMU remapping table. > > Sorry, what exactly does that mean? > > Doesn't the HW inspect the RID to determine what to do with the MSI? Okay, I get it: - x86 uses "interrupt remapping" which is a block that sits between the device and APIC, that can "remap" the MSI MemWr. AMD uses per-RID tables to implement isolation while Intel stores the authorized RID in each IRTE entry. Part of the remapping is discarding, HW will not forward MSIs that don't positively match the tables. Jason
On Thu, Dec 08 2022 at 16:26, Jason Gunthorpe wrote: > [ This would be for v6.3, the series depends on a bunch of stuff in > linux-next. I would be happy to merge it through the iommfd tree ] I'll have a look next week.
[ This would be for v6.3, the series depends on a bunch of stuff in linux-next. I would be happy to merge it through the iommfd tree ] Currently the kernel has two ways to signal the "secure MSI" concept that IOMMU_CAP_INTR_REMAP and irq_domain_check_msi_remap() both lay claim to. Harmonize these into a single irq_domain based check under msi_device_has_secure_msi(). In real HW "secure MSI" is implemented in a few different ways: - x86 uses "interrupt remapping" which is a block that sits between the device and APIC, that can "remap" the MSI MemWr using per-RID tables. Part of the remapping is discarding, the per-RID tables will not contain vectors that have not been enabled for the device. - ARM GICv3 ITS integrates the concept of an out-of-band "device ID" directly into the interrupt controller logic. The tables the GIC checks that determine how to deliver the interrupt through the ITS device table and interrupt translation tables allow limiting which interrupts device IDs can trigger. - S390 has unconditionally claimed it has secure MSI through the iommu driver. I'm not sure how it works, or if it even does. Perhaps zpci_set_airq() pushes the "zdev->gias" to the hypervisor which limits a device's MSI to only certain KVM contexts (though if true this would be considered insecure by VFIO) After this series the "secure MSI" is tagged based only on the irq_domains that the interrupt travels through. For x86 enabling interrupt remapping causes IR irq_domains to be installed in the path, and they can carry the IRQ_DOMAIN_FLAG_SECURE_MSI. For ARM the GICv3 ITS itself already sets the flag when it is running in a secure mode, and S390 simply sets it always through an arch hook since it doesn't use irq_domains at all. This removes the intrusion of entirely IRQ subsystem information into the iommu layer. Linux's iommu_domains abstraction has no bearing at all on the security of MSI. Even if HW linked to the IOMMU may implement the security on x86 implementations, Linux models that HW through the irq_domain, not the iommu_domain. This is on github: https://github.com/jgunthorpe/linux/commits/secure_msi Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Jason Gunthorpe (9): irq: Add msi_device_has_secure_msi() vfio/type1: Check that every device supports IOMMU_CAP_INTR_REMAP vfio/type1: Convert to msi_device_has_secure_msi() iommufd: Convert to msi_device_has_secure_msi() irq: Remove unused irq_domain_check_msi_remap() code irq: Rename MSI_REMAP to SECURE_MSI iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_SECURE_MSI irq/s390: Add arch_is_secure_msi() for s390 iommu: Remove IOMMU_CAP_INTR_REMAP arch/s390/include/asm/msi.h | 12 +++++++++ drivers/iommu/amd/iommu.c | 5 ++-- drivers/iommu/intel/iommu.c | 2 -- drivers/iommu/intel/irq_remapping.c | 3 ++- drivers/iommu/iommufd/device.c | 5 ++-- drivers/iommu/s390-iommu.c | 2 -- drivers/irqchip/irq-gic-v3-its.c | 4 +-- drivers/vfio/vfio_iommu_type1.c | 16 ++++++------ include/linux/iommu.h | 1 - include/linux/irqdomain.h | 27 ++------------------ include/linux/msi.h | 17 +++++++++++++ kernel/irq/irqdomain.c | 39 ----------------------------- kernel/irq/msi.c | 25 ++++++++++++++++++ 13 files changed, 73 insertions(+), 85 deletions(-) create mode 100644 arch/s390/include/asm/msi.h base-commit: 644f4ef9a6ea0e0c65f949bd6b80857d4223c476