[iommufd,0/9] Remove IOMMU_CAP_INTR_REMAP

Message ID	0-v1-9e466539c244+47b5-secure_msi_jgg@nvidia.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> From: Jason Gunthorpe <jgg@nvidia.com> To: Alexander Gordeev <agordeev@linux.ibm.com>, Alex Williamson <alex.williamson@redhat.com>, Lu Baolu <baolu.lu@linux.intel.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Cornelia Huck <cohuck@redhat.com>, David Woodhouse <dwmw2@infradead.org>, Gerald Schaefer <gerald.schaefer@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>, Kevin Tian <kevin.tian@intel.com>, kvm@vger.kernel.org, linux-s390@vger.kernel.org, Marc Zyngier <maz@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>, Sven Schnelle <svens@linux.ibm.com>, Thomas Gleixner <tglx@linutronix.de>, Will Deacon <will@kernel.org> Cc: Bharat Bhushan <bharat.bhushan@nxp.com>, Christian Borntraeger <borntraeger@de.ibm.com>, Eric Auger <eric.auger@redhat.com>, Eric Farman <farman@linux.ibm.com>, Marc Zyngier <marc.zyngier@arm.com>, Matthew Rosato <mjrosato@linux.ibm.com>, Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>, Will Deacon <will.deacon@arm.com> Subject: [PATCH iommufd 0/9] Remove IOMMU_CAP_INTR_REMAP Date: Thu, 8 Dec 2022 16:26:27 -0400 Message-Id: <0-v1-9e466539c244+47b5-secure_msi_jgg@nvidia.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Precedence: bulk
Series	Remove IOMMU_CAP_INTR_REMAP \| expand [iommufd,0/9] Remove IOMMU_CAP_INTR_REMAP [iommufd,1/9] irq: Add msi_device_has_secure_msi() [iommufd,2/9] vfio/type1: Check that every device supports IOMMU_CAP_INTR_REMAP [iommufd,3/9] vfio/type1: Convert to msi_device_has_secure_msi() [iommufd,4/9] iommufd: Convert to msi_device_has_secure_msi() [iommufd,5/9] irq: Remove unused irq_domain_check_msi_remap() code [iommufd,6/9] irq: Rename MSI_REMAP to SECURE_MSI [iommufd,7/9] iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_SECURE_MSI [iommufd,8/9] irq/s390: Add arch_is_secure_msi() for s390 [iommufd,9/9] iommu: Remove IOMMU_CAP_INTR_REMAP

Jason Gunthorpe Dec. 8, 2022, 8:26 p.m. UTC

[ This would be for v6.3, the series depends on a bunch of stuff in
linux-next. I would be happy to merge it through the iommfd tree ]

Currently the kernel has two ways to signal the "secure MSI" concept that
IOMMU_CAP_INTR_REMAP and irq_domain_check_msi_remap() both lay claim to.

Harmonize these into a single irq_domain based check under
msi_device_has_secure_msi().

In real HW "secure MSI" is implemented in a few different ways:

 - x86 uses "interrupt remapping" which is a block that sits between
   the device and APIC, that can "remap" the MSI MemWr using per-RID
   tables. Part of the remapping is discarding, the per-RID tables
   will not contain vectors that have not been enabled for the device.

 - ARM GICv3 ITS integrates the concept of an out-of-band "device ID"
   directly into the interrupt controller logic. The tables the GIC checks
   that determine how to deliver the interrupt through the ITS device table
   and interrupt translation tables allow limiting which interrupts device
   IDs can trigger.

 - S390 has unconditionally claimed it has secure MSI through the iommu
   driver. I'm not sure how it works, or if it even does. Perhaps
   zpci_set_airq() pushes the "zdev->gias" to the hypervisor which
   limits a device's MSI to only certain KVM contexts (though if true
   this would be considered insecure by VFIO)

After this series the "secure MSI" is tagged based only on the irq_domains
that the interrupt travels through. For x86 enabling interrupt remapping
causes IR irq_domains to be installed in the path, and they can carry the
IRQ_DOMAIN_FLAG_SECURE_MSI. For ARM the GICv3 ITS itself already sets the
flag when it is running in a secure mode, and S390 simply sets it always
through an arch hook since it doesn't use irq_domains at all.

This removes the intrusion of entirely IRQ subsystem information into the
iommu layer. Linux's iommu_domains abstraction has no bearing at all on
the security of MSI. Even if HW linked to the IOMMU may implement the
security on x86 implementations, Linux models that HW through the
irq_domain, not the iommu_domain.

This is on github: https://github.com/jgunthorpe/linux/commits/secure_msi

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (9):
  irq: Add msi_device_has_secure_msi()
  vfio/type1: Check that every device supports IOMMU_CAP_INTR_REMAP
  vfio/type1: Convert to msi_device_has_secure_msi()
  iommufd: Convert to msi_device_has_secure_msi()
  irq: Remove unused irq_domain_check_msi_remap() code
  irq: Rename MSI_REMAP to SECURE_MSI
  iommu/x86: Replace IOMMU_CAP_INTR_REMAP with
    IRQ_DOMAIN_FLAG_SECURE_MSI
  irq/s390: Add arch_is_secure_msi() for s390
  iommu: Remove IOMMU_CAP_INTR_REMAP

 arch/s390/include/asm/msi.h         | 12 +++++++++
 drivers/iommu/amd/iommu.c           |  5 ++--
 drivers/iommu/intel/iommu.c         |  2 --
 drivers/iommu/intel/irq_remapping.c |  3 ++-
 drivers/iommu/iommufd/device.c      |  5 ++--
 drivers/iommu/s390-iommu.c          |  2 --
 drivers/irqchip/irq-gic-v3-its.c    |  4 +--
 drivers/vfio/vfio_iommu_type1.c     | 16 ++++++------
 include/linux/iommu.h               |  1 -
 include/linux/irqdomain.h           | 27 ++------------------
 include/linux/msi.h                 | 17 +++++++++++++
 kernel/irq/irqdomain.c              | 39 -----------------------------
 kernel/irq/msi.c                    | 25 ++++++++++++++++++
 13 files changed, 73 insertions(+), 85 deletions(-)
 create mode 100644 arch/s390/include/asm/msi.h


base-commit: 644f4ef9a6ea0e0c65f949bd6b80857d4223c476

Matthew Rosato Dec. 8, 2022, 11:37 p.m. UTC | #1

On 12/8/22 3:26 PM, Jason Gunthorpe wrote:

>  - S390 has unconditionally claimed it has secure MSI through the iommu
>    driver. I'm not sure how it works, or if it even does. Perhaps
>    zpci_set_airq() pushes the "zdev->gias" to the hypervisor which
>    limits a device's MSI to only certain KVM contexts (though if true
>    this would be considered insecure by VFIO)
> 

There are a few layers here.  Interrupt isolation and mapping on s390 is accomplished via a mapping table used by a layer of firmware (and can be shared by a hypervisor e.g. qemu/kvm) that sits between the device and the kernel/driver (s390 linux always runs on at least this 'bare-metal hypervisor' firmware layer).  Indeed the initial relationship is established via zpci_set_airq -- the "zdev->fh" identifies the device, the "zdev->gisa" (if applicable) identifies the single KVM context that is eligible to receive interrupts related to the specified device as well as the single KVM context allowed to access the device via any zPCI instructions (e.g. config space access).  The aibv/noi indicate the vector mappings that are authorized for that device; firmware will typically route the interrupts to the guest without hypervisor involvement once this is established, but the table is shared by the hypervisor so that it can be tapped to complete delivery when necessary.  This registration process enables a firmware intermediary that will only pass along MSI from the device that has an associated, previously-authorized vector, associated with either the 'bare-metal hypervisor' (gisa = 0) and/or a specific VM (gisa != 0), depending what was registered as zpci_set_airq.

Jason Gunthorpe Dec. 9, 2022, 12:42 a.m. UTC | #2

On Thu, Dec 08, 2022 at 06:37:42PM -0500, Matthew Rosato wrote:
> On 12/8/22 3:26 PM, Jason Gunthorpe wrote:
> 
> >  - S390 has unconditionally claimed it has secure MSI through the iommu
> >    driver. I'm not sure how it works, or if it even does. Perhaps
> >    zpci_set_airq() pushes the "zdev->gias" to the hypervisor which
> >    limits a device's MSI to only certain KVM contexts (though if true
> >    this would be considered insecure by VFIO)
> > 
> 
> There are a few layers here.  Interrupt isolation and mapping on
> s390 is accomplished via a mapping table used by a layer of firmware
> (and can be shared by a hypervisor e.g. qemu/kvm) that sits between
> the device and the kernel/driver (s390 linux always runs on at least
> this 'bare-metal hypervisor' firmware layer).  Indeed the initial
> relationship is established via zpci_set_airq -- the "zdev->fh"
> identifies the device, the "zdev->gisa" (if applicable) identifies
> the single KVM context that is eligible to receive interrupts
> related to the specified device as well as the single KVM context
> allowed to access the device via any zPCI instructions (e.g. config
> space access).  The aibv/noi indicate the vector mappings that are
> authorized for that device; firmware will typically route the
> interrupts to the guest without hypervisor involvement once this is
> established, but the table is shared by the hypervisor so that it
> can be tapped to complete delivery when necessary.  This
> registration process enables a firmware intermediary that will only
> pass along MSI from the device that has an associated,
> previously-authorized vector, associated with either the 'bare-metal
> hypervisor' (gisa = 0) and/or a specific VM (gisa != 0), depending
> what was registered as zpci_set_airq.

I suspected something like this - it technically isn't the same
"secure msi" thing since a VFIO userspace with no KVM can trigger
bogus MSIs against the kernel.

But it has been this way for a long time, let's just document it and
leave it be.

Jason

Tian, Kevin Dec. 9, 2022, 5:54 a.m. UTC | #3

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, December 9, 2022 4:26 AM
> 
> In real HW "secure MSI" is implemented in a few different ways:
> 
>  - x86 uses "interrupt remapping" which is a block that sits between
>    the device and APIC, that can "remap" the MSI MemWr using per-RID
>    tables. Part of the remapping is discarding, the per-RID tables
>    will not contain vectors that have not been enabled for the device.
> 

per-RID tables is true for AMD.

However for Intel VT-d it's per-IOMMU remapping table.

Jason Gunthorpe Dec. 9, 2022, 2:38 p.m. UTC | #4

On Fri, Dec 09, 2022 at 05:54:46AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, December 9, 2022 4:26 AM
> > 
> > In real HW "secure MSI" is implemented in a few different ways:
> > 
> >  - x86 uses "interrupt remapping" which is a block that sits between
> >    the device and APIC, that can "remap" the MSI MemWr using per-RID
> >    tables. Part of the remapping is discarding, the per-RID tables
> >    will not contain vectors that have not been enabled for the device.
> > 
> 
> per-RID tables is true for AMD.
> 
> However for Intel VT-d it's per-IOMMU remapping table.

Sorry, what exactly does that mean?

Doesn't the HW inspect the RID to determine what to do with the MSI?

Jason

Jason Gunthorpe Dec. 9, 2022, 3:21 p.m. UTC | #5

On Fri, Dec 09, 2022 at 10:38:29AM -0400, Jason Gunthorpe wrote:
> On Fri, Dec 09, 2022 at 05:54:46AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, December 9, 2022 4:26 AM
> > > 
> > > In real HW "secure MSI" is implemented in a few different ways:
> > > 
> > >  - x86 uses "interrupt remapping" which is a block that sits between
> > >    the device and APIC, that can "remap" the MSI MemWr using per-RID
> > >    tables. Part of the remapping is discarding, the per-RID tables
> > >    will not contain vectors that have not been enabled for the device.
> > > 
> > 
> > per-RID tables is true for AMD.
> > 
> > However for Intel VT-d it's per-IOMMU remapping table.
> 
> Sorry, what exactly does that mean?
> 
> Doesn't the HW inspect the RID to determine what to do with the MSI?

Okay, I get it:

 - x86 uses "interrupt remapping" which is a block that sits between
   the device and APIC, that can "remap" the MSI MemWr. AMD uses per-RID
   tables to implement isolation while Intel stores the authorized RID in
   each IRTE entry. Part of the remapping is discarding, HW will not
   forward MSIs that don't positively match the tables.

Jason

Thomas Gleixner Dec. 9, 2022, 7:57 p.m. UTC | #6

On Thu, Dec 08 2022 at 16:26, Jason Gunthorpe wrote:
> [ This would be for v6.3, the series depends on a bunch of stuff in
> linux-next. I would be happy to merge it through the iommfd tree ]

I'll have a look next week.

[iommufd,0/9] Remove IOMMU_CAP_INTR_REMAP

Message

Comments