Message ID | a580069c5e494ffffa668218b6fe3a207b01efec.1739005085.git.nicolinc@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iommu: Add MSI mapping support with nested SMMU | expand |
On Sat, Feb 08 2025 at 01:02, Nicolin Chen wrote: > From: Jason Gunthorpe <jgg@nvidia.com> > > All the iommu cases simply want to override the MSI page's address with > the IOVA that was mapped through the iommu. This doesn't need a cookie > pointer, we just need to store the IOVA and its page size in the > msi_desc. We need to do nothing :) > Instead provide msi_desc_set_iommu_msi_iova() which allows the IOMMU side > to specify the IOVA that the MSI page is placed during > iommu_dma_prepare_msi(). This is stored in the msi_desc and then > iommu_dma_compose_msi_msg() is a simple inline that sets address_hi/lo. > > The next patch will correct the naming. > > This is done because we cannot correctly lock access to group->domain > in Now this gets really confusing. How is the next patch which corrects the naming related the locking problem? > the atomic context that iommu_dma_compose_msi_msg() is called under. Today > the locking miss is tolerable because dma_iommu.c operates under an > assumption that the domain does not change while a driver is probed. > > However iommufd now permits the domain to change while the driver is > probed and VFIO userspace can create races with IRQ changes calling > iommu_dma_prepare_msi/compose_msi_msg() and changing/freeing the > iommu_domain. > > Removing the pointer, and critically, the call to > iommu_get_domain_for_dev() during compose resolves this race. So this change log really fails to follow the basic structure: The context, the problem and the solution Something like this: The IOMMU translation for MSI message addresses is a two stage process: 1) A cookie containing the IOVA address is stored in the MSI descriptor, when a MSI interrupt is allocated 2) The compose callback uses this cookie to apply the translation to the message address. This worked so far because ....... With the iommufd rework this becomes racy, because ... Fix this by storing ... instead of ... .... Hmm? tglx
On Thu, Feb 13, 2025 at 12:54:15PM +0100, Thomas Gleixner wrote: > So this change log really fails to follow the basic structure: > > The context, the problem and the solution The IOMMU translation for MSI message addresses is a two step process, seperated in time: 1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address is stored in the MSI descriptor, when a MSI interrupt is allocated. 2) iommu_dma_compose_msi_msg(): The compose callback uses this cookkie pointer to compute the translated message address. This has an inherent lifetime problem for the pointer stored in the cookie. It must remain valid between the two steps. There is no locking at the irq layer that helps protect the liftime. Today this only works under the assumption that the iommu domain is not changed while MSI interrupts are being programmed. This is true for normal DMA API users within the kernel as the iommu domain is attached before the driver is probed and cannot be changed while a driver is attached. Classic VFIO type1 also prevented changing the iommu domain while VFIO was running as it does not support changing the "container" after starting up. However, iommufd has improved this and we can change the iommu domain during VFIO operation. This allows userspace to directly race VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()). This causes both the cookie pointer and the unlocked call to iommu_get_domain_for_dev() on the MSI translation path to become a potential UAF. Fix the MSI cookie UAF by removing the cookie pointer. The translated message address is already known during iommu_dma_prepare_msi() and can not change. It can simply be stored as an integer in the MSI descriptor. A following patch will correct the iommu_get_domain_for_dev() UAF using the IOMMU group mutex. Ok? Nicolin - lets change the patch structure a little bit can you adjust this patch to leave iommu_dma_compose_msi_msg() in dma-iommu.c and the next patch will be all about renaming and moving it to the MSI core code instead? Easier to explain Thanks, Jason
Hi Nicolin, On Sat, 8 Feb 2025 01:02:34 -0800 Nicolin Chen <nicolinc@nvidia.com> wrote: > -static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, > - const void > *iommu_cookie) +/** > + * iommu_dma_compose_msi_msg() - Apply translation to an MSI message > + * @desc: MSI descriptor prepared by iommu_dma_prepare_msi() > + * @msg: MSI message containing target physical address > + */ Is it IOVA not PA? > +static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, > + struct msi_msg *msg) > { > -} > +#ifdef CONFIG_IRQ_MSI_IOMMU > + if (desc->iommu_msi_page_shift) { > + u64 msi_iova = desc->iommu_msi_iova > + << desc->iommu_msi_page_shift; > + > + msg->address_hi = upper_32_bits(msi_iova); > + msg->address_lo = lower_32_bits(msi_iova) | > + (msg->address_lo & > + ((1 << > desc->iommu_msi_page_shift) - 1)); > + } > #endif > +}
On Thu, Feb 13, 2025 at 12:28:49PM -0800, Jacob Pan wrote: > Hi Nicolin, > > On Sat, 8 Feb 2025 01:02:34 -0800 > Nicolin Chen <nicolinc@nvidia.com> wrote: > > > -static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, > > - const void > > *iommu_cookie) +/** > > + * iommu_dma_compose_msi_msg() - Apply translation to an MSI message > > + * @desc: MSI descriptor prepared by iommu_dma_prepare_msi() > > + * @msg: MSI message containing target physical address > > + */ > Is it IOVA not PA? This is moved from dma-iommu.c so we didn't change that. And I think it's correct to say "target physical address" as the irqchip driver does pass in a PA via @msg. Then iommu_dma_compose_msi_msg() kind of reverse-translates that, overwriting the msg with the "IOVA" from @desc. Thanks Nicolin > > +static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, > > + struct msi_msg *msg) > > { > > -} > > +#ifdef CONFIG_IRQ_MSI_IOMMU > > + if (desc->iommu_msi_page_shift) { > > + u64 msi_iova = desc->iommu_msi_iova > > + << desc->iommu_msi_page_shift; > > + > > + msg->address_hi = upper_32_bits(msi_iova); > > + msg->address_lo = lower_32_bits(msi_iova) | > > + (msg->address_lo & > > + ((1 << > > desc->iommu_msi_page_shift) - 1)); > > + } > > #endif > > +}
Hi Nicolin, On Thu, 13 Feb 2025 13:02:27 -0800 Nicolin Chen <nicolinc@nvidia.com> wrote: > On Thu, Feb 13, 2025 at 12:28:49PM -0800, Jacob Pan wrote: > > Hi Nicolin, > > > > On Sat, 8 Feb 2025 01:02:34 -0800 > > Nicolin Chen <nicolinc@nvidia.com> wrote: > > > > > -static inline void msi_desc_set_iommu_cookie(struct msi_desc > > > *desc, > > > - const void > > > *iommu_cookie) +/** > > > + * iommu_dma_compose_msi_msg() - Apply translation to an MSI > > > message > > > + * @desc: MSI descriptor prepared by iommu_dma_prepare_msi() > > > + * @msg: MSI message containing target physical address > > > + */ > > Is it IOVA not PA? > > This is moved from dma-iommu.c so we didn't change that. > > And I think it's correct to say "target physical address" as the > irqchip driver does pass in a PA via @msg. > > Then iommu_dma_compose_msi_msg() kind of reverse-translates that, > overwriting the msg with the "IOVA" from @desc. > It is just confusing that the msg can be IOVA, not always PA as the comment says. I see it gets deleted anyway in the next patch :)
On Thu, Feb 13, 2025 at 09:57:52AM -0400, Jason Gunthorpe wrote: > Nicolin - lets change the patch structure a little bit can you adjust > this patch to leave iommu_dma_compose_msi_msg() in dma-iommu.c and the > next patch will be all about renaming and moving it to the MSI core > code instead? Easier to explain Ack. I moved that and updated the commit message too. Thanks Nicolin
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 38c65e92ecd0..caee952febd4 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1508,7 +1508,6 @@ static inline void iommu_debugfs_setup(void) {} int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); -void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); #else /* CONFIG_IOMMU_DMA */ @@ -1524,11 +1523,6 @@ static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_a { return 0; } - -static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) -{ -} - #endif /* CONFIG_IOMMU_DMA */ /* diff --git a/include/linux/msi.h b/include/linux/msi.h index b10093c4d00e..74c6a823f157 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -166,6 +166,10 @@ struct msi_desc_data { * @dev: Pointer to the device which uses this descriptor * @msg: The last set MSI message cached for reuse * @affinity: Optional pointer to a cpu affinity mask for this descriptor + * @iommu_msi_iova: Optional IOVA from the IOMMU to override the msi_addr. + * Only used if iommu_msi_page_shift != 0 + * @iommu_msi_page_shift: Indicates how many bits of the original address + * should be preserved when using iommu_msi_iova. * @sysfs_attr: Pointer to sysfs device attribute * * @write_msi_msg: Callback that may be called when the MSI message @@ -184,7 +188,8 @@ struct msi_desc { struct msi_msg msg; struct irq_affinity_desc *affinity; #ifdef CONFIG_IRQ_MSI_IOMMU - const void *iommu_cookie; + u64 iommu_msi_iova : 58; + u64 iommu_msi_page_shift : 6; #endif #ifdef CONFIG_SYSFS struct device_attribute *sysfs_attrs; @@ -285,28 +290,36 @@ struct msi_desc *msi_next_desc(struct device *dev, unsigned int domid, #define msi_desc_to_dev(desc) ((desc)->dev) -#ifdef CONFIG_IRQ_MSI_IOMMU -static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) -{ - return desc->iommu_cookie; -} - -static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, - const void *iommu_cookie) +static inline void msi_desc_set_iommu_msi_iova(struct msi_desc *desc, + u64 msi_iova, + unsigned int page_shift) { - desc->iommu_cookie = iommu_cookie; -} -#else -static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) -{ - return NULL; +#ifdef CONFIG_IRQ_MSI_IOMMU + desc->iommu_msi_iova = msi_iova >> page_shift; + desc->iommu_msi_page_shift = page_shift; +#endif } -static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, - const void *iommu_cookie) +/** + * iommu_dma_compose_msi_msg() - Apply translation to an MSI message + * @desc: MSI descriptor prepared by iommu_dma_prepare_msi() + * @msg: MSI message containing target physical address + */ +static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, + struct msi_msg *msg) { -} +#ifdef CONFIG_IRQ_MSI_IOMMU + if (desc->iommu_msi_page_shift) { + u64 msi_iova = desc->iommu_msi_iova + << desc->iommu_msi_page_shift; + + msg->address_hi = upper_32_bits(msi_iova); + msg->address_lo = lower_32_bits(msi_iova) | + (msg->address_lo & + ((1 << desc->iommu_msi_page_shift) - 1)); + } #endif +} int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid, struct msi_desc *init_desc); diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 2a9fa0c8cc00..bf91e014d179 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1815,7 +1815,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) static DEFINE_MUTEX(msi_prepare_lock); /* see below */ if (!domain || !domain->iova_cookie) { - desc->iommu_cookie = NULL; + msi_desc_set_iommu_msi_iova(desc, 0, 0); return 0; } @@ -1827,33 +1827,13 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) mutex_lock(&msi_prepare_lock); msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); mutex_unlock(&msi_prepare_lock); - - msi_desc_set_iommu_cookie(desc, msi_page); - if (!msi_page) return -ENOMEM; - return 0; -} -/** - * iommu_dma_compose_msi_msg() - Apply translation to an MSI message - * @desc: MSI descriptor prepared by iommu_dma_prepare_msi() - * @msg: MSI message containing target physical address - */ -void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) -{ - struct device *dev = msi_desc_to_dev(desc); - const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - const struct iommu_dma_msi_page *msi_page; - - msi_page = msi_desc_get_iommu_cookie(desc); - - if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) - return; - - msg->address_hi = upper_32_bits(msi_page->iova); - msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; - msg->address_lo += lower_32_bits(msi_page->iova); + msi_desc_set_iommu_msi_iova( + desc, msi_page->iova, + ilog2(cookie_msi_granule(domain->iova_cookie))); + return 0; } static int iommu_dma_init(void)