Message ID | 20231222104108.18499-4-haifeng.zhao@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fix vt-d hard lockup when hotplug ATS capable device | expand |
On 12/22/2023 6:41 PM, Ethan Zhao wrote: > Even the devTLB invalidation request is just submitted and waiting it > to be done/timeout in qi_submit_sync(), it is possible device is removed > or powered-off. try to break it out in such rare but possible case. > > This patch is sent for more comment. not tested, only passed compiling. > > Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> > --- > drivers/iommu/intel/dmar.c | 3 ++- > drivers/iommu/intel/iommu.c | 36 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 38 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c > index 23cb80d62a9a..d8637ab93387 100644 > --- a/drivers/iommu/intel/dmar.c > +++ b/drivers/iommu/intel/dmar.c > @@ -1422,7 +1422,8 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, > */ > writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG); > > - while (qi->desc_status[wait_index] != QI_DONE) { > + while (qi->desc_status[wait_index] != QI_DONE && > + qi->desc_status[wait_index] != QI_ABORT) { Another way is checking pci_device_is_present() here and bail out, how about it ? > /* > * We will leave the interrupts disabled, to prevent interrupt > * context to queue another cmd while a cmd is already submitted > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 897159dba47d..33075d0688bc 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -4472,10 +4472,46 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) > return &iommu->iommu; > } > > +static void intel_iommu_abort_devtlib_invalidate(struct device *dev) > +{ > + struct device_domain_info *info = dev_iommu_priv_get(dev); > + struct intel_iommu *iommu = info->iommu; > + struct q_inval *qi = iommu->qi; > + struct qi_desc *desc, *idesc; > + int index, offset, shift; > + u16 sid, qdep, pfsid > + unsigned long flags; > + > + if (!dev_is_pci(info->dev) || !info->ats_enabled || !qi) > + return; > + if (!pci_dev_is_disconnected(to_pci_dev(dev))) > + return; > + > + sid = info->bus << 8 | info->devfn; > + qdep = info->ats_qdep; > + pfsid = info->pfsid; > + > + raw_spin_lock_irqsave(&qi->q_lock, flags); > + for (index = 1; index < QI_LENGTH; index++) { > + offset = index << shift; > + desc = qi->desc + offset; > + if (desc->qw0 & QI_IWD_TYPE) { > + offset = (index-1) << shift; > + idesc = qi->desc + offset; > + if (idesc->qw0 & QI_DEV_EIOTLB_SID(sid)) { > + if (qi->desc_status[index] == QI_IN_USE) > + qi->desc_status[index] = QI_ABORT; > + } > + } > + } > + raw_spin_unlock_irqrestore(&qi->q_lock, flags); > + > +} > static void intel_iommu_release_device(struct device *dev) > { > struct device_domain_info *info = dev_iommu_priv_get(dev); > > + intel_iommu_abort_devtlib_invalidate(dev); Wonder if there is lock something prevent pciehp_ist() supprise_removal interrupt response re-enter to get here when another safe_removal is in process, if so , see above > dmar_remove_one_dev_info(dev); > intel_pasid_free_table(dev); > intel_iommu_debugfs_remove_dev(info);
On 12/23/2023 7:35 AM, Ethan Zhao wrote: > > On 12/22/2023 6:41 PM, Ethan Zhao wrote: >> Even the devTLB invalidation request is just submitted and waiting it >> to be done/timeout in qi_submit_sync(), it is possible device is removed >> or powered-off. try to break it out in such rare but possible case. >> >> This patch is sent for more comment. not tested, only passed compiling. >> >> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> >> --- >> drivers/iommu/intel/dmar.c | 3 ++- >> drivers/iommu/intel/iommu.c | 36 ++++++++++++++++++++++++++++++++++++ >> 2 files changed, 38 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c >> index 23cb80d62a9a..d8637ab93387 100644 >> --- a/drivers/iommu/intel/dmar.c >> +++ b/drivers/iommu/intel/dmar.c >> @@ -1422,7 +1422,8 @@ int qi_submit_sync(struct intel_iommu *iommu, >> struct qi_desc *desc, >> */ >> writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG); >> - while (qi->desc_status[wait_index] != QI_DONE) { >> + while (qi->desc_status[wait_index] != QI_DONE && >> + qi->desc_status[wait_index] != QI_ABORT) { > > Another way is checking pci_device_is_present() here and bail out, > > how about it ? > >> /* >> * We will leave the interrupts disabled, to prevent interrupt >> * context to queue another cmd while a cmd is already >> submitted >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >> index 897159dba47d..33075d0688bc 100644 >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -4472,10 +4472,46 @@ static struct iommu_device >> *intel_iommu_probe_device(struct device *dev) >> return &iommu->iommu; >> } >> +static void intel_iommu_abort_devtlib_invalidate(struct device *dev) >> +{ >> + struct device_domain_info *info = dev_iommu_priv_get(dev); >> + struct intel_iommu *iommu = info->iommu; >> + struct q_inval *qi = iommu->qi; >> + struct qi_desc *desc, *idesc; >> + int index, offset, shift; >> + u16 sid, qdep, pfsid >> + unsigned long flags; >> + >> + if (!dev_is_pci(info->dev) || !info->ats_enabled || !qi) >> + return; >> + if (!pci_dev_is_disconnected(to_pci_dev(dev))) >> + return; >> + >> + sid = info->bus << 8 | info->devfn; >> + qdep = info->ats_qdep; >> + pfsid = info->pfsid; >> + >> + raw_spin_lock_irqsave(&qi->q_lock, flags); >> + for (index = 1; index < QI_LENGTH; index++) { >> + offset = index << shift; >> + desc = qi->desc + offset; >> + if (desc->qw0 & QI_IWD_TYPE) { >> + offset = (index-1) << shift; >> + idesc = qi->desc + offset; >> + if (idesc->qw0 & QI_DEV_EIOTLB_SID(sid)) { >> + if (qi->desc_status[index] == QI_IN_USE) >> + qi->desc_status[index] = QI_ABORT; >> + } >> + } >> + } >> + raw_spin_unlock_irqrestore(&qi->q_lock, flags); >> + >> +} >> static void intel_iommu_release_device(struct device *dev) >> { >> struct device_domain_info *info = dev_iommu_priv_get(dev); >> + intel_iommu_abort_devtlib_invalidate(dev); > > Wonder if there is lock something prevent pciehp_ist() supprise_removal pci_lock_rescan_remove() will block another thread to re-enter the function pci_stop_and_remove_bus_device() to get here, this patch doesn't work. > > interrupt response re-enter to get here when another safe_removal is > in process, > > if so , see above > >> dmar_remove_one_dev_info(dev); >> intel_pasid_free_table(dev); >> intel_iommu_debugfs_remove_dev(info); >
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 23cb80d62a9a..d8637ab93387 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1422,7 +1422,8 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, */ writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG); - while (qi->desc_status[wait_index] != QI_DONE) { + while (qi->desc_status[wait_index] != QI_DONE && + qi->desc_status[wait_index] != QI_ABORT) { /* * We will leave the interrupts disabled, to prevent interrupt * context to queue another cmd while a cmd is already submitted diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 897159dba47d..33075d0688bc 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4472,10 +4472,46 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) return &iommu->iommu; } +static void intel_iommu_abort_devtlib_invalidate(struct device *dev) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct intel_iommu *iommu = info->iommu; + struct q_inval *qi = iommu->qi; + struct qi_desc *desc, *idesc; + int index, offset, shift; + u16 sid, qdep, pfsid + unsigned long flags; + + if (!dev_is_pci(info->dev) || !info->ats_enabled || !qi) + return; + if (!pci_dev_is_disconnected(to_pci_dev(dev))) + return; + + sid = info->bus << 8 | info->devfn; + qdep = info->ats_qdep; + pfsid = info->pfsid; + + raw_spin_lock_irqsave(&qi->q_lock, flags); + for (index = 1; index < QI_LENGTH; index++) { + offset = index << shift; + desc = qi->desc + offset; + if (desc->qw0 & QI_IWD_TYPE) { + offset = (index-1) << shift; + idesc = qi->desc + offset; + if (idesc->qw0 & QI_DEV_EIOTLB_SID(sid)) { + if (qi->desc_status[index] == QI_IN_USE) + qi->desc_status[index] = QI_ABORT; + } + } + } + raw_spin_unlock_irqrestore(&qi->q_lock, flags); + +} static void intel_iommu_release_device(struct device *dev) { struct device_domain_info *info = dev_iommu_priv_get(dev); + intel_iommu_abort_devtlib_invalidate(dev); dmar_remove_one_dev_info(dev); intel_pasid_free_table(dev); intel_iommu_debugfs_remove_dev(info);
Even the devTLB invalidation request is just submitted and waiting it to be done/timeout in qi_submit_sync(), it is possible device is removed or powered-off. try to break it out in such rare but possible case. This patch is sent for more comment. not tested, only passed compiling. Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> --- drivers/iommu/intel/dmar.c | 3 ++- drivers/iommu/intel/iommu.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-)