Message ID | 20231227025923.536148-5-haifeng.zhao@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fix vt-d hard lockup when hotplug ATS capable device | expand |
I suggest using "ATS Invalidate Request" in the subject as well. Otherwise we have to figure out whether "device-TLB invalidate request" is the same as "ATS Invalidate Request". If they are the same, just use the same words. On Tue, Dec 26, 2023 at 09:59:22PM -0500, Ethan Zhao wrote: > Except those aggressive hotplug cases - surprise remove a hotplug device > while its safe removal is requested and handled in process by: > > 1. pull it out directly. > 2. turn off its power. > 3. bring the link down. > 4. just died there that moment. > > etc, in a word, 'gone' or 'disconnected'. > > Mostly are regular normal safe removal and surprise removal unplug. > these hot unplug handling process could be optimized for fix the ATS > invalidation hang issue by calling pci_dev_is_disconnected() in function > devtlb_invalidation_with_pasid() to check target device state to avoid > sending meaningless ATS invalidation request to iommu when device is gone. > (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) Suggest "ATS Invalidate Request", capitalized exactly that way so we know it's a specific name of something defined in the PCIe spec. > For safe removal, device wouldn't be removed untill the whole software > handling process is done, it wouldn't trigger the hard lock up issue > caused by too long ATS invalidation timeout wait. in safe removal path, Ditto. Capitalize "In the safe removal ..." since it starts a new sentence. > device state isn't set to pci_channel_io_perm_failure in > pciehp_unconfigure_device() by checking 'presence' parameter, calling > pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return > false there, wouldn't break the function. > > For surprise removal, device state is set to pci_channel_io_perm_failure in > pciehp_unconfigure_device(), means device is already gone (disconnected) > call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will > return true to break the function not to send ATS invalidation request to Ditto. > the disconnected device blindly, thus avoid the further long time waiting > triggers the hard lockup. > > safe removal & surprise removal > > pciehp_ist() > pciehp_handle_presence_or_link_change() > pciehp_disable_slot() > remove_board() > pciehp_unconfigure_device(presence)
On 12/27/2023 9:11 PM, Bjorn Helgaas wrote: > I suggest using "ATS Invalidate Request" in the subject as well. > Otherwise we have to figure out whether "device-TLB invalidate > request" is the same as "ATS Invalidate Request". > > If they are the same, just use the same words. > > On Tue, Dec 26, 2023 at 09:59:22PM -0500, Ethan Zhao wrote: >> Except those aggressive hotplug cases - surprise remove a hotplug device >> while its safe removal is requested and handled in process by: >> >> 1. pull it out directly. >> 2. turn off its power. >> 3. bring the link down. >> 4. just died there that moment. >> >> etc, in a word, 'gone' or 'disconnected'. >> >> Mostly are regular normal safe removal and surprise removal unplug. >> these hot unplug handling process could be optimized for fix the ATS >> invalidation hang issue by calling pci_dev_is_disconnected() in function >> devtlb_invalidation_with_pasid() to check target device state to avoid >> sending meaningless ATS invalidation request to iommu when device is gone. >> (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) > Suggest "ATS Invalidate Request", capitalized exactly that way so we > know it's a specific name of something defined in the PCIe spec. > >> For safe removal, device wouldn't be removed untill the whole software >> handling process is done, it wouldn't trigger the hard lock up issue >> caused by too long ATS invalidation timeout wait. in safe removal path, > Ditto. > > Capitalize "In the safe removal ..." since it starts a new sentence. > >> device state isn't set to pci_channel_io_perm_failure in >> pciehp_unconfigure_device() by checking 'presence' parameter, calling >> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return >> false there, wouldn't break the function. >> >> For surprise removal, device state is set to pci_channel_io_perm_failure in >> pciehp_unconfigure_device(), means device is already gone (disconnected) >> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will >> return true to break the function not to send ATS invalidation request to > Ditto. Okay. Thanks, Ethan >> the disconnected device blindly, thus avoid the further long time waiting >> triggers the hard lockup. >> >> safe removal & surprise removal >> >> pciehp_ist() >> pciehp_handle_presence_or_link_change() >> pciehp_disable_slot() >> remove_board() >> pciehp_unconfigure_device(presence)
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 1c87fb1b1039..a08bdbec90eb 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -481,6 +481,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu, if (!info || !info->ats_enabled) return; + if (pci_dev_is_disconnected(to_pci_dev(dev))) + return; + sid = info->bus << 8 | info->devfn; qdep = info->ats_qdep; pfsid = info->pfsid;