Message ID | 20211025012503.33172-1-jandryuk@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PCI/MSI: Fix masking MSI/MSI-X on Xen PV | expand |
On Sun, 2021-10-24 at 21:25 -0400, Jason Andryuk wrote: > commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask > functions") introduce functions pci_msi_update_mask() and > pci_msix_write_vector_ctrl() that is missing checks for > pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use > new mask/unmask functions"). The checks are in place at the high level > __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call > directly to the helpers. > > Push the pci_msi_ignore_mask check down to the functions that make > the actual writes. This keeps the logic local to the writes that need > to be bypassed. > > With Xen PV, the hypervisor is responsible for masking and unmasking the > interrupts, which pci_msi_ignore_mask is used to indicate. This isn't just for Xen PV; Xen HVM guests let Xen unmask the MSI for them too.
On Mon, Oct 25, 2021 at 08:44:52AM +0100, David Woodhouse wrote: > On Sun, 2021-10-24 at 21:25 -0400, Jason Andryuk wrote: > > commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask > > functions") introduce functions pci_msi_update_mask() and > > pci_msix_write_vector_ctrl() that is missing checks for > > pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use > > new mask/unmask functions"). The checks are in place at the high level > > __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call > > directly to the helpers. > > > > Push the pci_msi_ignore_mask check down to the functions that make > > the actual writes. This keeps the logic local to the writes that need > > to be bypassed. > > > > With Xen PV, the hypervisor is responsible for masking and unmasking the > > interrupts, which pci_msi_ignore_mask is used to indicate. > > This isn't just for Xen PV; Xen HVM guests let Xen unmask the MSI for > them too. It's kind of optional for HVM guests, as it depends on XENFEAT_hvm_pirqs, which sadly gets unconditionally set for HVM guests, thus dropping any benefits from having hardware assisted APIC virtualization or posted interrupts support. AFAICT pci_msi_ignore_mask is not (correctly) set for HVM guest when MSI interrupts are routed over event channels. Regards, Roger.
On Mon, 2021-10-25 at 13:43 +0200, Roger Pau Monné wrote: > It's kind of optional for HVM guests, as it depends on > XENFEAT_hvm_pirqs, which sadly gets unconditionally set for HVM > guests, thus dropping any benefits from having hardware assisted APIC > virtualization or posted interrupts support. Indeed. After implementing PIRQ support for Xen guests running under KVM, I spent a "happy" couple of days swearing at it because it actually *worked* if something would just *unmask* the f***ing MSI, but the guest inexplicably (to me) didn't do that. Took me a while to work out that Xen itself is *snooping* on the MSI table writes even while they are *masked*, to capture the magic MSI message (with vector==0) which means it's actually a PIRQ# in the destination ID bits, and then magically unmask the MSI when the guest binds that PIRQ to an event channel. I did not enjoy implementing that part. FWIW the *guest* could potentlaly be smarter here and elect not to use PIRQs when hardware assisted vAPIC is present. Aren't there some bits in the CPUID that Xen advertises, which indicate that?
On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote: > > commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask > functions") introduce functions pci_msi_update_mask() and > pci_msix_write_vector_ctrl() that is missing checks for > pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use > new mask/unmask functions"). The checks are in place at the high level > __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call > directly to the helpers. > > Push the pci_msi_ignore_mask check down to the functions that make > the actual writes. This keeps the logic local to the writes that need > to be bypassed. > > With Xen PV, the hypervisor is responsible for masking and unmasking the > interrupts, which pci_msi_ignore_mask is used to indicate. > > This change avoids lockups in amdgpu drivers under Xen during boot. > > Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions") > Reported-by: Josef Johansson <josef@oderland.se> > Signed-off-by: Jason Andryuk <jandryuk@gmail.com> > --- I should have written that this is untested. If this is the desired approach, Josef should test that it solves his boot hangs. Regards, Jason
On Mon, Oct 25, 2021 at 3:44 AM David Woodhouse <dwmw2@infradead.org> wrote: > > On Sun, 2021-10-24 at 21:25 -0400, Jason Andryuk wrote: > > commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask > > functions") introduce functions pci_msi_update_mask() and > > pci_msix_write_vector_ctrl() that is missing checks for > > pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use > > new mask/unmask functions"). The checks are in place at the high level > > __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call > > directly to the helpers. > > > > Push the pci_msi_ignore_mask check down to the functions that make > > the actual writes. This keeps the logic local to the writes that need > > to be bypassed. > > > > With Xen PV, the hypervisor is responsible for masking and unmasking the > > interrupts, which pci_msi_ignore_mask is used to indicate. > > This isn't just for Xen PV; Xen HVM guests let Xen unmask the MSI for > them too. Ah, that makes sense that Xen handles both. I was repeating another commit message's statement. Oh, it looks like pci_msi_ignore_mask is PV-specific. Regards, Jason
On Mon, Oct 25, 2021 at 12:53:31PM +0100, David Woodhouse wrote: > On Mon, 2021-10-25 at 13:43 +0200, Roger Pau Monné wrote: > > It's kind of optional for HVM guests, as it depends on > > XENFEAT_hvm_pirqs, which sadly gets unconditionally set for HVM > > guests, thus dropping any benefits from having hardware assisted APIC > > virtualization or posted interrupts support. > > Indeed. After implementing PIRQ support for Xen guests running under > KVM, I spent a "happy" couple of days swearing at it because it > actually *worked* if something would just *unmask* the f***ing MSI, but > the guest inexplicably (to me) didn't do that. > > Took me a while to work out that Xen itself is *snooping* on the MSI > table writes even while they are *masked*, to capture the magic MSI > message (with vector==0) which means it's actually a PIRQ# in the > destination ID bits, and then magically unmask the MSI when the guest > binds that PIRQ to an event channel. > > I did not enjoy implementing that part. I can see that. It's even better because none of this is actually documented. > FWIW the *guest* could potentlaly be smarter here and elect not to use > PIRQs when hardware assisted vAPIC is present. Aren't there some bits > in the CPUID that Xen advertises, which indicate that? Yes, it's in leaf 0x40000x04. FWIW, I would also be fine with removing XENFEAT_hvm_pirqs, as I don't think diverging from the hardware specifications gives us much benefit. We avoid a couple of vm exits for sure, but the cost of having all those modifications in guest OSes is not worth it. Roger.
On Mon, 2021-10-25 at 14:58 +0200, Roger Pau Monné wrote: > On Mon, Oct 25, 2021 at 12:53:31PM +0100, David Woodhouse wrote: > > On Mon, 2021-10-25 at 13:43 +0200, Roger Pau Monné wrote: > > > It's kind of optional for HVM guests, as it depends on > > > XENFEAT_hvm_pirqs, which sadly gets unconditionally set for HVM > > > guests, thus dropping any benefits from having hardware assisted APIC > > > virtualization or posted interrupts support. > > > > Indeed. After implementing PIRQ support for Xen guests running under > > KVM, I spent a "happy" couple of days swearing at it because it > > actually *worked* if something would just *unmask* the f***ing MSI, but > > the guest inexplicably (to me) didn't do that. > > > > Took me a while to work out that Xen itself is *snooping* on the MSI > > table writes even while they are *masked*, to capture the magic MSI > > message (with vector==0) which means it's actually a PIRQ# in the > > destination ID bits, and then magically unmask the MSI when the guest > > binds that PIRQ to an event channel. > > > > I did not enjoy implementing that part. > > I can see that. It's even better because none of this is actually > documented. Indeed. I still haven't worked out if/how Xen actually *masks* the corresponding MSI-X again. It can't do so when the evtchn is masked, since that's just a bit in the shinfo page. So while the evtchn is masked, the MSI can still be screaming into the void? Perhaps it does so when the PIRQ is unbound from the evtchn? > > FWIW the *guest* could potentlaly be smarter here and elect not to use > > PIRQs when hardware assisted vAPIC is present. Aren't there some bits > > in the CPUID that Xen advertises, which indicate that? > > Yes, it's in leaf 0x40000x04. FWIW, I would also be fine with removing > XENFEAT_hvm_pirqs, as I don't think diverging from the hardware > specifications gives us much benefit. We avoid a couple of vm exits > for sure, but the cost of having all those modifications in guest > OSes is not worth it. These days with posted interrupts, it doesn't even save us any vmexits; it's all that additional guest complexity just to give us *more* vmexits than we would have had :)
On Mon, Oct 25, 2021 at 02:02:38PM +0100, David Woodhouse wrote: > On Mon, 2021-10-25 at 14:58 +0200, Roger Pau Monné wrote: > > On Mon, Oct 25, 2021 at 12:53:31PM +0100, David Woodhouse wrote: > > > On Mon, 2021-10-25 at 13:43 +0200, Roger Pau Monné wrote: > > > > It's kind of optional for HVM guests, as it depends on > > > > XENFEAT_hvm_pirqs, which sadly gets unconditionally set for HVM > > > > guests, thus dropping any benefits from having hardware assisted APIC > > > > virtualization or posted interrupts support. > > > > > > Indeed. After implementing PIRQ support for Xen guests running under > > > KVM, I spent a "happy" couple of days swearing at it because it > > > actually *worked* if something would just *unmask* the f***ing MSI, but > > > the guest inexplicably (to me) didn't do that. > > > > > > Took me a while to work out that Xen itself is *snooping* on the MSI > > > table writes even while they are *masked*, to capture the magic MSI > > > message (with vector==0) which means it's actually a PIRQ# in the > > > destination ID bits, and then magically unmask the MSI when the guest > > > binds that PIRQ to an event channel. > > > > > > I did not enjoy implementing that part. > > > > I can see that. It's even better because none of this is actually > > documented. > > Indeed. I still haven't worked out if/how Xen actually *masks* the > corresponding MSI-X again. It can't do so when the evtchn is masked, > since that's just a bit in the shinfo page. So while the evtchn is > masked, the MSI can still be screaming into the void? I think so, it's quite weird because as a side effect of this mangling Xen is transforming an edge triggered interrupt to a level triggered one, as masked event channels belonging to MSI vectors will get set to pending. So it's not entirely screaming into the void because it will get (wrongly) set as pending when masked. > Perhaps it does so when the PIRQ is unbound from the evtchn? > > > > FWIW the *guest* could potentlaly be smarter here and elect not to use > > > PIRQs when hardware assisted vAPIC is present. Aren't there some bits > > > in the CPUID that Xen advertises, which indicate that? > > > > Yes, it's in leaf 0x40000x04. FWIW, I would also be fine with removing > > XENFEAT_hvm_pirqs, as I don't think diverging from the hardware > > specifications gives us much benefit. We avoid a couple of vm exits > > for sure, but the cost of having all those modifications in guest > > OSes is not worth it. > > These days with posted interrupts, it doesn't even save us any vmexits; > it's all that additional guest complexity just to give us *more* > vmexits than we would have had :) Oh indeed. I was thinking about hardware without APIC hw virtualization or posted interrupts. Roger.
On 10/25/21 14:27, Jason Andryuk wrote: > On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote: >> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask >> functions") introduce functions pci_msi_update_mask() and >> pci_msix_write_vector_ctrl() that is missing checks for >> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use >> new mask/unmask functions"). The checks are in place at the high level >> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call >> directly to the helpers. >> >> Push the pci_msi_ignore_mask check down to the functions that make >> the actual writes. This keeps the logic local to the writes that need >> to be bypassed. >> >> With Xen PV, the hypervisor is responsible for masking and unmasking the >> interrupts, which pci_msi_ignore_mask is used to indicate. >> >> This change avoids lockups in amdgpu drivers under Xen during boot. >> >> Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions") >> Reported-by: Josef Johansson <josef@oderland.se> >> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> >> --- > I should have written that this is untested. If this is the desired > approach, Josef should test that it solves his boot hangs. > > Regards, > Jason I've tested this today, both the above patch, but also my own below where I'm patching inside __pci_write_msi_msg, which is the outcome of the patch above. I found that both solves the boot hang, but both have severe effects on suspends/resume later on. The below patch without patching __pci_write_msi_msg seems to be ideal, solves boot problems but not causing too much others. There seems to me that there's undocumented dragons here. Doing more test later today. diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 4b4792940e86..e97eea1bc93a 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 s raw_spinlock_t *lock = &desc->dev->msi_lock; unsigned long flags; + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + raw_spin_lock_irqsave(lock, flags); desc->msi_mask &= ~clear; desc->msi_mask |= set; @@ -186,6 +189,9 @@ static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl) static inline void pci_msix_mask(struct msi_desc *desc) { + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + desc->msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); /* Flush write to device */ @@ -194,15 +200,15 @@ static inline void pci_msix_mask(struct msi_desc *desc) static inline void pci_msix_unmask(struct msi_desc *desc) { + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + desc->msix_ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); } static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) { - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) - return; - if (desc->msi_attrib.is_msix) pci_msix_mask(desc); else if (desc->msi_attrib.maskbit) @@ -211,9 +217,6 @@ static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask) { - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) - return; - if (desc->msi_attrib.is_msix) pci_msix_unmask(desc); else if (desc->msi_attrib.maskbit) @@ -307,7 +310,7 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) * entry while the entry is unmasked, the result is * undefined." */ - if (unmasked) + if (unmasked && !pci_msi_ignore_mask) pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT); writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); @@ -450,8 +453,9 @@ static void __pci_restore_msix_state(struct pci_dev *dev) PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); arch_restore_msi_irqs(dev); - for_each_pci_msi_entry(entry, dev) - pci_msix_write_vector_ctrl(entry, entry->msix_ctrl); + if (!(pci_msi_ignore_mask || entry->msi_attrib.is_virtual)) + for_each_pci_msi_entry(entry, dev) + pci_msix_write_vector_ctrl(entry, entry->msix_ctrl); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); }
On 10/25/21 18:46, Josef Johansson wrote: > On 10/25/21 14:27, Jason Andryuk wrote: >> On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote: >>> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask >>> functions") introduce functions pci_msi_update_mask() and >>> pci_msix_write_vector_ctrl() that is missing checks for >>> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use >>> new mask/unmask functions"). The checks are in place at the high level >>> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call >>> directly to the helpers. >>> >>> Push the pci_msi_ignore_mask check down to the functions that make >>> the actual writes. This keeps the logic local to the writes that need >>> to be bypassed. >>> >>> With Xen PV, the hypervisor is responsible for masking and unmasking the >>> interrupts, which pci_msi_ignore_mask is used to indicate. >>> >>> This change avoids lockups in amdgpu drivers under Xen during boot. >>> >>> Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions") >>> Reported-by: Josef Johansson <josef@oderland.se> >>> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> >>> --- >> I should have written that this is untested. If this is the desired >> approach, Josef should test that it solves his boot hangs. >> >> Regards, >> Jason > I've tested this today, both the above patch, but also my own below > where I'm patching inside __pci_write_msi_msg, > which is the outcome of the patch above. I tested a lot of kernels today. To create a good baseline I compiled without any of our patches here and with my config flags set. CONFIG_AMD_PMC=y # CONFIG_HSA_AMD is not set # CONFIG_CRYPTO_DEV_CCP is not set The kernel stopped as before, and hung. Test number 2 was to boot with amdgpu.msi=0. This still resulted in a bad boot since all the xhcd drivers complained. We can be sure that it's not amdgpu per se. Test number 3 was with Jason's patch. It worked, but suspend/resume is not working well. Generally it's not behaving like other kernels do, which makes it actually change the behavior. Now with test 4 I tried that thought, maybe this is still a good change? I'm deprived of a good baseline in all this, so it's very hard to navigate between all the variables. Test number 4 was with Jason's patch plus the amdgpu-patch below. It worked, even suspend/resume, 2 times, but then it all crashed and burn with quite interesting stacktraces. Are amdgpu doing it wrong here or is it just me nitpicking? index cc2e0c9cfe0a..f125597eb991 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -279,17 +279,8 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev) static void amdgpu_restore_msix(struct amdgpu_device *adev) { - u16 ctrl; - - pci_read_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, &ctrl); - if (!(ctrl & PCI_MSIX_FLAGS_ENABLE)) - return; - - /* VF FLR */ - ctrl &= ~PCI_MSIX_FLAGS_ENABLE; - pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl); - ctrl |= PCI_MSIX_FLAGS_ENABLE; - pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl); + // checks if msix is enabled also + pci_restore_msi_state(adev->pdev); } /** During the tests I fiddled with dpm settings, and it does have an effect one the graphical output during suspend/resume. So maybe there's hardware problems at play here as well. I also looked through the code before and after Thomas' changes, and I can't see that this patch should make any functional difference compared to before the MSI series of patches. It's even such that is_virtual should be checked withing vector_ctrl. I find Jason's patch quite nice since it really places the checks on few places making it easier not to slip. Compared to my attempt that even failed because I forgot one more place to put the checks. With that said I would really like some more tests on this with different chipsets, on Xen. Any takers? What I'm seeing is that there's no readl() in pci_msix_unmask(), it was one in the code path before. I'm very much unsure if there should be one there though. We can really do a better job at the documentation for pci_msi_ignore_mask, at least in msi.c, maybe that should be a different patch adding some comments such that driver folks really see the benefits of using the built in restore functions e.g. This became so much bigger project than I thought, thanks all for chiming in on it.
On Sun, Oct 24 2021 at 21:25, Jason Andryuk wrote: > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 4b4792940e86..478536bafc39 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 s > raw_spinlock_t *lock = &desc->dev->msi_lock; > unsigned long flags; > > + if (pci_msi_ignore_mask) > + return; > + > raw_spin_lock_irqsave(lock, flags); > desc->msi_mask &= ~clear; > desc->msi_mask |= set; > @@ -181,6 +184,9 @@ static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl) > { > void __iomem *desc_addr = pci_msix_desc_addr(desc); > > + if (pci_msi_ignore_mask) > + return; > + > writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); > } > > @@ -200,7 +206,7 @@ static inline void pci_msix_unmask(struct msi_desc *desc) > > static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) > { > - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) > + if (desc->msi_attrib.is_virtual) > return; > > if (desc->msi_attrib.is_msix) > @@ -211,7 +217,7 @@ static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) > > static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask) > { > - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) > + if (desc->msi_attrib.is_virtual) > return; > > if (desc->msi_attrib.is_msix) No, really. This is horrible and incomplete. The right thing to do is to move the check back into the low level accessors and remove it from the call sites simply because the low level accessors can be reached not only from the mask/unmask functions. But the above also fails to respect msi_attrib.maskbit... I'll send out a proper fix in a few. Thanks, tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 4b4792940e86..478536bafc39 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 s raw_spinlock_t *lock = &desc->dev->msi_lock; unsigned long flags; + if (pci_msi_ignore_mask) + return; + raw_spin_lock_irqsave(lock, flags); desc->msi_mask &= ~clear; desc->msi_mask |= set; @@ -181,6 +184,9 @@ static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl) { void __iomem *desc_addr = pci_msix_desc_addr(desc); + if (pci_msi_ignore_mask) + return; + writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); } @@ -200,7 +206,7 @@ static inline void pci_msix_unmask(struct msi_desc *desc) static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) { - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + if (desc->msi_attrib.is_virtual) return; if (desc->msi_attrib.is_msix) @@ -211,7 +217,7 @@ static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask) { - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + if (desc->msi_attrib.is_virtual) return; if (desc->msi_attrib.is_msix)
commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask functions") introduce functions pci_msi_update_mask() and pci_msix_write_vector_ctrl() that is missing checks for pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions"). The checks are in place at the high level __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call directly to the helpers. Push the pci_msi_ignore_mask check down to the functions that make the actual writes. This keeps the logic local to the writes that need to be bypassed. With Xen PV, the hypervisor is responsible for masking and unmasking the interrupts, which pci_msi_ignore_mask is used to indicate. This change avoids lockups in amdgpu drivers under Xen during boot. Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions") Reported-by: Josef Johansson <josef@oderland.se> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> --- drivers/pci/msi.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)