Message ID | 1456929089-17414-4-git-send-email-quan.xu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
[I've removed some of the people that shouldn't be involved in this discussion any longer from the Cc-list] On Wed, 2016-03-02 at 22:31 +0800, Quan Xu wrote: > Signed-off-by: Quan Xu <quan.xu@intel.com> > So, this patch looks ok to me. I spotted something, though, that I think needs some attention. Since I'm jumping on this series only now, if this has been discussed before and I missed it, sorry for the noise. > @@ -788,10 +787,10 @@ static bool_t __init > set_iommu_interrupt_handler(struct amd_iommu *iommu) > return 0; > } > > - spin_lock_irqsave(&pcidevs_lock, flags); > + pcidevs_lock(); > So, spin_lock_irqsave() does: local_irq_save() local_save_flags() local_irq_disable() _spin_lock() i.e., it saves the flags and disable interrupts. pcidevs_lock() does: spin_lock_recursive() ... //handle recursion _spin_lock() i.e., it does not disable interrupts. And therefore it is possible that we are actually skipping disabling interrupts (if they're not disabled already), isn't it? And, of course, the same reasoning --mutatis mutandis-- applies to the unlock side of things. > iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf), > PCI_DEVFN2(iommu->bdf)); > - spin_unlock_irqrestore(&pcidevs_lock, flags); > + pcidevs_unlock(); > i.e., spin_unlock_irqrestore() restore the flags, including the interrupt enabled/disabled status, which means it can re-enable the interrupts or not, depending on whether they were enabled at the time of the previous spin_lock_irqsave(); pcidevs_unlock() just don't affect interrupt enabling/disabling at all. So, if the original code is correct in using spin_lock_irqsave()/spin_unlock_irqrestore(), I think that we need _irqsave() and _irqrestore() variants of recursive spinlocks, in order to deal with this case. However, from a quick inspection, it looks to me that: - this can only be called (during initialization), with interrupt enabled, so least saving & restoring flags shouldn't be necessary (unless I missed where they can be disabled in the call chain from iommu_setup() toward set_iommu_interrupt_handler()). - This protects pci_get_dev(); looking at other places where pci_get_dev() is called, I don't think it is really necessary to disable interrupts. If I'm right, it means that the original code could well have been using just plain spin_lock() and spin_unlock(), and it would then be fine to turn them into pcidevs_lock() and pcidevs_unlock(), and so no need to add more spin_[un]lock_recursive() variants. That would also mean that the patch is indeed ok, but I'd add a mention of this in the changelog. Regards, Dario
On March 04, 2016 7:59am, <dario.faggioli@citrix.com> wrote: > On Wed, 2016-03-02 at 22:31 +0800, Quan Xu wrote: > > Signed-off-by: Quan Xu <quan.xu@intel.com> > > > So, this patch looks ok to me. > > I spotted something, though, that I think needs some attention. > > Since I'm jumping on this series only now, if this has been discussed before and I > missed it, sorry for the noise. > Dario, Welcome~ it's never too late.:) > > @@ -788,10 +787,10 @@ static bool_t __init > > set_iommu_interrupt_handler(struct amd_iommu *iommu) > > return 0; > > } > > > > - spin_lock_irqsave(&pcidevs_lock, flags); > > + pcidevs_lock(); > > > So, spin_lock_irqsave() does: > local_irq_save() > local_save_flags() > local_irq_disable() > _spin_lock() > > i.e., it saves the flags and disable interrupts. > > pcidevs_lock() does: > spin_lock_recursive() > ... //handle recursion > _spin_lock() > > i.e., it does not disable interrupts. > > And therefore it is possible that we are actually skipping disabling interrupts (if > they're not disabled already), isn't it? > > And, of course, the same reasoning --mutatis mutandis-- applies to the unlock > side of things. > > > iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf), > > PCI_DEVFN2(iommu->bd > f)); > > - spin_unlock_irqrestore(&pcidevs_lock, flags); > > + pcidevs_unlock(); > > > i.e., spin_unlock_irqrestore() restore the flags, including the interrupt > enabled/disabled status, which means it can re-enable the interrupts or not, > depending on whether they were enabled at the time of the previous > spin_lock_irqsave(); pcidevs_unlock() just don't affect interrupt > enabling/disabling at all. > > So, if the original code is correct in using > spin_lock_irqsave()/spin_unlock_irqrestore(), I think that we need > _irqsave() and _irqrestore() variants of recursive spinlocks, in order to deal with > this case. > > However, from a quick inspection, it looks to me that: > - this can only be called (during initialization), with interrupt > enabled, so least saving & restoring flags shouldn't be necessary > (unless I missed where they can be disabled in the call chain > from iommu_setup() toward set_iommu_interrupt_handler()). > - This protects pci_get_dev(); looking at other places where > pci_get_dev() is called, I don't think it is really necessary to > disable interrupts. > > If I'm right, it means that the original code could well have been using just plain > spin_lock() and spin_unlock(), and it would then be fine to turn them into > pcidevs_lock() and pcidevs_unlock(), and so no need to add more > spin_[un]lock_recursive() variants. > > That would also mean that the patch is indeed ok, Yes, I fully agree your analysis and conclusion. I tried to implement _irqsave() and _irqrestore() variants of recursive spinlocks, but I found that it is no need to add them. Also I'd highlight the below modification: - if ( !spin_trylock(&pcidevs_lock) ) - return -ERESTART; - + pcidevs_lock(); IMO, it is right too. > but I'd add a mention of this in the changelog. In git log? Quan
>>> On 04.03.16 at 03:45, <quan.xu@intel.com> wrote: > On March 04, 2016 7:59am, <dario.faggioli@citrix.com> wrote: >> On Wed, 2016-03-02 at 22:31 +0800, Quan Xu wrote: >> > @@ -788,10 +787,10 @@ static bool_t __init >> > set_iommu_interrupt_handler(struct amd_iommu *iommu) >> > return 0; >> > } >> > >> > - spin_lock_irqsave(&pcidevs_lock, flags); >> > + pcidevs_lock(); >> > >> So, spin_lock_irqsave() does: >> local_irq_save() >> local_save_flags() >> local_irq_disable() >> _spin_lock() >> >> i.e., it saves the flags and disable interrupts. >> >> pcidevs_lock() does: >> spin_lock_recursive() >> ... //handle recursion >> _spin_lock() >> >> i.e., it does not disable interrupts. >> >> And therefore it is possible that we are actually skipping disabling > interrupts (if >> they're not disabled already), isn't it? >> >> And, of course, the same reasoning --mutatis mutandis-- applies to the unlock >> side of things. >> >> > iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf), >> > PCI_DEVFN2(iommu->bdf)); >> > - spin_unlock_irqrestore(&pcidevs_lock, flags); >> > + pcidevs_unlock(); >> > >> i.e., spin_unlock_irqrestore() restore the flags, including the interrupt >> enabled/disabled status, which means it can re-enable the interrupts or not, >> depending on whether they were enabled at the time of the previous >> spin_lock_irqsave(); pcidevs_unlock() just don't affect interrupt >> enabling/disabling at all. >> >> So, if the original code is correct in using >> spin_lock_irqsave()/spin_unlock_irqrestore(), I think that we need >> _irqsave() and _irqrestore() variants of recursive spinlocks, in order to deal with >> this case. >> >> However, from a quick inspection, it looks to me that: >> - this can only be called (during initialization), with interrupt >> enabled, so least saving & restoring flags shouldn't be necessary >> (unless I missed where they can be disabled in the call chain >> from iommu_setup() toward set_iommu_interrupt_handler()). >> - This protects pci_get_dev(); looking at other places where >> pci_get_dev() is called, I don't think it is really necessary to >> disable interrupts. >> >> If I'm right, it means that the original code could well have been using > just plain >> spin_lock() and spin_unlock(), and it would then be fine to turn them into >> pcidevs_lock() and pcidevs_unlock(), and so no need to add more >> spin_[un]lock_recursive() variants. >> >> That would also mean that the patch is indeed ok, > > Yes, I fully agree your analysis and conclusion. > I tried to implement _irqsave() and _irqrestore() variants of recursive > spinlocks, but I found that it is no need to add them. > > Also I'd highlight the below modification: > - if ( !spin_trylock(&pcidevs_lock) ) > - return -ERESTART; > - > + pcidevs_lock(); > > IMO, it is right too. Well, I'll have to see where exactly this is (pulling such out of context is pretty unhelpful), but I suspect it can't be replaced like this. >> but I'd add a mention of this in the changelog. > > In git log? To be honest, changes like this would better not be buried in a big rework like the one here. Make it a prereq patch, where you then will kind of automatically describe why it's correct. (I agree current code is bogus, and we're not hitting the respective BUG_ON() in check_lock() only because spin_debug gets set to true only after most __init code has run.) Jan
On March 04, 2016 5:29pm, <JBeulich@suse.com> wrote: > >>> On 04.03.16 at 03:45, <quan.xu@intel.com> wrote: > > On March 04, 2016 7:59am, <dario.faggioli@citrix.com> wrote: > >> On Wed, 2016-03-02 at 22:31 +0800, Quan Xu wrote: > >> > @@ -788,10 +787,10 @@ static bool_t __init > >> > set_iommu_interrupt_handler(struct amd_iommu *iommu) > >> > return 0; > >> > } > >> > > >> > - spin_lock_irqsave(&pcidevs_lock, flags); > >> > + pcidevs_lock(); > >> > > >> So, spin_lock_irqsave() does: > >> local_irq_save() > >> local_save_flags() > >> local_irq_disable() > >> _spin_lock() > >> > >> i.e., it saves the flags and disable interrupts. > >> > >> pcidevs_lock() does: > >> spin_lock_recursive() > >> ... //handle recursion > >> _spin_lock() > >> > >> i.e., it does not disable interrupts. > >> > >> And therefore it is possible that we are actually skipping disabling > > interrupts (if > >> they're not disabled already), isn't it? > >> > >> And, of course, the same reasoning --mutatis mutandis-- applies to > >> the unlock side of things. > >> > >> > iommu->msi.dev = pci_get_pdev(iommu->seg, > PCI_BUS(iommu->bdf), > >> > PCI_DEVFN2(iommu->bdf)); > >> > - spin_unlock_irqrestore(&pcidevs_lock, flags); > >> > + pcidevs_unlock(); > >> > > >> i.e., spin_unlock_irqrestore() restore the flags, including the > >> interrupt enabled/disabled status, which means it can re-enable the > >> interrupts or not, depending on whether they were enabled at the time > >> of the previous spin_lock_irqsave(); pcidevs_unlock() just don't > >> affect interrupt enabling/disabling at all. > >> > >> So, if the original code is correct in using > >> spin_lock_irqsave()/spin_unlock_irqrestore(), I think that we need > >> _irqsave() and _irqrestore() variants of recursive spinlocks, in > >> order to deal with this case. > >> > >> However, from a quick inspection, it looks to me that: > >> - this can only be called (during initialization), with interrupt > >> enabled, so least saving & restoring flags shouldn't be necessary > >> (unless I missed where they can be disabled in the call chain > >> from iommu_setup() toward set_iommu_interrupt_handler()). > >> - This protects pci_get_dev(); looking at other places where > >> pci_get_dev() is called, I don't think it is really necessary to > >> disable interrupts. > >> > >> If I'm right, it means that the original code could well have been > >> using > > just plain > >> spin_lock() and spin_unlock(), and it would then be fine to turn them > >> into > >> pcidevs_lock() and pcidevs_unlock(), and so no need to add more > >> spin_[un]lock_recursive() variants. > >> > >> That would also mean that the patch is indeed ok, > > > > Yes, I fully agree your analysis and conclusion. > > I tried to implement _irqsave() and _irqrestore() variants of > > recursive spinlocks, but I found that it is no need to add them. > > > > Also I'd highlight the below modification: > > - if ( !spin_trylock(&pcidevs_lock) ) > > - return -ERESTART; > > - > > + pcidevs_lock(); > > > > IMO, it is right too. > > Well, I'll have to see where exactly this is (pulling such out of context is pretty > unhelpful), but I suspect it can't be replaced like this. > Jan, I am looking forward to your review. btw, It is in the assign_device(), in the xen/drivers/passthrough/pci.c file. > >> but I'd add a mention of this in the changelog. > > > > In git log? > > To be honest, changes like this would better not be buried in a big rework like > the one here. Make it a prereq patch, where you then will kind of automatically > describe why it's correct. (I agree current code is bogus, and we're not hitting > the respective > BUG_ON() in check_lock() only because spin_debug gets set to true only after > most __init code has run.) Agreed, I would make a prereq patch in v7. Quan
On Fri, 2016-03-04 at 11:54 +0000, Xu, Quan wrote: > On March 04, 2016 5:29pm, <JBeulich@suse.com> wrote: > > On March 04, 2016 7:59am, <dario.faggioli@citrix.com> wrote: > > > > > Also I'd highlight the below modification: > > > - if ( !spin_trylock(&pcidevs_lock) ) > > > - return -ERESTART; > > > - > > > + pcidevs_lock(); > > > > > > IMO, it is right too. > > Well, I'll have to see where exactly this is (pulling such out of > > context is pretty > > unhelpful), but I suspect it can't be replaced like this. > > > Jan, I am looking forward to your review. > btw, It is in the assign_device(), in the > xen/drivers/passthrough/pci.c file. > Mmm... If multiple cpus calls assign_device(), and the calls race, the behavior between before and after the patch looks indeed different to me. In fact, in current code, the cpus that find the lock busy already, would quit the function immediately and a continuation is created. On the other hand, with the patch, they would spin and actually get the lock, one after the other (if there's more of them) at some point. Please, double check my reasoning, but I looks to me that it is indeed different what happens when the hypercall is restarted (i.e., in current code) and what happens if we just let others take the lock and execute the function (i.e., with the patch applied). I suggest you try to figure out whether that is actually the case. Once you've done, feel free to report here and ask for help for finding a solution, if you don't see one. > > > > but I'd add a mention of this in the changelog. > > > In git log? > > To be honest, changes like this would better not be buried in a big > > rework like > > the one here. Make it a prereq patch, where you then will kind of > > automatically > > describe why it's correct. (I agree current code is bogus, and > > we're not hitting > > the respective > > BUG_ON() in check_lock() only because spin_debug gets set to true > > only after > > most __init code has run.) > Agreed, I would make a prereq patch in v7. > Ok. In general, I agree with Jan. In this case, I suggested just mentioning in changelog as we curently basically have a bug, and I think it would be fine to have something like "Doing what we do also serves as a fix for a bug found in xxx yyy". But it's indeed Jan's call, and his solution is certainly cleaner. Not to mention that, in the case of assign_device(), fixing that would most likely require being done in a patch on its own anyway. Regards, Dario
>>> On 04.03.16 at 14:59, <dario.faggioli@citrix.com> wrote: > On Fri, 2016-03-04 at 11:54 +0000, Xu, Quan wrote: >> On March 04, 2016 5:29pm, <JBeulich@suse.com> wrote: >> > To be honest, changes like this would better not be buried in a big >> > rework like >> > the one here. Make it a prereq patch, where you then will kind of >> > automatically >> > describe why it's correct. (I agree current code is bogus, and >> > we're not hitting >> > the respective >> > BUG_ON() in check_lock() only because spin_debug gets set to true >> > only after >> > most __init code has run.) >> Agreed, I would make a prereq patch in v7. >> > Ok. In general, I agree with Jan. > > In this case, I suggested just mentioning in changelog as we curently > basically have a bug, and I think it would be fine to have something > like "Doing what we do also serves as a fix for a bug found in xxx > yyy". > > But it's indeed Jan's call, and his solution is certainly cleaner. Well, one of the reasons to separate out bug fixes is to make them backportable. Jan
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index bf62a88..21cc161 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -427,9 +427,9 @@ long arch_do_domctl( ret = -ESRCH; if ( iommu_enabled ) { - spin_lock(&pcidevs_lock); + pcidevs_lock(); ret = pt_irq_create_bind(d, bind); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); } if ( ret < 0 ) printk(XENLOG_G_ERR "pt_irq_create_bind failed (%ld) for dom%d\n", @@ -452,9 +452,9 @@ long arch_do_domctl( if ( iommu_enabled ) { - spin_lock(&pcidevs_lock); + pcidevs_lock(); ret = pt_irq_destroy_bind(d, bind); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); } if ( ret < 0 ) printk(XENLOG_G_ERR "pt_irq_destroy_bind failed (%ld) for dom%d\n", diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index ac838a9..8e0817b 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -388,7 +388,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable) struct msixtbl_entry *entry, *new_entry; int r = -EINVAL; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); ASSERT(spin_is_locked(&d->event_lock)); /* @@ -443,7 +443,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq) struct pci_dev *pdev; struct msixtbl_entry *entry; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); ASSERT(spin_is_locked(&d->event_lock)); irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index bf2e822..68bdf19 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1955,7 +1955,7 @@ int map_domain_pirq( struct pci_dev *pdev; unsigned int nr = 0; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); ret = -ENODEV; if ( !cpu_has_apic ) @@ -2100,7 +2100,7 @@ int unmap_domain_pirq(struct domain *d, int pirq) if ( (pirq < 0) || (pirq >= d->nr_pirqs) ) return -EINVAL; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); ASSERT(spin_is_locked(&d->event_lock)); info = pirq_info(d, pirq); @@ -2226,7 +2226,7 @@ void free_domain_pirqs(struct domain *d) { int i; - spin_lock(&pcidevs_lock); + pcidevs_lock(); spin_lock(&d->event_lock); for ( i = 0; i < d->nr_pirqs; i++ ) @@ -2234,7 +2234,7 @@ void free_domain_pirqs(struct domain *d) unmap_domain_pirq(d, i); spin_unlock(&d->event_lock); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); } static void dump_irqs(unsigned char key) diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index 3dbb84d..6e5e33e 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -694,7 +694,7 @@ static int msi_capability_init(struct pci_dev *dev, u8 slot = PCI_SLOT(dev->devfn); u8 func = PCI_FUNC(dev->devfn); - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI); if ( !pos ) return -ENODEV; @@ -852,7 +852,7 @@ static int msix_capability_init(struct pci_dev *dev, u8 func = PCI_FUNC(dev->devfn); bool_t maskall = msix->host_maskall; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); control = pci_conf_read16(seg, bus, slot, func, msix_control_reg(pos)); /* @@ -1042,7 +1042,7 @@ static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc) struct pci_dev *pdev; struct msi_desc *old_desc; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); pdev = pci_get_pdev(msi->seg, msi->bus, msi->devfn); if ( !pdev ) return -ENODEV; @@ -1103,7 +1103,7 @@ static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc) u8 func = PCI_FUNC(msi->devfn); struct msi_desc *old_desc; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); pdev = pci_get_pdev(msi->seg, msi->bus, msi->devfn); pos = pci_find_cap_offset(msi->seg, msi->bus, slot, func, PCI_CAP_ID_MSIX); if ( !pdev || !pos ) @@ -1205,7 +1205,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off) if ( !pos ) return -ENODEV; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev(seg, bus, devfn); if ( !pdev ) rc = -ENODEV; @@ -1224,7 +1224,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off) rc = msix_capability_init(pdev, pos, NULL, NULL, multi_msix_capable(control)); } - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return rc; } @@ -1235,7 +1235,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off) */ int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc) { - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); if ( !use_msi ) return -EPERM; @@ -1351,7 +1351,7 @@ int pci_restore_msi_state(struct pci_dev *pdev) unsigned int type = 0, pos = 0; u16 control = 0; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); if ( !use_msi ) return -EOPNOTSUPP; diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c index 5bcecbb..892278d 100644 --- a/xen/arch/x86/pci.c +++ b/xen/arch/x86/pci.c @@ -82,13 +82,13 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, if ( reg < 64 || reg >= 256 ) return 0; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf)); if ( pdev ) rc = pci_msi_conf_write_intercept(pdev, reg, size, data); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return rc; } diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 57b7800..1cb9b58 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -167,7 +167,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, goto free_domain; } - spin_lock(&pcidevs_lock); + pcidevs_lock(); /* Verify or get pirq. */ spin_lock(&d->event_lock); pirq = domain_irq_to_pirq(d, irq); @@ -237,7 +237,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, done: spin_unlock(&d->event_lock); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( ret != 0 ) switch ( type ) { @@ -275,11 +275,11 @@ int physdev_unmap_pirq(domid_t domid, int pirq) goto free_domain; } - spin_lock(&pcidevs_lock); + pcidevs_lock(); spin_lock(&d->event_lock); ret = unmap_domain_pirq(d, pirq); spin_unlock(&d->event_lock); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); free_domain: rcu_unlock_domain(d); @@ -689,10 +689,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&restore_msi, arg, 1) != 0 ) break; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev(0, restore_msi.bus, restore_msi.devfn); ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV; - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); break; } @@ -704,10 +704,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&dev, arg, 1) != 0 ) break; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn); ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV; - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); break; } diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index 85e853f..401edb9 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -426,7 +426,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) break; } - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn); if ( !pdev ) node = XEN_INVALID_DEV; @@ -434,7 +434,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) node = XEN_INVALID_NODE_ID; else node = pdev->node; - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( copy_to_guest_offset(ti->nodes, i, &node, 1) ) { diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index 5635650..8bb47b2 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -673,9 +673,9 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[]) bus = PCI_BUS(device_id); devfn = PCI_DEVFN2(device_id); - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_real_pdev(iommu->seg, bus, devfn); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( pdev ) guest_iommu_add_ppr_log(pdev->domain, entry); @@ -778,7 +778,6 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu) { int irq, ret; hw_irq_controller *handler; - unsigned long flags; u16 control; irq = create_irq(NUMA_NO_NODE); @@ -788,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu) return 0; } - spin_lock_irqsave(&pcidevs_lock, flags); + pcidevs_lock(); iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf), PCI_DEVFN2(iommu->bdf)); - spin_unlock_irqrestore(&pcidevs_lock, flags); + pcidevs_unlock(); if ( !iommu->msi.dev ) { AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n", diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index 78862c9..9d91dfc 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -593,7 +593,7 @@ static int update_paging_mode(struct domain *d, unsigned long gfn) hd->arch.paging_mode = level; hd->arch.root_table = new_root; - if ( !spin_is_locked(&pcidevs_lock) ) + if ( !pcidevs_is_locked() ) AMD_IOMMU_DEBUG("%s Try to access pdev_list " "without aquiring pcidevs_lock.\n", __func__); diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c index 8d9f358..1a72c87 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -158,7 +158,7 @@ static void amd_iommu_setup_domain_device( spin_unlock_irqrestore(&iommu->lock, flags); - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); if ( pci_ats_device(iommu->seg, bus, pdev->devfn) && !pci_ats_enabled(iommu->seg, bus, pdev->devfn) ) @@ -345,7 +345,7 @@ void amd_iommu_disable_domain_device(struct domain *domain, } spin_unlock_irqrestore(&iommu->lock, flags); - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); if ( devfn == pdev->devfn && pci_ats_device(iommu->seg, bus, devfn) && diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 27b3ca7..d7e94e1 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -48,7 +48,7 @@ struct pci_seg { } bus2bridge[MAX_BUSES]; }; -spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED; +static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED; static struct radix_tree_root pci_segments; static inline struct pci_seg *get_pseg(u16 seg) @@ -103,6 +103,21 @@ static int pci_segments_iterate( return rc; } +void pcidevs_lock(void) +{ + spin_lock_recursive(&_pcidevs_lock); +} + +void pcidevs_unlock(void) +{ + spin_unlock_recursive(&_pcidevs_lock); +} + +int pcidevs_is_locked(void) +{ + return spin_is_locked(&_pcidevs_lock); +} + void __init pt_pci_init(void) { radix_tree_init(&pci_segments); @@ -412,14 +427,14 @@ int __init pci_hide_device(int bus, int devfn) struct pci_dev *pdev; int rc = -ENOMEM; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = alloc_pdev(get_pseg(0), bus, devfn); if ( pdev ) { _pci_hide_device(pdev); rc = 0; } - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return rc; } @@ -456,7 +471,7 @@ struct pci_dev *pci_get_pdev(int seg, int bus, int devfn) struct pci_seg *pseg = get_pseg(seg); struct pci_dev *pdev = NULL; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); ASSERT(seg != -1 || bus == -1); ASSERT(bus != -1 || devfn == -1); @@ -581,9 +596,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, pdev_type = "extended function"; else if (info->is_virtfn) { - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( !pdev ) pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL, node); @@ -601,7 +616,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, ret = -ENOMEM; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pseg = alloc_pseg(seg); if ( !pseg ) goto out; @@ -703,7 +718,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, pci_enable_acs(pdev); out: - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( !ret ) { printk(XENLOG_DEBUG "PCI add %s %04x:%02x:%02x.%u\n", pdev_type, @@ -735,7 +750,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) if ( !pseg ) return -ENODEV; - spin_lock(&pcidevs_lock); + pcidevs_lock(); list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) if ( pdev->bus == bus && pdev->devfn == devfn ) { @@ -749,7 +764,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) break; } - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return ret; } @@ -807,11 +822,11 @@ int pci_release_devices(struct domain *d) u8 bus, devfn; int ret; - spin_lock(&pcidevs_lock); + pcidevs_lock(); ret = pci_clean_dpci_irqs(d); if ( ret ) { - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return ret; } while ( (pdev = pci_get_pdev_by_domain(d, -1, -1, -1)) ) @@ -823,7 +838,7 @@ int pci_release_devices(struct domain *d) d->domain_id, pdev->seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); } - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return 0; } @@ -920,7 +935,7 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn) s_time_t now = NOW(); u16 cword; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_real_pdev(seg, bus, devfn); if ( pdev ) { @@ -931,7 +946,7 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn) if ( ++pdev->fault.count < PT_FAULT_THRESHOLD ) pdev = NULL; } - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( !pdev ) return; @@ -988,9 +1003,9 @@ int __init scan_pci_devices(void) { int ret; - spin_lock(&pcidevs_lock); + pcidevs_lock(); ret = pci_segments_iterate(_scan_pci_devices, NULL); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return ret; } @@ -1054,17 +1069,17 @@ static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void *arg if ( iommu_verbose ) { - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); process_pending_softirqs(); - spin_lock(&pcidevs_lock); + pcidevs_lock(); } } if ( !iommu_verbose ) { - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); process_pending_softirqs(); - spin_lock(&pcidevs_lock); + pcidevs_lock(); } } @@ -1076,9 +1091,9 @@ void __hwdom_init setup_hwdom_pci_devices( { struct setup_hwdom ctxt = { .d = d, .handler = handler }; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pci_segments_iterate(_setup_hwdom_pci_devices, &ctxt); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); } #ifdef CONFIG_ACPI @@ -1206,9 +1221,9 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg) static void dump_pci_devices(unsigned char ch) { printk("==== PCI devices ====\n"); - spin_lock(&pcidevs_lock); + pcidevs_lock(); pci_segments_iterate(_dump_pci_devices, NULL); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); } struct keyhandler dump_pci_devices_keyhandler = { @@ -1248,7 +1263,7 @@ int iommu_add_device(struct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); hd = domain_hvm_iommu(pdev->domain); if ( !iommu_enabled || !hd->platform_ops ) @@ -1277,7 +1292,7 @@ int iommu_enable_device(struct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); hd = domain_hvm_iommu(pdev->domain); if ( !iommu_enabled || !hd->platform_ops || @@ -1326,9 +1341,9 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn) { struct pci_dev *pdev; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return pdev ? 0 : -EBUSY; } @@ -1350,13 +1365,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) p2m_get_hostp2m(d)->global_logdirty)) ) return -EXDEV; - if ( !spin_trylock(&pcidevs_lock) ) - return -ERESTART; - + pcidevs_lock(); rc = iommu_construct(d); if ( rc ) { - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return rc; } @@ -1387,7 +1400,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) done: if ( !has_arch_pdevs(d) && need_iommu(d) ) iommu_teardown(d); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return rc; } @@ -1402,7 +1415,7 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn) if ( !iommu_enabled || !hd->platform_ops ) return -EINVAL; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); pdev = pci_get_pdev_by_domain(d, seg, bus, devfn); if ( !pdev ) return -ENODEV; @@ -1457,7 +1470,7 @@ static int iommu_get_device_group( group_id = ops->get_device_group_id(seg, bus, devfn); - spin_lock(&pcidevs_lock); + pcidevs_lock(); for_each_pdev( d, pdev ) { if ( (pdev->seg != seg) || @@ -1476,14 +1489,14 @@ static int iommu_get_device_group( if ( unlikely(copy_to_guest_offset(buf, i, &bdf, 1)) ) { - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return -1; } i++; } } - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); return i; } @@ -1611,9 +1624,9 @@ int iommu_do_pci_domctl( bus = PCI_BUS(machine_sbdf); devfn = PCI_DEVFN2(machine_sbdf); - spin_lock(&pcidevs_lock); + pcidevs_lock(); ret = deassign_device(d, seg, bus, devfn); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( ret ) printk(XENLOG_G_ERR "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n", diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 6e62a42..a0e3922 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1307,7 +1307,7 @@ int domain_context_mapping_one( int agaw; int rc = 0; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); spin_lock(&iommu->lock); maddr = bus_to_context_maddr(iommu, bus); context_entries = (struct context_entry *)map_vtd_domain_page(maddr); @@ -1456,7 +1456,7 @@ static int domain_context_mapping( if ( !drhd ) return -ENODEV; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); switch ( pdev->type ) { @@ -1539,7 +1539,7 @@ int domain_context_unmap_one( int iommu_domid; int rc = 0; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); spin_lock(&iommu->lock); maddr = bus_to_context_maddr(iommu, bus); @@ -1861,7 +1861,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, struct mapped_rmrr *mrmrr; struct hvm_iommu *hd = domain_hvm_iommu(d); - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); ASSERT(rmrr->base_address < rmrr->end_address); /* @@ -1926,7 +1926,7 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) u16 bdf; int ret, i; - ASSERT(spin_is_locked(&pcidevs_lock)); + ASSERT(pcidevs_is_locked()); if ( !pdev->domain ) return -EINVAL; @@ -2154,7 +2154,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d) u16 bdf; int ret, i; - spin_lock(&pcidevs_lock); + pcidevs_lock(); for_each_rmrr_device ( rmrr, bdf, i ) { /* @@ -2168,7 +2168,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d) dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: mapping reserved region failed\n"); } - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); } int __init intel_vtd_setup(void) diff --git a/xen/drivers/video/vga.c b/xen/drivers/video/vga.c index 40e5963..61c6b13 100644 --- a/xen/drivers/video/vga.c +++ b/xen/drivers/video/vga.c @@ -117,9 +117,9 @@ void __init video_endboot(void) const struct pci_dev *pdev; u8 b = bus, df = devfn, sb; - spin_lock(&pcidevs_lock); + pcidevs_lock(); pdev = pci_get_pdev(0, bus, devfn); - spin_unlock(&pcidevs_lock); + pcidevs_unlock(); if ( !pdev || pci_conf_read16(0, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index a5aef55..017aa0b 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -94,7 +94,9 @@ struct pci_dev { * interrupt handling related (the mask bit register). */ -extern spinlock_t pcidevs_lock; +void pcidevs_lock(void); +void pcidevs_unlock(void); +int pcidevs_is_locked(void); bool_t pci_known_segment(u16 seg); bool_t pci_device_detect(u16 seg, u8 bus, u8 dev, u8 func);
Signed-off-by: Quan Xu <quan.xu@intel.com> --- xen/arch/x86/domctl.c | 8 +-- xen/arch/x86/hvm/vmsi.c | 4 +- xen/arch/x86/irq.c | 8 +-- xen/arch/x86/msi.c | 16 ++--- xen/arch/x86/pci.c | 4 +- xen/arch/x86/physdev.c | 16 ++--- xen/common/sysctl.c | 4 +- xen/drivers/passthrough/amd/iommu_init.c | 9 ++- xen/drivers/passthrough/amd/iommu_map.c | 2 +- xen/drivers/passthrough/amd/pci_amd_iommu.c | 4 +- xen/drivers/passthrough/pci.c | 93 ++++++++++++++++------------- xen/drivers/passthrough/vtd/iommu.c | 14 ++--- xen/drivers/video/vga.c | 4 +- xen/include/xen/pci.h | 4 +- 14 files changed, 102 insertions(+), 88 deletions(-)