Message ID | 20130418083347.GA16526@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Apr 18, 2013 at 12:40:09PM +0300, Jack Morgenstein wrote: > On Thursday 18 April 2013 11:33, Michael S. Tsirkin wrote: > > But for pci_sriov_enable, the situation is actually very simple: > > VFs almost never use the same driver as the PF so the warning > > is bogus there. > > > What about the case where the VF driver IS the same as the PF driver? Then it can deadlock, e.g. if driver takes a global mutex. But it's an internal driver issue the, you can trigger a deadlock through hardware too, e.g. if VF initialization blocks until PF is fully initialized. I think it's not the case for Mellanox, is it? This is what I refer to: would be nice to fix nested probing in general but it seems disabling the warning is the best we can do for 3.9 since it causes false positives.
On Thursday 18 April 2013 11:33, Michael S. Tsirkin wrote: > But for pci_sriov_enable, the situation is actually very simple: > VFs almost never use the same driver as the PF so the warning > is bogus there. > What about the case where the VF driver IS the same as the PF driver? -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 18 April 2013 11:48, Michael S. Tsirkin wrote: > On Thu, Apr 18, 2013 at 12:40:09PM +0300, Jack Morgenstein wrote: > > On Thursday 18 April 2013 11:33, Michael S. Tsirkin wrote: > > > But for pci_sriov_enable, the situation is actually very simple: > > > VFs almost never use the same driver as the PF so the warning > > > is bogus there. > > > > > What about the case where the VF driver IS the same as the PF driver? > > Then it can deadlock, e.g. if driver takes a global mutex. But it's an > internal driver issue the, you can trigger a deadlock through hardware > too, e.g. if VF initialization blocks until PF is fully initialized. > I think it's not the case for Mellanox, is it? Correct, the Mellanox driver does not deadlock. > This is what I refer to: would be nice to fix nested probing in general > but it seems disabling the warning is the best we can do for 3.9 since > it causes false positives. > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 18, 2013 at 05:49:20PM +0300, Or Gerlitz wrote: > On 18/04/2013 11:33, Michael S. Tsirkin wrote: > >On Sun, Apr 14, 2013 at 06:43:39AM -0700, Tejun Heo wrote: > >>On Sun, Apr 14, 2013 at 03:58:55PM +0300, Or Gerlitz wrote: > >>>So the patch eliminated the lockdep warning for mlx4 nested probing > >>>sequence, but introduced lockdep warning for > >>>00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC > >>>Interrupt Controller (rev 22) > >>Oops, the patch in itself doesn't really change anything. The caller > >>should use a different subclass for the nested invocation, just like > >>spin_lock_nested() and friends. Sorry about not being clear. > >>Michael, can you please help? > >> > >>Thanks. > >> > >>-- > >>tejun > >So like this on top. Tejun, you didn't add your S.O.B and patch > >description, if this helps as we expect they will be needed. > > > >----> > > > >pci: use work_on_cpu_nested for nested SRIOV > > > >Snce 3.9-rc1 mlx driver started triggering a lockdep warning. > > > >The issue is that a driver, in it's probe function, calls > >pci_sriov_enable so a PF device probe causes VF probe (AKA nested > >probe). Each probe in pci_device_probe which is (normally) run through > >work_on_cpu (this is to get the right numa node for memory allocated by > >the driver). In turn work_on_cpu does this internally: > > > > schedule_work_on(cpu, &wfc.work); > > flush_work(&wfc.work); > > > >So if you are running probe on CPU1, and cause another > >probe on the same CPU, this will try to flush > >workqueue from inside same workqueue which triggers > >a lockdep warning. > > > >Nested probing might be tricky to get right generally. > > > >But for pci_sriov_enable, the situation is actually very simple: > >VFs almost never use the same driver as the PF so the warning > >is bogus there. > > > >This is hardly elegant as it might shut up some real warnings if a buggy > >driver actually probes itself in a nested way, but looks to me like an > >appropriate quick fix for 3.9. > > > >Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > >--- > >diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > >index 1fa1e48..9c836ef 100644 > >--- a/drivers/pci/pci-driver.c > >+++ b/drivers/pci/pci-driver.c > >@@ -286,9 +286,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, > > int cpu; > > get_online_cpus(); > >- cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); > >- if (cpu < nr_cpu_ids) > >- error = work_on_cpu(cpu, local_pci_probe, &ddi); > >+ cpu = cpumask_first_and(cpumask_of_node(node), cpu_online_mask); > >+ if (cpu != raw_smp_processor_id() && cpu < nr_cpu_ids) > >+ error = work_on_cpu_nested(cpu, local_pci_probe, &ddi); > > as you wrote to me later, missing here is SINGLE_DEPTH_NESTING as > the last param to work_on_cpu_nested > > else > > error = local_pci_probe(&ddi); > > put_online_cpus(); > > So now I used Tejun's patch and Michael patch on top of the net.git > as of commit 2e0cbf2cc2c9371f0aa198857d799175ffe231a6 > "net: mvmdio: add select PHYLIB" from April 13 -- and I still see > this... so we're not there yet > > ===================================== > [ BUG: bad unlock balance detected! ] > 3.9.0-rc6+ #56 Not tainted > ------------------------------------- > swapper/0/1 is trying to release lock ((&wfc.work)) at: > [<ffffffff81220167>] pci_device_probe+0x117/0x120 > but there are no more locks to release! > > other info that might help us debug this: > 2 locks held by swapper/0/1: > #0: (&__lockdep_no_validate__){......}, at: [<ffffffff812da443>] > __driver_attach+0x53/0xb0 > #1: (&__lockdep_no_validate__){......}, at: [<ffffffff812da451>] > __driver_attach+0x61/0xb0 > > stack backtrace: > Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc6+ #56 > Call Trace: > [<ffffffff81220167>] ? pci_device_probe+0x117/0x120 > [<ffffffff81093529>] print_unlock_imbalance_bug+0xf9/0x100 > [<ffffffff8109616f>] lock_set_class+0x27f/0x7c0 > [<ffffffff81091d9e>] ? mark_held_locks+0x9e/0x130 > [<ffffffff81220167>] ? pci_device_probe+0x117/0x120 > [<ffffffff81066aeb>] work_on_cpu_nested+0x8b/0xc0 > [<ffffffff810633c0>] ? keventd_up+0x20/0x20 > [<ffffffff8121f420>] ? pci_pm_prepare+0x60/0x60 > [<ffffffff81220167>] pci_device_probe+0x117/0x120 > [<ffffffff812da0fa>] ? driver_sysfs_add+0x7a/0xb0 > [<ffffffff812da24f>] driver_probe_device+0x8f/0x230 > [<ffffffff812da493>] __driver_attach+0xa3/0xb0 > [<ffffffff812da3f0>] ? driver_probe_device+0x230/0x230 > [<ffffffff812da3f0>] ? driver_probe_device+0x230/0x230 > [<ffffffff812d86fc>] bus_for_each_dev+0x8c/0xb0 > [<ffffffff812da079>] driver_attach+0x19/0x20 > [<ffffffff812d91a0>] bus_add_driver+0x1f0/0x250 > [<ffffffff818bd596>] ? dmi_pcie_pme_disable_msi+0x21/0x21 > [<ffffffff812daadf>] driver_register+0x6f/0x150 > [<ffffffff818bd596>] ? dmi_pcie_pme_disable_msi+0x21/0x21 > [<ffffffff8122026f>] __pci_register_driver+0x5f/0x70 > [<ffffffff818bd5ff>] pcie_portdrv_init+0x69/0x7a > [<ffffffff810001fd>] do_one_initcall+0x3d/0x170 > [<ffffffff81895943>] kernel_init_freeable+0x10d/0x19c > [<ffffffff818959d2>] ? kernel_init_freeable+0x19c/0x19c > [<ffffffff8145a040>] ? rest_init+0x160/0x160 > [<ffffffff8145a049>] kernel_init+0x9/0xf0 > [<ffffffff8146ca6c>] ret_from_fork+0x7c/0xb0 > [<ffffffff8145a040>] ? rest_init+0x160/0x160 > ioapic: probe of 0000:00:13.0 failed with error -22 > pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Tejun, what do you say my patch is used for 3.9, and we can revisit for 3.10. The release is almost here. If yes please send your Ack.
On 18/04/2013 11:33, Michael S. Tsirkin wrote: > On Sun, Apr 14, 2013 at 06:43:39AM -0700, Tejun Heo wrote: >> On Sun, Apr 14, 2013 at 03:58:55PM +0300, Or Gerlitz wrote: >>> So the patch eliminated the lockdep warning for mlx4 nested probing >>> sequence, but introduced lockdep warning for >>> 00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC >>> Interrupt Controller (rev 22) >> Oops, the patch in itself doesn't really change anything. The caller >> should use a different subclass for the nested invocation, just like >> spin_lock_nested() and friends. Sorry about not being clear. >> Michael, can you please help? >> >> Thanks. >> >> -- >> tejun > So like this on top. Tejun, you didn't add your S.O.B and patch > description, if this helps as we expect they will be needed. > > ----> > > pci: use work_on_cpu_nested for nested SRIOV > > Snce 3.9-rc1 mlx driver started triggering a lockdep warning. > > The issue is that a driver, in it's probe function, calls > pci_sriov_enable so a PF device probe causes VF probe (AKA nested > probe). Each probe in pci_device_probe which is (normally) run through > work_on_cpu (this is to get the right numa node for memory allocated by > the driver). In turn work_on_cpu does this internally: > > schedule_work_on(cpu, &wfc.work); > flush_work(&wfc.work); > > So if you are running probe on CPU1, and cause another > probe on the same CPU, this will try to flush > workqueue from inside same workqueue which triggers > a lockdep warning. > > Nested probing might be tricky to get right generally. > > But for pci_sriov_enable, the situation is actually very simple: > VFs almost never use the same driver as the PF so the warning > is bogus there. > > This is hardly elegant as it might shut up some real warnings if a buggy > driver actually probes itself in a nested way, but looks to me like an > appropriate quick fix for 3.9. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 1fa1e48..9c836ef 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -286,9 +286,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, > int cpu; > > get_online_cpus(); > - cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); > - if (cpu < nr_cpu_ids) > - error = work_on_cpu(cpu, local_pci_probe, &ddi); > + cpu = cpumask_first_and(cpumask_of_node(node), cpu_online_mask); > + if (cpu != raw_smp_processor_id() && cpu < nr_cpu_ids) > + error = work_on_cpu_nested(cpu, local_pci_probe, &ddi); as you wrote to me later, missing here is SINGLE_DEPTH_NESTING as the last param to work_on_cpu_nested > else > error = local_pci_probe(&ddi); > put_online_cpus(); So now I used Tejun's patch and Michael patch on top of the net.git as of commit 2e0cbf2cc2c9371f0aa198857d799175ffe231a6 "net: mvmdio: add select PHYLIB" from April 13 -- and I still see this... so we're not there yet ===================================== [ BUG: bad unlock balance detected! ] 3.9.0-rc6+ #56 Not tainted ------------------------------------- swapper/0/1 is trying to release lock ((&wfc.work)) at: [<ffffffff81220167>] pci_device_probe+0x117/0x120 but there are no more locks to release! other info that might help us debug this: 2 locks held by swapper/0/1: #0: (&__lockdep_no_validate__){......}, at: [<ffffffff812da443>] __driver_attach+0x53/0xb0 #1: (&__lockdep_no_validate__){......}, at: [<ffffffff812da451>] __driver_attach+0x61/0xb0 stack backtrace: Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc6+ #56 Call Trace: [<ffffffff81220167>] ? pci_device_probe+0x117/0x120 [<ffffffff81093529>] print_unlock_imbalance_bug+0xf9/0x100 [<ffffffff8109616f>] lock_set_class+0x27f/0x7c0 [<ffffffff81091d9e>] ? mark_held_locks+0x9e/0x130 [<ffffffff81220167>] ? pci_device_probe+0x117/0x120 [<ffffffff81066aeb>] work_on_cpu_nested+0x8b/0xc0 [<ffffffff810633c0>] ? keventd_up+0x20/0x20 [<ffffffff8121f420>] ? pci_pm_prepare+0x60/0x60 [<ffffffff81220167>] pci_device_probe+0x117/0x120 [<ffffffff812da0fa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff812da24f>] driver_probe_device+0x8f/0x230 [<ffffffff812da493>] __driver_attach+0xa3/0xb0 [<ffffffff812da3f0>] ? driver_probe_device+0x230/0x230 [<ffffffff812da3f0>] ? driver_probe_device+0x230/0x230 [<ffffffff812d86fc>] bus_for_each_dev+0x8c/0xb0 [<ffffffff812da079>] driver_attach+0x19/0x20 [<ffffffff812d91a0>] bus_add_driver+0x1f0/0x250 [<ffffffff818bd596>] ? dmi_pcie_pme_disable_msi+0x21/0x21 [<ffffffff812daadf>] driver_register+0x6f/0x150 [<ffffffff818bd596>] ? dmi_pcie_pme_disable_msi+0x21/0x21 [<ffffffff8122026f>] __pci_register_driver+0x5f/0x70 [<ffffffff818bd5ff>] pcie_portdrv_init+0x69/0x7a [<ffffffff810001fd>] do_one_initcall+0x3d/0x170 [<ffffffff81895943>] kernel_init_freeable+0x10d/0x19c [<ffffffff818959d2>] ? kernel_init_freeable+0x19c/0x19c [<ffffffff8145a040>] ? rest_init+0x160/0x160 [<ffffffff8145a049>] kernel_init+0x9/0xf0 [<ffffffff8146ca6c>] ret_from_fork+0x7c/0xb0 [<ffffffff8145a040>] ? rest_init+0x160/0x160 ioapic: probe of 0000:00:13.0 failed with error -22 pci_hotplug: PCI Hot Plug PCI Core version: 0.5 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 18, 2013 at 04:54:58PM +0300, Michael S. Tsirkin wrote: > Tejun, what do you say my patch is used for 3.9, > and we can revisit for 3.10. > The release is almost here. > If yes please send your Ack. Yeap, let's do that. Acked-by: Tejun Heo <tj@kernel.org> Thanks.
On Thu, Apr 18, 2013 at 12:19 PM, Tejun Heo <tj@kernel.org> wrote: > On Thu, Apr 18, 2013 at 04:54:58PM +0300, Michael S. Tsirkin wrote: >> Tejun, what do you say my patch is used for 3.9, >> and we can revisit for 3.10. >> The release is almost here. >> If yes please send your Ack. > > Yeap, let's do that. > > Acked-by: Tejun Heo <tj@kernel.org> Michael, can you post a new version with Tejun's ack? IIRC, this was in drivers/pci, but I haven't been following this and am not sure exactly what you want applied. Thanks. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 18, 2013 at 4:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote: [...] > Tejun, what do you say my patch is used for 3.9, and we can revisit for 3.10. > The release is almost here. If yes please send your Ack. Michael, I assume you mean pull to 3.9 both Tejun's and your patch, correct? I wasn't sure what does this really buys us... we got read from the false-positive lockdep warning which takes place during nested probe and got another lockdep warning during the probe of Interrupt controller Or. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 18, 2013 at 09:41:31PM +0300, Or Gerlitz wrote: > On Thu, Apr 18, 2013 at 4:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > [...] > > Tejun, what do you say my patch is used for 3.9, and we can revisit for 3.10. > > The release is almost here. If yes please send your Ack. > > Michael, > > I assume you mean pull to 3.9 both Tejun's and your patch, correct? I > wasn't sure what does this really buys us... we got read from the > false-positive lockdep warning which takes place during nested probe > and got another lockdep warning during the probe of Interrupt > controller > > Or. No I mean my original patch. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 18, 2013 at 12:25:59PM -0600, Bjorn Helgaas wrote: > On Thu, Apr 18, 2013 at 12:19 PM, Tejun Heo <tj@kernel.org> wrote: > > On Thu, Apr 18, 2013 at 04:54:58PM +0300, Michael S. Tsirkin wrote: > >> Tejun, what do you say my patch is used for 3.9, > >> and we can revisit for 3.10. > >> The release is almost here. > >> If yes please send your Ack. > > > > Yeap, let's do that. > > > > Acked-by: Tejun Heo <tj@kernel.org> > > Michael, can you post a new version with Tejun's ack? IIRC, this was > in drivers/pci, but I haven't been following this and am not sure > exactly what you want applied. Thanks. > > Bjorn Done. Subject is: [PATCHv2 for-3.9] pci: avoid work_on_cpu for nested SRIOV it's the same patch with Tejun's ack and a minor correction in the commit message. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 1fa1e48..9c836ef 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -286,9 +286,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, int cpu; get_online_cpus(); - cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); - if (cpu < nr_cpu_ids) - error = work_on_cpu(cpu, local_pci_probe, &ddi); + cpu = cpumask_first_and(cpumask_of_node(node), cpu_online_mask); + if (cpu != raw_smp_processor_id() && cpu < nr_cpu_ids) + error = work_on_cpu_nested(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); put_online_cpus();