From patchwork Mon Nov 18 18:14:40 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bjorn Helgaas X-Patchwork-Id: 3197551 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 17EAD9F43F for ; Mon, 18 Nov 2013 18:15:04 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id AA227205BE for ; Mon, 18 Nov 2013 18:15:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2EB9D205BC for ; Mon, 18 Nov 2013 18:15:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751797Ab3KRSO6 (ORCPT ); Mon, 18 Nov 2013 13:14:58 -0500 Received: from mail-ie0-f175.google.com ([209.85.223.175]:62812 "EHLO mail-ie0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751468Ab3KRSOo (ORCPT ); Mon, 18 Nov 2013 13:14:44 -0500 Received: by mail-ie0-f175.google.com with SMTP id u16so9054182iet.20 for ; Mon, 18 Nov 2013 10:14:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=1ZpLf2em/BwJ3PDDvWrKj77pLctNTMJ+poeYWjxaNnc=; b=PJ2KrcRMsdtpz6a8b2Cye2q9Q2OdLOtlXIvoV+4Nx6cmtOnYZyDEZC52R+sKA114f4 wA+G4MguGqGIpXozotRM2C2DUPz+8ZDWhYNcRxB5lTQRzCam5PJz1+fXwdOYVhhTiW7r kan8EupEMwF8x773MzU8RCoNIU46ndBy8gXtq1w/GehWK2NgtLdWFX7xvgM1Ss1QF5wu 39CqGrZEv+uUtXg805OtN7T33UVwHEXOhvNKnNqUzzAD8QRIgA6tTZl566sAbLDUGnwa QMtn7qU6oEDJpQMad+Utf+wn5WnbXlg70OiI1zyd6NS1YetoGC84KxPnWg9x7Hp61dDC KFkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=1ZpLf2em/BwJ3PDDvWrKj77pLctNTMJ+poeYWjxaNnc=; b=LaUaN4AaPlmeQCg9/acz93HdL4op+f3qyFBZahhHqdDneywca0do8ph1UjivlrNHpT lewCP4poQQ3E+aLCMYWxKNe9/dGlaf+otEMo/Csq5Zi9Ls1md2QlgD4zfovxLGDr44BV FsSpsT59AafjtJB+10H14dLimYQskzA3x3jYOo4elS6chck5M+CpIMt+OfnfiKdgd7jS XMKrEYy8J6ULETFKQfDNwIgdhv4gcXUW5KwAKbBEiP2hBOtYFCXf1Musp0ibhnTWSSyI Ka3db1gbwZ0GldETFZ8azpgtXX/mflXMzSUGVrJjNxMugdVPJ667mhJXhtE4PCkJuf1R I6jw== X-Gm-Message-State: ALoCoQlpXWlYVI2gfuewkICTZLG0XE7aElnmQFzWOsw4DZES6RXblmHC27tej7Atvuiyz+UuQNIWz46ev2WnXVYaEcuSU27hszPfbEkuBxYcnWmQYjiowS2Gm779b51f6zBs3qIbOjTGOWELsD9oXU0B/1BGg1D2Gd+OmkqnnzXCXT0NbTiDQYhtosD8X3fcsZNinj9OVMPJe5100JF2jPotWQJ7J8sg+g== X-Received: by 10.50.41.38 with SMTP id c6mr15459991igl.47.1384798483891; Mon, 18 Nov 2013 10:14:43 -0800 (PST) Received: from google.com ([172.16.48.5]) by mx.google.com with ESMTPSA id ft2sm14805090igb.5.2013.11.18.10.14.42 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 18 Nov 2013 10:14:43 -0800 (PST) Date: Mon, 18 Nov 2013 11:14:40 -0700 From: Bjorn Helgaas To: Tejun Heo Cc: Hugh Dickins , Steven Rostedt , Li Zefan , Markus Blank-Burian , Michal Hocko , Johannes Weiner , David Rientjes , Ying Han , Greg Thelen , Michel Lespinasse , cgroups@vger.kernel.org, "Srivatsa S. Bhat" , Lai Jiangshan , "linux-kernel@vger.kernel.org" , "Rafael J. Wysocki" , Alexander Duyck , Yinghai Lu , "linux-pci@vger.kernel.org" Subject: Re: Possible regression with cgroups in 3.11 Message-ID: <20131118181440.GA2996@google.com> References: <5270BFE7.4000602@huawei.com> <20131031130647.0ff6f2c7@gandalf.local.home> <20131113032804.GB19394@mtj.dyndns.org> <20131113073806.GA23244@mtj.dyndns.org> <20131116002820.GA31073@google.com> <20131116045356.GA5618@mtj.dyndns.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20131116045356.GA5618@mtj.dyndns.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FSL_HELO_FAKE, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sat, Nov 16, 2013 at 01:53:56PM +0900, Tejun Heo wrote: > Hello, Bjorn. > > On Fri, Nov 15, 2013 at 05:28:20PM -0700, Bjorn Helgaas wrote: > > It would be better to fix PCI so we don't call VF driver .probe() methods > > from inside a PF driver .probe() method, but that's a bigger project. > > Yeah, if pci doesn't need the recursion, we can simply revert restore > the lockdep annoation on work_on_cpu(). > > > @@ -293,7 +293,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, > > its local memory on the right node without any need to > > change it. */ > > node = dev_to_node(&dev->dev); > > - if (node >= 0) { > > + preempt_disable(); > > + > > + if (node >= 0 && node != numa_node_id()) { > > A bit of comment here would be nice but yeah I think this should work. > Can you please also queue the revert of c2fda509667b ("workqueue: > allow work_on_cpu() to be called recursively") after this patch? > Please feel free to add my acked-by. OK, below are the two patches (Alex's fix + the revert) I propose to merge. Unless there are objections, I'll ask Linus to pull these before v3.13-rc1. Bjorn commit 84f23f99b507c2c9247f47d3db0f71a3fd65e3a3 Author: Alexander Duyck Date: Mon Nov 18 10:59:59 2013 -0700 PCI: Avoid unnecessary CPU switch when calling driver .probe() method If we are already on a CPU local to the device, call the driver .probe() method directly without using work_on_cpu(). This is a workaround for a lockdep warning in the following scenario: pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) driver .probe pci_enable_sriov ... pci_bus_add_device ... pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) It would be better to fix PCI so we don't call VF driver .probe() methods from inside a PF driver .probe() method, but that's a bigger project. [bhelgaas: disable preemption, open bugzilla, rework comments & changelog] Link: https://bugzilla.kernel.org/show_bug.cgi?id=65071 Link: http://lkml.kernel.org/r/CAE9FiQXYQEAZ=0sG6+2OdffBqfLS9MpoN1xviRR9aDbxPxcKxQ@mail.gmail.com Link: http://lkml.kernel.org/r/20130624195942.40795.27292.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: Alexander Duyck Signed-off-by: Bjorn Helgaas Acked-by: Tejun Heo Tested-by: Yinghai Lu Acked-by: Yinghai Lu --- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 9042fdbd7244..add04e70ac2a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -288,12 +288,24 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, int error, node; struct drv_dev_and_id ddi = { drv, dev, id }; - /* Execute driver initialization on node where the device's - bus is attached to. This way the driver likely allocates - its local memory on the right node without any need to - change it. */ + /* + * Execute driver initialization on node where the device is + * attached. This way the driver likely allocates its local memory + * on the right node. + */ node = dev_to_node(&dev->dev); - if (node >= 0) { + preempt_disable(); + + /* + * On NUMA systems, we are likely to call a PF probe function using + * work_on_cpu(). If that probe calls pci_enable_sriov() (which + * adds the VF devices via pci_bus_add_device()), we may re-enter + * this function to call the VF probe function. Calling + * work_on_cpu() again will cause a lockdep warning. Since VFs are + * always on the same node as the PF, we can work around this by + * avoiding work_on_cpu() when we're already on the correct node. + */ + if (node >= 0 && node != numa_node_id()) { int cpu; get_online_cpus(); @@ -305,6 +317,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, put_online_cpus(); } else error = local_pci_probe(&ddi); + + preempt_enable(); return error; } commit 2dde5285d06370b2004613ee4fd253e95622af20 Author: Bjorn Helgaas Date: Mon Nov 18 11:00:29 2013 -0700 Revert "workqueue: allow work_on_cpu() to be called recursively" This reverts commit c2fda509667b0fda4372a237f5a59ea4570b1627. c2fda509667b removed lockdep annotation from work_on_cpu() to work around the PCI path that calls work_on_cpu() from within a work_on_cpu() work item (PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver .probe method). 84f23f99b507 ("PCI: Avoid unnecessary CPU switch when calling driver .probe() method) avoids that recursive work_on_cpu() use in a different way, so this revert restores the work_on_cpu() lockdep annotation. Signed-off-by: Bjorn Helgaas Acked-by: Tejun Heo diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 987293d03ebc..5690b8eabfbc 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2840,19 +2840,6 @@ already_gone: return false; } -static bool __flush_work(struct work_struct *work) -{ - struct wq_barrier barr; - - if (start_flush_work(work, &barr)) { - wait_for_completion(&barr.done); - destroy_work_on_stack(&barr.work); - return true; - } else { - return false; - } -} - /** * flush_work - wait for a work to finish executing the last queueing instance * @work: the work to flush @@ -2866,10 +2853,18 @@ static bool __flush_work(struct work_struct *work) */ bool flush_work(struct work_struct *work) { + struct wq_barrier barr; + lock_map_acquire(&work->lockdep_map); lock_map_release(&work->lockdep_map); - return __flush_work(work); + if (start_flush_work(work, &barr)) { + wait_for_completion(&barr.done); + destroy_work_on_stack(&barr.work); + return true; + } else { + return false; + } } EXPORT_SYMBOL_GPL(flush_work); @@ -4814,14 +4809,7 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg) INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn); schedule_work_on(cpu, &wfc.work); - - /* - * The work item is on-stack and can't lead to deadlock through - * flushing. Use __flush_work() to avoid spurious lockdep warnings - * when work_on_cpu()s are nested. - */ - __flush_work(&wfc.work); - + flush_work(&wfc.work); return wfc.ret; } EXPORT_SYMBOL_GPL(work_on_cpu);