From patchwork Sat Nov 16 00:28:20 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bjorn Helgaas X-Patchwork-Id: 3191591 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id DDB61C045B for ; Sat, 16 Nov 2013 00:28:31 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 17B2C20915 for ; Sat, 16 Nov 2013 00:28:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2676E20901 for ; Sat, 16 Nov 2013 00:28:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753753Ab3KPA23 (ORCPT ); Fri, 15 Nov 2013 19:28:29 -0500 Received: from mail-ie0-f172.google.com ([209.85.223.172]:51720 "EHLO mail-ie0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753307Ab3KPA2Y (ORCPT ); Fri, 15 Nov 2013 19:28:24 -0500 Received: by mail-ie0-f172.google.com with SMTP id to1so5995101ieb.3 for ; Fri, 15 Nov 2013 16:28:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=AQpaWZrVQjX/JwJc0TXMnuHtcYrrKZ4mMIwtKUstktM=; b=f/7Qh8lipcYP4um3mI4jX9rSUzxZkzo+kY18JT0x7ReHtyuHkD/b3o2vdPzCBhDo6X 3dWcP7leHQvKwWhD+2OVZKyR3QJKbyuPmTJRU1t/XxVI/JLKDrT59e3IoO+i/oNQ8RqD zP4bMnQALyS2Blp63lCTEC2BxOZaTw68qGFB93MOIQThnFAqsTXlaAHC1KZBsYNThcEL goZYMt8jwcGcxXZg6BLmZC9A4tNsw18v+tfgeexZAYtAqOLDtJxjUlgH2+2lBJFUKl1u 5cLGIeowLizUH0bb+A4pERLENfpTVNV5z6wBNfqxYVcwOxWaFY6DrNfdtRHvat/tqosE G1JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=AQpaWZrVQjX/JwJc0TXMnuHtcYrrKZ4mMIwtKUstktM=; b=iUQWSf6oMiHcEwGhLQ6y32hMssuivxwefN/ATXfH3kqjM5fyv2Hi3TKfK4as2EUe3U tuhN4eBna0WuZcXDal+0e3ZwppJGuKm4ij14HGGvrmCene3BkssfIbp4H+8Qx1eATfQ/ iM2OYZTBLvMP4sDUQpTNHHQgbzJADl/9nR78MbAr2y246ueOYTgFKXn13B6h0/axyKhA o4I3LiMO7N7FFOeszV/yMGEqFdwjzMIRK/jCruW3ltcOaXwvW/Qh2lt1nYKNVtqhSRQm wvKHXwwn9OFEjMl9Ar6ZF3EB0jlyqOyaFJ/p6aY4c2EXrSra03qen5mINpHZRt8/K9SY yDMA== X-Gm-Message-State: ALoCoQnPGQFfsWV+620TqZnhlRPT4YBZ4zk38q53cTTq3MEcDs0nSiV/hP5EsrrBW3wTio+4/223d2eyCmTHlrwH2FDfSTmWxu+mbXMRTabehB3Lzd2zVazfxkMtS76U3CrneexsbsKsYW7Y/jK+xDNF54Yw7TjrMFSqnbVjJ37dzBvFUun3s8Hrh3Qn50bvE9hifyM5z1+UokIanTi2UPSls8UDyFJqqg== X-Received: by 10.50.41.41 with SMTP id c9mr6138043igl.59.1384561704182; Fri, 15 Nov 2013 16:28:24 -0800 (PST) Received: from google.com ([172.16.52.167]) by mx.google.com with ESMTPSA id w4sm182002igb.5.2013.11.15.16.28.22 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 15 Nov 2013 16:28:23 -0800 (PST) Date: Fri, 15 Nov 2013 17:28:20 -0700 From: Bjorn Helgaas To: Tejun Heo Cc: Hugh Dickins , Steven Rostedt , Li Zefan , Markus Blank-Burian , Michal Hocko , Johannes Weiner , David Rientjes , Ying Han , Greg Thelen , Michel Lespinasse , cgroups@vger.kernel.org, "Srivatsa S. Bhat" , Lai Jiangshan , "linux-kernel@vger.kernel.org" , "Rafael J. Wysocki" , Alexander Duyck , Yinghai Lu , "linux-pci@vger.kernel.org" Subject: Re: Possible regression with cgroups in 3.11 Message-ID: <20131116002820.GA31073@google.com> References: <525CB337.8050105@huawei.com> <5270BFE7.4000602@huawei.com> <20131031130647.0ff6f2c7@gandalf.local.home> <20131113032804.GB19394@mtj.dyndns.org> <20131113073806.GA23244@mtj.dyndns.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20131113073806.GA23244@mtj.dyndns.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FSL_HELO_FAKE, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Nov 13, 2013 at 04:38:06PM +0900, Tejun Heo wrote: > Hey, guys. > > cc'ing people from "workqueue, pci: INFO: possible recursive locking > detected" thread. > > http://thread.gmane.org/gmane.linux.kernel/1525779 > > So, to resolve that issue, we ripped out lockdep annotation from > work_on_cpu() and cgroup is now experiencing deadlock involving > work_on_cpu(). It *could* be that workqueue is actually broken or > memcg is looping but it doesn't seem like a very good idea to not have > lockdep annotation around work_on_cpu(). > > IIRC, there was one pci code path which called work_on_cpu() > recursively. Would it be possible for that path to use something like > work_on_cpu_nested(XXX, depth) so that we can retain lockdep > annotation on work_on_cpu()? I'm open to changing the way pci_call_probe() works, but my opinion is that the PCI path that causes trouble is a broken design, and we shouldn't complicate the work_on_cpu() interface just to accommodate that broken design. The problem is that when a PF .probe() method that calls pci_enable_sriov(), we add new VF devices and call *their* .probe() methods before the PF .probe() method completes. That is ugly and error-prone. When we call .probe() methods for the VFs, we're obviously already on the correct node, because the VFs are on the same node as the PF, so I think the best short-term fix is Alexander's patch to avoid work_on_cpu() when we're already on the correct node -- something like the (untested) patch below. Bjorn PCI: Avoid unnecessary CPU switch when calling driver .probe() method From: Bjorn Helgaas If we are already on a CPU local to the device, call the driver .probe() method directly without using work_on_cpu(). This is a workaround for a lockdep warning in the following scenario: pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) driver .probe pci_enable_sriov ... pci_bus_add_device ... pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) It would be better to fix PCI so we don't call VF driver .probe() methods from inside a PF driver .probe() method, but that's a bigger project. This patch is due to Alexander Duyck ; I merely added the preemption disable. Link: https://bugzilla.kernel.org/show_bug.cgi?id=65071 Link: http://lkml.kernel.org/r/CAE9FiQXYQEAZ=0sG6+2OdffBqfLS9MpoN1xviRR9aDbxPxcKxQ@mail.gmail.com Link: http://lkml.kernel.org/r/20130624195942.40795.27292.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: Bjorn Helgaas --- drivers/pci/pci-driver.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 454853507b7e..accae06aa79a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -293,7 +293,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, its local memory on the right node without any need to change it. */ node = dev_to_node(&dev->dev); - if (node >= 0) { + preempt_disable(); + + if (node >= 0 && node != numa_node_id()) { int cpu; get_online_cpus(); @@ -305,6 +307,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, put_online_cpus(); } else error = local_pci_probe(&ddi); + + preempt_enable(); return error; }