From patchwork Tue Jan 22 18:39:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10776089 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 00514139A for ; Tue, 22 Jan 2019 18:44:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E29CE2B5B3 for ; Tue, 22 Jan 2019 18:44:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D6DBC2B60A; Tue, 22 Jan 2019 18:44:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1ED522B5B3 for ; Tue, 22 Jan 2019 18:44:11 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D3A45211BA451; Tue, 22 Jan 2019 10:44:10 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: None (no SPF record) identity=mailfrom; client-ip=134.134.136.24; helo=mga09.intel.com; envelope-from=alexander.h.duyck@linux.intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 501C8211B7F9A for ; Tue, 22 Jan 2019 10:39:06 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Jan 2019 10:39:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,507,1539673200"; d="scan'208";a="110185215" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga006.jf.intel.com with ESMTP; 22 Jan 2019 10:39:05 -0800 Subject: [driver-core PATCH v10 0/9] Add NUMA aware async_schedule calls From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Date: Tue, 22 Jan 2019 10:39:05 -0800 Message-ID: <154818223154.18753.12374915684623789884.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: len.brown@intel.com, bvanassche@acm.org, linux-pm@vger.kernel.org, alexander.h.duyck@linux.intel.com, linux-nvdimm@lists.01.org, jiangshanlai@gmail.com, mcgrof@kernel.org, pavel@ucw.cz, zwisler@kernel.org, tj@kernel.org, akpm@linux-foundation.org, rafael@kernel.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP This patch set provides functionality that will help to improve the locality of the async_schedule calls used to provide deferred initialization. This patch set originally started out focused on just the one call to async_schedule_domain in the nvdimm tree that was being used to defer the device_add call however after doing some digging I realized the scope of this was much broader than I had originally planned. As such I went through and reworked the underlying infrastructure down to replacing the queue_work call itself with a function of my own and opted to try and provide a NUMA aware solution that would work for a broader audience. In addition I have added several tweaks and/or clean-ups to the front of the patch set. Patches 1 through 3 address a number of issues that actually were causing the existing async_schedule calls to not show the performance that they could due to either not scaling on a per device basis, or due to issues that could result in a potential race. For example, patch 3 addresses the fact that we were calling async_schedule once per driver instead of once per device, and as a result we would have still ended up with devices being probed on a non-local node without addressing this first. I have also updated the kernel module used to test async driver probing so that it can expose the original issue I was attempting to address. It will fail on a system of asynchronous work either takes longer than it takes to load a single device and a single driver with a device already added. It will also fail if the NUMA node that the driver is loaded on does not match the NUMA node the device is associated with. RFC->v1: Dropped nvdimm patch to submit later. It relies on code in libnvdimm development tree. Simplified queue_work_near to just convert node into a CPU. Split up drivers core and PM core patches. v1->v2: Renamed queue_work_near to queue_work_node Added WARN_ON_ONCE if we use queue_work_node with per-cpu workqueue v2->v3: Added Acked-by for queue_work_node patch Continued rename from _near to _node to be consistent with queue_work_node Renamed async_schedule_near_domain to async_schedule_node_domain Renamed async_schedule_near to async_schedule_node Added kerneldoc for new async_schedule_XXX functions Updated patch description for patch 4 to include data on potential gains v3->v4 Added patch to consolidate use of need_parent_lock Make asynchronous driver probing explicit about use of drvdata v4->v5 Added patch to move async_synchronize_full to address deadlock Added bit async_probe to act as mutex for probe/remove calls Added back nvdimm patch as code it relies on is now in Linus's tree Incorporated review comments on parent & device locking consolidation Rebased on latest linux-next v5->v6: Drop the "This patch" or "This change" from start of patch descriptions. Drop unnecessary parenthesis in first patch Use same wording for "selecting a CPU" in comments added in first patch Added kernel documentation for async_probe member of device Fixed up comments for async_schedule calls in patch 2 Moved code related setting async driver out of device.h and into dd.c Added Reviewed-by for several patches v6->v7: Fixed typo which had kernel doc refer to "lock" when I meant "unlock" Dropped "bool X:1" to "u8 X:1" from patch description Added async_driver to device_private structure to store driver Dropped unecessary code shuffle from async_probe patch Reordered patches to move fixes up to front Added Reviewed-by for several patches Updated cover page and patch descriptions throughout the set v7->v8: Replaced async_probe value with dead, only apply dead in device_del Dropped Reviewed-by from patch 2 due to significant changes Added Reviewed-by for patches reviewed by Luis Chamberlain v8->v9: Dropped patch 1 as it was applied, shifted remaining patches by 1 Added new patch 9 that adds test framework for NUMA and sequential init Tweaked what is now patch 1, and added Reviewed-by from Dan Williams v9->v10: Moved "dead" from device struct to device_private struct Added Reviewed-by from Rafael to patch 1 Rebased on latest linux-next --- Alexander Duyck (9): driver core: Establish order of operations for device_add and device_del via bitflag device core: Consolidate locking and unlocking of parent and device driver core: Probe devices asynchronously instead of the driver workqueue: Provide queue_work_node to queue work near a given NUMA node async: Add support for queueing on specific NUMA node driver core: Attach devices on CPU local to device node PM core: Use new async_schedule_dev command libnvdimm: Schedule device registration on node local to the device driver core: Rewrite test_async_driver_probe to cover serialization and NUMA affinity drivers/base/base.h | 8 + drivers/base/bus.c | 46 +---- drivers/base/core.c | 11 + drivers/base/dd.c | 160 +++++++++++++---- drivers/base/power/main.c | 12 + drivers/base/test/test_async_driver_probe.c | 261 +++++++++++++++++++++------ drivers/nvdimm/bus.c | 11 + include/linux/async.h | 82 ++++++++ include/linux/workqueue.h | 2 kernel/async.c | 53 +++-- kernel/workqueue.c | 84 +++++++++ 11 files changed, 564 insertions(+), 166 deletions(-) --