From patchwork Thu Nov 8 18:06:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674803 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 953D014E2 for ; Thu, 8 Nov 2018 18:06:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86A6F2DE5F for ; Thu, 8 Nov 2018 18:06:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 791182DE79; Thu, 8 Nov 2018 18:06:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CFB4D2DE5F for ; Thu, 8 Nov 2018 18:06:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727349AbeKIDn0 (ORCPT ); Thu, 8 Nov 2018 22:43:26 -0500 Received: from mga11.intel.com ([192.55.52.93]:3631 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726684AbeKIDn0 (ORCPT ); Thu, 8 Nov 2018 22:43:26 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:06:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="87734982" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga007.jf.intel.com with ESMTP; 08 Nov 2018 10:06:45 -0800 Subject: [driver-core PATCH v6 1/9] workqueue: Provide queue_work_node to queue work near a given NUMA node From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:06:45 -0800 Message-ID: <154170040562.12967.17831655390715808287.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Provide a new function, queue_work_node, which is meant to schedule work on a "random" CPU of the requested NUMA node. The main motivation for this is to help assist asynchronous init to better improve boot times for devices that are local to a specific node. For now we just default to the first CPU that is in the intersection of the cpumask of the node and the online cpumask. The only exception is if the CPU is local to the node we will just use the current CPU. This should work for our purposes as we are currently only using this for unbound work so the CPU will be translated to a node anyway instead of being directly used. As we are only using the first CPU to represent the NUMA node for now I am limiting the scope of the function so that it can only be used with unbound workqueues. Acked-by: Tejun Heo Reviewed-by: Bart Van Assche Signed-off-by: Alexander Duyck Acked-by: Dan Williams --- include/linux/workqueue.h | 2 + kernel/workqueue.c | 84 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 60d673e15632..1f50c1e586e7 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -463,6 +463,8 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpumask); extern bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work); +extern bool queue_work_node(int node, struct workqueue_struct *wq, + struct work_struct *work); extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *work, unsigned long delay); extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 0280deac392e..5df1a0ef6f90 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1492,6 +1492,90 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq, } EXPORT_SYMBOL(queue_work_on); +/** + * workqueue_select_cpu_near - Select a CPU based on NUMA node + * @node: NUMA node ID that we want to select a CPU from + * + * This function will attempt to find a "random" cpu available on a given + * node. If there are no CPUs available on the given node it will return + * WORK_CPU_UNBOUND indicating that we should just schedule to any + * available CPU if we need to schedule this work. + */ +static int workqueue_select_cpu_near(int node) +{ + int cpu; + + /* No point in doing this if NUMA isn't enabled for workqueues */ + if (!wq_numa_enabled) + return WORK_CPU_UNBOUND; + + /* Delay binding to CPU if node is not valid or online */ + if (node < 0 || node >= MAX_NUMNODES || !node_online(node)) + return WORK_CPU_UNBOUND; + + /* Use local node/cpu if we are already there */ + cpu = raw_smp_processor_id(); + if (node == cpu_to_node(cpu)) + return cpu; + + /* Use "random" otherwise know as "first" online CPU of node */ + cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); + + /* If CPU is valid return that, otherwise just defer */ + return cpu < nr_cpu_ids ? cpu : WORK_CPU_UNBOUND; +} + +/** + * queue_work_node - queue work on a "random" cpu for a given NUMA node + * @node: NUMA node that we are targeting the work for + * @wq: workqueue to use + * @work: work to queue + * + * We queue the work to a "random" CPU within a given NUMA node. The basic + * idea here is to provide a way to somehow associate work with a given + * NUMA node. + * + * This function will only make a best effort attempt at getting this onto + * the right NUMA node. If no node is requested or the requested node is + * offline then we just fall back to standard queue_work behavior. + * + * Currently the "random" CPU ends up being the first available CPU in the + * intersection of cpu_online_mask and the cpumask of the node, unless we + * are running on the node. In that case we just use the current CPU. + * + * Return: %false if @work was already on a queue, %true otherwise. + */ +bool queue_work_node(int node, struct workqueue_struct *wq, + struct work_struct *work) +{ + unsigned long flags; + bool ret = false; + + /* + * This current implementation is specific to unbound workqueues. + * Specifically we only return the first available CPU for a given + * node instead of cycling through individual CPUs within the node. + * + * If this is used with a per-cpu workqueue then the logic in + * workqueue_select_cpu_near would need to be updated to allow for + * some round robin type logic. + */ + WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)); + + local_irq_save(flags); + + if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) { + int cpu = workqueue_select_cpu_near(node); + + __queue_work(cpu, wq, work); + ret = true; + } + + local_irq_restore(flags); + return ret; +} +EXPORT_SYMBOL_GPL(queue_work_node); + void delayed_work_timer_fn(struct timer_list *t) { struct delayed_work *dwork = from_timer(dwork, t, timer); From patchwork Thu Nov 8 18:06:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674811 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 12B3014E2 for ; Thu, 8 Nov 2018 18:07:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 034FC2DE5F for ; Thu, 8 Nov 2018 18:07:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EB64F2DE67; Thu, 8 Nov 2018 18:06:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 13E1E2DE5F for ; Thu, 8 Nov 2018 18:06:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727071AbeKIDnb (ORCPT ); Thu, 8 Nov 2018 22:43:31 -0500 Received: from mga09.intel.com ([134.134.136.24]:45012 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726922AbeKIDnb (ORCPT ); Thu, 8 Nov 2018 22:43:31 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:06:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="90542656" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga008.jf.intel.com with ESMTP; 08 Nov 2018 10:06:50 -0800 Subject: [driver-core PATCH v6 2/9] async: Add support for queueing on specific NUMA node From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:06:50 -0800 Message-ID: <154170041079.12967.13132220574997822111.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce four new variants of the async_schedule_ functions that allow scheduling on a specific NUMA node. The first two functions are async_schedule_near and async_schedule_near_domain end up mapping to async_schedule and async_schedule_domain, but provide NUMA node specific functionality. They replace the original functions which were moved to inline function definitions that call the new functions while passing NUMA_NO_NODE. The second two functions are async_schedule_dev and async_schedule_dev_domain which provide NUMA specific functionality when passing a device as the data member and that device has a NUMA node other than NUMA_NO_NODE. The main motivation behind this is to address the need to be able to schedule device specific init work on specific NUMA nodes in order to improve performance of memory initialization. Signed-off-by: Alexander Duyck Reviewed-by: Bart Van Assche Reviewed-by: Dan Williams --- include/linux/async.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++-- kernel/async.c | 53 +++++++++++++++++--------------- 2 files changed, 108 insertions(+), 27 deletions(-) diff --git a/include/linux/async.h b/include/linux/async.h index 6b0226bdaadc..f81d6dbffe68 100644 --- a/include/linux/async.h +++ b/include/linux/async.h @@ -14,6 +14,8 @@ #include #include +#include +#include typedef u64 async_cookie_t; typedef void (*async_func_t) (void *data, async_cookie_t cookie); @@ -37,9 +39,83 @@ struct async_domain { struct async_domain _name = { .pending = LIST_HEAD_INIT(_name.pending), \ .registered = 0 } -extern async_cookie_t async_schedule(async_func_t func, void *data); -extern async_cookie_t async_schedule_domain(async_func_t func, void *data, - struct async_domain *domain); +async_cookie_t async_schedule_node(async_func_t func, void *data, + int node); +async_cookie_t async_schedule_node_domain(async_func_t func, void *data, + int node, + struct async_domain *domain); + +/** + * async_schedule - schedule a function for asynchronous execution + * @func: function to execute asynchronously + * @data: data pointer to pass to the function + * + * Returns an async_cookie_t that may be used for checkpointing later. + * Note: This function may be called from atomic or non-atomic contexts. + */ +static inline async_cookie_t async_schedule(async_func_t func, void *data) +{ + return async_schedule_node(func, data, NUMA_NO_NODE); +} + +/** + * async_schedule_domain - schedule a function for asynchronous execution within a certain domain + * @func: function to execute asynchronously + * @data: data pointer to pass to the function + * @domain: the domain + * + * Returns an async_cookie_t that may be used for checkpointing later. + * @domain may be used in the async_synchronize_*_domain() functions to + * wait within a certain synchronization domain rather than globally. + * Note: This function may be called from atomic or non-atomic contexts. + */ +static inline async_cookie_t +async_schedule_domain(async_func_t func, void *data, + struct async_domain *domain) +{ + return async_schedule_node_domain(func, data, NUMA_NO_NODE, domain); +} + +/** + * async_schedule_dev - A device specific version of async_schedule + * @func: function to execute asynchronously + * @dev: device argument to be passed to function + * + * Returns an async_cookie_t that may be used for checkpointing later. + * @dev is used as both the argument for the function and to provide NUMA + * context for where to run the function. By doing this we can try to + * provide for the best possible outcome by operating on the device on the + * CPUs closest to the device. + * Note: This function may be called from atomic or non-atomic contexts. + */ +static inline async_cookie_t +async_schedule_dev(async_func_t func, struct device *dev) +{ + return async_schedule_node(func, dev, dev_to_node(dev)); +} + +/** + * async_schedule_dev_domain - A device specific version of async_schedule_domain + * @func: function to execute asynchronously + * @dev: device argument to be passed to function + * @domain: the domain + * + * Returns an async_cookie_t that may be used for checkpointing later. + * @dev is used as both the argument for the function and to provide NUMA + * context for where to run the function. By doing this we can try to + * provide for the best possible outcome by operating on the device on the + * CPUs closest to the device. + * @domain may be used in the async_synchronize_*_domain() functions to + * wait within a certain synchronization domain rather than globally. + * Note: This function may be called from atomic or non-atomic contexts. + */ +static inline async_cookie_t +async_schedule_dev_domain(async_func_t func, struct device *dev, + struct async_domain *domain) +{ + return async_schedule_node_domain(func, dev, dev_to_node(dev), domain); +} + void async_unregister_domain(struct async_domain *domain); extern void async_synchronize_full(void); extern void async_synchronize_full_domain(struct async_domain *domain); diff --git a/kernel/async.c b/kernel/async.c index a893d6170944..f6bd0d9885e1 100644 --- a/kernel/async.c +++ b/kernel/async.c @@ -149,7 +149,25 @@ static void async_run_entry_fn(struct work_struct *work) wake_up(&async_done); } -static async_cookie_t __async_schedule(async_func_t func, void *data, struct async_domain *domain) +/** + * async_schedule_node_domain - NUMA specific version of async_schedule_domain + * @func: function to execute asynchronously + * @data: data pointer to pass to the function + * @node: NUMA node that we want to schedule this on or close to + * @domain: the domain + * + * Returns an async_cookie_t that may be used for checkpointing later. + * @domain may be used in the async_synchronize_*_domain() functions to + * wait within a certain synchronization domain rather than globally. + * + * Note: This function may be called from atomic or non-atomic contexts. + * + * The node requested will be honored on a best effort basis. If the node + * has no CPUs associated with it then the work is distributed among all + * available CPUs. + */ +async_cookie_t async_schedule_node_domain(async_func_t func, void *data, + int node, struct async_domain *domain) { struct async_entry *entry; unsigned long flags; @@ -195,43 +213,30 @@ static async_cookie_t __async_schedule(async_func_t func, void *data, struct asy current->flags |= PF_USED_ASYNC; /* schedule for execution */ - queue_work(system_unbound_wq, &entry->work); + queue_work_node(node, system_unbound_wq, &entry->work); return newcookie; } +EXPORT_SYMBOL_GPL(async_schedule_node_domain); /** - * async_schedule - schedule a function for asynchronous execution + * async_schedule_node - NUMA specific version of async_schedule * @func: function to execute asynchronously * @data: data pointer to pass to the function + * @node: NUMA node that we want to schedule this on or close to * * Returns an async_cookie_t that may be used for checkpointing later. * Note: This function may be called from atomic or non-atomic contexts. - */ -async_cookie_t async_schedule(async_func_t func, void *data) -{ - return __async_schedule(func, data, &async_dfl_domain); -} -EXPORT_SYMBOL_GPL(async_schedule); - -/** - * async_schedule_domain - schedule a function for asynchronous execution within a certain domain - * @func: function to execute asynchronously - * @data: data pointer to pass to the function - * @domain: the domain * - * Returns an async_cookie_t that may be used for checkpointing later. - * @domain may be used in the async_synchronize_*_domain() functions to - * wait within a certain synchronization domain rather than globally. A - * synchronization domain is specified via @domain. Note: This function - * may be called from atomic or non-atomic contexts. + * The node requested will be honored on a best effort basis. If the node + * has no CPUs associated with it then the work is distributed among all + * available CPUs. */ -async_cookie_t async_schedule_domain(async_func_t func, void *data, - struct async_domain *domain) +async_cookie_t async_schedule_node(async_func_t func, void *data, int node) { - return __async_schedule(func, data, domain); + return async_schedule_node_domain(func, data, node, &async_dfl_domain); } -EXPORT_SYMBOL_GPL(async_schedule_domain); +EXPORT_SYMBOL_GPL(async_schedule_node); /** * async_synchronize_full - synchronize all asynchronous function calls From patchwork Thu Nov 8 18:06:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674835 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24D1C14E2 for ; Thu, 8 Nov 2018 18:07:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 138982DE5F for ; Thu, 8 Nov 2018 18:07:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 07F1C2DE79; Thu, 8 Nov 2018 18:07:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6AC2E2DE5F for ; Thu, 8 Nov 2018 18:07:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726895AbeKIDnt (ORCPT ); Thu, 8 Nov 2018 22:43:49 -0500 Received: from mga03.intel.com ([134.134.136.65]:26896 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbeKIDnt (ORCPT ); Thu, 8 Nov 2018 22:43:49 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:06:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="248120303" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga004.jf.intel.com with ESMTP; 08 Nov 2018 10:06:55 -0800 Subject: [driver-core PATCH v6 3/9] device core: Consolidate locking and unlocking of parent and device From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:06:55 -0800 Message-ID: <154170041590.12967.4367086895437513524.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Try to consolidate all of the locking and unlocking of both the parent and device when attaching or removing a driver from a given device. To do that I first consolidated the lock pattern into two functions __device_driver_lock and __device_driver_unlock. After doing that I then created functions specific to attaching and detaching the driver while acquiring these locks. By doing this I was able to reduce the number of spots where we touch need_parent_lock from 12 down to 4. Reviewed-by: Bart Van Assche Reviewed-by: Rafael J. Wysocki Signed-off-by: Alexander Duyck Reviewed-by: Dan Williams --- drivers/base/base.h | 2 + drivers/base/bus.c | 23 ++----------- drivers/base/dd.c | 91 ++++++++++++++++++++++++++++++++++++++++----------- 3 files changed, 77 insertions(+), 39 deletions(-) diff --git a/drivers/base/base.h b/drivers/base/base.h index 7a419a7a6235..3f22ebd6117a 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -124,6 +124,8 @@ extern int driver_add_groups(struct device_driver *drv, const struct attribute_group **groups); extern void driver_remove_groups(struct device_driver *drv, const struct attribute_group **groups); +int device_driver_attach(struct device_driver *drv, struct device *dev); +void device_driver_detach(struct device *dev); extern char *make_class_name(const char *name, struct kobject *kobj); diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 8bfd27ec73d6..8a630f9bd880 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -184,11 +184,7 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf, dev = bus_find_device_by_name(bus, NULL, buf); if (dev && dev->driver == drv) { - if (dev->parent && dev->bus->need_parent_lock) - device_lock(dev->parent); - device_release_driver(dev); - if (dev->parent && dev->bus->need_parent_lock) - device_unlock(dev->parent); + device_driver_detach(dev); err = count; } put_device(dev); @@ -211,13 +207,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf, dev = bus_find_device_by_name(bus, NULL, buf); if (dev && dev->driver == NULL && driver_match_device(drv, dev)) { - if (dev->parent && bus->need_parent_lock) - device_lock(dev->parent); - device_lock(dev); - err = driver_probe_device(drv, dev); - device_unlock(dev); - if (dev->parent && bus->need_parent_lock) - device_unlock(dev->parent); + err = device_driver_attach(drv, dev); if (err > 0) { /* success */ @@ -769,13 +759,8 @@ EXPORT_SYMBOL_GPL(bus_rescan_devices); */ int device_reprobe(struct device *dev) { - if (dev->driver) { - if (dev->parent && dev->bus->need_parent_lock) - device_lock(dev->parent); - device_release_driver(dev); - if (dev->parent && dev->bus->need_parent_lock) - device_unlock(dev->parent); - } + if (dev->driver) + device_driver_detach(dev); return bus_rescan_devices_helper(dev, NULL); } EXPORT_SYMBOL_GPL(device_reprobe); diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 169412ee4ae8..76c40fe69463 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -864,6 +864,60 @@ void device_initial_probe(struct device *dev) __device_attach(dev, true); } +/* + * __device_driver_lock - acquire locks needed to manipulate dev->drv + * @dev: Device we will update driver info for + * @parent: Parent device. Needed if the bus requires parent lock + * + * This function will take the required locks for manipulating dev->drv. + * Normally this will just be the @dev lock, but when called for a USB + * interface, @parent lock will be held as well. + */ +static void __device_driver_lock(struct device *dev, struct device *parent) +{ + if (parent && dev->bus->need_parent_lock) + device_lock(parent); + device_lock(dev); +} + +/* + * __device_driver_lock - release locks needed to manipulate dev->drv + * @dev: Device we will update driver info for + * @parent: Parent device. Needed if the bus requires parent lock + * + * This function will release the required locks for manipulating dev->drv. + * Normally this will just be the the @dev lock, but when called for a + * USB interface, @parent lock will be released as well. + */ +static void __device_driver_unlock(struct device *dev, struct device *parent) +{ + device_unlock(dev); + if (parent && dev->bus->need_parent_lock) + device_unlock(parent); +} + +/** + * device_driver_attach - attach a specific driver to a specific device + * @drv: Driver to attach + * @dev: Device to attach it to + * + * Manually attach driver to a device. Will acquire both @dev lock and + * @dev->parent lock if needed. + */ +int device_driver_attach(struct device_driver *drv, struct device *dev) +{ + int ret = 0; + + __device_driver_lock(dev, dev->parent); + + if (!dev->driver) + ret = driver_probe_device(drv, dev); + + __device_driver_unlock(dev, dev->parent); + + return ret; +} + static int __driver_attach(struct device *dev, void *data) { struct device_driver *drv = data; @@ -891,14 +945,7 @@ static int __driver_attach(struct device *dev, void *data) return ret; } /* ret > 0 means positive match */ - if (dev->parent && dev->bus->need_parent_lock) - device_lock(dev->parent); - device_lock(dev); - if (!dev->driver) - driver_probe_device(drv, dev); - device_unlock(dev); - if (dev->parent && dev->bus->need_parent_lock) - device_unlock(dev->parent); + device_driver_attach(drv, dev); return 0; } @@ -932,15 +979,11 @@ static void __device_release_driver(struct device *dev, struct device *parent) async_synchronize_full(); while (device_links_busy(dev)) { - device_unlock(dev); - if (parent) - device_unlock(parent); + __device_driver_unlock(dev, parent); device_links_unbind_consumers(dev); - if (parent) - device_lock(parent); - device_lock(dev); + __device_driver_lock(dev, parent); /* * A concurrent invocation of the same function might * have released the driver successfully while this one @@ -993,16 +1036,12 @@ void device_release_driver_internal(struct device *dev, struct device_driver *drv, struct device *parent) { - if (parent && dev->bus->need_parent_lock) - device_lock(parent); + __device_driver_lock(dev, parent); - device_lock(dev); if (!drv || drv == dev->driver) __device_release_driver(dev, parent); - device_unlock(dev); - if (parent && dev->bus->need_parent_lock) - device_unlock(parent); + __device_driver_unlock(dev, parent); } /** @@ -1027,6 +1066,18 @@ void device_release_driver(struct device *dev) } EXPORT_SYMBOL_GPL(device_release_driver); +/** + * device_driver_detach - detach driver from a specific device + * @dev: device to detach driver from + * + * Detach driver from device. Will acquire both @dev lock and @dev->parent + * lock if needed. + */ +void device_driver_detach(struct device *dev) +{ + device_release_driver_internal(dev, NULL, dev->parent); +} + /** * driver_detach - detach driver from all devices it controls. * @drv: driver. From patchwork Thu Nov 8 18:07:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674813 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D327A14E2 for ; Thu, 8 Nov 2018 18:07:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C5C4D2DE5F for ; Thu, 8 Nov 2018 18:07:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B98F02DE67; Thu, 8 Nov 2018 18:07:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5ABCD2DE5F for ; Thu, 8 Nov 2018 18:07:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726922AbeKIDnl (ORCPT ); Thu, 8 Nov 2018 22:43:41 -0500 Received: from mga07.intel.com ([134.134.136.100]:2500 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726684AbeKIDnl (ORCPT ); Thu, 8 Nov 2018 22:43:41 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:07:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="106592055" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga001.jf.intel.com with ESMTP; 08 Nov 2018 10:07:01 -0800 Subject: [driver-core PATCH v6 4/9] driver core: Move async_synchronize_full call From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:07:01 -0800 Message-ID: <154170042103.12967.5841784115552956171.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Move the async_synchronize_full call out of __device_release_driver and into driver_detach. The idea behind this is that the async_synchronize_full call will only guarantee that any existing async operations are flushed. This doesn't do anything to guarantee that a hotplug event that may occur while we are doing the release of the driver will not be asynchronously scheduled. By moving this into the driver_detach path we can avoid potential deadlocks as we aren't holding the device lock at this point and we should not have the driver we want to flush loaded so the flush will take care of any asynchronous events the driver we are detaching might have scheduled. Reviewed-by: Bart Van Assche Signed-off-by: Alexander Duyck --- drivers/base/dd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 76c40fe69463..e74cefeb5b69 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -975,9 +975,6 @@ static void __device_release_driver(struct device *dev, struct device *parent) drv = dev->driver; if (drv) { - if (driver_allows_async_probing(drv)) - async_synchronize_full(); - while (device_links_busy(dev)) { __device_driver_unlock(dev, parent); @@ -1087,6 +1084,9 @@ void driver_detach(struct device_driver *drv) struct device_private *dev_prv; struct device *dev; + if (driver_allows_async_probing(drv)) + async_synchronize_full(); + for (;;) { spin_lock(&drv->p->klist_devices.k_lock); if (list_empty(&drv->p->klist_devices.k_list)) { From patchwork Thu Nov 8 18:07:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674837 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 96E7E14E2 for ; Thu, 8 Nov 2018 18:07:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89B472DE5F for ; Thu, 8 Nov 2018 18:07:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E0C02DE79; Thu, 8 Nov 2018 18:07:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 023502DE5F for ; Thu, 8 Nov 2018 18:07:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727527AbeKIDnq (ORCPT ); Thu, 8 Nov 2018 22:43:46 -0500 Received: from mga07.intel.com ([134.134.136.100]:2500 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbeKIDnq (ORCPT ); Thu, 8 Nov 2018 22:43:46 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:07:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="106592079" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga001.jf.intel.com with ESMTP; 08 Nov 2018 10:07:06 -0800 Subject: [driver-core PATCH v6 5/9] driver core: Establish clear order of operations for deferred probe and remove From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:07:06 -0800 Message-ID: <154170042611.12967.14256799141542560553.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add an additional bit flag to the device struct named async_probe. This additional flag allows us to guarantee ordering between probe and remove operations. This allows us to guarantee that if we execute a remove operation or a driver load attempt on a given interface it will not attempt to update the driver member asynchronously following the earlier operation. Previously this guarantee was not present and could result in us attempting to remove a driver from an interface only to have it show up later when it is asynchronously loaded. One change I made in addition is I replaced the use of "bool X:1" to define the bitfield to a "u8 X:1" setup in order to resolve some checkpatch warnings. Signed-off-by: Alexander Duyck Reviewed-by: Bart Van Assche --- drivers/base/dd.c | 104 +++++++++++++++++++++++++++--------------------- include/linux/device.h | 3 + 2 files changed, 62 insertions(+), 45 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index e74cefeb5b69..ed19cf0d6f9a 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -472,6 +472,8 @@ static int really_probe(struct device *dev, struct device_driver *drv) drv->bus->name, __func__, drv->name, dev_name(dev)); WARN_ON(!list_empty(&dev->devres_head)); + /* clear async_probe flag as we are no longer deferring driver load */ + dev->async_probe = false; re_probe: dev->driver = drv; @@ -771,6 +773,10 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie) device_lock(dev); + /* nothing to do if async_probe has been cleared */ + if (!dev->async_probe) + goto out_unlock; + if (dev->parent) pm_runtime_get_sync(dev->parent); @@ -781,7 +787,7 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie) if (dev->parent) pm_runtime_put(dev->parent); - +out_unlock: device_unlock(dev); put_device(dev); @@ -826,6 +832,7 @@ static int __device_attach(struct device *dev, bool allow_async) */ dev_dbg(dev, "scheduling asynchronous probe\n"); get_device(dev); + dev->async_probe = true; async_schedule(__device_attach_async_helper, dev); } else { pm_request_idle(dev); @@ -971,62 +978,69 @@ EXPORT_SYMBOL_GPL(driver_attach); */ static void __device_release_driver(struct device *dev, struct device *parent) { - struct device_driver *drv; + struct device_driver *drv = dev->driver; - drv = dev->driver; - if (drv) { - while (device_links_busy(dev)) { - __device_driver_unlock(dev, parent); + /* + * In the event that we are asked to release the driver on an + * interface that is still waiting on a probe we can just terminate + * the probe by setting async_probe to false. When the async call + * is finally completed it will see this state and just exit. + */ + dev->async_probe = false; + if (!drv) + return; - device_links_unbind_consumers(dev); + while (device_links_busy(dev)) { + __device_driver_unlock(dev, parent); - __device_driver_lock(dev, parent); - /* - * A concurrent invocation of the same function might - * have released the driver successfully while this one - * was waiting, so check for that. - */ - if (dev->driver != drv) - return; - } + device_links_unbind_consumers(dev); - pm_runtime_get_sync(dev); - pm_runtime_clean_up_links(dev); + __device_driver_lock(dev, parent); + /* + * A concurrent invocation of the same function might + * have released the driver successfully while this one + * was waiting, so check for that. + */ + if (dev->driver != drv) + return; + } - driver_sysfs_remove(dev); + pm_runtime_get_sync(dev); + pm_runtime_clean_up_links(dev); - if (dev->bus) - blocking_notifier_call_chain(&dev->bus->p->bus_notifier, - BUS_NOTIFY_UNBIND_DRIVER, - dev); + driver_sysfs_remove(dev); - pm_runtime_put_sync(dev); + if (dev->bus) + blocking_notifier_call_chain(&dev->bus->p->bus_notifier, + BUS_NOTIFY_UNBIND_DRIVER, + dev); - if (dev->bus && dev->bus->remove) - dev->bus->remove(dev); - else if (drv->remove) - drv->remove(dev); + pm_runtime_put_sync(dev); - device_links_driver_cleanup(dev); - arch_teardown_dma_ops(dev); + if (dev->bus && dev->bus->remove) + dev->bus->remove(dev); + else if (drv->remove) + drv->remove(dev); - devres_release_all(dev); - dev->driver = NULL; - dev_set_drvdata(dev, NULL); - if (dev->pm_domain && dev->pm_domain->dismiss) - dev->pm_domain->dismiss(dev); - pm_runtime_reinit(dev); - dev_pm_set_driver_flags(dev, 0); + device_links_driver_cleanup(dev); + arch_teardown_dma_ops(dev); + + devres_release_all(dev); + dev->driver = NULL; + dev_set_drvdata(dev, NULL); + if (dev->pm_domain && dev->pm_domain->dismiss) + dev->pm_domain->dismiss(dev); + pm_runtime_reinit(dev); + dev_pm_set_driver_flags(dev, 0); - klist_remove(&dev->p->knode_driver); - device_pm_check_callbacks(dev); - if (dev->bus) - blocking_notifier_call_chain(&dev->bus->p->bus_notifier, - BUS_NOTIFY_UNBOUND_DRIVER, - dev); + klist_remove(&dev->p->knode_driver); + device_pm_check_callbacks(dev); + if (dev->bus) + blocking_notifier_call_chain(&dev->bus->p->bus_notifier, + BUS_NOTIFY_UNBOUND_DRIVER, + dev); - kobject_uevent(&dev->kobj, KOBJ_UNBIND); - } + kobject_uevent(&dev->kobj, KOBJ_UNBIND); } void device_release_driver_internal(struct device *dev, diff --git a/include/linux/device.h b/include/linux/device.h index 1b25c7a43f4c..4d2eb2c74149 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -957,6 +957,8 @@ struct dev_links_info { * device. * @dma_coherent: this particular device is dma coherent, even if the * architecture supports non-coherent devices. + * @async_probe: This device has an asynchronous probe event pending. Should + * only be updated while holding device lock. * * At the lowest level, every device in a Linux system is represented by an * instance of struct device. The device structure contains the information @@ -1051,6 +1053,7 @@ struct device { defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) bool dma_coherent:1; #endif + bool async_probe:1; }; static inline struct device *kobj_to_dev(struct kobject *kobj) From patchwork Thu Nov 8 18:07:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674833 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B518A14BD for ; Thu, 8 Nov 2018 18:07:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A70072DE5F for ; Thu, 8 Nov 2018 18:07:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9ABFB2DE79; Thu, 8 Nov 2018 18:07:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 127482DE5F for ; Thu, 8 Nov 2018 18:07:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727602AbeKIDnw (ORCPT ); Thu, 8 Nov 2018 22:43:52 -0500 Received: from mga18.intel.com ([134.134.136.126]:12587 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbeKIDnv (ORCPT ); Thu, 8 Nov 2018 22:43:51 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:07:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="106592093" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga001.jf.intel.com with ESMTP; 08 Nov 2018 10:07:11 -0800 Subject: [driver-core PATCH v6 6/9] driver core: Probe devices asynchronously instead of the driver From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:07:11 -0800 Message-ID: <154170043123.12967.3591757325647337726.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Probe devices asynchronously instead of the driver. This results in us seeing the same behavior if the device is registered before the driver or after. This way we can avoid serializing the initialization should the driver not be loaded until after the devices have already been added. The motivation behind this is that if we have a set of devices that take a significant amount of time to load we can greatly reduce the time to load by processing them in parallel instead of one at a time. In addition, each device can exist on a different node so placing a single thread on one CPU to initialize all of the devices for a given driver can result in poor performance on a system with multiple nodes. I am using the driver_data member of the device struct to store the driver pointer while we wait on the deferred probe call. This should be safe to do as the value will either be set to NULL on a failed probe or driver load followed by unload, or the driver value itself will be set on a successful driver load. In addition I have used the async_probe flag to add additional protection as it will be cleared if someone overwrites the driver_data member as a part of loading the driver. Signed-off-by: Alexander Duyck Reviewed-by: Bart Van Assche --- drivers/base/bus.c | 23 ++-------------- drivers/base/dd.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/device.h | 10 ++++++- 3 files changed, 80 insertions(+), 21 deletions(-) diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 8a630f9bd880..0cd2eadd0816 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -606,17 +606,6 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf, } static DRIVER_ATTR_WO(uevent); -static void driver_attach_async(void *_drv, async_cookie_t cookie) -{ - struct device_driver *drv = _drv; - int ret; - - ret = driver_attach(drv); - - pr_debug("bus: '%s': driver %s async attach completed: %d\n", - drv->bus->name, drv->name, ret); -} - /** * bus_add_driver - Add a driver to the bus. * @drv: driver. @@ -649,15 +638,9 @@ int bus_add_driver(struct device_driver *drv) klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers); if (drv->bus->p->drivers_autoprobe) { - if (driver_allows_async_probing(drv)) { - pr_debug("bus: '%s': probing driver %s asynchronously\n", - drv->bus->name, drv->name); - async_schedule(driver_attach_async, drv); - } else { - error = driver_attach(drv); - if (error) - goto out_unregister; - } + error = driver_attach(drv); + if (error) + goto out_unregister; } module_add_driver(drv->owner, drv); diff --git a/drivers/base/dd.c b/drivers/base/dd.c index ed19cf0d6f9a..f4e84d639c69 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -808,6 +808,7 @@ static int __device_attach(struct device *dev, bool allow_async) ret = 1; else { dev->driver = NULL; + dev_set_drvdata(dev, NULL); ret = 0; } } else { @@ -925,6 +926,48 @@ int device_driver_attach(struct device_driver *drv, struct device *dev) return ret; } +static inline struct device_driver *dev_get_drv_async(const struct device *dev) +{ + return dev->async_probe ? dev->driver_data : NULL; +} + +static inline void dev_set_drv_async(struct device *dev, + struct device_driver *drv) +{ + /* + * Set async_probe to true indicating we are waiting for this data to be + * loaded as a potential driver. + */ + dev->driver_data = drv; + dev->async_probe = true; +} + +static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie) +{ + struct device *dev = _dev; + struct device_driver *drv; + + __device_driver_lock(dev, dev->parent); + + /* + * If someone attempted to bind a driver either successfully or + * unsuccessfully before we got here we should just skip the driver + * probe call. + */ + drv = dev_get_drv_async(dev); + if (drv && !dev->driver) + driver_probe_device(drv, dev); + + /* We made our attempt at an async_probe, clear the flag */ + dev->async_probe = false; + + __device_driver_unlock(dev, dev->parent); + + put_device(dev); + + dev_dbg(dev, "async probe completed\n"); +} + static int __driver_attach(struct device *dev, void *data) { struct device_driver *drv = data; @@ -952,6 +995,25 @@ static int __driver_attach(struct device *dev, void *data) return ret; } /* ret > 0 means positive match */ + if (driver_allows_async_probing(drv)) { + /* + * Instead of probing the device synchronously we will + * probe it asynchronously to allow for more parallelism. + * + * We only take the device lock here in order to guarantee + * that the dev->driver and driver_data fields are protected + */ + dev_dbg(dev, "scheduling asynchronous probe\n"); + device_lock(dev); + if (!dev->driver) { + get_device(dev); + dev_set_drv_async(dev, drv); + async_schedule(__driver_attach_async_helper, dev); + } + device_unlock(dev); + return 0; + } + device_driver_attach(drv, dev); return 0; @@ -1049,6 +1111,12 @@ void device_release_driver_internal(struct device *dev, { __device_driver_lock(dev, parent); + /* + * We shouldn't need to add a check for any pending async_probe here + * because the only caller that will pass us a driver, driver_detach, + * should have been called after the driver was removed from the bus + * and will call async_synchronize_full before we get to this point. + */ if (!drv || drv == dev->driver) __device_release_driver(dev, parent); diff --git a/include/linux/device.h b/include/linux/device.h index 4d2eb2c74149..2305eb886006 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -910,7 +910,9 @@ struct dev_links_info { * variants, which GPIO pins act in what additional roles, and so * on. This shrinks the "Board Support Packages" (BSPs) and * minimizes board-specific #ifdefs in drivers. - * @driver_data: Private pointer for driver specific info. + * @driver_data: Private pointer for driver specific info if driver is + * non-NULL. Pointer to deferred driver to be attached if driver + * is NULL. * @links: Links to suppliers and consumers of this device. * @power: For device power management. * See Documentation/driver-api/pm/devices.rst for details. @@ -1118,6 +1120,12 @@ static inline void *dev_get_drvdata(const struct device *dev) static inline void dev_set_drvdata(struct device *dev, void *data) { + /* + * clear async_probe to prevent us from attempting to read driver_data + * as a driver. We can reset this to true for the one case where we are + * using this to record an actual driver. + */ + dev->async_probe = false; dev->driver_data = data; } From patchwork Thu Nov 8 18:07:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7457114E2 for ; Thu, 8 Nov 2018 18:07:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 678242DE81 for ; Thu, 8 Nov 2018 18:07:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5B9D02DE99; Thu, 8 Nov 2018 18:07:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08A7D2DE81 for ; Thu, 8 Nov 2018 18:07:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727681AbeKIDn5 (ORCPT ); Thu, 8 Nov 2018 22:43:57 -0500 Received: from mga02.intel.com ([134.134.136.20]:21314 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbeKIDn5 (ORCPT ); Thu, 8 Nov 2018 22:43:57 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:07:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="272479072" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga005.jf.intel.com with ESMTP; 08 Nov 2018 10:07:16 -0800 Subject: [driver-core PATCH v6 7/9] driver core: Attach devices on CPU local to device node From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:07:16 -0800 Message-ID: <154170043632.12967.5339750954964165831.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Call the asynchronous probe routines on a CPU local to the device node. By doing this we should be able to improve our initialization time significantly as we can avoid having to access the device from a remote node which may introduce higher latency. For example, in the case of initializing memory for NVDIMM this can have a significant impact as initialing 3TB on remote node can take up to 39 seconds while initialing it on a local node only takes 23 seconds. It is situations like this where we will see the biggest improvement. Reviewed-by: Bart Van Assche Signed-off-by: Alexander Duyck Reviewed-by: Dan Williams --- drivers/base/dd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index f4e84d639c69..1660eeb1fc9d 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -834,7 +834,7 @@ static int __device_attach(struct device *dev, bool allow_async) dev_dbg(dev, "scheduling asynchronous probe\n"); get_device(dev); dev->async_probe = true; - async_schedule(__device_attach_async_helper, dev); + async_schedule_dev(__device_attach_async_helper, dev); } else { pm_request_idle(dev); } @@ -1008,7 +1008,7 @@ static int __driver_attach(struct device *dev, void *data) if (!dev->driver) { get_device(dev); dev_set_drv_async(dev, drv); - async_schedule(__driver_attach_async_helper, dev); + async_schedule_dev(__driver_attach_async_helper, dev); } device_unlock(dev); return 0; From patchwork Thu Nov 8 18:07:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674827 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 04A21175A for ; Thu, 8 Nov 2018 18:07:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7B252DE79 for ; Thu, 8 Nov 2018 18:07:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC0BB2DE96; Thu, 8 Nov 2018 18:07:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B9B32DE79 for ; Thu, 8 Nov 2018 18:07:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727722AbeKIDoC (ORCPT ); Thu, 8 Nov 2018 22:44:02 -0500 Received: from mga05.intel.com ([192.55.52.43]:46195 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726915AbeKIDoB (ORCPT ); Thu, 8 Nov 2018 22:44:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:07:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="272479086" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga005.jf.intel.com with ESMTP; 08 Nov 2018 10:07:21 -0800 Subject: [driver-core PATCH v6 8/9] PM core: Use new async_schedule_dev command From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:07:21 -0800 Message-ID: <154170044143.12967.7385910787360171117.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Use the device specific version of the async_schedule commands to defer various tasks related to power management. By doing this we should see a slight improvement in performance as any device that is sensitive to latency/locality in the setup will now be initializing on the node closest to the device. Reviewed-by: Bart Van Assche Reviewed-by: Rafael J. Wysocki Signed-off-by: Alexander Duyck Reviewed-by: Dan Williams --- drivers/base/power/main.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index a690fd400260..ebb8b61b52e9 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -726,7 +726,7 @@ void dpm_noirq_resume_devices(pm_message_t state) reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); - async_schedule(async_resume_noirq, dev); + async_schedule_dev(async_resume_noirq, dev); } } @@ -883,7 +883,7 @@ void dpm_resume_early(pm_message_t state) reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); - async_schedule(async_resume_early, dev); + async_schedule_dev(async_resume_early, dev); } } @@ -1047,7 +1047,7 @@ void dpm_resume(pm_message_t state) reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); - async_schedule(async_resume, dev); + async_schedule_dev(async_resume, dev); } } @@ -1366,7 +1366,7 @@ static int device_suspend_noirq(struct device *dev) if (is_async(dev)) { get_device(dev); - async_schedule(async_suspend_noirq, dev); + async_schedule_dev(async_suspend_noirq, dev); return 0; } return __device_suspend_noirq(dev, pm_transition, false); @@ -1569,7 +1569,7 @@ static int device_suspend_late(struct device *dev) if (is_async(dev)) { get_device(dev); - async_schedule(async_suspend_late, dev); + async_schedule_dev(async_suspend_late, dev); return 0; } @@ -1833,7 +1833,7 @@ static int device_suspend(struct device *dev) if (is_async(dev)) { get_device(dev); - async_schedule(async_suspend, dev); + async_schedule_dev(async_suspend, dev); return 0; } From patchwork Thu Nov 8 18:07:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10674831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBE86175A for ; Thu, 8 Nov 2018 18:07:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C96FA2DE5F for ; Thu, 8 Nov 2018 18:07:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD3CA2DE79; Thu, 8 Nov 2018 18:07:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5DD692DE5F for ; Thu, 8 Nov 2018 18:07:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727265AbeKIDoH (ORCPT ); Thu, 8 Nov 2018 22:44:07 -0500 Received: from mga07.intel.com ([134.134.136.100]:2556 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726915AbeKIDoG (ORCPT ); Thu, 8 Nov 2018 22:44:06 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 10:07:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="89636469" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga006.jf.intel.com with ESMTP; 08 Nov 2018 10:07:26 -0800 Subject: [driver-core PATCH v6 9/9] libnvdimm: Schedule device registration on node local to the device From: Alexander Duyck To: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Cc: linux-nvdimm@lists.01.org, tj@kernel.org, akpm@linux-foundation.org, linux-pm@vger.kernel.org, jiangshanlai@gmail.com, rafael@kernel.org, len.brown@intel.com, pavel@ucw.cz, zwisler@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, bvanassche@acm.org, alexander.h.duyck@linux.intel.com Date: Thu, 08 Nov 2018 10:07:26 -0800 Message-ID: <154170044652.12967.17419321472770956712.stgit@ahduyck-desk1.jf.intel.com> In-Reply-To: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> References: <154170028986.12967.2108024712555179678.stgit@ahduyck-desk1.jf.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Force the device registration for nvdimm devices to be closer to the actual device. This is achieved by using either the NUMA node ID of the region, or of the parent. By doing this we can have everything above the region based on the region, and everything below the region based on the nvdimm bus. By guaranteeing NUMA locality I see an improvement of as high as 25% for per-node init of a system with 12TB of persistent memory. Reviewed-by: Bart Van Assche Signed-off-by: Alexander Duyck --- drivers/nvdimm/bus.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index f1fb39921236..b1e193541874 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -513,11 +514,15 @@ void __nd_device_register(struct device *dev) set_dev_node(dev, to_nd_region(dev)->numa_node); dev->bus = &nvdimm_bus_type; - if (dev->parent) + if (dev->parent) { get_device(dev->parent); + if (dev_to_node(dev) == NUMA_NO_NODE) + set_dev_node(dev, dev_to_node(dev->parent)); + } get_device(dev); - async_schedule_domain(nd_async_device_register, dev, - &nd_async_domain); + + async_schedule_dev_domain(nd_async_device_register, dev, + &nd_async_domain); } void nd_device_register(struct device *dev)