[4/4] nvdimm: Trigger the device probe on a cpu local to the device
diff mbox series

Message ID 20180910234400.4068.15541.stgit@localhost.localdomain
State New, archived
Headers show
Series
  • Address issues slowing persistent memory initialization
Related show

Commit Message

Alexander Duyck Sept. 10, 2018, 11:44 p.m. UTC
From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch is based off of the pci_call_probe function used to initialize
PCI devices. The general idea here is to move the probe call to a location
that is local to the memory being initialized. By doing this we can shave
significant time off of the total time needed for initialization.

With this patch applied I see a significant reduction in overall init time
as without it the init varied between 23 and 37 seconds to initialize a 3GB
node. With this patch applied the variance is only between 23 and 26
seconds to initialize each node.

I hope to refine this further in the future by combining this logic into
the async_schedule_domain code that is already in use. By doing that it
would likely make this functionality redundant.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/nvdimm/bus.c |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

Comments

Alexander Duyck Sept. 11, 2018, 12:37 a.m. UTC | #1
On Mon, Sep 10, 2018 at 4:44 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> This patch is based off of the pci_call_probe function used to initialize
> PCI devices. The general idea here is to move the probe call to a location
> that is local to the memory being initialized. By doing this we can shave
> significant time off of the total time needed for initialization.
>
> With this patch applied I see a significant reduction in overall init time
> as without it the init varied between 23 and 37 seconds to initialize a 3GB
> node. With this patch applied the variance is only between 23 and 26
> seconds to initialize each node.

Same mistake here as in patch 1. It is 3TB, not 3GB. I will fix for
the next version.

- Alex
Dan Williams Sept. 12, 2018, 5:48 a.m. UTC | #2
On Mon, Sep 10, 2018 at 4:44 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> This patch is based off of the pci_call_probe function used to initialize
> PCI devices. The general idea here is to move the probe call to a location
> that is local to the memory being initialized. By doing this we can shave
> significant time off of the total time needed for initialization.
>
> With this patch applied I see a significant reduction in overall init time
> as without it the init varied between 23 and 37 seconds to initialize a 3GB
> node. With this patch applied the variance is only between 23 and 26
> seconds to initialize each node.
>
> I hope to refine this further in the future by combining this logic into
> the async_schedule_domain code that is already in use. By doing that it
> would likely make this functionality redundant.

Yeah, it is a bit sad that we schedule an async thread only to move it
back somewhere else.

Could we trivially achieve the same with an
async_schedule_domain_on_cpu() variant? It seems we can and the
workqueue core will "Do the right thing".

I now notice that async uses the system_unbound_wq and work_on_cpu()
uses the system_wq.  I don't think we want long running nvdimm work on
system_wq.
Pasha Tatashin Sept. 12, 2018, 1:44 p.m. UTC | #3
On 9/10/18 7:44 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch is based off of the pci_call_probe function used to initialize
> PCI devices. The general idea here is to move the probe call to a location
> that is local to the memory being initialized. By doing this we can shave
> significant time off of the total time needed for initialization.
> 
> With this patch applied I see a significant reduction in overall init time
> as without it the init varied between 23 and 37 seconds to initialize a 3GB
> node. With this patch applied the variance is only between 23 and 26
> seconds to initialize each node.
> 
> I hope to refine this further in the future by combining this logic into
> the async_schedule_domain code that is already in use. By doing that it
> would likely make this functionality redundant.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>

Looks good to me. The previous fast runs were because there we were
getting lucky and executed in the right latency groups, right? Now, we
bound the execution time to be always fast.

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>

Thank you,
Pavel

Patch
diff mbox series

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 8aae6dcc839f..5b73953176b1 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -27,6 +27,7 @@ 
 #include <linux/io.h>
 #include <linux/mm.h>
 #include <linux/nd.h>
+#include <linux/cpu.h>
 #include "nd-core.h"
 #include "nd.h"
 #include "pfn.h"
@@ -90,6 +91,48 @@  static void nvdimm_bus_probe_end(struct nvdimm_bus *nvdimm_bus)
 	nvdimm_bus_unlock(&nvdimm_bus->dev);
 }
 
+struct nvdimm_drv_dev {
+	struct nd_device_driver *nd_drv;
+	struct device *dev;
+};
+
+static long __nvdimm_call_probe(void *_nddd)
+{
+	struct nvdimm_drv_dev *nddd = _nddd;
+	struct nd_device_driver *nd_drv = nddd->nd_drv;
+
+	return nd_drv->probe(nddd->dev);
+}
+
+static int nvdimm_call_probe(struct nd_device_driver *nd_drv,
+			     struct device *dev)
+{
+	struct nvdimm_drv_dev nddd = { nd_drv, dev };
+	int rc, node, cpu;
+
+	/*
+	 * Execute driver initialization on node where the device is
+	 * attached.  This way the driver will be able to access local
+	 * memory instead of having to initialize memory across nodes.
+	 */
+	node = dev_to_node(dev);
+
+	cpu_hotplug_disable();
+
+	if (node < 0 || node >= MAX_NUMNODES || !node_online(node))
+		cpu = nr_cpu_ids;
+	else
+		cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
+
+	if (cpu < nr_cpu_ids)
+		rc = work_on_cpu(cpu, __nvdimm_call_probe, &nddd);
+	else
+		rc = __nvdimm_call_probe(&nddd);
+
+	cpu_hotplug_enable();
+	return rc;
+}
+
 static int nvdimm_bus_probe(struct device *dev)
 {
 	struct nd_device_driver *nd_drv = to_nd_device_driver(dev->driver);
@@ -104,7 +147,7 @@  static int nvdimm_bus_probe(struct device *dev)
 			dev->driver->name, dev_name(dev));
 
 	nvdimm_bus_probe_start(nvdimm_bus);
-	rc = nd_drv->probe(dev);
+	rc = nvdimm_call_probe(nd_drv, dev);
 	if (rc == 0)
 		nd_region_probe_success(nvdimm_bus, dev);
 	else