Message ID | 20180910234400.4068.15541.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Address issues slowing persistent memory initialization | expand |
On Mon, Sep 10, 2018 at 4:44 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > From: Alexander Duyck <alexander.h.duyck@intel.com> > > This patch is based off of the pci_call_probe function used to initialize > PCI devices. The general idea here is to move the probe call to a location > that is local to the memory being initialized. By doing this we can shave > significant time off of the total time needed for initialization. > > With this patch applied I see a significant reduction in overall init time > as without it the init varied between 23 and 37 seconds to initialize a 3GB > node. With this patch applied the variance is only between 23 and 26 > seconds to initialize each node. Same mistake here as in patch 1. It is 3TB, not 3GB. I will fix for the next version. - Alex
On Mon, Sep 10, 2018 at 4:44 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote: > From: Alexander Duyck <alexander.h.duyck@intel.com> > > This patch is based off of the pci_call_probe function used to initialize > PCI devices. The general idea here is to move the probe call to a location > that is local to the memory being initialized. By doing this we can shave > significant time off of the total time needed for initialization. > > With this patch applied I see a significant reduction in overall init time > as without it the init varied between 23 and 37 seconds to initialize a 3GB > node. With this patch applied the variance is only between 23 and 26 > seconds to initialize each node. > > I hope to refine this further in the future by combining this logic into > the async_schedule_domain code that is already in use. By doing that it > would likely make this functionality redundant. Yeah, it is a bit sad that we schedule an async thread only to move it back somewhere else. Could we trivially achieve the same with an async_schedule_domain_on_cpu() variant? It seems we can and the workqueue core will "Do the right thing". I now notice that async uses the system_unbound_wq and work_on_cpu() uses the system_wq. I don't think we want long running nvdimm work on system_wq.
On 9/10/18 7:44 PM, Alexander Duyck wrote: > From: Alexander Duyck <alexander.h.duyck@intel.com> > > This patch is based off of the pci_call_probe function used to initialize > PCI devices. The general idea here is to move the probe call to a location > that is local to the memory being initialized. By doing this we can shave > significant time off of the total time needed for initialization. > > With this patch applied I see a significant reduction in overall init time > as without it the init varied between 23 and 37 seconds to initialize a 3GB > node. With this patch applied the variance is only between 23 and 26 > seconds to initialize each node. > > I hope to refine this further in the future by combining this logic into > the async_schedule_domain code that is already in use. By doing that it > would likely make this functionality redundant. > > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Looks good to me. The previous fast runs were because there we were getting lucky and executed in the right latency groups, right? Now, we bound the execution time to be always fast. Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Thank you, Pavel
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 8aae6dcc839f..5b73953176b1 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -27,6 +27,7 @@ #include <linux/io.h> #include <linux/mm.h> #include <linux/nd.h> +#include <linux/cpu.h> #include "nd-core.h" #include "nd.h" #include "pfn.h" @@ -90,6 +91,48 @@ static void nvdimm_bus_probe_end(struct nvdimm_bus *nvdimm_bus) nvdimm_bus_unlock(&nvdimm_bus->dev); } +struct nvdimm_drv_dev { + struct nd_device_driver *nd_drv; + struct device *dev; +}; + +static long __nvdimm_call_probe(void *_nddd) +{ + struct nvdimm_drv_dev *nddd = _nddd; + struct nd_device_driver *nd_drv = nddd->nd_drv; + + return nd_drv->probe(nddd->dev); +} + +static int nvdimm_call_probe(struct nd_device_driver *nd_drv, + struct device *dev) +{ + struct nvdimm_drv_dev nddd = { nd_drv, dev }; + int rc, node, cpu; + + /* + * Execute driver initialization on node where the device is + * attached. This way the driver will be able to access local + * memory instead of having to initialize memory across nodes. + */ + node = dev_to_node(dev); + + cpu_hotplug_disable(); + + if (node < 0 || node >= MAX_NUMNODES || !node_online(node)) + cpu = nr_cpu_ids; + else + cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); + + if (cpu < nr_cpu_ids) + rc = work_on_cpu(cpu, __nvdimm_call_probe, &nddd); + else + rc = __nvdimm_call_probe(&nddd); + + cpu_hotplug_enable(); + return rc; +} + static int nvdimm_bus_probe(struct device *dev) { struct nd_device_driver *nd_drv = to_nd_device_driver(dev->driver); @@ -104,7 +147,7 @@ static int nvdimm_bus_probe(struct device *dev) dev->driver->name, dev_name(dev)); nvdimm_bus_probe_start(nvdimm_bus); - rc = nd_drv->probe(dev); + rc = nvdimm_call_probe(nd_drv, dev); if (rc == 0) nd_region_probe_success(nvdimm_bus, dev); else