mbox series

[driver-core,v8,0/9] Add NUMA aware async_schedule calls

Message ID 154403054034.11544.3978949383914046587.stgit@ahduyck-desk1.jf.intel.com (mailing list archive)
Headers show
Series Add NUMA aware async_schedule calls | expand

Message

Alexander Duyck Dec. 5, 2018, 5:25 p.m. UTC
This patch set provides functionality that will help to improve the
locality of the async_schedule calls used to provide deferred
initialization.

This patch set originally started out focused on just the one call to
async_schedule_domain in the nvdimm tree that was being used to defer the
device_add call however after doing some digging I realized the scope of
this was much broader than I had originally planned. As such I went
through and reworked the underlying infrastructure down to replacing the
queue_work call itself with a function of my own and opted to try and
provide a NUMA aware solution that would work for a broader audience.

In addition I have added several tweaks and/or clean-ups to the front of the
patch set. Patches 1 through 4 address a number of issues that actually were
causing the existing async_schedule calls to not show the performance that
they could due to either not scaling on a per device basis, or due to issues
that could result in a potential deadlock. For example, patch 4 addresses the
fact that we were calling async_schedule once per driver instead of once
per device, and as a result we would have still ended up with devices
being probed on a non-local node without addressing this first.

RFC->v1:
    Dropped nvdimm patch to submit later.
        It relies on code in libnvdimm development tree.
    Simplified queue_work_near to just convert node into a CPU.
    Split up drivers core and PM core patches.
v1->v2:
    Renamed queue_work_near to queue_work_node
    Added WARN_ON_ONCE if we use queue_work_node with per-cpu workqueue
v2->v3:
    Added Acked-by for queue_work_node patch
    Continued rename from _near to _node to be consistent with queue_work_node
        Renamed async_schedule_near_domain to async_schedule_node_domain
        Renamed async_schedule_near to async_schedule_node
    Added kerneldoc for new async_schedule_XXX functions
    Updated patch description for patch 4 to include data on potential gains
v3->v4
    Added patch to consolidate use of need_parent_lock
    Make asynchronous driver probing explicit about use of drvdata
v4->v5
    Added patch to move async_synchronize_full to address deadlock
    Added bit async_probe to act as mutex for probe/remove calls
    Added back nvdimm patch as code it relies on is now in Linus's tree
    Incorporated review comments on parent & device locking consolidation
    Rebased on latest linux-next
v5->v6:
    Drop the "This patch" or "This change" from start of patch descriptions.
    Drop unnecessary parenthesis in first patch
    Use same wording for "selecting a CPU" in comments added in first patch
    Added kernel documentation for async_probe member of device
    Fixed up comments for async_schedule calls in patch 2
    Moved code related setting async driver out of device.h and into dd.c
    Added Reviewed-by for several patches
v6->v7:
    Fixed typo which had kernel doc refer to "lock" when I meant "unlock"
    Dropped "bool X:1" to "u8 X:1" from patch description
    Added async_driver to device_private structure to store driver
    Dropped unecessary code shuffle from async_probe patch
    Reordered patches to move fixes up to front
    Added Reviewed-by for several patches
    Updated cover page and patch descriptions throughout the set
v7->v8:
    Replaced async_probe value with dead, only apply dead in device_del
    Dropped Reviewed-by from patch 2 due to significant changes
    Added Reviewed-by for patches reviewed by Luis Chamberlain

---

Alexander Duyck (9):
      driver core: Move async_synchronize_full call
      driver core: Establish order of operations for device_add and device_del via bitflag
      device core: Consolidate locking and unlocking of parent and device
      driver core: Probe devices asynchronously instead of the driver
      workqueue: Provide queue_work_node to queue work near a given NUMA node
      async: Add support for queueing on specific NUMA node
      driver core: Attach devices on CPU local to device node
      PM core: Use new async_schedule_dev command
      libnvdimm: Schedule device registration on node local to the device


 drivers/base/base.h       |    4 +
 drivers/base/bus.c        |   46 ++------------
 drivers/base/core.c       |   11 +++
 drivers/base/dd.c         |  152 ++++++++++++++++++++++++++++++++++++++-------
 drivers/base/power/main.c |   12 ++--
 drivers/nvdimm/bus.c      |   11 ++-
 include/linux/async.h     |   82 +++++++++++++++++++++++-
 include/linux/device.h    |    5 +
 include/linux/workqueue.h |    2 +
 kernel/async.c            |   53 +++++++++-------
 kernel/workqueue.c        |   84 +++++++++++++++++++++++++
 11 files changed, 362 insertions(+), 100 deletions(-)

--

Comments

Luis Chamberlain Dec. 10, 2018, 7:22 p.m. UTC | #1
On Wed, Dec 05, 2018 at 09:25:13AM -0800, Alexander Duyck wrote:
> This patch set provides functionality that will help to improve the
> locality of the async_schedule calls used to provide deferred
> initialization.
> 
> This patch set originally started out focused on just the one call to
> async_schedule_domain in the nvdimm tree that was being used to defer the
> device_add call however after doing some digging I realized the scope of
> this was much broader than I had originally planned. As such I went
> through and reworked the underlying infrastructure down to replacing the
> queue_work call itself with a function of my own and opted to try and
> provide a NUMA aware solution that would work for a broader audience.
> 
> In addition I have added several tweaks and/or clean-ups to the front of the
> patch set. Patches 1 through 4 address a number of issues that actually were
> causing the existing async_schedule calls to not show the performance that
> they could due to either not scaling on a per device basis, or due to issues
> that could result in a potential deadlock. For example, patch 4 addresses the
> fact that we were calling async_schedule once per driver instead of once
> per device, and as a result we would have still ended up with devices
> being probed on a non-local node without addressing this first.

No tests were added. Again, I think it would be good to add test
cases to showcase the old mechanisms, illustrate the new, and ensure
we don't regress both now and also help us ensure we don't regress
moving forward.

This is all too critical of a path for the kernel, and these changes
are rather instrusive. I'd readlly like to see test code for it now
rather than later.

  Luis
Alexander Duyck Dec. 10, 2018, 11:25 p.m. UTC | #2
On Mon, 2018-12-10 at 11:22 -0800, Luis Chamberlain wrote:
> On Wed, Dec 05, 2018 at 09:25:13AM -0800, Alexander Duyck wrote:
> > This patch set provides functionality that will help to improve the
> > locality of the async_schedule calls used to provide deferred
> > initialization.
> > 
> > This patch set originally started out focused on just the one call to
> > async_schedule_domain in the nvdimm tree that was being used to defer the
> > device_add call however after doing some digging I realized the scope of
> > this was much broader than I had originally planned. As such I went
> > through and reworked the underlying infrastructure down to replacing the
> > queue_work call itself with a function of my own and opted to try and
> > provide a NUMA aware solution that would work for a broader audience.
> > 
> > In addition I have added several tweaks and/or clean-ups to the front of the
> > patch set. Patches 1 through 4 address a number of issues that actually were
> > causing the existing async_schedule calls to not show the performance that
> > they could due to either not scaling on a per device basis, or due to issues
> > that could result in a potential deadlock. For example, patch 4 addresses the
> > fact that we were calling async_schedule once per driver instead of once
> > per device, and as a result we would have still ended up with devices
> > being probed on a non-local node without addressing this first.
> 
> No tests were added. Again, I think it would be good to add test
> cases to showcase the old mechanisms, illustrate the new, and ensure
> we don't regress both now and also help us ensure we don't regress
> moving forward.
> 
> This is all too critical of a path for the kernel, and these changes
> are rather instrusive. I'd readlly like to see test code for it now
> rather than later.
> 
>   Luis

Sorry about that. I was more focused on the rewrite of patch 2 and
overlooked the comment about lib/test_kmod.c.

I'll look into it and see if I can squeeze it in for v9.

Thanks.

- Alex
Luis Chamberlain Dec. 10, 2018, 11:35 p.m. UTC | #3
On Mon, Dec 10, 2018 at 03:25:04PM -0800, Alexander Duyck wrote:
> On Mon, 2018-12-10 at 11:22 -0800, Luis Chamberlain wrote:
> > On Wed, Dec 05, 2018 at 09:25:13AM -0800, Alexander Duyck wrote:
> > > This patch set provides functionality that will help to improve the
> > > locality of the async_schedule calls used to provide deferred
> > > initialization.
> > > 
> > > This patch set originally started out focused on just the one call to
> > > async_schedule_domain in the nvdimm tree that was being used to defer the
> > > device_add call however after doing some digging I realized the scope of
> > > this was much broader than I had originally planned. As such I went
> > > through and reworked the underlying infrastructure down to replacing the
> > > queue_work call itself with a function of my own and opted to try and
> > > provide a NUMA aware solution that would work for a broader audience.
> > > 
> > > In addition I have added several tweaks and/or clean-ups to the front of the
> > > patch set. Patches 1 through 4 address a number of issues that actually were
> > > causing the existing async_schedule calls to not show the performance that
> > > they could due to either not scaling on a per device basis, or due to issues
> > > that could result in a potential deadlock. For example, patch 4 addresses the
> > > fact that we were calling async_schedule once per driver instead of once
> > > per device, and as a result we would have still ended up with devices
> > > being probed on a non-local node without addressing this first.
> > 
> > No tests were added. Again, I think it would be good to add test
> > cases to showcase the old mechanisms, illustrate the new, and ensure
> > we don't regress both now and also help us ensure we don't regress
> > moving forward.
> > 
> > This is all too critical of a path for the kernel, and these changes
> > are rather instrusive. I'd readlly like to see test code for it now
> > rather than later.
> > 
> >   Luis
> 
> Sorry about that. I was more focused on the rewrite of patch 2 and
> overlooked the comment about lib/test_kmod.c.
> 
> I'll look into it and see if I can squeeze it in for v9.

Superb!

  Luis