diff mbox series

[PATCHv5,04/10] node: Link memory nodes to their compute nodes

Message ID 20190124230724.10022-5-keith.busch@intel.com (mailing list archive)
State Changes Requested, archived
Headers show
Series Heterogeneuos memory node attributes | expand

Commit Message

Keith Busch Jan. 24, 2019, 11:07 p.m. UTC
Systems may be constructed with various specialized nodes. Some nodes
may provide memory, some provide compute devices that access and use
that memory, and others may provide both. Nodes that provide memory are
referred to as memory targets, and nodes that can initiate memory access
are referred to as memory initiators.

Memory targets will often have varying access characteristics from
different initiators, and platforms may have ways to express those
relationships. In preparation for these systems, provide interfaces for
the kernel to export the memory relationship among different nodes memory
targets and their initiators with symlinks to each other.

If a system provides access locality for each initiator-target pair, nodes
may be grouped into ranked access classes relative to other nodes. The
new interface allows a subsystem to register relationships of varying
classes if available and desired to be exported.

A memory initiator may have multiple memory targets in the same access
class. The target memory's initiators in a given class indicate the
nodes access characteristics share the same performance relative to other
linked initiator nodes. Each target within an initiator's access class,
though, do not necessarily perform the same as each other.

A memory target node may have multiple memory initiators. All linked
initiators in a target's class have the same access characteristics to
that target.

The following example show the nodes' new sysfs hierarchy for a memory
target node 'Y' with access class 0 from initiator node 'X':

  # symlinks -v /sys/devices/system/node/nodeX/access0/
  relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY

  # symlinks -v /sys/devices/system/node/nodeY/access0/
  relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX

The new attributes are added to the sysfs stable documentation.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 Documentation/ABI/stable/sysfs-devices-node |  25 ++++-
 drivers/base/node.c                         | 142 +++++++++++++++++++++++++++-
 include/linux/node.h                        |   7 +-
 3 files changed, 171 insertions(+), 3 deletions(-)

Comments

Rafael J. Wysocki Feb. 5, 2019, 12:33 p.m. UTC | #1
On Fri, Jan 25, 2019 at 12:08 AM Keith Busch <keith.busch@intel.com> wrote:
>
> Systems may be constructed with various specialized nodes. Some nodes
> may provide memory, some provide compute devices that access and use
> that memory, and others may provide both. Nodes that provide memory are
> referred to as memory targets, and nodes that can initiate memory access
> are referred to as memory initiators.
>
> Memory targets will often have varying access characteristics from
> different initiators, and platforms may have ways to express those
> relationships. In preparation for these systems, provide interfaces for
> the kernel to export the memory relationship among different nodes memory
> targets and their initiators with symlinks to each other.
>
> If a system provides access locality for each initiator-target pair, nodes
> may be grouped into ranked access classes relative to other nodes. The
> new interface allows a subsystem to register relationships of varying
> classes if available and desired to be exported.
>
> A memory initiator may have multiple memory targets in the same access
> class. The target memory's initiators in a given class indicate the
> nodes access characteristics share the same performance relative to other
> linked initiator nodes. Each target within an initiator's access class,
> though, do not necessarily perform the same as each other.
>
> A memory target node may have multiple memory initiators. All linked
> initiators in a target's class have the same access characteristics to
> that target.
>
> The following example show the nodes' new sysfs hierarchy for a memory
> target node 'Y' with access class 0 from initiator node 'X':
>
>   # symlinks -v /sys/devices/system/node/nodeX/access0/
>   relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
>
>   # symlinks -v /sys/devices/system/node/nodeY/access0/
>   relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
>
> The new attributes are added to the sysfs stable documentation.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  Documentation/ABI/stable/sysfs-devices-node |  25 ++++-
>  drivers/base/node.c                         | 142 +++++++++++++++++++++++++++-
>  include/linux/node.h                        |   7 +-
>  3 files changed, 171 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 3e90e1f3bf0a..fb843222a281 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -90,4 +90,27 @@ Date:                December 2009
>  Contact:       Lee Schermerhorn <lee.schermerhorn@hp.com>
>  Description:
>                 The node's huge page size control/query attributes.
> -               See Documentation/admin-guide/mm/hugetlbpage.rst
> \ No newline at end of file
> +               See Documentation/admin-guide/mm/hugetlbpage.rst
> +
> +What:          /sys/devices/system/node/nodeX/accessY/
> +Date:          December 2018
> +Contact:       Keith Busch <keith.busch@intel.com>
> +Description:
> +               The node's relationship to other nodes for access class "Y".
> +
> +What:          /sys/devices/system/node/nodeX/accessY/initiators/
> +Date:          December 2018
> +Contact:       Keith Busch <keith.busch@intel.com>
> +Description:
> +               The directory containing symlinks to memory initiator
> +               nodes that have class "Y" access to this target node's
> +               memory. CPUs and other memory initiators in nodes not in
> +               the list accessing this node's memory may have different
> +               performance.
> +
> +What:          /sys/devices/system/node/nodeX/classY/targets/
> +Date:          December 2018
> +Contact:       Keith Busch <keith.busch@intel.com>
> +Description:
> +               The directory containing symlinks to memory targets that
> +               this initiator node has class "Y" access.
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 86d6cd92ce3d..6f4097680580 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -17,6 +17,7 @@
>  #include <linux/nodemask.h>
>  #include <linux/cpu.h>
>  #include <linux/device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/swap.h>
>  #include <linux/slab.h>
>
> @@ -59,6 +60,94 @@ static inline ssize_t node_read_cpulist(struct device *dev,
>  static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
>  static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
>
> +/**
> + * struct node_access_nodes - Access class device to hold user visible
> + *                           relationships to other nodes.
> + * @dev:       Device for this memory access class
> + * @list_node: List element in the node's access list
> + * @access:    The access class rank
> + */
> +struct node_access_nodes {
> +       struct device           dev;

I'm not sure if the entire struct device is needed here.

It looks like what you need is the kobject part of it only and you can
use a kobject directly here:

struct kobject        kobj;

Then, you can register that under the node's kobject using
kobject_init_and_add() and you can create attr groups under a kobject
using sysfs_create_groups(), which is exactly what device_add_groups()
does.

That would allow you to avoid allocating extra memory to hold the
entire device structure and the extra empty "power" subdirectory added
by device registration would not be there.

> +       struct list_head        list_node;
> +       unsigned                access;
> +};

Apart from the above, the patch looks good to me.
Keith Busch Feb. 5, 2019, 2:48 p.m. UTC | #2
On Tue, Feb 05, 2019 at 04:33:27AM -0800, Rafael J. Wysocki wrote:
> On Fri, Jan 25, 2019 at 12:08 AM Keith Busch <keith.busch@intel.com> wrote:
> > +/**
> > + * struct node_access_nodes - Access class device to hold user visible
> > + *                           relationships to other nodes.
> > + * @dev:       Device for this memory access class
> > + * @list_node: List element in the node's access list
> > + * @access:    The access class rank
> > + */
> > +struct node_access_nodes {
> > +       struct device           dev;
> 
> I'm not sure if the entire struct device is needed here.
> 
> It looks like what you need is the kobject part of it only and you can
> use a kobject directly here:
> 
> struct kobject        kobj;
> 
> Then, you can register that under the node's kobject using
> kobject_init_and_add() and you can create attr groups under a kobject
> using sysfs_create_groups(), which is exactly what device_add_groups()
> does.
> 
> That would allow you to avoid allocating extra memory to hold the
> entire device structure and the extra empty "power" subdirectory added
> by device registration would not be there.

This is conflicting with Greg's feedback from the first version of
this series:

  https://lore.kernel.org/lkml/20181126190619.GA32595@kroah.com/

Do you still recommend using kobject?
Greg KH Feb. 5, 2019, 2:52 p.m. UTC | #3
On Tue, Feb 05, 2019 at 01:33:27PM +0100, Rafael J. Wysocki wrote:
> > +/**
> > + * struct node_access_nodes - Access class device to hold user visible
> > + *                           relationships to other nodes.
> > + * @dev:       Device for this memory access class
> > + * @list_node: List element in the node's access list
> > + * @access:    The access class rank
> > + */
> > +struct node_access_nodes {
> > +       struct device           dev;
> 
> I'm not sure if the entire struct device is needed here.
> 
> It looks like what you need is the kobject part of it only and you can
> use a kobject directly here:
> 
> struct kobject        kobj;
> 
> Then, you can register that under the node's kobject using
> kobject_init_and_add() and you can create attr groups under a kobject
> using sysfs_create_groups(), which is exactly what device_add_groups()
> does.
> 
> That would allow you to avoid allocating extra memory to hold the
> entire device structure and the extra empty "power" subdirectory added
> by device registration would not be there.

When you use a "raw" kobject then userspace tools do not see the devices
and attributes in libraries like udev.  So unless userspace does not
care about this at all, you should use a 'struct device' where ever
possible.  The memory "savings" usually just isn't worth it unless you
have a _lot_ of objects being created here.

Who is going to use all of this new information?

thanks,

greg k-h
Rafael J. Wysocki Feb. 5, 2019, 3:17 p.m. UTC | #4
On Tue, Feb 5, 2019 at 3:52 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Feb 05, 2019 at 01:33:27PM +0100, Rafael J. Wysocki wrote:
> > > +/**
> > > + * struct node_access_nodes - Access class device to hold user visible
> > > + *                           relationships to other nodes.
> > > + * @dev:       Device for this memory access class
> > > + * @list_node: List element in the node's access list
> > > + * @access:    The access class rank
> > > + */
> > > +struct node_access_nodes {
> > > +       struct device           dev;
> >
> > I'm not sure if the entire struct device is needed here.
> >
> > It looks like what you need is the kobject part of it only and you can
> > use a kobject directly here:
> >
> > struct kobject        kobj;
> >
> > Then, you can register that under the node's kobject using
> > kobject_init_and_add() and you can create attr groups under a kobject
> > using sysfs_create_groups(), which is exactly what device_add_groups()
> > does.
> >
> > That would allow you to avoid allocating extra memory to hold the
> > entire device structure and the extra empty "power" subdirectory added
> > by device registration would not be there.
>
> When you use a "raw" kobject then userspace tools do not see the devices
> and attributes in libraries like udev.

And why would they need it in this particular case?

> So unless userspace does not care about this at all,

Which I think is the case here, isn't it?

> you should use a 'struct device' where ever
> possible.  The memory "savings" usually just isn't worth it unless you
> have a _lot_ of objects being created here.
>
> Who is going to use all of this new information?

Somebody who wants to know how the memory in the system is laid out AFAICS.
Jonathan Cameron Feb. 6, 2019, 12:26 p.m. UTC | #5
On Thu, 24 Jan 2019 16:07:18 -0700
Keith Busch <keith.busch@intel.com> wrote:

> Systems may be constructed with various specialized nodes. Some nodes
> may provide memory, some provide compute devices that access and use
> that memory, and others may provide both. Nodes that provide memory are
> referred to as memory targets, and nodes that can initiate memory access
> are referred to as memory initiators.
> 
> Memory targets will often have varying access characteristics from
> different initiators, and platforms may have ways to express those
> relationships. In preparation for these systems, provide interfaces for
> the kernel to export the memory relationship among different nodes memory
> targets and their initiators with symlinks to each other.
> 
> If a system provides access locality for each initiator-target pair, nodes
> may be grouped into ranked access classes relative to other nodes. The
> new interface allows a subsystem to register relationships of varying
> classes if available and desired to be exported.
> 
> A memory initiator may have multiple memory targets in the same access
> class. The target memory's initiators in a given class indicate the
> nodes access characteristics share the same performance relative to other
> linked initiator nodes. Each target within an initiator's access class,
> though, do not necessarily perform the same as each other.
> 
> A memory target node may have multiple memory initiators. All linked
> initiators in a target's class have the same access characteristics to
> that target.
> 
> The following example show the nodes' new sysfs hierarchy for a memory
> target node 'Y' with access class 0 from initiator node 'X':
> 
>   # symlinks -v /sys/devices/system/node/nodeX/access0/
>   relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
> 
>   # symlinks -v /sys/devices/system/node/nodeY/access0/
>   relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
> 
> The new attributes are added to the sysfs stable documentation.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>

A few comments inline.

> ---
>  Documentation/ABI/stable/sysfs-devices-node |  25 ++++-
>  drivers/base/node.c                         | 142 +++++++++++++++++++++++++++-
>  include/linux/node.h                        |   7 +-
>  3 files changed, 171 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 3e90e1f3bf0a..fb843222a281 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -90,4 +90,27 @@ Date:		December 2009
>  Contact:	Lee Schermerhorn <lee.schermerhorn@hp.com>
>  Description:
>  		The node's huge page size control/query attributes.
> -		See Documentation/admin-guide/mm/hugetlbpage.rst
> \ No newline at end of file
> +		See Documentation/admin-guide/mm/hugetlbpage.rst
> +
> +What:		/sys/devices/system/node/nodeX/accessY/
> +Date:		December 2018
> +Contact:	Keith Busch <keith.busch@intel.com>
> +Description:
> +		The node's relationship to other nodes for access class "Y".
> +
> +What:		/sys/devices/system/node/nodeX/accessY/initiators/
> +Date:		December 2018
> +Contact:	Keith Busch <keith.busch@intel.com>
> +Description:
> +		The directory containing symlinks to memory initiator
> +		nodes that have class "Y" access to this target node's
> +		memory. CPUs and other memory initiators in nodes not in
> +		the list accessing this node's memory may have different
> +		performance.

Also seems to contain the characteristics of those accesses.

> +
> +What:		/sys/devices/system/node/nodeX/classY/targets/
> +Date:		December 2018
> +Contact:	Keith Busch <keith.busch@intel.com>
> +Description:
> +		The directory containing symlinks to memory targets that
> +		this initiator node has class "Y" access.
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 86d6cd92ce3d..6f4097680580 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -17,6 +17,7 @@
>  #include <linux/nodemask.h>
>  #include <linux/cpu.h>
>  #include <linux/device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/swap.h>
>  #include <linux/slab.h>
>  
> @@ -59,6 +60,94 @@ static inline ssize_t node_read_cpulist(struct device *dev,
>  static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
>  static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
>  
> +/**
> + * struct node_access_nodes - Access class device to hold user visible
> + * 			      relationships to other nodes.
> + * @dev:	Device for this memory access class
> + * @list_node:	List element in the node's access list
> + * @access:	The access class rank
> + */
> +struct node_access_nodes {
> +	struct device		dev;
> +	struct list_head	list_node;
> +	unsigned		access;
> +};
> +#define to_access_nodes(dev) container_of(dev, struct node_access_nodes, dev)
> +
> +static struct attribute *node_init_access_node_attrs[] = {
> +	NULL,
> +};
> +
> +static struct attribute *node_targ_access_node_attrs[] = {
> +	NULL,
> +};
> +
> +static const struct attribute_group initiators = {
> +	.name	= "initiators",
> +	.attrs	= node_init_access_node_attrs,
> +};
> +
> +static const struct attribute_group targets = {
> +	.name	= "targets",
> +	.attrs	= node_targ_access_node_attrs,
> +};
> +
> +static const struct attribute_group *node_access_node_groups[] = {
> +	&initiators,
> +	&targets,
> +	NULL,
> +};
> +
> +static void node_remove_accesses(struct node *node)
> +{
> +	struct node_access_nodes *c, *cnext;
> +
> +	list_for_each_entry_safe(c, cnext, &node->access_list, list_node) {
> +		list_del(&c->list_node);
> +		device_unregister(&c->dev);
> +	}
> +}
> +
> +static void node_access_release(struct device *dev)
> +{
> +	kfree(to_access_nodes(dev));
> +}
> +
> +static struct node_access_nodes *node_init_node_access(struct node *node,
> +						       unsigned access)
> +{
> +	struct node_access_nodes *access_node;
> +	struct device *dev;
> +
> +	list_for_each_entry(access_node, &node->access_list, list_node)
> +		if (access_node->access == access)
> +			return access_node;
> +
> +	access_node = kzalloc(sizeof(*access_node), GFP_KERNEL);
> +	if (!access_node)
> +		return NULL;
> +
> +	access_node->access = access;
> +	dev = &access_node->dev;
> +	dev->parent = &node->dev;
> +	dev->release = node_access_release;
> +	dev->groups = node_access_node_groups;
> +	if (dev_set_name(dev, "access%u", access))
> +		goto free;
> +
> +	if (device_register(dev))
> +		goto free_name;
> +
> +	pm_runtime_no_callbacks(dev);
> +	list_add_tail(&access_node->list_node, &node->access_list);
> +	return access_node;
> +free_name:
> +	kfree_const(dev->kobj.name);
> +free:
> +	kfree(access_node);
> +	return NULL;
> +}
> +
>  #define K(x) ((x) << (PAGE_SHIFT - 10))
>  static ssize_t node_read_meminfo(struct device *dev,
>  			struct device_attribute *attr, char *buf)
> @@ -340,7 +429,7 @@ static int register_node(struct node *node, int num)
>  void unregister_node(struct node *node)
>  {
>  	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
> -
> +	node_remove_accesses(node);
>  	device_unregister(&node->dev);
>  }
>  
> @@ -372,6 +461,56 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid)
>  				 kobject_name(&node_devices[nid]->dev.kobj));
>  }
>  
> +/**
> + * register_memory_node_under_compute_node - link memory node to its compute
> + *					     node for a given access class.
> + * @mem_node:	Memory node number
> + * @cpu_node:	Cpu  node number
> + * @access:	Access class to register
> + *
> + * Description:
> + * 	For use with platforms that may have separate memory and compute nodes.
I would drop that first line as it also applies on systems where this isn't
true and will be there if we want hmat simply for the better stats.

> + * 	This function will export node relationships linking which memory
> + * 	initiator nodes can access memory targets at a given ranked access
> + * 	class.
> + */
> +int register_memory_node_under_compute_node(unsigned int mem_nid,
> +					    unsigned int cpu_nid,
> +					    unsigned access)
> +{
> +	struct node *init_node, *targ_node;
> +	struct node_access_nodes *initiator, *target;
> +	int ret;
> +
> +	if (!node_online(cpu_nid) || !node_online(mem_nid))
> +		return -ENODEV;

What do we do under memory/node hotplug?  More than likely that will
apply in such systems (it does in mine for starters).
Clearly to do the full story we would need _HMT support etc but
we can do the prebaked version by having hmat entries for nodes
that aren't online yet (like we do for SRAT).

Perhaps one for a follow up patch set.  However, I'd like an
pr_info to indicate that the node is listed but not online currently.

> +
> +	init_node = node_devices[cpu_nid];
> +	targ_node = node_devices[mem_nid];
> +	initiator = node_init_node_access(init_node, access);
> +	target = node_init_node_access(targ_node, access);
> +	if (!initiator || !target)
> +		return -ENOMEM;
If one of these fails and the other doesn't + the one that succeeded
did an init, don't we end up leaking a device here?  I'd expect this
function to not leave things hanging if it has an error. It should
unwind anything it has done.  It has been added to the list so
could be cleaned up later, but I'm not seeing that code. 

These only get cleaned up when the node is removed.

> +
> +	ret = sysfs_add_link_to_group(&initiator->dev.kobj, "targets",
> +				      &targ_node->dev.kobj,
> +				      dev_name(&targ_node->dev));
> +	if (ret)
> +		return ret;
> +
> +	ret = sysfs_add_link_to_group(&target->dev.kobj, "initiators",
> +				      &init_node->dev.kobj,
> +				      dev_name(&init_node->dev));
> +	if (ret)
> +		goto err;
> +
> +	return 0;
> + err:
> +	sysfs_remove_link_from_group(&initiator->dev.kobj, "targets",
> +				     dev_name(&targ_node->dev));
> +	return ret;
> +}
> +
>  int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>  {
>  	struct device *obj;
> @@ -580,6 +719,7 @@ int __register_one_node(int nid)
>  			register_cpu_under_node(cpu, nid);
>  	}
>  
> +	INIT_LIST_HEAD(&node_devices[nid]->access_list);
>  	/* initialize work queue for memory hot plug */
>  	init_node_hugetlb_work(nid);
>  
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 257bb3d6d014..f34688a203c1 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -17,11 +17,12 @@
>  
>  #include <linux/device.h>
>  #include <linux/cpumask.h>
> +#include <linux/list.h>
>  #include <linux/workqueue.h>
>  
>  struct node {
>  	struct device	dev;
> -
> +	struct list_head access_list;
>  #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
>  	struct work_struct	node_work;
>  #endif
> @@ -75,6 +76,10 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>  extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>  					   unsigned long phys_index);
>  
> +extern int register_memory_node_under_compute_node(unsigned int mem_nid,
> +						   unsigned int cpu_nid,
> +						   unsigned access);
> +
>  #ifdef CONFIG_HUGETLBFS
>  extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>  					 node_registration_func_t unregister);
Keith Busch Feb. 6, 2019, 4:12 p.m. UTC | #6
On Wed, Feb 06, 2019 at 04:26:35AM -0800, Jonathan Cameron wrote:
> On Thu, 24 Jan 2019 16:07:18 -0700
> Keith Busch <keith.busch@intel.com> wrote:
> > +What:		/sys/devices/system/node/nodeX/accessY/initiators/
> > +Date:		December 2018
> > +Contact:	Keith Busch <keith.busch@intel.com>
> > +Description:
> > +		The directory containing symlinks to memory initiator
> > +		nodes that have class "Y" access to this target node's
> > +		memory. CPUs and other memory initiators in nodes not in
> > +		the list accessing this node's memory may have different
> > +		performance.
> 
> Also seems to contain the characteristics of those accesses.

Right, but not in this patch. I will append the description for access
characterisitics in the patch that adds those attributes.
 
> > + * 	This function will export node relationships linking which memory
> > + * 	initiator nodes can access memory targets at a given ranked access
> > + * 	class.
> > + */
> > +int register_memory_node_under_compute_node(unsigned int mem_nid,
> > +					    unsigned int cpu_nid,
> > +					    unsigned access)
> > +{
> > +	struct node *init_node, *targ_node;
> > +	struct node_access_nodes *initiator, *target;
> > +	int ret;
> > +
> > +	if (!node_online(cpu_nid) || !node_online(mem_nid))
> > +		return -ENODEV;
> 
> What do we do under memory/node hotplug?  More than likely that will
> apply in such systems (it does in mine for starters).
> Clearly to do the full story we would need _HMT support etc but
> we can do the prebaked version by having hmat entries for nodes
> that aren't online yet (like we do for SRAT).
> 
> Perhaps one for a follow up patch set.  However, I'd like an
> pr_info to indicate that the node is listed but not online currently.

Yes, hot plug is planned to follow on to this series.

> > +
> > +	init_node = node_devices[cpu_nid];
> > +	targ_node = node_devices[mem_nid];
> > +	initiator = node_init_node_access(init_node, access);
> > +	target = node_init_node_access(targ_node, access);
> > +	if (!initiator || !target)
> > +		return -ENOMEM;
>
> If one of these fails and the other doesn't + the one that succeeded
> did an init, don't we end up leaking a device here?  I'd expect this
> function to not leave things hanging if it has an error. It should
> unwind anything it has done.  It has been added to the list so
> could be cleaned up later, but I'm not seeing that code. 
> 
> These only get cleaned up when the node is removed.

The intiator-target relationship is many-to-many, so we don't want to
free it just because we couldn't allocate its pairing node. The
exisiting one may still be paired to others we were able to allocate.
Jonathan Cameron Feb. 6, 2019, 4:47 p.m. UTC | #7
On Wed, 6 Feb 2019 09:12:27 -0700
Keith Busch <keith.busch@intel.com> wrote:

> On Wed, Feb 06, 2019 at 04:26:35AM -0800, Jonathan Cameron wrote:
> > On Thu, 24 Jan 2019 16:07:18 -0700
> > Keith Busch <keith.busch@intel.com> wrote:  
> > > +What:		/sys/devices/system/node/nodeX/accessY/initiators/
> > > +Date:		December 2018
> > > +Contact:	Keith Busch <keith.busch@intel.com>
> > > +Description:
> > > +		The directory containing symlinks to memory initiator
> > > +		nodes that have class "Y" access to this target node's
> > > +		memory. CPUs and other memory initiators in nodes not in
> > > +		the list accessing this node's memory may have different
> > > +		performance.  
> > 
> > Also seems to contain the characteristics of those accesses.  
> 
> Right, but not in this patch. I will append the description for access
> characterisitics in the patch that adds those attributes.
>  
> > > + * 	This function will export node relationships linking which memory
> > > + * 	initiator nodes can access memory targets at a given ranked access
> > > + * 	class.
> > > + */
> > > +int register_memory_node_under_compute_node(unsigned int mem_nid,
> > > +					    unsigned int cpu_nid,
> > > +					    unsigned access)
> > > +{
> > > +	struct node *init_node, *targ_node;
> > > +	struct node_access_nodes *initiator, *target;
> > > +	int ret;
> > > +
> > > +	if (!node_online(cpu_nid) || !node_online(mem_nid))
> > > +		return -ENODEV;  
> > 
> > What do we do under memory/node hotplug?  More than likely that will
> > apply in such systems (it does in mine for starters).
> > Clearly to do the full story we would need _HMT support etc but
> > we can do the prebaked version by having hmat entries for nodes
> > that aren't online yet (like we do for SRAT).
> > 
> > Perhaps one for a follow up patch set.  However, I'd like an
> > pr_info to indicate that the node is listed but not online currently.  
> 
> Yes, hot plug is planned to follow on to this series.
> 
> > > +
> > > +	init_node = node_devices[cpu_nid];
> > > +	targ_node = node_devices[mem_nid];
> > > +	initiator = node_init_node_access(init_node, access);
> > > +	target = node_init_node_access(targ_node, access);
> > > +	if (!initiator || !target)
> > > +		return -ENOMEM;  
> >
> > If one of these fails and the other doesn't + the one that succeeded
> > did an init, don't we end up leaking a device here?  I'd expect this
> > function to not leave things hanging if it has an error. It should
> > unwind anything it has done.  It has been added to the list so
> > could be cleaned up later, but I'm not seeing that code. 
> > 
> > These only get cleaned up when the node is removed.  
> 
> The intiator-target relationship is many-to-many, so we don't want to
> free it just because we couldn't allocate its pairing node. The
> exisiting one may still be paired to others we were able to allocate.

Reference count them?  We have lots of paths that can result in
creation any of which might need cleaning up. Sounds like a classic
case for reference counts.

Jonathan
Keith Busch Feb. 6, 2019, 11:09 p.m. UTC | #8
On Tue, Feb 05, 2019 at 04:17:09PM +0100, Rafael J. Wysocki wrote:
> <gregkh@linuxfoundation.org> wrote:
> >
> > When you use a "raw" kobject then userspace tools do not see the devices
> > and attributes in libraries like udev.
> 
> And why would they need it in this particular case?
> 
> > So unless userspace does not care about this at all,
> 
> Which I think is the case here, isn't it?
> 
> > you should use a 'struct device' where ever
> > possible.  The memory "savings" usually just isn't worth it unless you
> > have a _lot_ of objects being created here.
> >
> > Who is going to use all of this new information?
> 
> Somebody who wants to know how the memory in the system is laid out AFAICS.

Yes, this is for user space to make informed decisions about where it
wants to allocate/relocate hot and cold data with respect to particular
compute domains. So user space should care about these attributes,
and they won't always be static when memory hotplug support for these
attributes is added.

Does that change anything, or still recommending kobject? I don't have a
strong opinion either way and have both options coded and ready to
submit new version once I know which direction is most acceptable.
Rafael J. Wysocki Feb. 6, 2019, 11:48 p.m. UTC | #9
On Thu, Feb 7, 2019 at 12:10 AM Keith Busch <keith.busch@intel.com> wrote:
>
> On Tue, Feb 05, 2019 at 04:17:09PM +0100, Rafael J. Wysocki wrote:
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > When you use a "raw" kobject then userspace tools do not see the devices
> > > and attributes in libraries like udev.
> >
> > And why would they need it in this particular case?
> >
> > > So unless userspace does not care about this at all,
> >
> > Which I think is the case here, isn't it?
> >
> > > you should use a 'struct device' where ever
> > > possible.  The memory "savings" usually just isn't worth it unless you
> > > have a _lot_ of objects being created here.
> > >
> > > Who is going to use all of this new information?
> >
> > Somebody who wants to know how the memory in the system is laid out AFAICS.
>
> Yes, this is for user space to make informed decisions about where it
> wants to allocate/relocate hot and cold data with respect to particular
> compute domains. So user space should care about these attributes,
> and they won't always be static when memory hotplug support for these
> attributes is added.
>
> Does that change anything, or still recommending kobject? I don't have a
> strong opinion either way and have both options coded and ready to
> submit new version once I know which direction is most acceptable.

If you want to make dynamic changes to the sysfs directories under
this object, uevents generated by device registration and
unregstration may be useful.  However, they only trigger automatically
when you register and unregister, so presumably you'd need to do that
every time for the changes to trigger an update in user space.  Also,
whoever is interested in this data would need to listen to the uevents
to get notified.

OTOH, you can call kobject_uevent() for the "raw" kobjects too.

Anyway, if Greg really prefers struct device to be used here, that's
fine by me, but since the uevents in question are going to be part of
your user space I/F then, it may be good to take that into
consideration. :-)

After all, you need to know how you want the I/F to work.
Rafael J. Wysocki Feb. 7, 2019, 11:35 a.m. UTC | #10
On Fri, Jan 25, 2019 at 12:08 AM Keith Busch <keith.busch@intel.com> wrote:
>
> Systems may be constructed with various specialized nodes. Some nodes
> may provide memory, some provide compute devices that access and use
> that memory, and others may provide both. Nodes that provide memory are
> referred to as memory targets, and nodes that can initiate memory access
> are referred to as memory initiators.
>
> Memory targets will often have varying access characteristics from
> different initiators, and platforms may have ways to express those
> relationships. In preparation for these systems, provide interfaces for
> the kernel to export the memory relationship among different nodes memory
> targets and their initiators with symlinks to each other.
>
> If a system provides access locality for each initiator-target pair, nodes
> may be grouped into ranked access classes relative to other nodes. The
> new interface allows a subsystem to register relationships of varying
> classes if available and desired to be exported.
>
> A memory initiator may have multiple memory targets in the same access
> class. The target memory's initiators in a given class indicate the
> nodes access characteristics share the same performance relative to other
> linked initiator nodes. Each target within an initiator's access class,
> though, do not necessarily perform the same as each other.
>
> A memory target node may have multiple memory initiators. All linked
> initiators in a target's class have the same access characteristics to
> that target.
>
> The following example show the nodes' new sysfs hierarchy for a memory
> target node 'Y' with access class 0 from initiator node 'X':
>
>   # symlinks -v /sys/devices/system/node/nodeX/access0/
>   relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
>
>   # symlinks -v /sys/devices/system/node/nodeY/access0/
>   relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
>
> The new attributes are added to the sysfs stable documentation.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Overall, if you decide to go for full struct device embedded in struct
node_access_nodes, feel free to add

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

to this patch.

> ---
>  Documentation/ABI/stable/sysfs-devices-node |  25 ++++-
>  drivers/base/node.c                         | 142 +++++++++++++++++++++++++++-
>  include/linux/node.h                        |   7 +-
>  3 files changed, 171 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 3e90e1f3bf0a..fb843222a281 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -90,4 +90,27 @@ Date:                December 2009
>  Contact:       Lee Schermerhorn <lee.schermerhorn@hp.com>
>  Description:
>                 The node's huge page size control/query attributes.
> -               See Documentation/admin-guide/mm/hugetlbpage.rst
> \ No newline at end of file
> +               See Documentation/admin-guide/mm/hugetlbpage.rst
> +
> +What:          /sys/devices/system/node/nodeX/accessY/
> +Date:          December 2018
> +Contact:       Keith Busch <keith.busch@intel.com>
> +Description:
> +               The node's relationship to other nodes for access class "Y".
> +
> +What:          /sys/devices/system/node/nodeX/accessY/initiators/
> +Date:          December 2018
> +Contact:       Keith Busch <keith.busch@intel.com>
> +Description:
> +               The directory containing symlinks to memory initiator
> +               nodes that have class "Y" access to this target node's
> +               memory. CPUs and other memory initiators in nodes not in
> +               the list accessing this node's memory may have different
> +               performance.
> +
> +What:          /sys/devices/system/node/nodeX/classY/targets/
> +Date:          December 2018
> +Contact:       Keith Busch <keith.busch@intel.com>
> +Description:
> +               The directory containing symlinks to memory targets that
> +               this initiator node has class "Y" access.
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 86d6cd92ce3d..6f4097680580 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -17,6 +17,7 @@
>  #include <linux/nodemask.h>
>  #include <linux/cpu.h>
>  #include <linux/device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/swap.h>
>  #include <linux/slab.h>
>
> @@ -59,6 +60,94 @@ static inline ssize_t node_read_cpulist(struct device *dev,
>  static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
>  static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
>
> +/**
> + * struct node_access_nodes - Access class device to hold user visible
> + *                           relationships to other nodes.
> + * @dev:       Device for this memory access class
> + * @list_node: List element in the node's access list
> + * @access:    The access class rank
> + */
> +struct node_access_nodes {
> +       struct device           dev;
> +       struct list_head        list_node;
> +       unsigned                access;
> +};
> +#define to_access_nodes(dev) container_of(dev, struct node_access_nodes, dev)
> +
> +static struct attribute *node_init_access_node_attrs[] = {
> +       NULL,
> +};
> +
> +static struct attribute *node_targ_access_node_attrs[] = {
> +       NULL,
> +};
> +
> +static const struct attribute_group initiators = {
> +       .name   = "initiators",
> +       .attrs  = node_init_access_node_attrs,
> +};
> +
> +static const struct attribute_group targets = {
> +       .name   = "targets",
> +       .attrs  = node_targ_access_node_attrs,
> +};
> +
> +static const struct attribute_group *node_access_node_groups[] = {
> +       &initiators,
> +       &targets,
> +       NULL,
> +};
> +
> +static void node_remove_accesses(struct node *node)
> +{
> +       struct node_access_nodes *c, *cnext;
> +
> +       list_for_each_entry_safe(c, cnext, &node->access_list, list_node) {
> +               list_del(&c->list_node);
> +               device_unregister(&c->dev);
> +       }
> +}
> +
> +static void node_access_release(struct device *dev)
> +{
> +       kfree(to_access_nodes(dev));
> +}
> +
> +static struct node_access_nodes *node_init_node_access(struct node *node,
> +                                                      unsigned access)
> +{
> +       struct node_access_nodes *access_node;
> +       struct device *dev;
> +
> +       list_for_each_entry(access_node, &node->access_list, list_node)
> +               if (access_node->access == access)
> +                       return access_node;
> +
> +       access_node = kzalloc(sizeof(*access_node), GFP_KERNEL);
> +       if (!access_node)
> +               return NULL;
> +
> +       access_node->access = access;
> +       dev = &access_node->dev;
> +       dev->parent = &node->dev;
> +       dev->release = node_access_release;
> +       dev->groups = node_access_node_groups;
> +       if (dev_set_name(dev, "access%u", access))
> +               goto free;
> +
> +       if (device_register(dev))
> +               goto free_name;
> +
> +       pm_runtime_no_callbacks(dev);
> +       list_add_tail(&access_node->list_node, &node->access_list);
> +       return access_node;
> +free_name:
> +       kfree_const(dev->kobj.name);
> +free:
> +       kfree(access_node);
> +       return NULL;
> +}
> +
>  #define K(x) ((x) << (PAGE_SHIFT - 10))
>  static ssize_t node_read_meminfo(struct device *dev,
>                         struct device_attribute *attr, char *buf)
> @@ -340,7 +429,7 @@ static int register_node(struct node *node, int num)
>  void unregister_node(struct node *node)
>  {
>         hugetlb_unregister_node(node);          /* no-op, if memoryless node */
> -
> +       node_remove_accesses(node);
>         device_unregister(&node->dev);
>  }
>
> @@ -372,6 +461,56 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid)
>                                  kobject_name(&node_devices[nid]->dev.kobj));
>  }
>
> +/**
> + * register_memory_node_under_compute_node - link memory node to its compute
> + *                                          node for a given access class.
> + * @mem_node:  Memory node number
> + * @cpu_node:  Cpu  node number
> + * @access:    Access class to register
> + *
> + * Description:
> + *     For use with platforms that may have separate memory and compute nodes.
> + *     This function will export node relationships linking which memory
> + *     initiator nodes can access memory targets at a given ranked access
> + *     class.
> + */
> +int register_memory_node_under_compute_node(unsigned int mem_nid,
> +                                           unsigned int cpu_nid,
> +                                           unsigned access)
> +{
> +       struct node *init_node, *targ_node;
> +       struct node_access_nodes *initiator, *target;
> +       int ret;
> +
> +       if (!node_online(cpu_nid) || !node_online(mem_nid))
> +               return -ENODEV;
> +
> +       init_node = node_devices[cpu_nid];
> +       targ_node = node_devices[mem_nid];
> +       initiator = node_init_node_access(init_node, access);
> +       target = node_init_node_access(targ_node, access);
> +       if (!initiator || !target)
> +               return -ENOMEM;
> +
> +       ret = sysfs_add_link_to_group(&initiator->dev.kobj, "targets",
> +                                     &targ_node->dev.kobj,
> +                                     dev_name(&targ_node->dev));
> +       if (ret)
> +               return ret;
> +
> +       ret = sysfs_add_link_to_group(&target->dev.kobj, "initiators",
> +                                     &init_node->dev.kobj,
> +                                     dev_name(&init_node->dev));
> +       if (ret)
> +               goto err;
> +
> +       return 0;
> + err:
> +       sysfs_remove_link_from_group(&initiator->dev.kobj, "targets",
> +                                    dev_name(&targ_node->dev));
> +       return ret;
> +}
> +
>  int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>  {
>         struct device *obj;
> @@ -580,6 +719,7 @@ int __register_one_node(int nid)
>                         register_cpu_under_node(cpu, nid);
>         }
>
> +       INIT_LIST_HEAD(&node_devices[nid]->access_list);
>         /* initialize work queue for memory hot plug */
>         init_node_hugetlb_work(nid);
>
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 257bb3d6d014..f34688a203c1 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -17,11 +17,12 @@
>
>  #include <linux/device.h>
>  #include <linux/cpumask.h>
> +#include <linux/list.h>
>  #include <linux/workqueue.h>
>
>  struct node {
>         struct device   dev;
> -
> +       struct list_head access_list;
>  #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
>         struct work_struct      node_work;
>  #endif
> @@ -75,6 +76,10 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>  extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>                                            unsigned long phys_index);
>
> +extern int register_memory_node_under_compute_node(unsigned int mem_nid,
> +                                                  unsigned int cpu_nid,
> +                                                  unsigned access);
> +
>  #ifdef CONFIG_HUGETLBFS
>  extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>                                          node_registration_func_t unregister);
> --
> 2.14.4
>
diff mbox series

Patch

diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index 3e90e1f3bf0a..fb843222a281 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -90,4 +90,27 @@  Date:		December 2009
 Contact:	Lee Schermerhorn <lee.schermerhorn@hp.com>
 Description:
 		The node's huge page size control/query attributes.
-		See Documentation/admin-guide/mm/hugetlbpage.rst
\ No newline at end of file
+		See Documentation/admin-guide/mm/hugetlbpage.rst
+
+What:		/sys/devices/system/node/nodeX/accessY/
+Date:		December 2018
+Contact:	Keith Busch <keith.busch@intel.com>
+Description:
+		The node's relationship to other nodes for access class "Y".
+
+What:		/sys/devices/system/node/nodeX/accessY/initiators/
+Date:		December 2018
+Contact:	Keith Busch <keith.busch@intel.com>
+Description:
+		The directory containing symlinks to memory initiator
+		nodes that have class "Y" access to this target node's
+		memory. CPUs and other memory initiators in nodes not in
+		the list accessing this node's memory may have different
+		performance.
+
+What:		/sys/devices/system/node/nodeX/classY/targets/
+Date:		December 2018
+Contact:	Keith Busch <keith.busch@intel.com>
+Description:
+		The directory containing symlinks to memory targets that
+		this initiator node has class "Y" access.
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 86d6cd92ce3d..6f4097680580 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -17,6 +17,7 @@ 
 #include <linux/nodemask.h>
 #include <linux/cpu.h>
 #include <linux/device.h>
+#include <linux/pm_runtime.h>
 #include <linux/swap.h>
 #include <linux/slab.h>
 
@@ -59,6 +60,94 @@  static inline ssize_t node_read_cpulist(struct device *dev,
 static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
 static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
 
+/**
+ * struct node_access_nodes - Access class device to hold user visible
+ * 			      relationships to other nodes.
+ * @dev:	Device for this memory access class
+ * @list_node:	List element in the node's access list
+ * @access:	The access class rank
+ */
+struct node_access_nodes {
+	struct device		dev;
+	struct list_head	list_node;
+	unsigned		access;
+};
+#define to_access_nodes(dev) container_of(dev, struct node_access_nodes, dev)
+
+static struct attribute *node_init_access_node_attrs[] = {
+	NULL,
+};
+
+static struct attribute *node_targ_access_node_attrs[] = {
+	NULL,
+};
+
+static const struct attribute_group initiators = {
+	.name	= "initiators",
+	.attrs	= node_init_access_node_attrs,
+};
+
+static const struct attribute_group targets = {
+	.name	= "targets",
+	.attrs	= node_targ_access_node_attrs,
+};
+
+static const struct attribute_group *node_access_node_groups[] = {
+	&initiators,
+	&targets,
+	NULL,
+};
+
+static void node_remove_accesses(struct node *node)
+{
+	struct node_access_nodes *c, *cnext;
+
+	list_for_each_entry_safe(c, cnext, &node->access_list, list_node) {
+		list_del(&c->list_node);
+		device_unregister(&c->dev);
+	}
+}
+
+static void node_access_release(struct device *dev)
+{
+	kfree(to_access_nodes(dev));
+}
+
+static struct node_access_nodes *node_init_node_access(struct node *node,
+						       unsigned access)
+{
+	struct node_access_nodes *access_node;
+	struct device *dev;
+
+	list_for_each_entry(access_node, &node->access_list, list_node)
+		if (access_node->access == access)
+			return access_node;
+
+	access_node = kzalloc(sizeof(*access_node), GFP_KERNEL);
+	if (!access_node)
+		return NULL;
+
+	access_node->access = access;
+	dev = &access_node->dev;
+	dev->parent = &node->dev;
+	dev->release = node_access_release;
+	dev->groups = node_access_node_groups;
+	if (dev_set_name(dev, "access%u", access))
+		goto free;
+
+	if (device_register(dev))
+		goto free_name;
+
+	pm_runtime_no_callbacks(dev);
+	list_add_tail(&access_node->list_node, &node->access_list);
+	return access_node;
+free_name:
+	kfree_const(dev->kobj.name);
+free:
+	kfree(access_node);
+	return NULL;
+}
+
 #define K(x) ((x) << (PAGE_SHIFT - 10))
 static ssize_t node_read_meminfo(struct device *dev,
 			struct device_attribute *attr, char *buf)
@@ -340,7 +429,7 @@  static int register_node(struct node *node, int num)
 void unregister_node(struct node *node)
 {
 	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
-
+	node_remove_accesses(node);
 	device_unregister(&node->dev);
 }
 
@@ -372,6 +461,56 @@  int register_cpu_under_node(unsigned int cpu, unsigned int nid)
 				 kobject_name(&node_devices[nid]->dev.kobj));
 }
 
+/**
+ * register_memory_node_under_compute_node - link memory node to its compute
+ *					     node for a given access class.
+ * @mem_node:	Memory node number
+ * @cpu_node:	Cpu  node number
+ * @access:	Access class to register
+ *
+ * Description:
+ * 	For use with platforms that may have separate memory and compute nodes.
+ * 	This function will export node relationships linking which memory
+ * 	initiator nodes can access memory targets at a given ranked access
+ * 	class.
+ */
+int register_memory_node_under_compute_node(unsigned int mem_nid,
+					    unsigned int cpu_nid,
+					    unsigned access)
+{
+	struct node *init_node, *targ_node;
+	struct node_access_nodes *initiator, *target;
+	int ret;
+
+	if (!node_online(cpu_nid) || !node_online(mem_nid))
+		return -ENODEV;
+
+	init_node = node_devices[cpu_nid];
+	targ_node = node_devices[mem_nid];
+	initiator = node_init_node_access(init_node, access);
+	target = node_init_node_access(targ_node, access);
+	if (!initiator || !target)
+		return -ENOMEM;
+
+	ret = sysfs_add_link_to_group(&initiator->dev.kobj, "targets",
+				      &targ_node->dev.kobj,
+				      dev_name(&targ_node->dev));
+	if (ret)
+		return ret;
+
+	ret = sysfs_add_link_to_group(&target->dev.kobj, "initiators",
+				      &init_node->dev.kobj,
+				      dev_name(&init_node->dev));
+	if (ret)
+		goto err;
+
+	return 0;
+ err:
+	sysfs_remove_link_from_group(&initiator->dev.kobj, "targets",
+				     dev_name(&targ_node->dev));
+	return ret;
+}
+
 int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 {
 	struct device *obj;
@@ -580,6 +719,7 @@  int __register_one_node(int nid)
 			register_cpu_under_node(cpu, nid);
 	}
 
+	INIT_LIST_HEAD(&node_devices[nid]->access_list);
 	/* initialize work queue for memory hot plug */
 	init_node_hugetlb_work(nid);
 
diff --git a/include/linux/node.h b/include/linux/node.h
index 257bb3d6d014..f34688a203c1 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -17,11 +17,12 @@ 
 
 #include <linux/device.h>
 #include <linux/cpumask.h>
+#include <linux/list.h>
 #include <linux/workqueue.h>
 
 struct node {
 	struct device	dev;
-
+	struct list_head access_list;
 #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
 	struct work_struct	node_work;
 #endif
@@ -75,6 +76,10 @@  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 					   unsigned long phys_index);
 
+extern int register_memory_node_under_compute_node(unsigned int mem_nid,
+						   unsigned int cpu_nid,
+						   unsigned access);
+
 #ifdef CONFIG_HUGETLBFS
 extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
 					 node_registration_func_t unregister);