From patchwork Wed Jan 16 17:57:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766603 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C0731390 for ; Wed, 16 Jan 2019 17:59:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE5162F0EC for ; Wed, 16 Jan 2019 17:59:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E23DB2F0F2; Wed, 16 Jan 2019 17:59:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F6F32F0EC for ; Wed, 16 Jan 2019 17:59:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C7AF8E0005; Wed, 16 Jan 2019 12:59:42 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F2CBD8E0008; Wed, 16 Jan 2019 12:59:41 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBFC68E0005; Wed, 16 Jan 2019 12:59:41 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 81EE58E0006 for ; Wed, 16 Jan 2019 12:59:41 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id p15so5213196pfk.7 for ; Wed, 16 Jan 2019 09:59:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=L7FJ5sgSKEoSzwTq3TWighJ5lFq6R/5JKxRAHS4jhyA=; b=ff9EZYo79E5bS2tdUYa4gU93/8svDuRxncZu4cRilFAlCSPbTMFtCYPQTWYuk5WaLD LPgy0euQVQ66oQoyBsPc2KJeOiUHtvivzX5D3yTIjPKQEbzvIYtqmFq4BskhiReZAuJn TbQGb6yX0pb5JyYlduakFTcKRQLAGjuIDuz8u5M9rZj9a/u/eV5rOKi/avibO/z549l7 1ynY6y23OyOf/1l34PEN8sg4c51CtC3w8FSAu/Sgchn74qmz+iOSm9f3WI8xDdweGnD1 crgG9lSW0ufGsiIc1JnZquMxnrEO/lOrl7G5stSN+NMEnNiIdQExCdIVYGL4HZYVFF2C wXYw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of keith.busch@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=keith.busch@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukePUV1TIXB0DEoY4xrrbhYvZhWt/u25ZYdoPDpBXssWcvF/YOtl HOrKoLCd6GptuTIm5W0x6Cf8ZbSs5GvluWFk8CbwwcDPqgrJvidfkkd6+1UkrlxBUK4pZ1G5WMj svfjWZf+2+XfYNMgHsG/N4hgFFK+zVZPah0gu3hkvhxMeS+pyPUWsg+wzyo7A5T8h9g== X-Received: by 2002:a62:3a04:: with SMTP id h4mr10938887pfa.119.1547661581140; Wed, 16 Jan 2019 09:59:41 -0800 (PST) X-Google-Smtp-Source: ALg8bN5vRjHWs6BOegDJOY1NTcgryAfjDT7OF8PMxTlTvPaplpuBjyHgAceJPK/pk/C/kE1THO2l X-Received: by 2002:a62:3a04:: with SMTP id h4mr10938821pfa.119.1547661580019; Wed, 16 Jan 2019 09:59:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547661580; cv=none; d=google.com; s=arc-20160816; b=JDCvggU5Pm5VTWoEJ/zQ9NVRpAP/MpjLjYvIPZmrvT3+vLiCXPR/HWG7iiiGQgQdG0 hgI7UYR11IRPHqy0U+kG4QeyeN2HzvBGB+rq0nZ9k9+HfQ8YrfXtzzrg9Q/q0uOru3MT qMiH6eIGyxGZONSOoOZznRRT3g2Ft7P+v4UwshpGq52bK3nENrjER3xs1inHsma+9HQn ARH27Wo1Cidc/+UuqnlwfUMi1cRhKqIVhlux9Y/xD7gsvjSQOcG8jbQJiM0JjXX6QSgq WGGS5seFMd1Z4PaScv9UYY9TDK+3w98VXZd8uce/DJ3cQqp/u77lrCkg+jzHRIX46rZ/ 9lQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=L7FJ5sgSKEoSzwTq3TWighJ5lFq6R/5JKxRAHS4jhyA=; b=bxQW1Xu7blseAntSC/Oyr0JMYzU7rinUG0UNONvRZhju1/a3ZH6tYvJxEkqIgJDMYE bW4SFWgKX/AFxXHz7MOT7lScd+oGjN+JMHBD5Zp3Inb+x+hO3Vm/PmOT3gHxZsujGrwh aF+9wvZ8/FHpd56UObNhrlgruklIFoQQv/xhmur+uwWtQzd6la9KRftETB/8oOeIAkhe QX4WmuylUvD8X8GHuVkCBHAntwcJP/TpNArdyQb+hVAY76ddGWfgQJKeM6m78ykwQNfX FxVy9LOhHyvpFZZ97AS7VEQPgCayow3PC4kzs2i8a+1wMJy8qJmJiVvHXmL+gqgfQ1Hc FS7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of keith.busch@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=keith.busch@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id y8si6735247plr.92.2019.01.16.09.59.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 09:59:40 -0800 (PST) Received-SPF: pass (google.com: domain of keith.busch@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of keith.busch@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=keith.busch@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227779" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:37 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 04/13] node: Link memory nodes to their compute nodes Date: Wed, 16 Jan 2019 10:57:55 -0700 Message-Id: <20190116175804.30196-5-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Systems may be constructed with various specialized nodes. Some nodes may provide memory, some provide compute devices that access and use that memory, and others may provide both. Nodes that provide memory are referred to as memory targets, and nodes that can initiate memory access are referred to as memory initiators. Memory targets will often have varying access characteristics from different initiators, and platforms may have ways to express those relationships. In preparation for these systems, provide interfaces for the kernel to export the memory relationship among different nodes memory targets and their initiators with symlinks to each other's nodes, and export node lists showing the same relationship. If a system provides access locality for each initiator-target pair, nodes may be grouped into ranked access classes relative to other nodes. The new interface allows a subsystem to register relationships of varying classes if available and desired to be exported. A lower class number indicates a higher performing tier, with 0 being the best performing class. A memory initiator may have multiple memory targets in the same access class. The initiator's memory targets in given class indicate the node's access characteristics perform better relative to other initiator nodes either unreported or in lower class numbers. The targets within an initiator's class, though, do not necessarily perform the same as each other. A memory target node may have multiple memory initiators. All linked initiators in a target's class have the same access characteristics to that target. The following example show the nodes' new sysfs hierarchy for a memory target node 'Y' with class 0 access from initiator node 'X': # symlinks -v /sys/devices/system/node/nodeX/class0/ relative: /sys/devices/system/node/nodeX/class0/targetY -> ../../nodeY # symlinks -v /sys/devices/system/node/nodeY/class0/ relative: /sys/devices/system/node/nodeY/class0/initiatorX -> ../../nodeX And the same information is reflected in the nodelist: # cat /sys/devices/system/node/nodeX/class0/target_nodelist Y # cat /sys/devices/system/node/nodeY/class0/initiator_nodelist X Signed-off-by: Keith Busch --- drivers/base/node.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/node.h | 6 ++- 2 files changed, 131 insertions(+), 2 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..1da5072116ab 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -59,6 +60,91 @@ static inline ssize_t node_read_cpulist(struct device *dev, static DEVICE_ATTR(cpumap, S_IRUGO, node_read_cpumask, NULL); static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL); +struct node_class_nodes { + struct device dev; + struct list_head list_node; + unsigned class; + nodemask_t initiator_nodes; + nodemask_t target_nodes; +}; +#define to_class_nodes(dev) container_of(dev, struct node_class_nodes, dev) + +static ssize_t initiator_nodelist_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node_class_nodes *c = to_class_nodes(dev); + return scnprintf(buf, PAGE_SIZE - 1, "%*pbl\n", + nodemask_pr_args(&c->initiator_nodes)); +} +static DEVICE_ATTR_RO(initiator_nodelist); + +static ssize_t target_nodelist_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node_class_nodes *c = to_class_nodes(dev); + return scnprintf(buf, PAGE_SIZE - 1, "%*pbl\n", + nodemask_pr_args(&c->target_nodes)); +} +static DEVICE_ATTR_RO(target_nodelist); + +static struct attribute *node_class_node_attrs[] = { + &dev_attr_initiator_nodelist.attr, + &dev_attr_target_nodelist.attr, + NULL, +}; +ATTRIBUTE_GROUPS(node_class_node); + +static void node_remove_classes(struct node *node) +{ + struct node_class_nodes *c, *cnext; + + list_for_each_entry_safe(c, cnext, &node->class_list, list_node) { + list_del(&c->list_node); + device_unregister(&c->dev); + } +} + +static void node_class_release(struct device *dev) +{ + kfree(to_class_nodes(dev)); +} + +static struct node_class_nodes *node_init_node_class(struct device *parent, + struct list_head *list, + unsigned class) +{ + struct node_class_nodes *class_node; + struct device *dev; + + list_for_each_entry(class_node, list, list_node) + if (class_node->class == class) + return class_node; + + class_node = kzalloc(sizeof(*class_node), GFP_KERNEL); + if (!class_node) + return NULL; + + class_node->class = class; + dev = &class_node->dev; + dev->parent = parent; + dev->release = node_class_release; + dev->groups = node_class_node_groups; + if (dev_set_name(dev, "class%u", class)) + goto free; + + if (device_register(dev)) + goto free_name; + + pm_runtime_no_callbacks(dev); + list_add_tail(&class_node->list_node, list); + return class_node; +free_name: + kfree_const(dev->kobj.name); +free: + kfree(class_node); + return NULL; +} + #define K(x) ((x) << (PAGE_SHIFT - 10)) static ssize_t node_read_meminfo(struct device *dev, struct device_attribute *attr, char *buf) @@ -340,7 +426,7 @@ static int register_node(struct node *node, int num) void unregister_node(struct node *node) { hugetlb_unregister_node(node); /* no-op, if memoryless node */ - + node_remove_classes(node); device_unregister(&node->dev); } @@ -372,6 +458,44 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) kobject_name(&node_devices[nid]->dev.kobj)); } +int register_memory_node_under_compute_node(unsigned int m, unsigned int p, + unsigned class) +{ + struct node *init, *targ; + struct node_class_nodes *i, *t; + char initiator[20]; /* "initiator4294967295\0" */ + char target[17]; /* "target4294967295\0" */ + int ret; + + if (!node_online(p) || !node_online(m)) + return -ENODEV; + + init = node_devices[p]; + targ = node_devices[m]; + i = node_init_node_class(&init->dev, &init->class_list, class); + t = node_init_node_class(&targ->dev, &targ->class_list, class); + if (!i || !t) + return -ENOMEM; + + snprintf(initiator, sizeof(initiator), "initiator%u", p); + snprintf(target, sizeof(target), "target%u", m); + ret = sysfs_create_link(&i->dev.kobj, &targ->dev.kobj, target); + if (ret) + return ret; + + ret = sysfs_create_link(&t->dev.kobj, &init->dev.kobj, initiator); + if (ret) + goto err; + + node_set(m, i->target_nodes); + node_set(p, t->initiator_nodes); + return 0; + err: + sysfs_remove_link(&node_devices[p]->dev.kobj, + kobject_name(&node_devices[m]->dev.kobj)); + return ret; +} + int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) { struct device *obj; @@ -580,6 +704,7 @@ int __register_one_node(int nid) register_cpu_under_node(cpu, nid); } + INIT_LIST_HEAD(&node_devices[nid]->class_list); /* initialize work queue for memory hot plug */ init_node_hugetlb_work(nid); diff --git a/include/linux/node.h b/include/linux/node.h index 257bb3d6d014..8e3666c12ef2 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -17,11 +17,12 @@ #include #include +#include #include struct node { struct device dev; - + struct list_head class_list; #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) struct work_struct node_work; #endif @@ -75,6 +76,9 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk, extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, unsigned long phys_index); +extern int register_memory_node_under_compute_node(unsigned int m, unsigned int p, + unsigned class); + #ifdef CONFIG_HUGETLBFS extern void register_hugetlbfs_with_node(node_registration_func_t doregister, node_registration_func_t unregister);