From patchwork Tue Dec 11 01:03:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CC640112E for ; Tue, 11 Dec 2018 01:06:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA0C32A0E6 for ; Tue, 11 Dec 2018 01:06:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ADE332A4ED; Tue, 11 Dec 2018 01:06:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 162192A0E6 for ; Tue, 11 Dec 2018 01:06:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E7CF8E0070; Mon, 10 Dec 2018 20:05:52 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6BD268E006F; Mon, 10 Dec 2018 20:05:52 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5149F8E0070; Mon, 10 Dec 2018 20:05:52 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id E64788E006C for ; Mon, 10 Dec 2018 20:05:51 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id v72so8653714pgb.10 for ; Mon, 10 Dec 2018 17:05:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=44qP4UsydjGC77SKhN4MAkgaKSiEKUy1LUuzw8HoBkI=; b=ECzWYVcPqtl3KiVOLsHknpKN4HE14sTMj9xRUzn9tQv9tfD98uI9FCqjnpxIz4z4AW vrtVI3qlR41bpxgzt58kX+NAaxKCX4EFk8MlpMQM+gCU/6Yi/1hs52FBIK4iKrS2x0Dm z2uo2KO6cgfyFN107xIP7sNA3Dfha7DDOLl0PtcgZB29BJ10yMr1/i5ZGFBA37MdHXWq 1hR9FKFQZxl+Ms0cVPRrp533dBfwh85IW6U7xVa/RQqq6xwKF9Zhn2PuwXgReNTQ/cUw tImItn/j+dYNEqlQfxmyMZVuf4JCTrhn9Yp2L+I0acBoZPiwhYFH3vrpdnXCMLZlulor y4WQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of keith.busch@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=keith.busch@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZVVovKMJ8a3JDuDhtDOdZFxGjaWYDReGm7pNMv3kMwp8wXqK2D dcZ8PWVKLASKjMd8lyNXchCI439ROebvX06jM643CiM92JHkzJ0wHCwgdLCWBn2IrrfTUP+pKBP 4dz53InOh5+p5DpRUN7YmYWUSo086UUsq21qfF9IqUyzNeOqeHfOU3nAUTB0O3TbG/A== X-Received: by 2002:a62:47d9:: with SMTP id p86mr14077170pfi.95.1544490351459; Mon, 10 Dec 2018 17:05:51 -0800 (PST) X-Google-Smtp-Source: AFSGD/X8P80wBTVgVMXl+mEhwKlC3mR0ghLxUM2ACF79nYdUGdfMyd9uwJn6mwynQ3701ijzLdt2 X-Received: by 2002:a62:47d9:: with SMTP id p86mr14077094pfi.95.1544490349888; Mon, 10 Dec 2018 17:05:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544490349; cv=none; d=google.com; s=arc-20160816; b=XakY54+VattqmNVBHGECQaND8wdqZf7+UzX49hGPkjy968iXfDD46QkE1lC1wjHFyM gcgFc+I2U5+xUCuHevPcrYx9rX7qx8M2Ml0VbSedSBJbvO8q7uFW31DMjYtNBC3lQi0+ fOAFo6jJI4a31rApco4TWhPJhS7uiFV/VqEo7DjH3cjIo4lhaq02HCryd9u8Mkn9BunR bnUvQ30iixggOcmXlXkcO8a3J5WAFbq4sD49laZCKd0zwK38Zmj3DLKXwxW3K8+DZJqZ uuTnL0/6NhR1dflhN906CHrv21hCLDsjnG2zQdwTw9f7sGC3Z8nQaB+NdtACjILUFdvz 4V0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=44qP4UsydjGC77SKhN4MAkgaKSiEKUy1LUuzw8HoBkI=; b=HWA24aBlmMC4QOHZZ9bO1lSTbNcAkJCIyXql5WTOjwjJwnc5wQ16DCmkzNr8svflCT z23H+wRrzW9G63HM10wRr5X/QZSvSyCpuVsi5FYh1LP05f4ZJhkEpjPdkADKib2RtvXU L7u7nVvKpXXFIVkAlWkl2lbP7uk5oxxfnmRdFugmSz4UFei6FUZkuN/9Sg3s7KA7Nxej hguDiQnxad1OmZJ0zAwneXsZkX4HBnpvmPi5fnAZV/PumKyFlKgmK0ryUMwFNNSA2OKr wHiaXq9cC7NFwN4et9Iqj6F7IFytmhkfU9IaxCR2FiCGO2NMdh2/2N0jNfxOHpLZucM5 GfMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of keith.busch@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=keith.busch@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id i1si11402278pfj.276.2018.12.10.17.05.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 17:05:49 -0800 (PST) Received-SPF: pass (google.com: domain of keith.busch@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of keith.busch@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=keith.busch@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705189" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:48 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 06/12] node: Add heterogenous memory performance Date: Mon, 10 Dec 2018 18:03:04 -0700 Message-Id: <20181211010310.8551-7-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Heterogeneous memory systems provide memory nodes with different latency and bandwidth performance attributes. Create an interface for the kernel to register the attributes for the primary memory initiators under the memory node. If the system provides this information, applications can then query the node attributes when deciding which node to request memory. The following example shows the new sysfs hierarchy for a node exporting performance attributes: # tree /sys/devices/system/node/nodeY/primary_initiator_access /sys/devices/system/node/nodeY/primary_initiator_access |-- read_bandwidth |-- read_latency |-- write_bandwidth `-- write_latency The bandwidth is exported as MB/s and latency is reported in nanoseconds. Memory accesses from an initiator node that is not one of the memory's primary compute nodes may encounter a performance penalty that does not match the performance reported for primary memory initiators. As an example of what you may be able to do with this, let's say we have a PCIe storage device, /dev/nvme0n1, attached to a particular node, and we want to run IO to it using the fastest memory with primary access from the same node as that PCIe device. The following shell script is such an example to achieve that goal: #!/bin/bash DEV_NODE=/sys/devices/system/node/node$(cat /sys/block/nvme0n1/device/device/numa_node) BEST_WRITE_BW=0 BEST_MEM_NODE=0 for i in $(ls -d ${DEV_NODE}/primary_target*); do tmp=$(cat ${i}/primary_initiator_access/write_bandwidth); if ((${tmp} > ${BEST_WRITE_BW})); then BEST_WRITE_BW=${tmp} BEST_MEM_NODE=$(echo ${i} | sed s/^.*primary_target//g) fi done numactl --membind=${BEST_MEM_NODE} \ --cpunodebind=$(cat ${DEV_NODE}/primary_cpu_nodelist) \ -- fio --filename=/dev/nvme0n1 --bs=4k --name=access-test Signed-off-by: Keith Busch --- drivers/base/Kconfig | 8 ++++++++ drivers/base/node.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/node.h | 22 ++++++++++++++++++++++ 3 files changed, 74 insertions(+) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 3e63a900b330..6014980238e8 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -149,6 +149,14 @@ config DEBUG_TEST_DRIVER_REMOVE unusable. You should say N here unless you are explicitly looking to test this functionality. +config HMEM_REPORTING + bool + default y + depends on NUMA + help + Enable reporting for heterogenous memory access attributes under + their non-uniform memory nodes. + source "drivers/base/test/Kconfig" config SYS_HYPERVISOR diff --git a/drivers/base/node.c b/drivers/base/node.c index 50412ce3fd7d..768612c06c56 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -99,6 +99,50 @@ static DEVICE_ATTR_RO(primary_cpu_nodelist); static DEVICE_ATTR(cpumap, S_IRUGO, node_read_cpumask, NULL); static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL); +#ifdef CONFIG_HMEM_REPORTING +const struct attribute_group node_access_attrs_group; + +#define ACCESS_ATTR(name) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return sprintf(buf, "%d\n", to_node(dev)->hmem_attrs.name); \ +} \ +static DEVICE_ATTR_RO(name); + +ACCESS_ATTR(read_bandwidth) +ACCESS_ATTR(read_latency) +ACCESS_ATTR(write_bandwidth) +ACCESS_ATTR(write_latency) + +static struct attribute *access_attrs[] = { + &dev_attr_read_bandwidth.attr, + &dev_attr_read_latency.attr, + &dev_attr_write_bandwidth.attr, + &dev_attr_write_latency.attr, + NULL, +}; + +const struct attribute_group node_access_attrs_group = { + .name = "primary_initiator_access", + .attrs = access_attrs, +}; + +void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs) +{ + struct node *node; + + if (WARN_ON_ONCE(!node_online(nid))) + return; + node = node_devices[nid]; + node->hmem_attrs = *hmem_attrs; + if (sysfs_create_group(&node->dev.kobj, &node_access_attrs_group)) + pr_info("failed to add performance attribute group to node %d\n", + nid); +} +#endif + #define K(x) ((x) << (PAGE_SHIFT - 10)) static ssize_t node_read_meminfo(struct device *dev, struct device_attribute *attr, char *buf) diff --git a/include/linux/node.h b/include/linux/node.h index 3d06de045cbf..71abaf0d4f4b 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -17,8 +17,27 @@ #include #include +#include #include +#ifdef CONFIG_HMEM_REPORTING +/** + * struct node_hmem_attrs - heterogeneous memory performance attributes + * + * read_bandwidth: Read bandwidth in MB/s + * write_bandwidth: Write bandwidth in MB/s + * read_latency: Read latency in nanoseconds + * write_latency: Write latency in nanoseconds + */ +struct node_hmem_attrs { + unsigned int read_bandwidth; + unsigned int write_bandwidth; + unsigned int read_latency; + unsigned int write_latency; +}; +void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs); +#endif + struct node { struct device dev; nodemask_t primary_mem_nodes; @@ -27,6 +46,9 @@ struct node { #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) struct work_struct node_work; #endif +#ifdef CONFIG_HMEM_REPORTING + struct node_hmem_attrs hmem_attrs; +#endif }; struct memory_block;