From patchwork Mon Dec 3 23:34:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710889 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B022014E2 for ; Mon, 3 Dec 2018 23:35:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D7E92B1EF for ; Mon, 3 Dec 2018 23:35:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 915C92B212; Mon, 3 Dec 2018 23:35:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A6702B1EF for ; Mon, 3 Dec 2018 23:35:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F61C6B6BAA; Mon, 3 Dec 2018 18:35:39 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 47CDC6B6BAB; Mon, 3 Dec 2018 18:35:39 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F63C6B6BAC; Mon, 3 Dec 2018 18:35:39 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id F032F6B6BAA for ; Mon, 3 Dec 2018 18:35:38 -0500 (EST) Received: by mail-qt1-f197.google.com with SMTP id 42so15046446qtr.7 for ; Mon, 03 Dec 2018 15:35:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Vhf1VFrsbERcrLk2w/Eh/lJbssa7aO7E9BwnrXVDV3A=; b=DGwCET8ChiGtoka1mSytbQfx41qyke5qE4nC619hViAz3Pg85IO49eH/XDiVh5OEKw lnC/f7vTCIKxhhQ7gRVn3sfw/9rv2NVjW0uLY3HZNCD5znpnWgo4C7gpd9ODo2cKqsY6 ECDVCCNSHHRHczx+X8BzIcCES0niNc1i4nds95oW9Ob3vVlxIRDJPemOHyFHgt7k+829 bUEA/xnFs7BJRNpWXZwhvcxDro57NyU5UISTOkJZv9M5DR56tcs93QmP7jlCuG1XUxJP JEI3u9S5puJVRR5Hlfn5v2qjY7bmXZrMkIF1BEukafQAab8R5PoXqxm+4oT1KmDeWIET yi8A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWbIYRdr8EsYUykz5e3XTSJ0KBPU8h/Hc8WozyvJTrlEm5uJjuSW dwB4Zdm+cyA6KZckjmuprlFdhB9kYtbwf1uxT18U10twxQQaHt37w3dPB2IjEmdQeCfUGIq0E1l Sw24t3S2J41GMviJw6KjIQblGHT13drF6oJ0m/towXXtbaMp/XZEoVj/7Jj5+Hi82uQ== X-Received: by 2002:ac8:17f0:: with SMTP id r45mr17111438qtk.206.1543880138696; Mon, 03 Dec 2018 15:35:38 -0800 (PST) X-Google-Smtp-Source: AFSGD/UZmGVkGJYkwZFHwJyUFMWhFdFH7XC2dkQINxTA7yQX3hnrV42tw5BvIzDDfVakqAPxSimp X-Received: by 2002:ac8:17f0:: with SMTP id r45mr17111396qtk.206.1543880137525; Mon, 03 Dec 2018 15:35:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880137; cv=none; d=google.com; s=arc-20160816; b=pK8NWkYoipZVd2eYMu2E+ipW7N/7tAnUvKngIiD88ncechH1geAn54aNBTQFBNdKL0 ji+2zFS9glTWye6I26cahkMZ/GDXbARuHY8hP2sgu6UctPElkf1tuuePw04QfTmRBiZB DvT7X5UvYNiuESsCD6NP0PvekZjwOp3paYYKux2rpRWEuP0TSWJ1V3swMGjFT13laBYv 3WHJWY/AH+Ra+yrAKYsPLy5z2N6OIQhruZ2sJKKnOr7RcO/Fm9GjDJq7gHGXXUi0ov+7 xOBVAjF1Ub4I/DeEOcAMjKiRjB5+8Z293/iqYirLGZG/XU5thny/eRauXD5sItS59Vtg 4Z5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=Vhf1VFrsbERcrLk2w/Eh/lJbssa7aO7E9BwnrXVDV3A=; b=YEw5dLC9h+DO/qpOgVwK+v1heWY/YBzJxbSDMc6DIjQkQa2orc9A3s+m+gf4dNHnU9 U3Efss6bqa2hobDmQ7bep0gp1cC2iVHaS3MQnoaE5tEenys5Y9sqsR16ZWdn7x4fIvvI DzrkvTZphFp7bArcJzpWB2lydqCsm4JaVwfoQmIGYFb2as+kylAX75QZAeoT+YYOJQyl OBw/3mEUFMYlsNKrY4QXAczog2jbRRdhM5OP3rTFe6iFZ5u/zuN8IIccZllzB4U1mVWb PajRtb9+FcdQesZdNzm36bJqaDRY/KEM7xyDGJ3z2ARBE+1WFruZdtbW1Y6wEzLWFmEZ imqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id l90si1114416qte.331.2018.12.03.15.35.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:35:37 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 67A553082137; Mon, 3 Dec 2018 23:35:36 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 71973600C1; Mon, 3 Dec 2018 23:35:26 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 01/14] mm/hms: heterogeneous memory system (sysfs infrastructure) Date: Mon, 3 Dec 2018 18:34:56 -0500 Message-Id: <20181203233509.20671-2-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Mon, 03 Dec 2018 23:35:36 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse System with complex memory topology needs a more versatile memory topology description than just node where a node is a collection of memory and CPU. In heterogeneous memory system we consider four types of object: - target: which is any kind of memory - initiator: any kind of device or CPU - link: any kind of link that connects targets and initiators - bridge: a bridge between two links (for some initiators) Properties (like bandwidth, latency, bus width, ...) are define per bridge and per link. Property of a link apply to all initiators which are connected to that link. Not all initiators are connected to all links thus not all initiators can access all targets memory (this apply to CPU too ie some CPU might not be able to access all target memory). Bridges allow initiators (that can use the bridge) to access targets for which they do not have a direct link with. Through this four types of object we can describe any kind of system memory topology. To expose this to userspace we expose a new sysfs hierarchy (that co-exist with the existing one): - /sys/bus/hms/target/ all targets in the system - /sys/bus/hms/initiator all initiators in the system - /sys/bus/hms/interconnect all inter-connects in the system - /sys/bus/hms/bridge all bridges in the system Inside each link or bridge directory they are symlinks to targets and initiators that are connected to that bridge or link. Properties are defined inside link and bridge directory. This patch only introduce core HMS infrastructure, each object type is added with individual patch. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- Documentation/vm/hms.rst | 35 +++++++ drivers/base/Kconfig | 14 +++ drivers/base/Makefile | 1 + drivers/base/hms.c | 199 +++++++++++++++++++++++++++++++++++++++ drivers/base/init.c | 2 + include/linux/hms.h | 72 ++++++++++++++ 6 files changed, 323 insertions(+) create mode 100644 Documentation/vm/hms.rst create mode 100644 drivers/base/hms.c create mode 100644 include/linux/hms.h diff --git a/Documentation/vm/hms.rst b/Documentation/vm/hms.rst new file mode 100644 index 000000000000..dbf0f71918a9 --- /dev/null +++ b/Documentation/vm/hms.rst @@ -0,0 +1,35 @@ +.. hms: + +================================= +Heterogeneous Memory System (HMS) +================================= + +System with complex memory topology needs a more versatile memory topology +description than just node where a node is a collection of memory and CPU. +In heterogeneous memory system we consider four types of object:: + - target: which is any kind of memory + - initiator: any kind of device or CPU + - inter-connect: any kind of links that connects target and initiator + - bridge: a link between two inter-connects + +Properties (like bandwidth, latency, bus width, ...) are define per bridge +and per inter-connect. Property of an inter-connect apply to all initiators +which are link to that inter-connect. Not all initiators are link to all +inter-connect and thus not all initiators can access all memory (this apply +to CPU too ie some CPU might not be able to access all memory). + +Bridges allow initiators (that can use the bridge) to access target for +which they do not have a direct link with (ie they do not share a common +inter-connect with the target). + +Through this four types of object we can describe any kind of system memory +topology. To expose this to userspace we expose a new sysfs hierarchy (that +co-exist with the existing one):: + - /sys/bus/hms/target* all targets in the system + - /sys/bus/hms/initiator* all initiators in the system + - /sys/bus/hms/interconnect* all inter-connects in the system + - /sys/bus/hms/bridge* all bridges in the system + +Inside each bridge or inter-connect directory they are symlinks to targets +and initiators that are linked to that bridge or inter-connect. Properties +are defined inside bridge and inter-connect directory. diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 3e63a900b330..d46a7d47f316 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -276,4 +276,18 @@ config GENERIC_ARCH_TOPOLOGY appropriate scaling, sysfs interface for changing capacity values at runtime. +config HMS + bool "Heterogeneous memory system" + depends on STAGING + default n + help + THIS IS AN EXPERIMENTAL API DO NOT RELY ON IT ! IT IS UNSTABLE ! + + Select HMS if you want to expose heterogeneous memory system to user + space. This will expose a new directory under /sys/class/bus/hms that + provide a description of heterogeneous memory system. + + See Documentations/vm/hms.rst for further informations. + + endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 704f44295810..92ebfacbf0dc 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -12,6 +12,7 @@ obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o obj-y += firmware_loader/ obj-$(CONFIG_NUMA) += node.o +obj-$(CONFIG_HMS) += hms.o obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o ifeq ($(CONFIG_SYSFS),y) obj-$(CONFIG_MODULES) += module.o diff --git a/drivers/base/hms.c b/drivers/base/hms.c new file mode 100644 index 000000000000..a145f00a3683 --- /dev/null +++ b/drivers/base/hms.c @@ -0,0 +1,199 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +#define HMS_CLASS_NAME "hms" + +static DEFINE_MUTEX(hms_sysfs_mutex); + +static struct bus_type hms_subsys = { + .name = HMS_CLASS_NAME, + .dev_name = NULL, +}; + +void hms_object_release(struct hms_object *object) +{ + put_device(object->parent); +} + +int hms_object_init(struct hms_object *object, struct device *parent, + enum hms_type type, unsigned version, + void (*device_release)(struct device *device), + const struct attribute_group **device_group) +{ + static unsigned uid = 0; + int ret; + + mutex_lock(&hms_sysfs_mutex); + + /* + * For now assume we are not going to have more that (2^31)-1 objects + * in a system. + * + * FIXME use something little less naive ... + */ + object->uid = uid++; + + switch (type) { + case HMS_TARGET: + dev_set_name(&object->device, "v%u-%u-target", + version, object->uid); + break; + case HMS_BRIDGE: + dev_set_name(&object->device, "v%u-%u-bridge", + version, object->uid); + break; + case HMS_INITIATOR: + dev_set_name(&object->device, "v%u-%u-initiator", + version, object->uid); + break; + case HMS_LINK: + dev_set_name(&object->device, "v%u-%u-link", + version, object->uid); + break; + default: + mutex_unlock(&hms_sysfs_mutex); + return -EINVAL; + } + + object->type = type; + object->version = version; + object->device.id = object->uid; + object->device.bus = &hms_subsys; + object->device.groups = device_group; + object->device.release = device_release; + + ret = device_register(&object->device); + if (ret) + put_device(&object->device); + mutex_unlock(&hms_sysfs_mutex); + + if (!ret && parent) { + object->parent = parent; + get_device(parent); + + sysfs_create_link(&object->device.kobj, &parent->kobj, + kobject_name(&parent->kobj)); + } + + return ret; +} + +int hms_object_link(struct hms_object *objecta, + struct hms_object *objectb) +{ + int ret; + + ret = sysfs_create_link(&objecta->device.kobj, + &objectb->device.kobj, + kobject_name(&objectb->device.kobj)); + if (ret) + return ret; + ret = sysfs_create_link(&objectb->device.kobj, + &objecta->device.kobj, + kobject_name(&objecta->device.kobj)); + if (ret) { + sysfs_remove_link(&objecta->device.kobj, + kobject_name(&objectb->device.kobj)); + return ret; + } + + return 0; +} + +void hms_object_unlink(struct hms_object *objecta, + struct hms_object *objectb) +{ + sysfs_remove_link(&objecta->device.kobj, + kobject_name(&objectb->device.kobj)); + sysfs_remove_link(&objectb->device.kobj, + kobject_name(&objecta->device.kobj)); +} + +struct hms_object *hms_object_get(struct hms_object *object) +{ + if (object == NULL) + return NULL; + + get_device(&object->device); + return object; +} + +void hms_object_put(struct hms_object *object) +{ + put_device(&object->device); +} + +void hms_object_unregister(struct hms_object *object) +{ + mutex_lock(&hms_sysfs_mutex); + device_unregister(&object->device); + mutex_unlock(&hms_sysfs_mutex); +} + +struct hms_object *hms_object_find_locked(unsigned uid) +{ + struct device *device; + + device = subsys_find_device_by_id(&hms_subsys, uid, NULL); + return device ? to_hms_object(device) : NULL; +} + +struct hms_object *hms_object_find(unsigned uid) +{ + struct hms_object *object; + + mutex_lock(&hms_sysfs_mutex); + object = hms_object_find_locked(uid); + mutex_unlock(&hms_sysfs_mutex); + return object; +} + + +static struct attribute *hms_root_attrs[] = { + NULL +}; + +static struct attribute_group hms_root_attr_group = { + .attrs = hms_root_attrs, +}; + +static const struct attribute_group *hms_root_attr_groups[] = { + &hms_root_attr_group, + NULL, +}; + +int __init hms_init(void) +{ + int ret; + + ret = subsys_system_register(&hms_subsys, hms_root_attr_groups); + if (ret) + pr_err("%s() failed: %d\n", __func__, ret); + + return ret; +} diff --git a/drivers/base/init.c b/drivers/base/init.c index 908e6520e804..3b40d5899d66 100644 --- a/drivers/base/init.c +++ b/drivers/base/init.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "base.h" @@ -34,5 +35,6 @@ void __init driver_init(void) platform_bus_init(); cpu_dev_init(); memory_dev_init(); + hms_init(); container_dev_init(); } diff --git a/include/linux/hms.h b/include/linux/hms.h new file mode 100644 index 000000000000..1ab288df0158 --- /dev/null +++ b/include/linux/hms.h @@ -0,0 +1,72 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#ifndef HMS_H +#define HMS_H +#if IS_ENABLED(CONFIG_HMS) + + +#include + + +#define to_hms_object(device) container_of(device, struct hms_object, device) + +enum hms_type { + HMS_BRIDGE, + HMS_INITIATOR, + HMS_LINK, + HMS_TARGET, +}; + +struct hms_object { + struct device *parent; + struct device device; + enum hms_type type; + unsigned version; + unsigned uid; +}; + +void hms_object_release(struct hms_object *object); +int hms_object_init(struct hms_object *object, struct device *parent, + enum hms_type type, unsigned version, + void (*device_release)(struct device *device), + const struct attribute_group **device_group); +int hms_object_link(struct hms_object *objecta, + struct hms_object *objectb); +void hms_object_unlink(struct hms_object *objecta, + struct hms_object *objectb); +struct hms_object *hms_object_get(struct hms_object *object); +void hms_object_put(struct hms_object *object); +void hms_object_unregister(struct hms_object *object); +struct hms_object *hms_object_find_locked(unsigned uid); +struct hms_object *hms_object_find(unsigned uid); + + +int hms_init(void); + + +#else /* IS_ENABLED(CONFIG_HMS) */ + + +static inline int hms_init(void) +{ + return 0; +} + + +#endif /* IS_ENABLED(CONFIG_HMS) */ +#endif /* HMS_H */