From patchwork Mon Dec 3 23:34:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710889 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B022014E2 for ; Mon, 3 Dec 2018 23:35:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D7E92B1EF for ; Mon, 3 Dec 2018 23:35:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 915C92B212; Mon, 3 Dec 2018 23:35:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A6702B1EF for ; Mon, 3 Dec 2018 23:35:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F61C6B6BAA; Mon, 3 Dec 2018 18:35:39 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 47CDC6B6BAB; Mon, 3 Dec 2018 18:35:39 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F63C6B6BAC; Mon, 3 Dec 2018 18:35:39 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id F032F6B6BAA for ; Mon, 3 Dec 2018 18:35:38 -0500 (EST) Received: by mail-qt1-f197.google.com with SMTP id 42so15046446qtr.7 for ; Mon, 03 Dec 2018 15:35:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Vhf1VFrsbERcrLk2w/Eh/lJbssa7aO7E9BwnrXVDV3A=; b=DGwCET8ChiGtoka1mSytbQfx41qyke5qE4nC619hViAz3Pg85IO49eH/XDiVh5OEKw lnC/f7vTCIKxhhQ7gRVn3sfw/9rv2NVjW0uLY3HZNCD5znpnWgo4C7gpd9ODo2cKqsY6 ECDVCCNSHHRHczx+X8BzIcCES0niNc1i4nds95oW9Ob3vVlxIRDJPemOHyFHgt7k+829 bUEA/xnFs7BJRNpWXZwhvcxDro57NyU5UISTOkJZv9M5DR56tcs93QmP7jlCuG1XUxJP JEI3u9S5puJVRR5Hlfn5v2qjY7bmXZrMkIF1BEukafQAab8R5PoXqxm+4oT1KmDeWIET yi8A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWbIYRdr8EsYUykz5e3XTSJ0KBPU8h/Hc8WozyvJTrlEm5uJjuSW dwB4Zdm+cyA6KZckjmuprlFdhB9kYtbwf1uxT18U10twxQQaHt37w3dPB2IjEmdQeCfUGIq0E1l Sw24t3S2J41GMviJw6KjIQblGHT13drF6oJ0m/towXXtbaMp/XZEoVj/7Jj5+Hi82uQ== X-Received: by 2002:ac8:17f0:: with SMTP id r45mr17111438qtk.206.1543880138696; Mon, 03 Dec 2018 15:35:38 -0800 (PST) X-Google-Smtp-Source: AFSGD/UZmGVkGJYkwZFHwJyUFMWhFdFH7XC2dkQINxTA7yQX3hnrV42tw5BvIzDDfVakqAPxSimp X-Received: by 2002:ac8:17f0:: with SMTP id r45mr17111396qtk.206.1543880137525; Mon, 03 Dec 2018 15:35:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880137; cv=none; d=google.com; s=arc-20160816; b=pK8NWkYoipZVd2eYMu2E+ipW7N/7tAnUvKngIiD88ncechH1geAn54aNBTQFBNdKL0 ji+2zFS9glTWye6I26cahkMZ/GDXbARuHY8hP2sgu6UctPElkf1tuuePw04QfTmRBiZB DvT7X5UvYNiuESsCD6NP0PvekZjwOp3paYYKux2rpRWEuP0TSWJ1V3swMGjFT13laBYv 3WHJWY/AH+Ra+yrAKYsPLy5z2N6OIQhruZ2sJKKnOr7RcO/Fm9GjDJq7gHGXXUi0ov+7 xOBVAjF1Ub4I/DeEOcAMjKiRjB5+8Z293/iqYirLGZG/XU5thny/eRauXD5sItS59Vtg 4Z5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=Vhf1VFrsbERcrLk2w/Eh/lJbssa7aO7E9BwnrXVDV3A=; b=YEw5dLC9h+DO/qpOgVwK+v1heWY/YBzJxbSDMc6DIjQkQa2orc9A3s+m+gf4dNHnU9 U3Efss6bqa2hobDmQ7bep0gp1cC2iVHaS3MQnoaE5tEenys5Y9sqsR16ZWdn7x4fIvvI DzrkvTZphFp7bArcJzpWB2lydqCsm4JaVwfoQmIGYFb2as+kylAX75QZAeoT+YYOJQyl OBw/3mEUFMYlsNKrY4QXAczog2jbRRdhM5OP3rTFe6iFZ5u/zuN8IIccZllzB4U1mVWb PajRtb9+FcdQesZdNzm36bJqaDRY/KEM7xyDGJ3z2ARBE+1WFruZdtbW1Y6wEzLWFmEZ imqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id l90si1114416qte.331.2018.12.03.15.35.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:35:37 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 67A553082137; Mon, 3 Dec 2018 23:35:36 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 71973600C1; Mon, 3 Dec 2018 23:35:26 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 01/14] mm/hms: heterogeneous memory system (sysfs infrastructure) Date: Mon, 3 Dec 2018 18:34:56 -0500 Message-Id: <20181203233509.20671-2-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Mon, 03 Dec 2018 23:35:36 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse System with complex memory topology needs a more versatile memory topology description than just node where a node is a collection of memory and CPU. In heterogeneous memory system we consider four types of object: - target: which is any kind of memory - initiator: any kind of device or CPU - link: any kind of link that connects targets and initiators - bridge: a bridge between two links (for some initiators) Properties (like bandwidth, latency, bus width, ...) are define per bridge and per link. Property of a link apply to all initiators which are connected to that link. Not all initiators are connected to all links thus not all initiators can access all targets memory (this apply to CPU too ie some CPU might not be able to access all target memory). Bridges allow initiators (that can use the bridge) to access targets for which they do not have a direct link with. Through this four types of object we can describe any kind of system memory topology. To expose this to userspace we expose a new sysfs hierarchy (that co-exist with the existing one): - /sys/bus/hms/target/ all targets in the system - /sys/bus/hms/initiator all initiators in the system - /sys/bus/hms/interconnect all inter-connects in the system - /sys/bus/hms/bridge all bridges in the system Inside each link or bridge directory they are symlinks to targets and initiators that are connected to that bridge or link. Properties are defined inside link and bridge directory. This patch only introduce core HMS infrastructure, each object type is added with individual patch. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- Documentation/vm/hms.rst | 35 +++++++ drivers/base/Kconfig | 14 +++ drivers/base/Makefile | 1 + drivers/base/hms.c | 199 +++++++++++++++++++++++++++++++++++++++ drivers/base/init.c | 2 + include/linux/hms.h | 72 ++++++++++++++ 6 files changed, 323 insertions(+) create mode 100644 Documentation/vm/hms.rst create mode 100644 drivers/base/hms.c create mode 100644 include/linux/hms.h diff --git a/Documentation/vm/hms.rst b/Documentation/vm/hms.rst new file mode 100644 index 000000000000..dbf0f71918a9 --- /dev/null +++ b/Documentation/vm/hms.rst @@ -0,0 +1,35 @@ +.. hms: + +================================= +Heterogeneous Memory System (HMS) +================================= + +System with complex memory topology needs a more versatile memory topology +description than just node where a node is a collection of memory and CPU. +In heterogeneous memory system we consider four types of object:: + - target: which is any kind of memory + - initiator: any kind of device or CPU + - inter-connect: any kind of links that connects target and initiator + - bridge: a link between two inter-connects + +Properties (like bandwidth, latency, bus width, ...) are define per bridge +and per inter-connect. Property of an inter-connect apply to all initiators +which are link to that inter-connect. Not all initiators are link to all +inter-connect and thus not all initiators can access all memory (this apply +to CPU too ie some CPU might not be able to access all memory). + +Bridges allow initiators (that can use the bridge) to access target for +which they do not have a direct link with (ie they do not share a common +inter-connect with the target). + +Through this four types of object we can describe any kind of system memory +topology. To expose this to userspace we expose a new sysfs hierarchy (that +co-exist with the existing one):: + - /sys/bus/hms/target* all targets in the system + - /sys/bus/hms/initiator* all initiators in the system + - /sys/bus/hms/interconnect* all inter-connects in the system + - /sys/bus/hms/bridge* all bridges in the system + +Inside each bridge or inter-connect directory they are symlinks to targets +and initiators that are linked to that bridge or inter-connect. Properties +are defined inside bridge and inter-connect directory. diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 3e63a900b330..d46a7d47f316 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -276,4 +276,18 @@ config GENERIC_ARCH_TOPOLOGY appropriate scaling, sysfs interface for changing capacity values at runtime. +config HMS + bool "Heterogeneous memory system" + depends on STAGING + default n + help + THIS IS AN EXPERIMENTAL API DO NOT RELY ON IT ! IT IS UNSTABLE ! + + Select HMS if you want to expose heterogeneous memory system to user + space. This will expose a new directory under /sys/class/bus/hms that + provide a description of heterogeneous memory system. + + See Documentations/vm/hms.rst for further informations. + + endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 704f44295810..92ebfacbf0dc 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -12,6 +12,7 @@ obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o obj-y += firmware_loader/ obj-$(CONFIG_NUMA) += node.o +obj-$(CONFIG_HMS) += hms.o obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o ifeq ($(CONFIG_SYSFS),y) obj-$(CONFIG_MODULES) += module.o diff --git a/drivers/base/hms.c b/drivers/base/hms.c new file mode 100644 index 000000000000..a145f00a3683 --- /dev/null +++ b/drivers/base/hms.c @@ -0,0 +1,199 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +#define HMS_CLASS_NAME "hms" + +static DEFINE_MUTEX(hms_sysfs_mutex); + +static struct bus_type hms_subsys = { + .name = HMS_CLASS_NAME, + .dev_name = NULL, +}; + +void hms_object_release(struct hms_object *object) +{ + put_device(object->parent); +} + +int hms_object_init(struct hms_object *object, struct device *parent, + enum hms_type type, unsigned version, + void (*device_release)(struct device *device), + const struct attribute_group **device_group) +{ + static unsigned uid = 0; + int ret; + + mutex_lock(&hms_sysfs_mutex); + + /* + * For now assume we are not going to have more that (2^31)-1 objects + * in a system. + * + * FIXME use something little less naive ... + */ + object->uid = uid++; + + switch (type) { + case HMS_TARGET: + dev_set_name(&object->device, "v%u-%u-target", + version, object->uid); + break; + case HMS_BRIDGE: + dev_set_name(&object->device, "v%u-%u-bridge", + version, object->uid); + break; + case HMS_INITIATOR: + dev_set_name(&object->device, "v%u-%u-initiator", + version, object->uid); + break; + case HMS_LINK: + dev_set_name(&object->device, "v%u-%u-link", + version, object->uid); + break; + default: + mutex_unlock(&hms_sysfs_mutex); + return -EINVAL; + } + + object->type = type; + object->version = version; + object->device.id = object->uid; + object->device.bus = &hms_subsys; + object->device.groups = device_group; + object->device.release = device_release; + + ret = device_register(&object->device); + if (ret) + put_device(&object->device); + mutex_unlock(&hms_sysfs_mutex); + + if (!ret && parent) { + object->parent = parent; + get_device(parent); + + sysfs_create_link(&object->device.kobj, &parent->kobj, + kobject_name(&parent->kobj)); + } + + return ret; +} + +int hms_object_link(struct hms_object *objecta, + struct hms_object *objectb) +{ + int ret; + + ret = sysfs_create_link(&objecta->device.kobj, + &objectb->device.kobj, + kobject_name(&objectb->device.kobj)); + if (ret) + return ret; + ret = sysfs_create_link(&objectb->device.kobj, + &objecta->device.kobj, + kobject_name(&objecta->device.kobj)); + if (ret) { + sysfs_remove_link(&objecta->device.kobj, + kobject_name(&objectb->device.kobj)); + return ret; + } + + return 0; +} + +void hms_object_unlink(struct hms_object *objecta, + struct hms_object *objectb) +{ + sysfs_remove_link(&objecta->device.kobj, + kobject_name(&objectb->device.kobj)); + sysfs_remove_link(&objectb->device.kobj, + kobject_name(&objecta->device.kobj)); +} + +struct hms_object *hms_object_get(struct hms_object *object) +{ + if (object == NULL) + return NULL; + + get_device(&object->device); + return object; +} + +void hms_object_put(struct hms_object *object) +{ + put_device(&object->device); +} + +void hms_object_unregister(struct hms_object *object) +{ + mutex_lock(&hms_sysfs_mutex); + device_unregister(&object->device); + mutex_unlock(&hms_sysfs_mutex); +} + +struct hms_object *hms_object_find_locked(unsigned uid) +{ + struct device *device; + + device = subsys_find_device_by_id(&hms_subsys, uid, NULL); + return device ? to_hms_object(device) : NULL; +} + +struct hms_object *hms_object_find(unsigned uid) +{ + struct hms_object *object; + + mutex_lock(&hms_sysfs_mutex); + object = hms_object_find_locked(uid); + mutex_unlock(&hms_sysfs_mutex); + return object; +} + + +static struct attribute *hms_root_attrs[] = { + NULL +}; + +static struct attribute_group hms_root_attr_group = { + .attrs = hms_root_attrs, +}; + +static const struct attribute_group *hms_root_attr_groups[] = { + &hms_root_attr_group, + NULL, +}; + +int __init hms_init(void) +{ + int ret; + + ret = subsys_system_register(&hms_subsys, hms_root_attr_groups); + if (ret) + pr_err("%s() failed: %d\n", __func__, ret); + + return ret; +} diff --git a/drivers/base/init.c b/drivers/base/init.c index 908e6520e804..3b40d5899d66 100644 --- a/drivers/base/init.c +++ b/drivers/base/init.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "base.h" @@ -34,5 +35,6 @@ void __init driver_init(void) platform_bus_init(); cpu_dev_init(); memory_dev_init(); + hms_init(); container_dev_init(); } diff --git a/include/linux/hms.h b/include/linux/hms.h new file mode 100644 index 000000000000..1ab288df0158 --- /dev/null +++ b/include/linux/hms.h @@ -0,0 +1,72 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#ifndef HMS_H +#define HMS_H +#if IS_ENABLED(CONFIG_HMS) + + +#include + + +#define to_hms_object(device) container_of(device, struct hms_object, device) + +enum hms_type { + HMS_BRIDGE, + HMS_INITIATOR, + HMS_LINK, + HMS_TARGET, +}; + +struct hms_object { + struct device *parent; + struct device device; + enum hms_type type; + unsigned version; + unsigned uid; +}; + +void hms_object_release(struct hms_object *object); +int hms_object_init(struct hms_object *object, struct device *parent, + enum hms_type type, unsigned version, + void (*device_release)(struct device *device), + const struct attribute_group **device_group); +int hms_object_link(struct hms_object *objecta, + struct hms_object *objectb); +void hms_object_unlink(struct hms_object *objecta, + struct hms_object *objectb); +struct hms_object *hms_object_get(struct hms_object *object); +void hms_object_put(struct hms_object *object); +void hms_object_unregister(struct hms_object *object); +struct hms_object *hms_object_find_locked(unsigned uid); +struct hms_object *hms_object_find(unsigned uid); + + +int hms_init(void); + + +#else /* IS_ENABLED(CONFIG_HMS) */ + + +static inline int hms_init(void) +{ + return 0; +} + + +#endif /* IS_ENABLED(CONFIG_HMS) */ +#endif /* HMS_H */ From patchwork Mon Dec 3 23:34:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9BDA813BF for ; Mon, 3 Dec 2018 23:35:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A8512B1EF for ; Mon, 3 Dec 2018 23:35:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E7E02B212; Mon, 3 Dec 2018 23:35:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 34C2C2B1EF for ; Mon, 3 Dec 2018 23:35:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47D176B6BAB; Mon, 3 Dec 2018 18:35:48 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 407966B6BAC; Mon, 3 Dec 2018 18:35:48 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 280A36B6BAD; Mon, 3 Dec 2018 18:35:48 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id E77576B6BAB for ; Mon, 3 Dec 2018 18:35:47 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id p24so15318294qtl.2 for ; Mon, 03 Dec 2018 15:35:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=bAhfSHaNpVkjhiii1+FvF59gmZCQViAd0Klb0FF8pPo=; b=mboUmsMhu8NC7Dhk2g9mUQA1SyHYDQkW83JNT5Ys6QmIIVD84JvZPp0wc6+0RB7C4O 3GQwVGj/fLeaPANuVoK/Er3+44u5ATxSkPLa1BOxSCtBrZGMG2VhqtGb8vjWkonUyC6D ywiKEyYKxbHANdxW4UBf14aeTZZByUhGNCLMD+JfoUHHfLOkG494zhSR5NjgCvRlqbiE WF3VWxZl1KW0dEaJDgn4KGngMbcWlEO1FHEJ4JrPg5B5K/Qx2Bkb4t0CNaiTzYcTsgjw gG1OKt3A/adQH88XJnmKu7sUBEybQ1j/FHrVawaz6cO+96k9g2XqCtPqEOSDSm9pkkCo 4hqw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZJHaIRCvREjugATjnNkrOTPG5KocirUEDFsyFe2M4ttnbiLTTq 3Y2OzI/LIU2UORphFt56+jHDxh9mvnioZzVSYao1nrqP6LD/oF8EUp/rLVL+HdnL8NbDsIHc2cX 5yYsq4xPpbazbtL2Zx0yNtNPs1WRnSH822EgDc4ZeeleNXTokhJ/BjDMVXrtfxcGNgQ== X-Received: by 2002:a37:a0d5:: with SMTP id j204mr16698697qke.261.1543880147644; Mon, 03 Dec 2018 15:35:47 -0800 (PST) X-Google-Smtp-Source: AFSGD/XQaWMTKvcBunxZJxG77ipah3vi2EeqxpTD8SXoXXQxVoK/qISbSSua2lUQF8dl7UtXUyrd X-Received: by 2002:a37:a0d5:: with SMTP id j204mr16698662qke.261.1543880146636; Mon, 03 Dec 2018 15:35:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880146; cv=none; d=google.com; s=arc-20160816; b=KhRM1iIzA/IsbtuCCoeroh2UzQZLUN2B6VEt39MltibtzKISRMLv9SYjOD7LaLvu+O pez26BsfoFqk9twvC0ggtb7B9AAb2ymueRagfoMda0Tz59eg/PH4qn28B6Ti4xwDyyj0 vWlsiGNHXb9v2mUUWt36Vz6AafxasRq5XtlrK8b2WEjuJ/PqexMdh9BNpnL+vMFLM36Q OJKV2YJacxvKl1YFWcLgxtrVHKAwi38Qh8nORn81UULi4R8m0P6uXYsm3D/ON8teO6OW ZVa3YDuQYJecfOifQzzIFpDtieqbrcR1DuRrzy089Vx/Ney2kmbmfslIqOvuMyc4wba/ C1Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=bAhfSHaNpVkjhiii1+FvF59gmZCQViAd0Klb0FF8pPo=; b=hYsAe8wQ8p3zvPf+o1fDW9qSkfefvFvgJbCa880Ge/DehGYZj0n8agiMHMx0QR9MQR d7DXTp6VqlNqi6ww1X+KiPKSkIZfWImr+qFdZ5MrXcbpdG6tk0cQdgLfMHl3XewYm8Xb cqTDWYOSdDxOvtAHKybZyyy6IxRIi/A/xLpt4Gq+Wu1pFKjWFPUx6ZO/cGjlVohHtO6+ QpMumZl/020mN10nVXQ5QSh2H2g42ZGzeyaGbD54iIYMaS5vKUixRqdhWeNfUrPQb0uH 667X3f/h8Elaa+0M7Y1x+EuaK3fa/fFh8mQqiZmUlnc5S1qjmE3kunKmyKP3HM/mEnhW cxpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id v40si3164371qta.42.2018.12.03.15.35.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:35:46 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1CCEFC036770; Mon, 3 Dec 2018 23:35:45 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 90B2D605A8; Mon, 3 Dec 2018 23:35:36 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation Date: Mon, 3 Dec 2018 18:34:57 -0500 Message-Id: <20181203233509.20671-3-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 03 Dec 2018 23:35:45 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Add documentation to what is HMS and what it is for (see patch content). Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- Documentation/vm/hms.rst | 275 ++++++++++++++++++++++++++++++++++----- 1 file changed, 246 insertions(+), 29 deletions(-) diff --git a/Documentation/vm/hms.rst b/Documentation/vm/hms.rst index dbf0f71918a9..bd7c9e8e7077 100644 --- a/Documentation/vm/hms.rst +++ b/Documentation/vm/hms.rst @@ -4,32 +4,249 @@ Heterogeneous Memory System (HMS) ================================= -System with complex memory topology needs a more versatile memory topology -description than just node where a node is a collection of memory and CPU. -In heterogeneous memory system we consider four types of object:: - - target: which is any kind of memory - - initiator: any kind of device or CPU - - inter-connect: any kind of links that connects target and initiator - - bridge: a link between two inter-connects - -Properties (like bandwidth, latency, bus width, ...) are define per bridge -and per inter-connect. Property of an inter-connect apply to all initiators -which are link to that inter-connect. Not all initiators are link to all -inter-connect and thus not all initiators can access all memory (this apply -to CPU too ie some CPU might not be able to access all memory). - -Bridges allow initiators (that can use the bridge) to access target for -which they do not have a direct link with (ie they do not share a common -inter-connect with the target). - -Through this four types of object we can describe any kind of system memory -topology. To expose this to userspace we expose a new sysfs hierarchy (that -co-exist with the existing one):: - - /sys/bus/hms/target* all targets in the system - - /sys/bus/hms/initiator* all initiators in the system - - /sys/bus/hms/interconnect* all inter-connects in the system - - /sys/bus/hms/bridge* all bridges in the system - -Inside each bridge or inter-connect directory they are symlinks to targets -and initiators that are linked to that bridge or inter-connect. Properties -are defined inside bridge and inter-connect directory. +Heterogeneous memory system are becoming more and more the norm, in +those system there is not only the main system memory for each node, +but also device memory and|or memory hierarchy to consider. Device +memory can comes from a device like GPU, FPGA, ... or from a memory +only device (persistent memory, or high density memory device). + +Memory hierarchy is when you not only have the main memory but also +other type of memory like HBM (High Bandwidth Memory often stack up +on CPU die or GPU die), peristent memory or high density memory (ie +something slower then regular DDR DIMM but much bigger). + +On top of this diversity of memories you also have to account for the +system bus topology ie how all CPUs and devices are connected to each +others. Userspace do not care about the exact physical topology but +care about topology from behavior point of view ie what are all the +paths between an initiator (anything that can initiate memory access +like CPU, GPU, FGPA, network controller ...) and a target memory and +what are all the properties of each of those path (bandwidth, latency, +granularity, ...). + +This means that it is no longer sufficient to consider a flat view +for each node in a system but for maximum performance we need to +account for all of this new memory but also for system topology. +This is why this proposal is unlike the HMAT proposal [1] which +tries to extend the existing NUMA for new type of memory. Here we +are tackling a much more profound change that depart from NUMA. + + +One of the reasons for radical change is the advance of accelerator +like GPU or FPGA means that CPU is no longer the only piece where +computation happens. It is becoming more and more common for an +application to use a mix and match of different accelerator to +perform its computation. So we can no longer satisfy our self with +a CPU centric and flat view of a system like NUMA and NUMA distance. + + +HMS tackle this problems through three aspects: + 1 - Expose complex system topology and various kind of memory + to user space so that application have a standard way and + single place to get all the information it cares about. + 2 - A new API for user space to bind/provide hint to kernel on + which memory to use for range of virtual address (a new + mbind() syscall). + 3 - Kernel side changes for vm policy to handle this changes + + +The rest of this documents is splits in 3 sections, the first section +talks about complex system topology: what it is, how it is use today +and how to describe it tomorrow. The second sections talks about +new API to bind/provide hint to kernel for range of virtual address. +The third section talks about new mechanism to track bind/hint +provided by user space or device driver inside the kernel. + + +1) Complex system topology and representing them +================================================ + +Inside a node you can have a complex topology of memory, for instance +you can have multiple HBM memory in a node, each HBM memory tie to a +set of CPUs (all of which are in the same node). This means that you +have a hierarchy of memory for CPUs. The local fast HBM but which is +expected to be relatively small compare to main memory and then the +main memory. New memory technology might also deepen this hierarchy +with another level of yet slower memory but gigantic in size (some +persistent memory technology might fall into that category). Another +example is device memory, and device themself can have a hierarchy +like HBM on top of device core and main device memory. + +On top of that you can have multiple path to access each memory and +each path can have different properties (latency, bandwidth, ...). +Also there is not always symmetry ie some memory might only be +accessible by some device or CPU ie not accessible by everyone. + +So a flat hierarchy for each node is not capable of representing this +kind of complexity. To simplify discussion and because we do not want +to single out CPU from device, from here on out we will use initiator +to refer to either CPU or device. An initiator is any kind of CPU or +device that can access memory (ie initiate memory access). + +At this point a example of such system might help: + - 2 nodes and for each node: + - 1 CPU per node with 2 complex of CPUs cores per CPU + - one HBM memory for each complex of CPUs cores (200GB/s) + - CPUs cores complex are linked to each other (100GB/s) + - main memory is (90GB/s) + - 4 GPUs each with: + - HBM memory for each GPU (1000GB/s) (not CPU accessible) + - GDDR memory for each GPU (500GB/s) (CPU accessible) + - connected to CPU root controller (60GB/s) + - connected to other GPUs (even GPUs from the second + node) with GPU link (400GB/s) + +In this example we restrict our self to bandwidth and ignore bus width +or latency, this is just to simplify discussions but obviously they +also factor in. + + +Userspace very much would like to know about this information, for +instance HPC folks have develop complex library to manage this and +there is wide research on the topics [2] [3] [4] [5]. Today most of +the work is done by hardcoding thing for specific platform. Which is +somewhat acceptable for HPC folks where the platform stays the same +for a long period of time. + +Roughly speaking i see two broads use case for topology information. +First is for virtualization and vm where you want to segment your +hardware properly for each vm (binding memory, CPU and GPU that are +all close to each others). Second is for application, many of which +can partition their workload to minimize exchange between partition +allowing each partition to be bind to a subset of device and CPUs +that are close to each others (for maximum locality). Here it is much +more than just NUMA distance, you can leverage the memory hierarchy +and the system topology all-together (see [2] [3] [4] [5] for more +references and details). + +So this is not exposing topology just for the sake of cool graph in +userspace. They are active user today of such information and if we +want to growth and broaden the usage we should provide a unified API +to standardize how that information is accessible to every one. + + +One proposal so far to handle new type of memory is to user CPU less +node for those [6]. While same idea can apply for device memory, it is +still hard to describe multiple path with different property in such +scheme. While it is backward compatible and have minimum changes, it +simplify can not convey complex topology (think any kind of random +graph, not just a tree like graph). + +So HMS use a new way to expose to userspace the system topology. It +relies on 4 types of objects: + - target: any kind of memory (main memory, HBM, device, ...) + - initiator: CPU or device (anything that can access memory) + - link: anything that link initiator and target + - bridges: anything that allow group of initiator to access + remote target (ie target they are not connected with directly + through an link) + +Properties like bandwidth, latency, ... are all sets per bridges and +links. All initiators connected to an link can access any target memory +also connected to the same link and all with the same link properties. + +Link do not need to match physical hardware ie you can have a single +physical link match a single or multiples software expose link. This +allows to model device connected to same physical link (like PCIE +for instance) but not with same characteristics (like number of lane +or lane speed in PCIE). The reverse is also true ie having a single +software expose link match multiples physical link. + +Bridges allows initiator to access remote link. A bridges connect two +links to each others and is also specific to list of initiators (ie +not all initiators connected to each of the link can use the bridge). +Bridges have their own properties (bandwidth, latency, ...) so that +the actual property value for each property is the lowest common +denominator between bridge and each of the links. + + +This model allows to describe any kind of directed graph and thus +allows to describe any kind of topology we might see in the future. +It is also easier to add new properties to each object type. + +Moreover it can be use to expose devices capable to do peer to peer +between them. For that simply have all devices capable to peer to +peer to have a common link or use the bridge object if the peer to +peer capabilities is only one way for instance. + + +HMS use the above scheme to expose system topology through sysfs under +/sys/bus/hms/ with: + - /sys/bus/hms/devices/v%version-%id-target/ : a target memory, + each has a UID and you can usual value in that folder (node id, + size, ...) + + - /sys/bus/hms/devices/v%version-%id-initiator/ : an initiator + (CPU or device), each has a HMS UID but also a CPU id for CPU + (which match CPU id in (/sys/bus/cpu/). For device you have a + path that can be PCIE BUS ID for instance) + + - /sys/bus/hms/devices/v%version-%id-link : an link, each has a + UID and a file per property (bandwidth, latency, ...) you also + find a symlink to every target and initiator connected to that + link. + + - /sys/bus/hms/devices/v%version-%id-bridge : a bridge, each has + a UID and a file per property (bandwidth, latency, ...) you + also find a symlink to all initiators that can use that bridge. + +To help with forward compatibility each object as a version value and +it is mandatory for user space to only use target or initiator with +version supported by the user space. For instance if user space only +knows about what version 1 means and sees a target with version 2 then +the user space must ignore that target as if it does not exist. + +Mandating that allows the additions of new properties that break back- +ward compatibility ie user space must know how this new property affect +the object to be able to use it safely. + +Main memory of each node is expose under a common target. For now +device driver are responsible to register memory they want to expose +through that scheme but in the future that information might come from +the system firmware (this is a different discussion). + + + +2) hbind() bind range of virtual address to heterogeneous memory +================================================================ + +So instead of using a bitmap, hbind() take an array of uid and each uid +is a unique memory target inside the new memory topology description. +User space also provide an array of modifiers. Modifier can be seen as +the flags parameter of mbind() but here we use an array so that user +space can not only supply a modifier but also value with it. This should +allow the API to grow more features in the future. Kernel should return +-EINVAL if it is provided with an unkown modifier and just ignore the +call all together, forcing the user space to restrict itself to modifier +supported by the kernel it is running on (i know i am dreaming about well +behave user space). + + +Note that none of this is exclusive of automatic memory placement like +autonuma. I also believe that we will see something similar to autonuma +for device memory. + + +3) Tracking and applying heterogeneous memory policies +====================================================== + +Current memory policy infrastructure is node oriented, instead of +changing that and risking breakage and regression HMS adds a new +heterogeneous policy tracking infra-structure. The expectation is +that existing application can keep using mbind() and all existing +infrastructure under-disturb and unaffected, while new application +will use the new API and should avoid mix and matching both (as they +can achieve the same thing with the new API). + +Also the policy is not directly tie to the vma structure for a few +reasons: + - avoid having to split vma for policy that do not cover full vma + - avoid changing too much vma code + - avoid growing the vma structure with an extra pointer + +The overall design is simple, on hbind() call a hms policy structure +is created for the supplied range and hms use the callback associated +with the target memory. This callback is provided by device driver +for device memory or by core HMS for regular main memory. The callback +can decide to migrate the range to the target memories or do nothing +(this can be influenced by flags provided to hbind() too). From patchwork Mon Dec 3 23:34:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1232B13BF for ; Mon, 3 Dec 2018 23:35:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3E9629267 for ; Mon, 3 Dec 2018 23:35:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E571B2B212; Mon, 3 Dec 2018 23:35:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D27C92B1EF for ; Mon, 3 Dec 2018 23:35:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA6766B6BAC; Mon, 3 Dec 2018 18:35:53 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D2E686B6BAD; Mon, 3 Dec 2018 18:35:53 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCF3F6B6BAE; Mon, 3 Dec 2018 18:35:53 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id 8FF6C6B6BAC for ; Mon, 3 Dec 2018 18:35:53 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id s70so14942144qks.4 for ; Mon, 03 Dec 2018 15:35:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=qEWzPDikBrPZIZ8YPALgUwtKNGrdoB4MANJ8TJxUGCg=; b=coxVJ5Hwsk/Hi3ffkud2OIKNpp/QO93az+1GHMXMGy2bkOVMDvPqxMPfwnup5J8/v2 Rxm+y/eLZNu+uj3mtK4I8Erur66TdkC9ni/VullL83gsvvmsTF/8pM6nRiig7X+bqiKd UgiLjOK7tINthzZ5Im9Cab83daeRzxveAKq5ut6Gq/penL3/2ogPyHkGaKYbI1PuHE6h RTZ+aBRaMKdxUKYtBHMYESfyDzBEksidUc6uC7XnEhre1vLlwSzLwODkevBMfD+AlfBj 3NXwS97VEsXJOFxU4JRX/FTeIZyHsi008xbUjHhmC4Le3YlqIQwcu+EW65B995g9R6oj Y96Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWa2WOHF0zDBZZ5DSebxYXBetmYvrFjH0rV9ie31a5UCSQaQghmY oGQwtGMrBrolHzvJfIslv7WWP5vYEiAN7ChctfzVlxZwMuRvZq0LaHXM1+Be3DvFqlgQpoAGi0p FLnu0+yu5+b/J/1uHxQ2Q+OBdwszJY9J7UPDDvNjresFOkCvAC9NzKCS08GaIGHXEXQ== X-Received: by 2002:a37:611:: with SMTP id 17mr16748141qkg.123.1543880153307; Mon, 03 Dec 2018 15:35:53 -0800 (PST) X-Google-Smtp-Source: AFSGD/W2AGFXTmjrLWVF98vQInGkn4jDCxT2810oR/yGH49Bl3C2bmtNCArzmm3J3GnWTFR92Euj X-Received: by 2002:a37:611:: with SMTP id 17mr16748119qkg.123.1543880152612; Mon, 03 Dec 2018 15:35:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880152; cv=none; d=google.com; s=arc-20160816; b=pJ/It14ei8sKJbeaDiPsg18T8RG22Vcr+3yTutdLxcIVZU2PkB9G6Z+pvkr+IXNBw5 W9A5Vl+YfqQiEQwhdaFRIQgrhIpEiDkOOFYHfMOIa7GIBX61O5FBtaVlS/aBIe/+kZjv Gn0Q4/1yaWICGgSW3ijdIkEKd8pDPRJXJMKvezr/wI2Uc42f3iHlANhstGE7MR6ihbm2 N3HYInL/2lg/tnw35mUrWQfoDy6S6ODXnogj0LiAsRbkrgCNO+5YAUbixpv3rxdeorzu QG9iezAovOBopLx28BR15GfFemtkMvuZwJzJSySYYqhHgXUvWMGma7Z3AHAnOuEu26az 6NGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=qEWzPDikBrPZIZ8YPALgUwtKNGrdoB4MANJ8TJxUGCg=; b=QbGMzARXj0APmcALMtW7PFYlDpnkL7TdloYGy8iUfylOV6kO1fT7h3eyA68vQO7KJL VHFCb+5yrUm4wrrLoQaxUzGR5gjhgGO8ozlXDu/cmN4uefwEoLGqtvstazSwr6onKSzs cRmtBTeNjkeEeKmhEUOBngm5WFOK2Pn/+fPzNOlXyOUfEUhMF38XrA0x67uo975dNwh0 cEB79Y0nIuMsiZVqzt/kT+8O9ktdJZfKKZe/SlSmDrECVI7hE/D/CB6G/srkjPj2epR4 1i2l5ICPVLjstCA2DZmTZPB3SpnZNOlMFtu9rJmFxiTTsseQLAz0Vkxv+PqQtL4b7v2I mhXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id o3si6521099qvr.36.2018.12.03.15.35.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:35:52 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8D28530024C8; Mon, 3 Dec 2018 23:35:51 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 49B55600C1; Mon, 3 Dec 2018 23:35:45 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 03/14] mm/hms: add target memory to heterogeneous memory system infrastructure Date: Mon, 3 Dec 2018 18:34:58 -0500 Message-Id: <20181203233509.20671-4-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Mon, 03 Dec 2018 23:35:51 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse A target is some kind of memory, it can be regular main memory or some more specialize memory like CPU's HBM (High Bandwidth Memory) or some device's memory. Some target memory might not be accessible by all initiators (anything that can trigger memory access). For instance some device memory might not be accessible by CPU. This is truely heterogeneous systems at its heart. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- drivers/base/Makefile | 2 +- drivers/base/hms-target.c | 193 ++++++++++++++++++++++++++++++++++++++ include/linux/hms.h | 43 ++++++++- 3 files changed, 235 insertions(+), 3 deletions(-) create mode 100644 drivers/base/hms-target.c diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 92ebfacbf0dc..8e8092145f18 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -12,7 +12,7 @@ obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o obj-y += firmware_loader/ obj-$(CONFIG_NUMA) += node.o -obj-$(CONFIG_HMS) += hms.o +obj-$(CONFIG_HMS) += hms.o hms-target.o obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o ifeq ($(CONFIG_SYSFS),y) obj-$(CONFIG_MODULES) += module.o diff --git a/drivers/base/hms-target.c b/drivers/base/hms-target.c new file mode 100644 index 000000000000..ce28dfe089a3 --- /dev/null +++ b/drivers/base/hms-target.c @@ -0,0 +1,193 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +static DEFINE_MUTEX(hms_target_mutex); + + +static inline struct hms_target *hms_object_to_target(struct hms_object *object) +{ + if (object == NULL) + return NULL; + + if (object->type != HMS_TARGET) + return NULL; + return container_of(object, struct hms_target, object); +} + +static inline struct hms_target *device_to_hms_target(struct device *device) +{ + if (device == NULL) + return NULL; + + return hms_object_to_target(to_hms_object(device)); +} + +struct hms_target *hms_target_find_locked(unsigned uid) +{ + struct hms_object *object = hms_object_find_locked(uid); + struct hms_target *target; + + target = hms_object_to_target(object); + if (target) + return target; + hms_object_put(object); + return NULL; +} + +struct hms_target *hms_target_find(unsigned uid) +{ + struct hms_object *object = hms_object_find(uid); + struct hms_target *target; + + target = hms_object_to_target(object); + if (target) + return target; + hms_object_put(object); + return NULL; +} + +static void hms_target_release(struct device *device) +{ + struct hms_target *target = device_to_hms_target(device); + + hms_object_release(&target->object); + kfree(target); +} + +static ssize_t hms_target_show_size(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct hms_target *target = device_to_hms_target(device); + + if (target == NULL) + return -EINVAL; + + return sprintf(buf, "%ld\n", target->size); +} + +static ssize_t hms_target_show_nid(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct hms_target *target = device_to_hms_target(device); + + if (target == NULL) + return -EINVAL; + + return sprintf(buf, "%d\n", target->nid); +} + +static ssize_t hms_target_show_uid(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct hms_target *target = device_to_hms_target(device); + + if (target == NULL) + return -EINVAL; + + return sprintf(buf, "%d\n", target->object.uid); +} + +static DEVICE_ATTR(size, 0444, hms_target_show_size, NULL); +static DEVICE_ATTR(nid, 0444, hms_target_show_nid, NULL); +static DEVICE_ATTR(uid, 0444, hms_target_show_uid, NULL); + +static struct attribute *hms_target_attrs[] = { + &dev_attr_size.attr, + &dev_attr_nid.attr, + &dev_attr_uid.attr, + NULL +}; + +static struct attribute_group hms_target_attr_group = { + .attrs = hms_target_attrs, +}; + +static const struct attribute_group *hms_target_attr_groups[] = { + &hms_target_attr_group, + NULL, +}; + +void hms_target_register(struct hms_target **targetp, struct device *parent, + int nid, const struct hms_target_hbind *hbind, + unsigned long size, unsigned version) +{ + struct hms_target *target; + + *targetp = NULL; + target = kzalloc(sizeof(*target), GFP_KERNEL); + if (target == NULL) + return; + + target->nid = nid; + target->size = size; + target->hbind = hbind; + + if (hms_object_init(&target->object, parent, HMS_TARGET, version, + hms_target_release, hms_target_attr_groups)) { + kfree(target); + target = NULL; + } + + *targetp = target; +} +EXPORT_SYMBOL(hms_target_register); + +void hms_target_add_memory(struct hms_target *target, unsigned long size) +{ + if (target) { + mutex_lock(&hms_target_mutex); + target->size += size; + mutex_unlock(&hms_target_mutex); + } +} +EXPORT_SYMBOL(hms_target_add_memory); + +void hms_target_remove_memory(struct hms_target *target, unsigned long size) +{ + if (target) { + mutex_lock(&hms_target_mutex); + target->size = size < target->size ? target->size - size : 0; + mutex_unlock(&hms_target_mutex); + } +} +EXPORT_SYMBOL(hms_target_remove_memory); + +void hms_target_unregister(struct hms_target **targetp) +{ + struct hms_target *target = *targetp; + + *targetp = NULL; + if (target == NULL) + return; + + hms_object_unregister(&target->object); +} +EXPORT_SYMBOL(hms_target_unregister); diff --git a/include/linux/hms.h b/include/linux/hms.h index 1ab288df0158..0568fdf6d479 100644 --- a/include/linux/hms.h +++ b/include/linux/hms.h @@ -17,10 +17,21 @@ /* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ #ifndef HMS_H #define HMS_H -#if IS_ENABLED(CONFIG_HMS) - #include +#include + + +struct hms_target; + +struct hms_target_hbind { + int (*migrate)(struct hms_target *target, struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned natoms, uint32_t *atoms); +}; + + +#if IS_ENABLED(CONFIG_HMS) #define to_hms_object(device) container_of(device, struct hms_object, device) @@ -56,12 +67,40 @@ struct hms_object *hms_object_find_locked(unsigned uid); struct hms_object *hms_object_find(unsigned uid); +struct hms_target { + const struct hms_target_hbind *hbind; + struct hms_object object; + unsigned long size; + void *private; + int nid; +}; + +void hms_target_add_memory(struct hms_target *target, unsigned long size); +void hms_target_remove_memory(struct hms_target *target, unsigned long size); +void hms_target_register(struct hms_target **targetp, struct device *parent, + int nid, const struct hms_target_hbind *hbind, + unsigned long size, unsigned version); +void hms_target_unregister(struct hms_target **targetp); +struct hms_target *hms_target_find(unsigned uid); + +static inline void hms_target_put(struct hms_target *target) +{ + hms_object_put(&target->object); +} + + int hms_init(void); #else /* IS_ENABLED(CONFIG_HMS) */ +#define hms_target_add_memory(target, size) +#define hms_target_remove_memory(target, size) +#define hms_target_register(targetp, nid, size) +#define hms_target_unregister(targetp) + + static inline int hms_init(void) { return 0; From patchwork Mon Dec 3 23:34:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710895 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A19D13BF for ; Mon, 3 Dec 2018 23:35:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4986029267 for ; Mon, 3 Dec 2018 23:35:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3C20B2B212; Mon, 3 Dec 2018 23:35:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6CF6229267 for ; Mon, 3 Dec 2018 23:35:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 456C56B6BAD; Mon, 3 Dec 2018 18:35:57 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3E02E6B6BAE; Mon, 3 Dec 2018 18:35:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 25A856B6BAF; Mon, 3 Dec 2018 18:35:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id EB9CF6B6BAD for ; Mon, 3 Dec 2018 18:35:56 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id z68so14924891qkb.14 for ; Mon, 03 Dec 2018 15:35:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=4tVgCUBHBNQMjCO6jTp3guECjaVmciwmYP+Klg4CL4Y=; b=eGKCFw2DwRr1EzKJyHsbynIUKZbSw6UVG411G4THhPgLGObDF3hoAchr7UJBvFrEF6 SE3P2SZaoX00GLz+f799SdDwZ0mrtESk1VJpG08mhq3L6inFSFjmttfAs1S6LFYu2p9u qEkKwuztZq1hm2mU4EibXgI/R+zh6HSCUyyw3VeVE+kGf5ze8jXX1Bu0CsG8PDYPGf2R au5LTsWhjW2Ftxdg5AXeU9HdSzdIymxPCLfv7kcpfj3cAZByAYrfzf1iGk/1VhtvkqgY oEcw4Ucsnc4AeIOxUOcxiGf8G9EtM/DGs10Pp9MTWe4nAIHVE2VmWByrS968pH3G/VNe CUyQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWY7LPlUhe3dLhn3imhKpghtnxraRiPRqocrR0lqfSX3EQN3ASGS JKchnJZLl4ZmNKIwmzE/6q4b7u0VUvXNuHSwZHRvIzPVV4nnnviFKO/9u57P1CwZeZouDDj1pAU oyC3ejXxb/U77Fts3ksW3fBRMf+pFPPKAYV35scD/5lX4nQdHVGYZF0A5EU3pCQ30jA== X-Received: by 2002:a0c:8b64:: with SMTP id d36mr17367525qvc.233.1543880156712; Mon, 03 Dec 2018 15:35:56 -0800 (PST) X-Google-Smtp-Source: AFSGD/VWVzG/uKKA2N42hbyy29GFPEKOy78xQNb4uRdbR4hJk2UUqIFj2xBav7PXoRlkwmThagZK X-Received: by 2002:a0c:8b64:: with SMTP id d36mr17367490qvc.233.1543880156106; Mon, 03 Dec 2018 15:35:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880156; cv=none; d=google.com; s=arc-20160816; b=KajVs1MdbYnZL0piWXAF3rFl3cHGn3rl93gR6/970a2Lmdk+O/V3c05WAFH68n0HOH 8K0xILXai+vDzUbT3D9hDTHLevLE5bmjVgBIIT4wN8MLHk3Sx8ZRWoou/Ldbt162KRfp ckczzbLeYNZnm0hHkfLfrphUZvXrOfCsaZ27yq14wLMR8SuxkFXAg/v8uQzYYlSVj7Xh z931M+SE5Fx1x2M/o1EYQI1mNuJmJU7Nck5vfm/I19t6FfGmGFulkL3Is3oMuTqa6HOn bMkdWQ14wck6nOotAfjZnfFGj2uvzsHMh+b5BuykMxrNpEcFZwi6WzeqXP1nP9lf6sSS axBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=4tVgCUBHBNQMjCO6jTp3guECjaVmciwmYP+Klg4CL4Y=; b=Hk1mUhGBazJkJdOqm929OecxbJTDaPm/9uWOPvicrJoQceoEA7WWr3uQcmP6nyLsDK DYSJcjlorC2iZRnlokWn6m+00/j02YB8srZKSxspE8wIUSOBFyzpiUhFrm2iLre8iIZG yUBw2EMajc0SOm0d9OXmY2PpzfnJpNFEynFQusu2/UbDiVAMyDcT/XcG0JC+WbFDVzEG Hi+VH9C9ZkY8m/GMn+QItG6+c4AQDDJPqOQIiDRvogiIQSmmzVHetaj1xD/F4Y+q0qZC Yiewgr+SScfckGBZId2RAaGkQr2oj5DCUtuFe1hQzW4eSqsyhvCMhpxERVDJ9XUqBw6x iAeQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id u27si7805757qtb.176.2018.12.03.15.35.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:35:56 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0EF973082193; Mon, 3 Dec 2018 23:35:55 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id B9833600C7; Mon, 3 Dec 2018 23:35:51 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 04/14] mm/hms: add initiator to heterogeneous memory system infrastructure Date: Mon, 3 Dec 2018 18:34:59 -0500 Message-Id: <20181203233509.20671-5-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 03 Dec 2018 23:35:55 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse An initiator is anything that can initiate memory access, either a CPU or a device. Here CPUs and devices are treated as equals. See HMS Documentation/vm/hms.txt for further detail.. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- drivers/base/Makefile | 2 +- drivers/base/hms-initiator.c | 141 +++++++++++++++++++++++++++++++++++ include/linux/hms.h | 15 ++++ 3 files changed, 157 insertions(+), 1 deletion(-) create mode 100644 drivers/base/hms-initiator.c diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 8e8092145f18..6a1b5ab667bd 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -12,7 +12,7 @@ obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o obj-y += firmware_loader/ obj-$(CONFIG_NUMA) += node.o -obj-$(CONFIG_HMS) += hms.o hms-target.o +obj-$(CONFIG_HMS) += hms.o hms-target.o hms-initiator.o obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o ifeq ($(CONFIG_SYSFS),y) obj-$(CONFIG_MODULES) += module.o diff --git a/drivers/base/hms-initiator.c b/drivers/base/hms-initiator.c new file mode 100644 index 000000000000..08aa519427d6 --- /dev/null +++ b/drivers/base/hms-initiator.c @@ -0,0 +1,141 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +static inline struct hms_initiator *hms_object_to_initiator(struct hms_object *object) +{ + if (object == NULL) + return NULL; + + if (object->type != HMS_INITIATOR) + return NULL; + return container_of(object, struct hms_initiator, object); +} + +static inline struct hms_initiator *device_to_hms_initiator(struct device *device) +{ + if (device == NULL) + return NULL; + + return hms_object_to_initiator(to_hms_object(device)); +} + +struct hms_initiator *hms_initiator_find_locked(unsigned uid) +{ + struct hms_object *object = hms_object_find_locked(uid); + struct hms_initiator *initiator; + + initiator = hms_object_to_initiator(object); + if (initiator) + return initiator; + hms_object_put(object); + return NULL; +} + +struct hms_initiator *hms_initiator_find(unsigned uid) +{ + struct hms_object *object = hms_object_find(uid); + struct hms_initiator *initiator; + + initiator = hms_object_to_initiator(object); + if (initiator) + return initiator; + hms_object_put(object); + return NULL; +} + +static void hms_initiator_release(struct device *device) +{ + struct hms_initiator *initiator = device_to_hms_initiator(device); + + hms_object_release(&initiator->object); + kfree(initiator); +} + +static ssize_t hms_initiator_show_uid(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct hms_initiator *initiator = device_to_hms_initiator(device); + + if (initiator == NULL) + return -EINVAL; + + return sprintf(buf, "%d\n", initiator->object.uid); +} + +static DEVICE_ATTR(uid, 0444, hms_initiator_show_uid, NULL); + +static struct attribute *hms_initiator_attrs[] = { + &dev_attr_uid.attr, + NULL +}; + +static struct attribute_group hms_initiator_attr_group = { + .attrs = hms_initiator_attrs, +}; + +static const struct attribute_group *hms_initiator_attr_groups[] = { + &hms_initiator_attr_group, + NULL, +}; + +void hms_initiator_register(struct hms_initiator **initiatorp, + struct device *parent, int nid, + unsigned version) +{ + struct hms_initiator *initiator; + + *initiatorp = NULL; + initiator = kzalloc(sizeof(*initiator), GFP_KERNEL); + if (initiator == NULL) + return; + + initiator->nid = nid; + + if (hms_object_init(&initiator->object, parent, HMS_INITIATOR, version, + hms_initiator_release, hms_initiator_attr_groups)) + { + kfree(initiator); + initiator = NULL; + } + + *initiatorp = initiator; +} +EXPORT_SYMBOL(hms_initiator_register); + +void hms_initiator_unregister(struct hms_initiator **initiatorp) +{ + struct hms_initiator *initiator = *initiatorp; + + *initiatorp = NULL; + if (initiator == NULL) + return; + + hms_object_unregister(&initiator->object); +} +EXPORT_SYMBOL(hms_initiator_unregister); diff --git a/include/linux/hms.h b/include/linux/hms.h index 0568fdf6d479..7a2823493f63 100644 --- a/include/linux/hms.h +++ b/include/linux/hms.h @@ -67,6 +67,17 @@ struct hms_object *hms_object_find_locked(unsigned uid); struct hms_object *hms_object_find(unsigned uid); +struct hms_initiator { + struct hms_object object; + int nid; +}; + +void hms_initiator_register(struct hms_initiator **initiatorp, + struct device *parent, int nid, + unsigned version); +void hms_initiator_unregister(struct hms_initiator **initiatorp); + + struct hms_target { const struct hms_target_hbind *hbind; struct hms_object object; @@ -95,6 +106,10 @@ int hms_init(void); #else /* IS_ENABLED(CONFIG_HMS) */ +#define hms_initiator_register(initiatorp) +#define hms_initiator_unregister(initiatorp) + + #define hms_target_add_memory(target, size) #define hms_target_remove_memory(target, size) #define hms_target_register(targetp, nid, size) From patchwork Mon Dec 3 23:35:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5BA0614E2 for ; Mon, 3 Dec 2018 23:36:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B3CA29267 for ; Mon, 3 Dec 2018 23:36:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3F3842B212; Mon, 3 Dec 2018 23:36:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E7E029267 for ; Mon, 3 Dec 2018 23:36:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 235F96B6BAE; Mon, 3 Dec 2018 18:36:01 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1966C6B6BAF; Mon, 3 Dec 2018 18:36:01 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03A246B6BB0; Mon, 3 Dec 2018 18:36:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id C79156B6BAE for ; Mon, 3 Dec 2018 18:36:00 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id k203so14975907qke.2 for ; Mon, 03 Dec 2018 15:36:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=XeiKgLSePtBBt1YPT9+iZv2xdm7yAGzLblr3Z0Y6AK4=; b=axfhFd9VWdePVo3149f5ZHBqoKCP2IiDEf+oO6hMbu58jlTabqFHjx+9+6DzbhpzK3 QlSDSzyzmy9sXUh1rNNK8DCJcD4OvNWT3dfwytQx6dIa2rhZXzPdUl4McqIb+N8bP45/ X2yA98OdeVMiXbVVJieqXaoE19rXiH4QMV2vdo5zCHvcEi2qXhx+QMSub6Vo8URHoJa3 WLegkkGIHZMV23ykJDuNcle41esZNWXXuws4aDybNpIpSYwhddWu+imFpj4j2As96epJ MCwuhJrj+3dCquMALta3L4kiTKsE2wOWjWfFZ23lgTGY8Erc03Utk0AxATk0Z6IZArBh Sz7g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWbJShDUhpLWROMonrDPecwxwdpFbpiV962xnb9t95JyERDepHYY SAL0qQq4Lh9YPQ2Ed9yauKJtyQQkREzqtx+GsXC14InwlVHcWL3X7Cw9Mn7sQqMI+iLxHnbW3zW FlobIQeGYcQq78BDO0M8nfcHBh+dhJjnM7s6jS+/j6iHSSlREYyrgukakpegJbnvyxg== X-Received: by 2002:a37:17df:: with SMTP id 92mr17100744qkx.191.1543880160553; Mon, 03 Dec 2018 15:36:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/XPIFh7APiihHcfK6e+KeVcmpVxm13KQSHF+6MYkeVcAoL10+yPFI0CCm7Bzq18RTRVM5gV X-Received: by 2002:a37:17df:: with SMTP id 92mr17100718qkx.191.1543880159822; Mon, 03 Dec 2018 15:35:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880159; cv=none; d=google.com; s=arc-20160816; b=QpGjYFXlNE7+FN9ZlPMO68ZLD7vM/8WRO0dMGuYsnp4WXykpOL7LU4USGm9Qg8F15q M3vktIiuI72vOSHUKuUsCSnZReNHghMPw4dqaVRepHh1+HlZeLPZb/rQ4PEI8p4Iv1g5 92h34Ud/QmI8iYsZEHvPsIdiujt/BqU+n/GgepbqgbG1QyDVJLUmEOcoVDRzN2B+jMli wcg1MUfBqRz+2GZPcuTdj/otT0rLO/Oub2HZSKyAYxpfPw71mzc2GnXNAnpC3LUP2Rcq fS0JctVHYIOoSswhDZfl7sQukb8OTstxLvUjph1LWyAhaVELf0WH8uGQ0XkBtcjzg6k/ hqMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=XeiKgLSePtBBt1YPT9+iZv2xdm7yAGzLblr3Z0Y6AK4=; b=wQRd8nx5Qstf7SxDy7wdUu+RzrHXarkXUXoIGKIhdAvcKBxAW0AatjDa54wmTSiWCq TCXb4g5JkQxlQ+L8LNmQgb9kdst9ppagz3Z0/uxtYTDT82wvZ2AbAGiDYu6iapP0BgRd GyGVENhKIH29w4aDS7psePHByu3J1q5MTDI2Pj2pkuo7XQJAaiBbCgHqZ6g9l8zkHEgn IGf/mYiTupXRgriyTFi5obK1YWwTbHqjIBCM+z1cJNqjkqeWJbL3w2bAttnRxGQkOjFp 4+KRtyLA++39cv9KcpTJeBz7dWVQNbo5aFOKOGcxyOw5i3wQ+6fiS+IklHJ+46IHEzJc 2/eA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id p45si5826356qta.144.2018.12.03.15.35.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:35:59 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9111F315486B; Mon, 3 Dec 2018 23:35:58 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3CC23604A3; Mon, 3 Dec 2018 23:35:55 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 05/14] mm/hms: add link to heterogeneous memory system infrastructure Date: Mon, 3 Dec 2018 18:35:00 -0500 Message-Id: <20181203233509.20671-6-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 03 Dec 2018 23:35:59 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse A link connect initiators (CPUs or devices) and targets memory with each others. It does necessarily match one to one with a physical inter-connect ie a given physical inter-connect by be presented as multiple links or multiple physical inter-connect can be presented as just one link. What matters is that the properties associated with the links applies to all initiators and targets listed as connected to that link. For example you can consider the PCIE bus if all initiators can peer to peer with each others than it can be presented as just one link with all the PCIE devices in it and the local CPU (ie CPU from which the PCIE lanes are coming from). If not all PCIE device can peer to peer than a link per peer to peer group is created and corresponding CPU is added to each. See HMS Documentation/vm/hms.txt for detail. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- drivers/base/Makefile | 2 +- drivers/base/hms-link.c | 183 ++++++++++++++++++++++++++++++++++++++++ include/linux/hms.h | 23 +++++ 3 files changed, 207 insertions(+), 1 deletion(-) create mode 100644 drivers/base/hms-link.c diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 6a1b5ab667bd..b8ff678fdae9 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -12,7 +12,7 @@ obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o obj-y += firmware_loader/ obj-$(CONFIG_NUMA) += node.o -obj-$(CONFIG_HMS) += hms.o hms-target.o hms-initiator.o +obj-$(CONFIG_HMS) += hms.o hms-target.o hms-initiator.o hms-link.o obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o ifeq ($(CONFIG_SYSFS),y) obj-$(CONFIG_MODULES) += module.o diff --git a/drivers/base/hms-link.c b/drivers/base/hms-link.c new file mode 100644 index 000000000000..58f4fdd8977c --- /dev/null +++ b/drivers/base/hms-link.c @@ -0,0 +1,183 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +struct hms_link *hms_object_to_link(struct hms_object *object) +{ + if (object == NULL) + return NULL; + + if (object->type != HMS_LINK) + return NULL; + return container_of(object, struct hms_link, object); +} + +static inline struct hms_link *device_to_hms_link(struct device *device) +{ + if (device == NULL) + return NULL; + + return hms_object_to_link(to_hms_object(device)); +} + +struct hms_link *hms_link_find_locked(unsigned uid) +{ + struct hms_object *object = hms_object_find_locked(uid); + struct hms_link *link; + + link = hms_object_to_link(object); + if (link) + return link; + hms_object_put(object); + return NULL; +} + +struct hms_link *hms_link_find(unsigned uid) +{ + struct hms_object *object = hms_object_find(uid); + struct hms_link *link; + + link = hms_object_to_link(object); + if (link) + return link; + hms_object_put(object); + return NULL; +} + +static void hms_link_release(struct device *device) + +{ + struct hms_link *link = device_to_hms_link(device); + + hms_object_release(&link->object); + kfree(link); +} + +static ssize_t hms_link_show_uid(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct hms_link *link = device_to_hms_link(device); + + if (link == NULL) + return -EINVAL; + + return sprintf(buf, "%d\n", link->object.uid); +} + +static DEVICE_ATTR(uid, 0444, hms_link_show_uid, NULL); + +static struct attribute *hms_link_attrs[] = { + &dev_attr_uid.attr, + NULL +}; + +static struct attribute_group hms_link_attr_group = { + .attrs = hms_link_attrs, +}; + +static const struct attribute_group *hms_link_attr_groups[] = { + &hms_link_attr_group, + NULL, +}; + +void hms_link_register(struct hms_link **linkp, struct device *parent, + unsigned version) +{ + struct hms_link *link; + + *linkp = NULL; + link = kzalloc(sizeof(*link), GFP_KERNEL); + if (link == NULL) + return; + + if (hms_object_init(&link->object, parent, HMS_LINK, version, + hms_link_release, hms_link_attr_groups)) { + kfree(link); + link = NULL; + } + + *linkp = link; +} +EXPORT_SYMBOL(hms_link_register); + +void hms_unlink_initiator(struct hms_link *link, + struct hms_initiator *initiator) +{ + if (link == NULL || initiator == NULL) + return; + if (link->object.type != HMS_LINK) + return; + if (initiator->object.type != HMS_INITIATOR) + return; + hms_object_unlink(&link->object, &initiator->object); +} +EXPORT_SYMBOL(hms_unlink_initiator); + +void hms_unlink_target(struct hms_link *link, struct hms_target *target) +{ + if (link == NULL || target == NULL) + return; + if (link->object.type != HMS_LINK || target->object.type != HMS_TARGET) + return; + hms_object_unlink(&link->object, &target->object); +} +EXPORT_SYMBOL(hms_unlink_target); + +int hms_link_initiator(struct hms_link *link, struct hms_initiator *initiator) +{ + if (link == NULL || initiator == NULL) + return -EINVAL; + if (link->object.type != HMS_LINK) + return -EINVAL; + if (initiator->object.type != HMS_INITIATOR) + return -EINVAL; + return hms_object_link(&link->object, &initiator->object); +} +EXPORT_SYMBOL(hms_link_initiator); + +int hms_link_target(struct hms_link *link, struct hms_target *target) +{ + if (link == NULL || target == NULL) + return -EINVAL; + if (link->object.type != HMS_LINK || target->object.type != HMS_TARGET) + return -EINVAL; + return hms_object_link(&link->object, &target->object); +} +EXPORT_SYMBOL(hms_link_target); + +void hms_link_unregister(struct hms_link **linkp) +{ + struct hms_link *link = *linkp; + + *linkp = NULL; + if (link == NULL) + return; + + hms_object_unregister(&link->object); +} +EXPORT_SYMBOL(hms_link_unregister); diff --git a/include/linux/hms.h b/include/linux/hms.h index 7a2823493f63..2a9e49a2d771 100644 --- a/include/linux/hms.h +++ b/include/linux/hms.h @@ -100,6 +100,21 @@ static inline void hms_target_put(struct hms_target *target) } +struct hms_link { + struct hms_object object; +}; + +struct hms_link *hms_object_to_link(struct hms_object *object); +void hms_unlink_initiator(struct hms_link *link, + struct hms_initiator *initiator); +void hms_unlink_target(struct hms_link *link, struct hms_target *target); +int hms_link_initiator(struct hms_link *link, struct hms_initiator *initiator); +int hms_link_target(struct hms_link *link, struct hms_target *target); +void hms_link_register(struct hms_link **linkp, struct device *parent, + unsigned version); +void hms_link_unregister(struct hms_link **linkp); + + int hms_init(void); @@ -116,6 +131,14 @@ int hms_init(void); #define hms_target_unregister(targetp) +#define hms_unlink_initiator(link, initiator) +#define hms_unlink_target(link, target) +#define hms_link_initiator(link, initiator) +#define hms_link_target(link, target) +#define hms_link_register(linkp) +#define hms_link_unregister(linkp) + + static inline int hms_init(void) { return 0; From patchwork Mon Dec 3 23:35:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADA7814E2 for ; Mon, 3 Dec 2018 23:36:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9CC5829267 for ; Mon, 3 Dec 2018 23:36:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8FE652B222; Mon, 3 Dec 2018 23:36:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD78D29267 for ; Mon, 3 Dec 2018 23:36:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 84EEE6B6BB0; Mon, 3 Dec 2018 18:36:04 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7D85C6B6BB1; Mon, 3 Dec 2018 18:36:04 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 654576B6BB2; Mon, 3 Dec 2018 18:36:04 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id 35A186B6BB0 for ; Mon, 3 Dec 2018 18:36:04 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id y83so14733007qka.7 for ; Mon, 03 Dec 2018 15:36:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=lteeHK8AAWG2SXyXToM8ykO2ukjRJHO6lSFifN80R68=; b=VTYmS+qWYBK8p2c4FUkyLpN0G4nB+f2DXdmd4RKhMeY9kJsf1mNr8o3izL6PuLQHu8 lI6/+Q6LzpnCQ8HmrA5SVg2P9+0kMkcx/kLM0v2BzU6D+lIAAcyccEWhsAPLYiun+bf8 sissRYeI9BHUcI4q1K5U8xTuw0KUnXqe0IuONidE8zvBgq4UjcY/jO5StKL0XpFKNoBv i4K9kMs7zUPiIwqpNJRIaQK2Dgl7bJnyuTpTdjeXEsIGm3VT9Of/XdNnwcCM0k5LJUbn mKe1c07ubINjpEbe0J4+BO0syuQiKBpfN2RAU8/G0Xm1JZ5mcvBPrZ5Uza8js+uVHwYR Yvfg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWYAemRFcWfwJtE0AN4zC9BgIttKlypN7gaj/KvGpjjUpio3itXp aCdEuUSwU70sp+on+yM2YKHBGq96cpOVFcC9ggjR27ZvTgcAre9fk1u0QrDiKm6efeIMe7vUuQz ZCnZAV1DNa519yGfpi75CSjIdeH6/CPF0v/lh6HRkEw6rF2ZfM8aRiL26ioOO15bizA== X-Received: by 2002:a0c:95b5:: with SMTP id s50mr17744892qvs.64.1543880163942; Mon, 03 Dec 2018 15:36:03 -0800 (PST) X-Google-Smtp-Source: AFSGD/UrAvkQBV/uFtmyPwziTZDg4NpOfjS9uHIt/xXOxhvwgn9rit3O5QG0vTa8qDkz2AZYVcTr X-Received: by 2002:a0c:95b5:: with SMTP id s50mr17744871qvs.64.1543880163231; Mon, 03 Dec 2018 15:36:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880163; cv=none; d=google.com; s=arc-20160816; b=08d6auV99XDlvnGGEXxiuERKg8mGYc+dcYdEAMttv9kX40rdEJt+kY/VduqC6N2vgm kyGOkDAsMjgTcgDfUsavzvb4Jl3I3SoQFwYUhRcjcpRmQTWsA+gjgY2UUr9oyyJq7OK0 Rvc2GvlaC8sqlJ1rFsvspWIWLfbi44nDTk/QFtTLKpAnyLpXQiKbd41yqbxnRXndh4JT 2IfnQSBV3njrzASmhBHHHQXse5+3l2/FsPOadK0Duc65MuyHrc4M2J5qE0MBbXk6o8cn +Wjmv8BU8zmhiItJUTk/cNAUVsfCe7LMOof8so4tO7u6JW66+ob0jjNAjgpUmbal+TsR NDhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=lteeHK8AAWG2SXyXToM8ykO2ukjRJHO6lSFifN80R68=; b=opqcilcBbjybjNAPYYUs02AJrXzX/9sYhhbQThkkYZb2hG8L4lrd+XQRnrCfbV9Y3G DnFoqdbAYqsbcLwVgULb+JtvVQ7felraDtL7EHUkDtt6X0YY3drcIydMS40HbKU0NSVY NY197QG3zBPcH/GLAGs1LbN7kJPW0MPqLw6C3NWfr7HAJ4dApDVEdtp40Gt6dly9vh75 Bj8nEhsX7wHTKYdbCMgjtQgp7kBOjlj5KQVDDsYnzl30eZ/BPtkaVqwc2b84TCOCHGLb apQRIPw3wxyAQ126O9YBjc02CJhgyazsv2fmYbl0Pgi3HSnnsviOxvkmb1U4I0QxgMNA FcKQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id u11si101004qvl.90.2018.12.03.15.36.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:03 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 163EA2D7FB; Mon, 3 Dec 2018 23:36:02 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id B9873600C1; Mon, 3 Dec 2018 23:35:58 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 06/14] mm/hms: add bridge to heterogeneous memory system infrastructure Date: Mon, 3 Dec 2018 18:35:01 -0500 Message-Id: <20181203233509.20671-7-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 03 Dec 2018 23:36:02 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse A bridge connect two links with each others and apply only to listed initiators. With links, this allows to describe any kind of system topology ie any kind of directed graph. Moreover with bridges the userspace can choose to use different bridges to load balance bandwidth usage accross multiple paths between targets memory and initiators. Note that explicit path selection is not always under the control of user space, some system might do load balancing in hardware. See HMS Documentation/vm/hms.txt for detail. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- drivers/base/Makefile | 2 +- drivers/base/hms-bridge.c | 197 ++++++++++++++++++++++++++++++++++++++ include/linux/hms.h | 24 +++++ 3 files changed, 222 insertions(+), 1 deletion(-) create mode 100644 drivers/base/hms-bridge.c diff --git a/drivers/base/Makefile b/drivers/base/Makefile index b8ff678fdae9..62695fdcd32f 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -12,7 +12,7 @@ obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o obj-y += firmware_loader/ obj-$(CONFIG_NUMA) += node.o -obj-$(CONFIG_HMS) += hms.o hms-target.o hms-initiator.o hms-link.o +obj-$(CONFIG_HMS) += hms.o hms-target.o hms-initiator.o hms-link.o hms-bridge.o obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o ifeq ($(CONFIG_SYSFS),y) obj-$(CONFIG_MODULES) += module.o diff --git a/drivers/base/hms-bridge.c b/drivers/base/hms-bridge.c new file mode 100644 index 000000000000..64732e923fba --- /dev/null +++ b/drivers/base/hms-bridge.c @@ -0,0 +1,197 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +static inline struct hms_bridge *hms_object_to_bridge(struct hms_object *object) +{ + if (object == NULL) + return NULL; + + if (object->type != HMS_BRIDGE) + return NULL; + return container_of(object, struct hms_bridge, object); +} + +static inline struct hms_bridge *device_to_hms_bridge(struct device *device) +{ + if (device == NULL) + return NULL; + + return hms_object_to_bridge(to_hms_object(device)); +} + +struct hms_bridge *hms_bridge_find_locked(unsigned uid) +{ + struct hms_object *object = hms_object_find_locked(uid); + struct hms_bridge *bridge; + + bridge = hms_object_to_bridge(object); + if (bridge) + return bridge; + hms_object_put(object); + return NULL; +} + +struct hms_bridge *hms_bridge_find(unsigned uid) +{ + struct hms_object *object = hms_object_find(uid); + struct hms_bridge *bridge; + + bridge = hms_object_to_bridge(object); + if (bridge) + return bridge; + hms_object_put(object); + return NULL; +} + +static void hms_bridge_release(struct device *device) +{ + struct hms_bridge *bridge = device_to_hms_bridge(device); + + hms_object_put(&bridge->linka->object); + hms_object_put(&bridge->linkb->object); + hms_object_release(&bridge->object); + kfree(bridge); +} + +static ssize_t hms_bridge_show_uid(struct device *device, + struct device_attribute *attr, + char *buf) +{ + struct hms_bridge *bridge = device_to_hms_bridge(device); + + if (bridge == NULL) + return -EINVAL; + + return sprintf(buf, "%d\n", bridge->object.uid); +} + +static DEVICE_ATTR(uid, 0444, hms_bridge_show_uid, NULL); + +static struct attribute *hms_bridge_attrs[] = { + &dev_attr_uid.attr, + NULL +}; + +static struct attribute_group hms_bridge_attr_group = { + .attrs = hms_bridge_attrs, +}; + +static const struct attribute_group *hms_bridge_attr_groups[] = { + &hms_bridge_attr_group, + NULL, +}; + +void hms_bridge_register(struct hms_bridge **bridgep, + struct device *parent, + struct hms_link *linka, + struct hms_link *linkb, + unsigned version) +{ + struct hms_bridge *bridge; + int ret; + + *bridgep = NULL; + + if (linka == NULL || linkb == NULL) + return; + linka = hms_object_to_link(hms_object_get(&linka->object)); + linkb = hms_object_to_link(hms_object_get(&linkb->object)); + if (linka == NULL || linkb == NULL) + goto error; + + bridge = kzalloc(sizeof(*bridge), GFP_KERNEL); + if (bridge == NULL) + goto error; + + if (hms_object_init(&bridge->object, parent, HMS_BRIDGE, version, + hms_bridge_release, hms_bridge_attr_groups)) { + kfree(bridge); + goto error; + } + + bridge->linka = linka; + bridge->linkb = linkb; + + ret = hms_object_link(&bridge->object, &linka->object); + if (ret) { + hms_bridge_unregister(&bridge); + return; + } + + ret = hms_object_link(&bridge->object, &linkb->object); + if (ret) { + hms_bridge_unregister(&bridge); + return; + } + + *bridgep = bridge; + return; + +error: + hms_object_put(&linka->object); + hms_object_put(&linkb->object); +} +EXPORT_SYMBOL(hms_bridge_register); + +void hms_unbridge_initiator(struct hms_bridge *bridge, + struct hms_initiator *initiator) +{ + if (bridge == NULL || initiator == NULL) + return; + if (bridge->object.type != HMS_BRIDGE) + return; + if (initiator->object.type != HMS_INITIATOR) + return; + hms_object_unlink(&bridge->object, &initiator->object); +} +EXPORT_SYMBOL(hms_unbridge_initiator); + +int hms_bridge_initiator(struct hms_bridge *bridge, + struct hms_initiator *initiator) +{ + if (bridge == NULL || initiator == NULL) + return -EINVAL; + if (bridge->object.type != HMS_BRIDGE) + return -EINVAL; + if (initiator->object.type != HMS_INITIATOR) + return -EINVAL; + return hms_object_link(&bridge->object, &initiator->object); +} +EXPORT_SYMBOL(hms_bridge_initiator); + +void hms_bridge_unregister(struct hms_bridge **bridgep) +{ + struct hms_bridge *bridge = *bridgep; + + *bridgep = NULL; + if (bridge == NULL) + return; + + hms_object_unregister(&bridge->object); +} +EXPORT_SYMBOL(hms_bridge_unregister); diff --git a/include/linux/hms.h b/include/linux/hms.h index 2a9e49a2d771..511b5363d8f2 100644 --- a/include/linux/hms.h +++ b/include/linux/hms.h @@ -115,6 +115,24 @@ void hms_link_register(struct hms_link **linkp, struct device *parent, void hms_link_unregister(struct hms_link **linkp); +struct hms_bridge { + struct hms_object object; + struct hms_link *linka; + struct hms_link *linkb; +}; + +void hms_unbridge_initiator(struct hms_bridge *bridge, + struct hms_initiator *initiator); +int hms_bridge_initiator(struct hms_bridge *bridge, + struct hms_initiator *initiator); +void hms_bridge_register(struct hms_bridge **bridgep, + struct device *parent, + struct hms_link *linka, + struct hms_link *linkb, + unsigned version); +void hms_bridge_unregister(struct hms_bridge **bridgep); + + int hms_init(void); @@ -139,6 +157,12 @@ int hms_init(void); #define hms_link_unregister(linkp) +#define hms_unbridge_initiator(bridge, initiator) +#define hms_bridge_initiator(bridge, initiator) +#define hms_bridge_register(bridgep) +#define hms_bridge_unregister(bridgep) + + static inline int hms_init(void) { return 0; From patchwork Mon Dec 3 23:35:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7389413BF for ; Mon, 3 Dec 2018 23:36:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6267129267 for ; Mon, 3 Dec 2018 23:36:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 54B3C2B222; Mon, 3 Dec 2018 23:36:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A0F329267 for ; Mon, 3 Dec 2018 23:36:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CA6B6B6BB1; Mon, 3 Dec 2018 18:36:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 650CB6B6BB2; Mon, 3 Dec 2018 18:36:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F1F86B6BB3; Mon, 3 Dec 2018 18:36:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 23AA56B6BB1 for ; Mon, 3 Dec 2018 18:36:08 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id u32so15365478qte.1 for ; Mon, 03 Dec 2018 15:36:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=zRntr0v24OrOu9F3uW12kta4v58xqWX3PVG2df0NLbw=; b=q9cTE/2xvKnP8aNwJvQMCbhUclbjV1EehrQZ5arIxju8Xxyh6/cwoXCnwCygI2cDnq C70xbg+AikvjZ+dQJrpt8xMfitRLFv9TekKnd0smCaVuCcXYcfqY4mPWigMQbm5ziOpy jEz8xUOSjBz0Q+JkQFUYSatNO5NbNwTiGRxrG/kL7XvhIaIervJP96HUbof3ihDt5y3X TGuMVxCPcY3yUQBcU1IIQ0xR1fwMbjmErdEzr/VtNGtwun7TkTEv12JsI+BvdS0OeyUy +lyh+ZSAUu7v6M8Ixuqa+mfonYRwaXpLyZXOMco8cGNQRXjJWxR49GOImuYyVxM0gsI1 8Xjw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWYPO28MKxK4SLE4V/tXSx/qHC4PROyzqNrFjS9JonDG0feAaGzh tFRIr0BARvIsPOProlA1fKyb5pxkdIPHr9oHlYcXuh1NkTA3Ciwzwoz6UJewAhBnKmUoSDRRkHR ibAdtDgRFy/Asn1/+ahspOZcDm0Zl1NYKoXbpb2tegtfr5FLGknDxE1tFDrDMdIYBsg== X-Received: by 2002:a37:2f84:: with SMTP id v126mr16295805qkh.254.1543880167877; Mon, 03 Dec 2018 15:36:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/XN3nM0QC08Zn5+t6RLGF2RVGLyRUcVuykXioi2+8uRRXeCHNMBLbkt0VDD2ws/ewmQM7D9 X-Received: by 2002:a37:2f84:: with SMTP id v126mr16295770qkh.254.1543880166776; Mon, 03 Dec 2018 15:36:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880166; cv=none; d=google.com; s=arc-20160816; b=0nfYyi2UlmLeqRVvbgHhdSrU81Rz6KgRPVqXwvmckYrUxKUPcIGuzrjEcHeEIZKlfO h7cLnO7kAysQcFhZqqHzqbHAzrFxSIz0Gw9738j9xEL76VzZe6TD0HpbdQL3+eVIbd4b TJ+AumiCN+BIYRwbwyrzmBfP3cwIJXWoLo5wkrNYG6lQKJ+D6TyyBlNasw2+zV3CjDH7 98zuaW373HRGM6glU5nrQxr/3+vO8asdi2yIRSLqwNG76SvTkoczqlkW2NgX5yNF+Z4A SltQLW4mJyp+huYzMZstONIe+aJSUAn6aG/ySuP9Fg81po6LtLESPsfyp5bmrJxOPOFc b2Ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=zRntr0v24OrOu9F3uW12kta4v58xqWX3PVG2df0NLbw=; b=V1P+AMATXvu6rEzSqdK9hVleDsIkVHoP9whPnMNxyboa0yJlE/uBS5CWgZW+Pr3XMG oSM/uRT3M1FfVegp6SGwg5F37DimCCiwCjw3P+9lTCxY3G9zbAxWC8bUV5FgpvENfjGv QYquY3OQMifsO3XE9Vwn1f9QhE7IN+LitW1jPGB+g9KnBg3rJY6JixjeQSlOisIMq3n/ Ht7+pAaUziw8MfpFEGzM3uVZ3GNpfEpZTF4bPurPV3BJ3Ff9qyWmU8BKBXhk9/vYQyWW j1WMMgvHQ1vVAD39P0xhYIn4Rz0JIcCUITsJSB38sx9EQFHX12uk5QiKv6HVTbobwrLj Z2XQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 35si6223263qvm.133.2018.12.03.15.36.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:06 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A90E4312E9FB; Mon, 3 Dec 2018 23:36:05 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 42AE2600C7; Mon, 3 Dec 2018 23:36:02 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 07/14] mm/hms: register main memory with heterogenenous memory system Date: Mon, 3 Dec 2018 18:35:02 -0500 Message-Id: <20181203233509.20671-8-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Mon, 03 Dec 2018 23:36:06 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Register main memory as target under HMS scheme. Memory is registered per node (one target device per node). We also create a default link to connect main memory and CPU that are in the same node. For details see Documentation/vm/hms.rst. This is done to allow application to use one API for regular memory or device memory. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- drivers/base/node.c | 65 +++++++++++++++++++++++++++++++++++++++++++- include/linux/node.h | 6 ++++ 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..05621ba3cf13 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -323,6 +323,11 @@ static int register_node(struct node *node, int num) if (error) put_device(&node->dev); else { + hms_link_register(&node->link, &node->dev, 0); + hms_target_register(&node->target, &node->dev, + num, NULL, 0, 0); + hms_link_target(node->link, node->target); + hugetlb_register_node(node); compaction_register_node(node); @@ -339,6 +344,9 @@ static int register_node(struct node *node, int num) */ void unregister_node(struct node *node) { + hms_target_unregister(&node->target); + hms_link_unregister(&node->link); + hugetlb_unregister_node(node); /* no-op, if memoryless node */ device_unregister(&node->dev); @@ -415,6 +423,9 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg) sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr); sect_end_pfn += PAGES_PER_SECTION - 1; for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) { +#if defined(CONFIG_HMS) + unsigned long size = PAGE_SIZE; +#endif int page_nid; /* @@ -445,9 +456,35 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg) if (ret) return ret; - return sysfs_create_link_nowarn(&mem_blk->dev.kobj, + ret = sysfs_create_link_nowarn(&mem_blk->dev.kobj, &node_devices[nid]->dev.kobj, kobject_name(&node_devices[nid]->dev.kobj)); + if (ret) + return ret; + +#if defined(CONFIG_HMS) + /* + * Right now here i do not see any easier way to get the size + * in bytes of valid memory that is added to this node. + */ + for (++pfn; pfn <= sect_end_pfn; pfn++) { + if (!pfn_present(pfn)) { + pfn = round_down(pfn + PAGES_PER_SECTION, + PAGES_PER_SECTION) - 1; + continue; + } + page_nid = get_nid_for_pfn(pfn); + if (page_nid < 0) + continue; + if (page_nid != nid) + continue; + size += PAGE_SIZE; + } + + hms_target_add_memory(node_devices[nid]->target, size); +#endif + + return 0; } /* mem section does not span the specified node */ return 0; @@ -471,6 +508,10 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, sect_start_pfn = section_nr_to_pfn(phys_index); sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1; for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) { +#if defined(CONFIG_HMS) + unsigned long size = 0; + int page_nid; +#endif int nid; nid = get_nid_for_pfn(pfn); @@ -484,6 +525,28 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, kobject_name(&mem_blk->dev.kobj)); sysfs_remove_link(&mem_blk->dev.kobj, kobject_name(&node_devices[nid]->dev.kobj)); + +#if defined(CONFIG_HMS) + /* + * Right now here i do not see any easier way to get the size + * in bytes of valid memory that is added to this node. + */ + for (; pfn <= sect_end_pfn; pfn++) { + if (!pfn_present(pfn)) { + pfn = round_down(pfn + PAGES_PER_SECTION, + PAGES_PER_SECTION) - 1; + continue; + } + page_nid = get_nid_for_pfn(pfn); + if (page_nid < 0) + continue; + if (page_nid != nid) + break; + size += PAGE_SIZE; + } + + hms_target_remove_memory(node_devices[nid]->target, size); +#endif } NODEMASK_FREE(unlinked_nodes); return 0; diff --git a/include/linux/node.h b/include/linux/node.h index 257bb3d6d014..297b01d3c1ed 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -15,6 +15,7 @@ #ifndef _LINUX_NODE_H_ #define _LINUX_NODE_H_ +#include #include #include #include @@ -22,6 +23,11 @@ struct node { struct device dev; +#if defined(CONFIG_HMS) + struct hms_target *target; + struct hms_link *link; +#endif + #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) struct work_struct node_work; #endif From patchwork Mon Dec 3 23:35:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF06814E2 for ; Mon, 3 Dec 2018 23:36:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E060329267 for ; Mon, 3 Dec 2018 23:36:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D363E2B222; Mon, 3 Dec 2018 23:36:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32DDD29267 for ; Mon, 3 Dec 2018 23:36:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2DDB6B6BB2; Mon, 3 Dec 2018 18:36:11 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EB7616B6BB3; Mon, 3 Dec 2018 18:36:11 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D838E6B6BB4; Mon, 3 Dec 2018 18:36:11 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id AC6D06B6BB2 for ; Mon, 3 Dec 2018 18:36:11 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id z6so15158280qtj.21 for ; Mon, 03 Dec 2018 15:36:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Wrmg3a7a7IgWDNC76LFCnofF/CnLvE/OaTrE/8lHU7c=; b=B7Cb4itwYFKKBwPqmvgFI/w+nzOuBF6xFVYZCrH/307/nPD3YmGo0YmLdmBTHsUexx rohA0wRS+5vAB3C0seIZTy/p/KEZSkyTD9u34xd3eEF2QvUfRo/T+Aab2D3PbvKXPooX NwI63cFpgLKDwrZhk6G2qxxLQzzE0gBaUGCOGfKfL4w06Nem1YtHkHzBEHyfncYrnYby tjat9I02vdbM2pgx5y6kiwlrPG1liyma8S0xT0O6vv9ajOh17wdP7/hoWi1dXNsw3qEE 79GRTr0B9/kRBfFPSgEWkHdvkA1VWvB+A82SagpEQSq/KDOwgzEKezTBFSL/QItXWKWr CeHg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZAbz0tdxV7SZQCOGEYTOWU52WUWinU6VIFLlcHTjSrkpibV4Er Meyf8XWDuHDGwg7+TfPFZ8snyaMdT/2zw29ArosTNAR3D8HJOmXvDVJnaupy/0gnaTJTxtL6bd0 mxruF2/rJiqjhIDwAO8/IIm+I6H/URGtKxqVv4vfHw9IIoM0ib9HyyTRLePo9X9wvTA== X-Received: by 2002:ae9:dcc4:: with SMTP id q187mr16575857qkf.97.1543880171451; Mon, 03 Dec 2018 15:36:11 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xi1OISmA5QTOaH/sNbCZktf2fybIyyR8KtbY4YA8pDCLZPgbxe1Ej8aOfQusH+8ayoGa7s X-Received: by 2002:ae9:dcc4:: with SMTP id q187mr16575822qkf.97.1543880170279; Mon, 03 Dec 2018 15:36:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880170; cv=none; d=google.com; s=arc-20160816; b=mnrYwzwlRTzF7EYE10EyA/NyVgemfTTW6hiybZrCDJSySKkHSJdXftfILmL2DBDfNH Lzs0bRGW4NXYZ2zr7RhKu6rr4vCmOEtGINnC+Y+3vmX7JyADI73IgS1HgQBvl+tQnztw UBtSTxMhMWwsMkiRawADrsB1U0F5XQUm5/YHHO7bKeEHrJ8uD5muZezAlenOhSWDMd7c zxWNCwHxuhTTn55OZmombwE7chYx6QxL8TZSVs+S/APpIyHE+mbSn9AQjLw00zrlJQ+3 SUwSm4YPByu/wsGGqYPqxkOVxG+O52GC5lpDQvRlOx4tWnOa+H6SXg3AsRQUjKHRQj2o Q/cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=Wrmg3a7a7IgWDNC76LFCnofF/CnLvE/OaTrE/8lHU7c=; b=lW1rKvYQA1B2fA2Ol+p1w7ICvwwl8ZHgWd8eZHyn2pyHMtg0qkCBzOZPk/2NigHehD FViKqb6EihWh0u/EjKoo72nGUfHqpX9v6y+jVhoDxsKD/N11DkH6FfEfMAfOoOpt4Xac Hyz91VRjD0rU42v9sotd7y2jbxN4wF/Lz3Qv/cc63WWiesAD5rTsc7owDnziG395A+aC ibaCglTvYfBACO25tdEZaSDaCHnmDI6a0PJ8rZYGMsDFFyvORYjtxiKi2suIZiPGAWIj D8HbMBc0xYY1wIl09xMHuyspLvMPncuyIpP+bJZzDb2pW0qusAu4IFOcPE3aauILGcRz 1+fw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id n69si2780294qkn.55.2018.12.03.15.36.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:10 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3106A356D5; Mon, 3 Dec 2018 23:36:09 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id D853560566; Mon, 3 Dec 2018 23:36:05 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 08/14] mm/hms: register main CPUs with heterogenenous memory system Date: Mon, 3 Dec 2018 18:35:03 -0500 Message-Id: <20181203233509.20671-9-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 03 Dec 2018 23:36:09 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Register CPUs as initiator under HMS scheme. CPUs are registered per node (one initiator device per node per CPU). We also add the CPU to the node default link so it is connected to main memory for the node. For details see Documentation/vm/hms.rst. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- drivers/base/cpu.c | 5 +++++ drivers/base/node.c | 18 +++++++++++++++++- include/linux/cpu.h | 4 ++++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index eb9443d5bae1..160454bc5c38 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -76,6 +76,8 @@ void unregister_cpu(struct cpu *cpu) { int logical_cpu = cpu->dev.id; + hms_initiator_unregister(&cpu->initiator); + unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu)); device_unregister(&cpu->dev); @@ -392,6 +394,9 @@ int register_cpu(struct cpu *cpu, int num) dev_pm_qos_expose_latency_limit(&cpu->dev, PM_QOS_RESUME_LATENCY_NO_CONSTRAINT); + hms_initiator_register(&cpu->initiator, &cpu->dev, + cpu_to_node(num), 0); + return 0; } diff --git a/drivers/base/node.c b/drivers/base/node.c index 05621ba3cf13..43f1820cdadb 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -375,9 +375,19 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) if (ret) return ret; - return sysfs_create_link(&obj->kobj, + ret = sysfs_create_link(&obj->kobj, &node_devices[nid]->dev.kobj, kobject_name(&node_devices[nid]->dev.kobj)); + if (ret) + return ret; + + if (IS_ENABLED(CONFIG_HMS)) { + struct cpu *cpu = container_of(obj, struct cpu, dev); + + hms_link_initiator(node_devices[nid]->link, cpu->initiator); + } + + return 0; } int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) @@ -396,6 +406,12 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) sysfs_remove_link(&obj->kobj, kobject_name(&node_devices[nid]->dev.kobj)); + if (IS_ENABLED(CONFIG_HMS)) { + struct cpu *cpu = container_of(obj, struct cpu, dev); + + hms_unlink_initiator(node_devices[nid]->link, cpu->initiator); + } + return 0; } diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 218df7f4d3e1..1e3a777bfa3d 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -14,6 +14,7 @@ #ifndef _LINUX_CPU_H_ #define _LINUX_CPU_H_ +#include #include #include #include @@ -27,6 +28,9 @@ struct cpu { int node_id; /* The node which contains the CPU */ int hotpluggable; /* creates sysfs control file if hotpluggable */ struct device dev; +#if defined(CONFIG_HMS) + struct hms_initiator *initiator; +#endif }; extern void boot_cpu_init(void); From patchwork Mon Dec 3 23:35:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F075E14E2 for ; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0DB62B21F for ; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D47D92B242; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E4E32B21F for ; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BEEE6B6BB3; Mon, 3 Dec 2018 18:36:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1719B6B6BB4; Mon, 3 Dec 2018 18:36:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2F236B6BB5; Mon, 3 Dec 2018 18:36:13 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id C6FC66B6BB3 for ; Mon, 3 Dec 2018 18:36:13 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id v74so14804920qkb.21 for ; Mon, 03 Dec 2018 15:36:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=8hFoGAyCjPU00zOBP6weebSI+EaYdnYXZx/EEMWWNOo=; b=dydK0kBGdIDOgQ2jShN74FoilnBmTBzxRprSxJinRwcO4oYXehH7zG3eXoxTlaPTbk VRZBWIzwJI/rQCVFBZ4z1V9tHGsL4Hg4U2ORr/VeooaHLIH6j1AuXwkfxdCBWfQXBUdv 8qJC4xnW+MlGvPBsMuYlV3Q2r76Zn+tsJf57Hbmth87GsNj/jDeZa2VuexBTnw6RgteH dizLi3YEnA10pXeO1tKvUsvZWqc8r485R1NlJraXz0Qoct1tM3MxqaLyDqPI+uuk+9Wx 1JHK4YK9sgvULp2wmBzFnWgDLG7bo60XQ2T+U20zG1Ny2GxCAHDTNZdn8AtZAkCc3Tqn P+Gg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZcqdJ5bSV5g5CFqLcdv0UWcVz5NZhXdEcTtKJYzQAxCCw1yb1y 0VkJ9V/eqbZBhy/JSnMMvA3zuANY/GSafPbJm9ySFtOGsl0u67k8RPa4DEKQFaJZqirhOXTfCRj V3hlZ8so9EdeHpQR8G7K6cVlKcOFuvC75sqAoJWpHXhEMoboI6L7RGCk9rIjPU9TGZg== X-Received: by 2002:a0c:f787:: with SMTP id s7mr17804894qvn.167.1543880173566; Mon, 03 Dec 2018 15:36:13 -0800 (PST) X-Google-Smtp-Source: AFSGD/XtlFx1Jvcl/HQxq0sRs4EjSlpAf0VikkDO7uERT6SU/HjF8+uXZKkJb8FY/ztFynFwDH6P X-Received: by 2002:a0c:f787:: with SMTP id s7mr17804861qvn.167.1543880172834; Mon, 03 Dec 2018 15:36:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880172; cv=none; d=google.com; s=arc-20160816; b=IKjB5PMfiPf6aPCqMi9oHU2oQmyeFaeFsTdQj6mGmYUoYmvTUMLYoi88cR+/RQN0Ga uWSSU56K3QcEfxbZ8H8Xax3fmrPYavS6atPUoPSGvp2yE+rHzq08Xtt1kn5G2kYlisPZ hPKctdukde3yxXuaxKvVhL3IjAoSDp0tvsH3Jik2NcUaca9TBarMU5SULEEnyaExAXzR XtJpTXmh1XwrMPj0t46ap+Eq/lbGZgshR+Lt4xxYhBGjkDSI6VPhKLWOPYdHJ38xx6Bx xjNyo3Cgbm/HLr2kRQFS7Q2SE3Z0YsLI5rhD6rF+8Yo59BsxLKW7k/bFzaw/YCCVwe1v 4ooA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=8hFoGAyCjPU00zOBP6weebSI+EaYdnYXZx/EEMWWNOo=; b=gefeDFY8Trz28sSarmX/n56GqrIAI8/yvXBN/8K20aunqenFFa4um94qAtd6kA1eRu LUGbF620I9NMAXDu74ulwoZ7mkTz+wcEggCLesKfvYxd1S/CB7pN37WbkRmKJ3OKJ6xV KdpaJvkLOpFknvG2r5FCFsT4FTtQ2VoIoetHA0ih6hAcOPtDMacoxt1Di+0zFyhpIlRP Jk0uNaD4SFgoH4P5IyAoUTrCvU7B2NAmMM2KevqXkYR1A/HnUGe741yEdFdupHsdV+Yh eoiHUqzuyMhPiTSOKlBgl0Sli0RUrtsI8sXaFHTiMg51EuW2blsUoJyAlXG34MmZJJH8 f/AQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id j1si1907qkj.111.2018.12.03.15.36.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:12 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D31F187622; Mon, 3 Dec 2018 23:36:11 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5A312600C1; Mon, 3 Dec 2018 23:36:09 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , John Hubbard , Jonathan Cameron , Mark Hairgrove , Vivek Kini Subject: [RFC PATCH 09/14] mm/hms: hbind() for heterogeneous memory system (aka mbind() for HMS) Date: Mon, 3 Dec 2018 18:35:04 -0500 Message-Id: <20181203233509.20671-10-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 03 Dec 2018 23:36:12 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse With the advance of heterogeneous computing and the new kind of memory topology that are now becoming more widespread (CPU HBM, persistent memory, ...). We no longer just have a flat memory topology inside a numa node. Instead there is a hierarchy of memory for instance HBM for CPU versus main memory. Moreover there is also device memory a good example is GPU which have a large amount of memory (several giga bytes and it keeps growing). In face of this the mbind() API is too limited to allow precise selection of which memory to use inside a node. This is why this patchset introduce a new API hbind() for heterogeneous bind, that allow to bind any kind of memory wether it is some specific memory like CPU's HBM in a node, or some device memory. Instead of using a bitmap, hbind() take an array of uid and each uid is a unique memory target inside the new HMS topology description. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: John Hubbard Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: linux-mm@kvack.org --- include/uapi/linux/hbind.h | 46 +++++++++++ mm/Makefile | 1 + mm/hms.c | 158 +++++++++++++++++++++++++++++++++++++ 3 files changed, 205 insertions(+) create mode 100644 include/uapi/linux/hbind.h create mode 100644 mm/hms.c diff --git a/include/uapi/linux/hbind.h b/include/uapi/linux/hbind.h new file mode 100644 index 000000000000..a9aba17ab142 --- /dev/null +++ b/include/uapi/linux/hbind.h @@ -0,0 +1,46 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#ifndef LINUX_UAPI_HBIND +#define LINUX_UAPI_HBIND + + +/* For now just freak out if it is bigger than a page. */ +#define HBIND_MAX_TARGETS (4096 / 4) +#define HBIND_MAX_ATOMS (4096 / 4) + + +struct hbind_params { + uint64_t start; + uint64_t end; + uint32_t ntargets; + uint32_t natoms; + uint64_t targets; + uint64_t atoms; +}; + + +#define HBIND_ATOM_GET_DWORDS(v) (((v) >> 20) & 0xfff) +#define HBIND_ATOM_SET_DWORDS(v) (((v) & 0xfff) << 20) +#define HBIND_ATOM_GET_CMD(v) ((v) & 0xfffff) +#define HBIND_ATOM_SET_CMD(v) ((v) & 0xfffff) + + +#define HBIND_IOCTL _IOWR('H', 0x00, struct hbind_params) + + +#endif /* LINUX_UAPI_HBIND */ diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..0537a95f6cbd 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -99,3 +99,4 @@ obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o obj-$(CONFIG_HMM) += hmm.o obj-$(CONFIG_MEMFD_CREATE) += memfd.o +obj-$(CONFIG_HMS) += hms.o diff --git a/mm/hms.c b/mm/hms.c new file mode 100644 index 000000000000..bf328bd577dc --- /dev/null +++ b/mm/hms.c @@ -0,0 +1,158 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#define pr_fmt(fmt) "hms: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + +#define HBIND_FIX_ARRAY 64 + + +static ssize_t hbind_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static ssize_t hbind_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) +{ + uint32_t *targets, *_dtargets = NULL, _ftargets[HBIND_FIX_ARRAY]; + uint32_t *atoms, *_datoms = NULL, _fatoms[HBIND_FIX_ARRAY]; + void __user *uarg = (void __user *)arg; + struct hbind_params params; + uint32_t i, ndwords; + int ret; + + switch(cmd) { + case HBIND_IOCTL: + break; + default: + return -EINVAL; + } + + ret = copy_from_user(¶ms, uarg, sizeof(params)); + if (ret) + return ret; + + /* Some sanity checks */ + params.start &= PAGE_MASK; + params.end = PAGE_ALIGN(params.end); + if (params.end <= params.start) + return -EINVAL; + + /* More sanity checks */ + if (params.ntargets > HBIND_MAX_TARGETS) + return -EINVAL; + + /* We need at least one atoms. */ + if (!params.natoms || params.natoms > HBIND_MAX_ATOMS) + return -EINVAL; + + /* Let's allocate memory for parameters. */ + if (params.ntargets > HBIND_FIX_ARRAY) { + _dtargets = kzalloc(4 * params.ntargets, GFP_KERNEL); + if (_dtargets == NULL) + return -ENOMEM; + targets = _dtargets; + } else { + targets = _ftargets; + } + if (params.natoms > HBIND_FIX_ARRAY) { + _datoms = kzalloc(4 * params.natoms, GFP_KERNEL); + if (_datoms == NULL) { + ret = -ENOMEM; + goto out; + } + atoms = _datoms; + } else { + atoms = _fatoms; + } + + /* Let's fetch hbind() parameters. */ + ret = copy_from_user(atoms, (void __user *)params.atoms, + 4 * params.natoms); + if (ret) + goto out; + ret = copy_from_user(targets, (void __user *)params.targets, + 4 * params.ntargets); + if (ret) + goto out; + + mmget(current->mm); + + /* Sanity checks atoms and execute them. */ + for (i = 0, ndwords = 1; i < params.natoms; i += ndwords) { + ndwords = 1 + HBIND_ATOM_GET_DWORDS(atoms[i]); + switch (HBIND_ATOM_GET_CMD(atoms[i])) { + default: + ret = -EINVAL; + goto out_mm; + } + } + +out_mm: + copy_to_user((void __user *)params.atoms, atoms, 4 * params.natoms); + mmput(current->mm); +out: + kfree(_dtargets); + kfree(_datoms); + return ret; +} + +const struct file_operations hbind_fops = { + .llseek = no_llseek, + .read = hbind_read, + .write = hbind_write, + .unlocked_ioctl = hbind_ioctl, + .owner = THIS_MODULE, +}; + +static struct miscdevice hbind_device = { + .minor = MISC_DYNAMIC_MINOR, + .fops = &hbind_fops, + .name = "hbind", +}; + +int __init hbind_init(void) +{ + pr_info("Heterogeneous memory system (HMS) hbind() driver\n"); + return misc_register(&hbind_device); +} + +void __exit hbind_fini(void) +{ + misc_deregister(&hbind_device); +} + +module_init(hbind_init); +module_exit(hbind_fini); From patchwork Mon Dec 3 23:35:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3352D13BF for ; Mon, 3 Dec 2018 23:36:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21F3629267 for ; Mon, 3 Dec 2018 23:36:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 155FA2B222; Mon, 3 Dec 2018 23:36:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA0CB29267 for ; Mon, 3 Dec 2018 23:36:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 989276B6BB4; Mon, 3 Dec 2018 18:36:18 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 935B56B6BB5; Mon, 3 Dec 2018 18:36:18 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FDCA6B6BB6; Mon, 3 Dec 2018 18:36:18 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id 4FE726B6BB4 for ; Mon, 3 Dec 2018 18:36:18 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id w19so15308734qto.13 for ; Mon, 03 Dec 2018 15:36:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=fGUkIwrtpQ9R7F4I5nb/T9jddvBjfUOHwi+diMA8+M4=; b=JE94zTssnUkR03t77MEhDQejm+O5vi3ZAmBrRuT1jLne6r/EPCquIfd8D/7uPGwRyS ePGtgvT3lXoeYhWprpJTN1X/OJEMo9YDOpuo0IPBpe3nknTCfmX1i8CLUaW59F+CnmIN GaKNF2E3eSndcAmS2W4Lss+mYKpeEVRgfahmPKlzr0rws/c6/TtCdSF1fe7Tndg4jk6C 2G8Rdr1n4g9odfawD7T63v2KkYzT34u0ZzBeG/6MpacnKuF+WQSN7Kk9oomwucPseVNR UZLOkMOO+aAtVeRyC4e2xxw4UQ4BYBvyd+kStM7u5jYy0xWKRrexAqRs9DLe+RV6CxaF TwVA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZ5ulggFfiQvEx7RdtGYXPhvBCqmq6Ii8UPtGWl6mRyo0a/KzwB Qn7x9jDbjf3LXAgu6hzUe+rFB5mJ318uPRQGAqxuOlhWhait7bs0eaN+OWbrBGbHQg2gQi0WZfB DoBx8CCE43QK2Wl2/We5FtsxCGtSB62rDKN2CHY5Cptm/R30Cc3pjcuqIawH5ZkScrQ== X-Received: by 2002:a37:b1c7:: with SMTP id a190mr16234369qkf.94.1543880178034; Mon, 03 Dec 2018 15:36:18 -0800 (PST) X-Google-Smtp-Source: AFSGD/UQLVhCchrpEfZq2lvnPf1HcrpesBQ8zCJiu1u3DBBruyyF3BnwGbEuLD5mPpOY2xrv2sED X-Received: by 2002:a37:b1c7:: with SMTP id a190mr16234336qkf.94.1543880176840; Mon, 03 Dec 2018 15:36:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880176; cv=none; d=google.com; s=arc-20160816; b=Pwzgv1z4M3t7kW4N7WmaFDH5OW9TZbyfIErC6x95BCr1gacytSvWQVuQaGcw8EjfuR H752hjM1/Vtu0Vz8kosw9CVOWRo6BM1zanqXtnCZ1uxDiO4pPgerxFELUxDxtb/dCZap 5s0B07sSLEN3wCVv1+z4J4Bs4fV72Nki1NYQ0ZCMuCbUzDLqV6elTiMmG97lb3B2xOSu 3R1B+uYit7NNakbMKVDgqbdxaZyVqAYj8ZL99ExQZxXAVUqnP9ar6ZVxSdqcSgMXbYJM QCQN+ZdHi+LfgbQRcXruAZhk0RUXRbTJObVlUNnoJ5TkyCuF+n2DvqDfxZPjm2Zjca8C 1EsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=fGUkIwrtpQ9R7F4I5nb/T9jddvBjfUOHwi+diMA8+M4=; b=JGKds0eLbA1qjT88sKVqbr/QsiKNV7KOnpMWiR0K3olWOq4A6O7s4OL8Fs4PVFgc7l 6p6UYGrRTYgtz/jHfr3qZJMIFE3Yosv6xA55CAxxCwV3hXhrj4z90FQw0UuN3LHki92Y riHwoBIZBjOSE9OKY8wntFqA6IK+IDeTEBhl56Nd7MQrT5Y0HKtztVnjMCdfP9k7yKsA w0UQQuMJE9L4V2+3lmWAZeNKarvF88mHkCbV6zADOl4qlnNbtOcIhPAqzG90WEOaZLrK rnOqjCfintu1E/xSb4TalogNkeYyT6sL1QzdWe5aT+j6qaK96ziUbXr5BzMbBw0yw2Tn FaJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id c185si5894447qkb.52.2018.12.03.15.36.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:16 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7CA51C05004D; Mon, 3 Dec 2018 23:36:15 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0B7B1600C7; Mon, 3 Dec 2018 23:36:11 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 10/14] mm/hbind: add heterogeneous memory policy tracking infrastructure Date: Mon, 3 Dec 2018 18:35:05 -0500 Message-Id: <20181203233509.20671-11-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 03 Dec 2018 23:36:16 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This patch add infrastructure to track heterogeneous memory policy within the kernel. Policy are defined over range of virtual address of a process and attach to the correspond mm_struct. User can reset to default policy for range of virtual address using hbind() default commands for the range. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- include/linux/hms.h | 46 ++++++ include/linux/mm_types.h | 6 + include/uapi/linux/hbind.h | 8 + kernel/fork.c | 3 + mm/hms.c | 306 ++++++++++++++++++++++++++++++++++++- 5 files changed, 368 insertions(+), 1 deletion(-) diff --git a/include/linux/hms.h b/include/linux/hms.h index 511b5363d8f2..f39c390b3afb 100644 --- a/include/linux/hms.h +++ b/include/linux/hms.h @@ -20,6 +20,8 @@ #include #include +#include +#include struct hms_target; @@ -34,6 +36,10 @@ struct hms_target_hbind { #if IS_ENABLED(CONFIG_HMS) +#include +#include + + #define to_hms_object(device) container_of(device, struct hms_object, device) enum hms_type { @@ -133,6 +139,42 @@ void hms_bridge_register(struct hms_bridge **bridgep, void hms_bridge_unregister(struct hms_bridge **bridgep); +struct hms_policy_targets { + struct hms_target **targets; + unsigned ntargets; + struct kref kref; +}; + +struct hms_policy_range { + struct hms_policy_targets *ptargets; + struct interval_tree_node node; + struct kref kref; +}; + +struct hms_policy { + struct rb_root_cached ranges; + struct rw_semaphore sem; + struct mmu_notifier mn; +}; + +static inline unsigned long hms_policy_range_start(struct hms_policy_range *r) +{ + return r->node.start; +} + +static inline unsigned long hms_policy_range_end(struct hms_policy_range *r) +{ + return r->node.last + 1; +} + +static inline void hms_policy_init(struct mm_struct *mm) +{ + mm->hpolicy = NULL; +} + +void hms_policy_fini(struct mm_struct *mm); + + int hms_init(void); @@ -163,6 +205,10 @@ int hms_init(void); #define hms_bridge_unregister(bridgep) +#define hms_policy_init(mm) +#define hms_policy_fini(mm) + + static inline int hms_init(void) { return 0; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5ed8f6292a53..3da91767c689 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -26,6 +26,7 @@ typedef int vm_fault_t; struct address_space; struct mem_cgroup; +struct hms_policy; struct hmm; /* @@ -491,6 +492,11 @@ struct mm_struct { /* HMM needs to track a few things per mm */ struct hmm *hmm; #endif + +#if IS_ENABLED(CONFIG_HMS) + /* Heterogeneous Memory System policy */ + struct hms_policy *hpolicy; +#endif } __randomize_layout; /* diff --git a/include/uapi/linux/hbind.h b/include/uapi/linux/hbind.h index a9aba17ab142..cc4687587f5a 100644 --- a/include/uapi/linux/hbind.h +++ b/include/uapi/linux/hbind.h @@ -39,6 +39,14 @@ struct hbind_params { #define HBIND_ATOM_GET_CMD(v) ((v) & 0xfffff) #define HBIND_ATOM_SET_CMD(v) ((v) & 0xfffff) +/* + * HBIND_CMD_DEFAULT restore default policy ie undo any of the previous policy. + * + * Additional dwords: + * NONE (DWORDS MUST BE 0 !) + */ +#define HBIND_CMD_DEFAULT 0 + #define HBIND_IOCTL _IOWR('H', 0x00, struct hbind_params) diff --git a/kernel/fork.c b/kernel/fork.c index 07cddff89c7b..bc40edcadc69 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -671,6 +672,7 @@ void __mmdrop(struct mm_struct *mm) mm_free_pgd(mm); destroy_context(mm); hmm_mm_destroy(mm); + hms_policy_fini(mm); mmu_notifier_mm_destroy(mm); check_mm(mm); put_user_ns(mm->user_ns); @@ -989,6 +991,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, RCU_INIT_POINTER(mm->exe_file, NULL); mmu_notifier_mm_init(mm); hmm_mm_init(mm); + hms_policy_init(mm); init_tlb_flush_pending(mm); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS mm->pmd_huge_pte = NULL; diff --git a/mm/hms.c b/mm/hms.c index bf328bd577dc..be2c4e526f25 100644 --- a/mm/hms.c +++ b/mm/hms.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -31,7 +32,6 @@ #define HBIND_FIX_ARRAY 64 - static ssize_t hbind_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { @@ -44,6 +44,300 @@ static ssize_t hbind_write(struct file *file, const char __user *buf, return -EINVAL; } + +static void hms_policy_targets_get(struct hms_policy_targets *ptargets) +{ + kref_get(&ptargets->kref); +} + +static void hms_policy_targets_free(struct kref *kref) +{ + struct hms_policy_targets *ptargets; + + ptargets = container_of(kref, struct hms_policy_targets, kref); + kfree(ptargets->targets); + kfree(ptargets); +} + +static void hms_policy_targets_put(struct hms_policy_targets *ptargets) +{ + kref_put(&ptargets->kref, &hms_policy_targets_free); +} + +static struct hms_policy_targets* hms_policy_targets_new(const uint32_t *targets, + unsigned ntargets) +{ + struct hms_policy_targets *ptargets; + void *_targets; + unsigned i, c; + + _targets = kzalloc(ntargets * sizeof(void *), GFP_KERNEL); + if (_targets == NULL) + return NULL; + + ptargets = kmalloc(sizeof(*ptargets), GFP_KERNEL); + if (ptargets == NULL) { + kfree(_targets); + return NULL; + } + + kref_init(&ptargets->kref); + ptargets->targets = _targets; + ptargets->ntargets = ntargets; + + for (i = 0, c = 0; i < ntargets; ++i) { + ptargets->targets[c] = hms_target_find(targets[i]); + c += !!((long)ptargets->targets[i]); + } + + /* Ignore NULL targets[i] */ + ptargets->ntargets = c; + + if (!c) { + /* No valid targets pointless to waste memory ... */ + hms_policy_targets_put(ptargets); + return NULL; + } + + return ptargets; +} + + +static void hms_policy_range_get(struct hms_policy_range *prange) +{ + kref_get(&prange->kref); +} + +static void hms_policy_range_free(struct kref *kref) +{ + struct hms_policy_range *prange; + + prange = container_of(kref, struct hms_policy_range, kref); + hms_policy_targets_put(prange->ptargets); + kfree(prange); +} + +static void hms_policy_range_put(struct hms_policy_range *prange) +{ + kref_put(&prange->kref, &hms_policy_range_free); +} + +static struct hms_policy_range *hms_policy_range_new(const uint32_t *targets, + unsigned long start, + unsigned long end, + unsigned ntargets) +{ + struct hms_policy_targets *ptargets; + struct hms_policy_range *prange; + + ptargets = hms_policy_targets_new(targets, ntargets); + if (ptargets == NULL) + return NULL; + + prange = kmalloc(sizeof(*prange), GFP_KERNEL); + if (prange == NULL) + return NULL; + + prange->node.start = start & PAGE_MASK; + prange->node.last = PAGE_ALIGN(end) - 1; + prange->ptargets = ptargets; + kref_init(&prange->kref); + + return prange; +} + +static struct hms_policy_range * +hms_policy_range_dup(struct hms_policy_range *_prange) +{ + struct hms_policy_range *prange; + + prange = kmalloc(sizeof(*prange), GFP_KERNEL); + if (prange == NULL) + return NULL; + + hms_policy_targets_get(_prange->ptargets); + prange->node.start = _prange->node.start; + prange->node.last = _prange->node.last; + prange->ptargets = _prange->ptargets; + kref_init(&prange->kref); + + return prange; +} + + +void hms_policy_fini(struct mm_struct *mm) +{ + struct hms_policy *hpolicy = READ_ONCE(mm->hpolicy); + struct interval_tree_node *node; + + spin_lock(&mm->page_table_lock); + hpolicy = READ_ONCE(mm->hpolicy); + mm->hpolicy = NULL; + spin_unlock(&mm->page_table_lock); + + /* No active heterogeneous policy structure so nothing to cleanup. */ + if (hpolicy == NULL) + return; + + mmu_notifier_unregister_no_release(&hpolicy->mn, mm); + + down_write(&hpolicy->sem); + node = interval_tree_iter_first(&hpolicy->ranges, 0, -1UL); + while (node) { + struct hms_policy_range *prange; + struct interval_tree_node *next; + + prange = container_of(node, struct hms_policy_range, node); + next = interval_tree_iter_next(node, 0, -1UL); + interval_tree_remove(node, &hpolicy->ranges); + hms_policy_range_put(prange); + node = next; + } + up_write(&hpolicy->sem); + + kfree(hpolicy); +} + + +static int hbind_default_locked(struct hms_policy *hpolicy, + struct hbind_params *params) +{ + struct interval_tree_node *node; + unsigned long start, last; + int ret = 0; + + start = params->start; + last = params->end - 1UL; + + node = interval_tree_iter_first(&hpolicy->ranges, start, last); + while (node) { + struct hms_policy_range *prange; + struct interval_tree_node *next; + + prange = container_of(node, struct hms_policy_range, node); + next = interval_tree_iter_next(node, start, last); + if (node->start < start && node->last > last) { + /* Node is split in 2 */ + struct hms_policy_range *_prange; + _prange = hms_policy_range_dup(prange); + if (_prange == NULL) { + ret = -ENOMEM; + break; + } + prange->node.last = start - 1; + _prange->node.start = last + 1; + interval_tree_insert(&_prange->node, &hpolicy->ranges); + break; + } else if (node->start < start) { + prange->node.last = start - 1; + } else if (node->last > last) { + prange->node.start = last + 1; + } else { + /* Fully inside [start, last] */ + interval_tree_remove(node, &hpolicy->ranges); + } + + node = next; + } + + return ret; +} + +static int hbind_default(struct mm_struct *mm, struct hbind_params *params, + const uint32_t *targets, uint32_t *atoms) +{ + struct hms_policy *hpolicy = READ_ONCE(mm->hpolicy); + int ret; + + /* No active heterogeneous policy structure so no range to reset. */ + if (hpolicy == NULL) + return 0; + + down_write(&hpolicy->sem); + ret = hbind_default_locked(hpolicy, params); + up_write(&hpolicy->sem); + + return ret; +} + + +static void hms_policy_notifier_release(struct mmu_notifier *mn, + struct mm_struct *mm) +{ + hms_policy_fini(mm); +} + +static int hms_policy_notifier_invalidate_range_start(struct mmu_notifier *mn, + const struct mmu_notifier_range *range) +{ + if (range->event == MMU_NOTIFY_UNMAP) { + struct hbind_params params; + + if (!range->blockable) + return -EBUSY; + + params.natoms = 0; + params.ntargets = 0; + params.end = range->end; + params.start = range->start; + hbind_default(range->mm, ¶ms, NULL, NULL); + } + + return 0; +} + +static const struct mmu_notifier_ops hms_policy_notifier_ops = { + .release = hms_policy_notifier_release, + .invalidate_range_start = hms_policy_notifier_invalidate_range_start, +}; + +static struct hms_policy *hms_policy_get(struct mm_struct *mm) +{ + struct hms_policy *hpolicy = READ_ONCE(mm->hpolicy); + bool mmu_notifier = false; + + /* + * The hpolicy struct can only be freed once the mm_struct goes away, + * hence only pre-allocate if none is attach yet. + */ + if (hpolicy) + return hpolicy; + + hpolicy = kzalloc(sizeof(*hpolicy), GFP_KERNEL); + if (hpolicy == NULL) + return NULL; + + init_rwsem(&hpolicy->sem); + + spin_lock(&mm->page_table_lock); + if (!mm->hpolicy) { + mm->hpolicy = hpolicy; + mmu_notifier = true; + hpolicy = NULL; + } + spin_unlock(&mm->page_table_lock); + + if (mmu_notifier) { + int ret; + + hpolicy->mn.ops = &hms_policy_notifier_ops; + ret = mmu_notifier_register(&hpolicy->mn, mm); + if (ret) { + spin_lock(&mm->page_table_lock); + hpolicy = mm->hpolicy; + mm->hpolicy = NULL; + spin_unlock(&mm->page_table_lock); + } + } + + if (hpolicy) + kfree(hpolicy); + + /* At this point mm->hpolicy is valid */ + return mm->hpolicy; +} + + static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) { uint32_t *targets, *_dtargets = NULL, _ftargets[HBIND_FIX_ARRAY]; @@ -114,6 +408,16 @@ static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) for (i = 0, ndwords = 1; i < params.natoms; i += ndwords) { ndwords = 1 + HBIND_ATOM_GET_DWORDS(atoms[i]); switch (HBIND_ATOM_GET_CMD(atoms[i])) { + case HBIND_CMD_DEFAULT: + if (ndwords != 1) { + ret = -EINVAL; + goto out_mm; + } + ret = hbind_default(current->mm, ¶ms, + targets, atoms); + if (ret) + goto out_mm; + break; default: ret = -EINVAL; goto out_mm; From patchwork Mon Dec 3 23:35:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710909 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5D3614E2 for ; Mon, 3 Dec 2018 23:36:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C421929267 for ; Mon, 3 Dec 2018 23:36:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5BB32B222; Mon, 3 Dec 2018 23:36:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 13C4329267 for ; Mon, 3 Dec 2018 23:36:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56FAD6B6BB5; Mon, 3 Dec 2018 18:36:21 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4AB406B6BB6; Mon, 3 Dec 2018 18:36:21 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A8796B6BB7; Mon, 3 Dec 2018 18:36:21 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id EFBCA6B6BB5 for ; Mon, 3 Dec 2018 18:36:20 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id w18so15030463qts.8 for ; Mon, 03 Dec 2018 15:36:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=DA+2SFd81eCmFA42hYfl77NXN7UFbB59TjS6UlfUsD8=; b=P8xRXLz+0G2ovcT2Gqm0MK7mJdh6mQpdmhkk8W6nIAl7eSYQBMDgEJ2VMxCasP9Fnx 5Femjknv0lpWz8YNmVV0t5c+VLfgcwIbslzpbEYaGvMgyBrqXRs2M1t12pU2YL1axTRL kjF3E5abE4zMS/rUznlHr5WH/h2xy50/i6Opzsc+KbWScrWLrX+EIIs4QqTgsz3bgWKx vLkz06n8LCaGo8G/ZRBG5/UAEmTlIS0i+J31KT9fjEuxJWqrOy2SMyFVDgWkoh2y+/M6 ezHnaEG3IFK8BVppsjS1NAvVAXRrfjk8Y4bMtieMvDvDkNsL2QMhmquxR4SDu2aZadDD X0MQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWaV1Ya/B+p2Q6JgC+LOT8z0e4RcJ1zH9qOgfSeyNoeQPQVIiuuL fTkWPEniTVAlS1M1hfKDDqrpdo+xbbFBwvPMZrxEWHhH9AVoeGWVjW+1cK0S7YgnXWM4PCrUrd6 crDsIhQwE4VpNcm6AEDgsnLzO5/VQmsRcgFuI4PUDAn75TeVHS1fabMj87b4Xr5gBCQ== X-Received: by 2002:a0c:8164:: with SMTP id 91mr17723735qvc.100.1543880180737; Mon, 03 Dec 2018 15:36:20 -0800 (PST) X-Google-Smtp-Source: AFSGD/XBk+IGgmIXXP7e02z6lxVZcmuoALiAzGuUso6S85FkrKNl+aH8nR0kP1OVDFaAt8yaAdXx X-Received: by 2002:a0c:8164:: with SMTP id 91mr17723705qvc.100.1543880180187; Mon, 03 Dec 2018 15:36:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880180; cv=none; d=google.com; s=arc-20160816; b=EL46GtyKY/loFzSfI2HUXyqDFF3xDEaIm0Xllp5TnBgn0QLdD6KcrM3z5rzs6D9Ijk FJm4wiVpG6itnTMZwb5lnJNEmQND47yWKawV0bsHve/ZqXMgjPNI3GDJZAfVUihZEILe YsJ0XbqPn1ceF3gpmsCZYQ4sIydgtgvQZ1Rnpjn0jBqlQd+bnpZg5c55DPrsVcVwXf1k zqBV8Vqa0ysyIArcmkXwHXPJJwszXhAjrvtk8IJLjli+aVNmmx5xh28NL+jbOKK1JGEH Ky2NwdjorBG0OB34O/tmZUX1tf1UdENEGw6bXgcNOfcjogrHMj6KW+z1n/hr+4iBY6b2 Beew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=DA+2SFd81eCmFA42hYfl77NXN7UFbB59TjS6UlfUsD8=; b=e9Yf1pdVcKPPGDUVjBfcfj9EBzHzQzfYKYKVJOLpnI4DitU1EkQwDVUzqwceJmnjuv 8e5OOcoPis+ja/zBI8C5ZIe+32U40yf9JttBkNr4yTUi/242+5qIEmdLKLyvvj/O68ug VP52UKOp7WWuWRnd0ts9GB1BsbM/2hhJyKkL74p0xX3JQ2mu8t3FXcaGLgs7in+2y83i gQ5ZX2dztWiYjwTncosIHRfHUjrqXZ7mT18fZy0UMH4F7S2p6EmsNrbEPs4mnQYxB11c wKIzHzKUfyffdi+nnk0fBIgRAtAJW/MO5g9Wm/SfZykdNl7349UzkfxZ2j18AJqU0fHN qQ5g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id k18si4490752qtb.401.2018.12.03.15.36.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:20 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 05B5D87629; Mon, 3 Dec 2018 23:36:19 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id A3704600C1; Mon, 3 Dec 2018 23:36:15 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 11/14] mm/hbind: add bind command to heterogeneous memory policy Date: Mon, 3 Dec 2018 18:35:06 -0500 Message-Id: <20181203233509.20671-12-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 03 Dec 2018 23:36:19 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This patch add bind command to hbind() ioctl, this allow to bind a range of virtual address to given list of target memory. New memory allocated in the range will try to use memory from the target memory list. Note that this patch does not modify existing page fault path and thus does not activate new heterogeneous policy. Updating the CPU page fault code path or device page fault code path (HMM) will be done in separate patches. Here we only introduce helpers and infrastructure that will be use by page fault code path. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- include/uapi/linux/hbind.h | 10 ++++++++++ mm/hms.c | 40 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/uapi/linux/hbind.h b/include/uapi/linux/hbind.h index cc4687587f5a..7bb876954e3f 100644 --- a/include/uapi/linux/hbind.h +++ b/include/uapi/linux/hbind.h @@ -47,6 +47,16 @@ struct hbind_params { */ #define HBIND_CMD_DEFAULT 0 +/* + * HBIND_CMD_BIND strict policy ie new allocations will comes from one of the + * listed targets until they run of memory. Other targets can be use if the + * none of the listed targets can be accessed by the initiator that did fault. + * + * Additional dwords: + * NONE (DWORDS MUST BE 0 !) + */ +#define HBIND_CMD_BIND 1 + #define HBIND_IOCTL _IOWR('H', 0x00, struct hbind_params) diff --git a/mm/hms.c b/mm/hms.c index be2c4e526f25..6be6f4acdd49 100644 --- a/mm/hms.c +++ b/mm/hms.c @@ -338,6 +338,36 @@ static struct hms_policy *hms_policy_get(struct mm_struct *mm) } +static int hbind_bind(struct mm_struct *mm, struct hbind_params *params, + const uint32_t *targets, uint32_t *atoms) +{ + struct hms_policy_range *prange; + struct hms_policy *hpolicy; + int ret; + + hpolicy = hms_policy_get(mm); + if (hpolicy == NULL) + return -ENOMEM; + + prange = hms_policy_range_new(targets, params->start, params->end, + params->ntargets); + if (prange == NULL) + return -ENOMEM; + + down_write(&hpolicy->sem); + ret = hbind_default_locked(hpolicy, params); + if (ret) + goto out; + + interval_tree_insert(&prange->node, &hpolicy->ranges); + +out: + up_write(&hpolicy->sem); + + return ret; +} + + static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) { uint32_t *targets, *_dtargets = NULL, _ftargets[HBIND_FIX_ARRAY]; @@ -418,6 +448,16 @@ static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) if (ret) goto out_mm; break; + case HBIND_CMD_BIND: + if (ndwords != 1) { + ret = -EINVAL; + goto out_mm; + } + ret = hbind_bind(current->mm, ¶ms, + targets, atoms); + if (ret) + goto out_mm; + break; default: ret = -EINVAL; goto out_mm; From patchwork Mon Dec 3 23:35:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E5C8514E2 for ; Mon, 3 Dec 2018 23:36:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D52E829267 for ; Mon, 3 Dec 2018 23:36:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C94162B222; Mon, 3 Dec 2018 23:36:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CC3329267 for ; Mon, 3 Dec 2018 23:36:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C95426B6BB6; Mon, 3 Dec 2018 18:36:24 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C22DD6B6BB7; Mon, 3 Dec 2018 18:36:24 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A98CD6B6BB8; Mon, 3 Dec 2018 18:36:24 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 7AE426B6BB6 for ; Mon, 3 Dec 2018 18:36:24 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id d35so15271482qtd.20 for ; Mon, 03 Dec 2018 15:36:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Ve3VhpJ/7pW46sLjPCtoXMfcCdwQkkz/RopUJFQFp+A=; b=hSUXugAo0EKOXN26A4B8WhU8GaLoNjAxcyVNHdxcsHTOln9rosa0BRlln4ubnFI+8I j+odmQEI4FFm3y9sZ0luRsiz4HJn1oBvUIP5mNfD5u1Ca8iIkrX0MSgDre4e1OPaWZ+z oXv/6fTb9egBBgNLzM1njHiQdpUF23HcljCk9UATi4orC87jtqd+2UZlfKQvWE4P3Gq3 oaBz0WOwzxmqKi8kecOwJTMLy8DpH81DXccCDEHd7p0D/gLF9L6wwpnZR5lpvvb+Ew6E okvKu+U7GTfI8PmJLmSNRmflFU3fummpmYQf2QP7WvFny9jqK3PzZsMi8hE4uW9FXznj 6Nmg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWYfTOGTh5JNuOeJV2x1OyFyM0znFJg5zG35fuAK3D/GuvNdrTOe o7tXAuMJis3FgjlKmrvANLHw00eSq0elWPrrPk4vHkZEZaiRS2T1xfwl3aLIn+JlnFXNcwZmws5 iJVphDyfMenb//oJV9Oa0UZTNLLZKexx79Wlzw83iEkeVDIEaZPkL/PV40fL8Elky8Q== X-Received: by 2002:aed:2603:: with SMTP id z3mr17212079qtc.120.1543880184271; Mon, 03 Dec 2018 15:36:24 -0800 (PST) X-Google-Smtp-Source: AFSGD/V0954gv+vrycBQJldki1pNRYv8jbRGR+pqDqds/kIjrm/nNrwF2CqlkvUcvCLxkeI6tiWI X-Received: by 2002:aed:2603:: with SMTP id z3mr17212058qtc.120.1543880183758; Mon, 03 Dec 2018 15:36:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880183; cv=none; d=google.com; s=arc-20160816; b=dtAYkXfl9kGxFIEmsw+91iUGgDjzXMkwWcVdKpLbfY9T6XQN4LUINVWfEwzzLI9wil NN0hRK0FQbTKp5oBdkb8J2l6LtAE5qKhkCbyCEmeO84w2tsJeDn2vwywAluIF1jW0oex SEOWnzTRS0rygnhXpbiJBuWLySzYQIBS/jpGxvYEE/BgZsL0bSLUVG3gsB0WkE0v+JSO 9aAtf61464vMSXcW/9yWLnUWVF41C2gB9lzEhPOpGckc10DxqRdU4kKbb7nqsP7SZ1L/ iPK8IbWwKGg1jEK8+Snll3s89hf4FuLaddhhgCqYBEC3JaSf87qATN21Crcks8mFmiK3 Usuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=Ve3VhpJ/7pW46sLjPCtoXMfcCdwQkkz/RopUJFQFp+A=; b=0TvUW9CSJHcb85872bXUyq619qhUiauXm6Vr7/TR29BSRtI9Ln3FsK2tMVdYyj/Ga8 vCHLT8B+vxGSxqdXW1WAw5Icmnz0IJEXKnubdS1iUqOwb/zHRyeFZHmZmQA5usIAN0NN e3jPeFygLqqnkZkBELRYT1dQdcg2oQ/H3yjdHj9IbgFTzi94A4V2Qm/p+iOodbM8jNLE P5xz4XEBCdb3ZAZlGMHH62hOf+KVdvIa24NlOreyM5HWpDRRhBtY2kjyH67e1Pd+XVwl /eMDwpuN0lFULxjN5ZIXL6ZPD3OoLEii/i9Mh+6e4Vz0joaTiiurTkosU0OsDTFmKJyc jwOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id b129si3134457qke.179.2018.12.03.15.36.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:23 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 78B5D307EA90; Mon, 3 Dec 2018 23:36:22 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 296FE60566; Mon, 3 Dec 2018 23:36:19 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Dan Williams , Dave Hansen , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , Logan Gunthorpe , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli Subject: [RFC PATCH 12/14] mm/hbind: add migrate command to hbind() ioctl Date: Mon, 3 Dec 2018 18:35:07 -0500 Message-Id: <20181203233509.20671-13-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 03 Dec 2018 23:36:22 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This patch add migrate commands to hbind() ioctl, user space can use this commands to migrate a range of virtual address to list of target memory. This does not change the policy for the range, it also ignores any of the existing policy range, it does not changes the policy for the range. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Dan Williams Cc: Dave Hansen Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: Logan Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michal Hocko Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: Mel Gorman Cc: Dave Airlie Cc: Ben Skeggs Cc: Andrea Arcangeli --- include/uapi/linux/hbind.h | 9 ++++++++ mm/hms.c | 43 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+) diff --git a/include/uapi/linux/hbind.h b/include/uapi/linux/hbind.h index 7bb876954e3f..ededbba22121 100644 --- a/include/uapi/linux/hbind.h +++ b/include/uapi/linux/hbind.h @@ -57,6 +57,15 @@ struct hbind_params { */ #define HBIND_CMD_BIND 1 +/* + * HBIND_CMD_MIGRATE move existing memory to use listed target memory. This is + * a best effort. + * + * Additional dwords: + * [0] result ie number of pages that have been migrated. + */ +#define HBIND_CMD_MIGRATE 2 + #define HBIND_IOCTL _IOWR('H', 0x00, struct hbind_params) diff --git a/mm/hms.c b/mm/hms.c index 6be6f4acdd49..6764908f47bf 100644 --- a/mm/hms.c +++ b/mm/hms.c @@ -368,6 +368,39 @@ static int hbind_bind(struct mm_struct *mm, struct hbind_params *params, } +static int hbind_migrate(struct mm_struct *mm, struct hbind_params *params, + const uint32_t *targets, uint32_t *atoms) +{ + unsigned long size, npages; + int ret = -EINVAL; + unsigned i; + + size = PAGE_ALIGN(params->end) - (params->start & PAGE_MASK); + npages = size >> PAGE_SHIFT; + + for (i = 0; params->ntargets; ++i) { + struct hms_target *target; + + target = hms_target_find(targets[i]); + if (target == NULL) + continue; + + ret = target->hbind->migrate(target, mm, params->start, + params->end, params->natoms, + atoms); + hms_target_put(target); + + if (ret) + continue; + + if (atoms[0] >= npages) + break; + } + + return ret; +} + + static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) { uint32_t *targets, *_dtargets = NULL, _ftargets[HBIND_FIX_ARRAY]; @@ -458,6 +491,16 @@ static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) if (ret) goto out_mm; break; + case HBIND_CMD_MIGRATE: + if (ndwords != 2) { + ret = -EINVAL; + goto out_mm; + } + ret = hbind_migrate(current->mm, ¶ms, + targets, atoms); + if (ret) + goto out_mm; + break; default: ret = -EINVAL; goto out_mm; From patchwork Mon Dec 3 23:35:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710913 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26A1213BF for ; Mon, 3 Dec 2018 23:36:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1683229267 for ; Mon, 3 Dec 2018 23:36:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 08E8D2B222; Mon, 3 Dec 2018 23:36:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CDB0629267 for ; Mon, 3 Dec 2018 23:36:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7668E6B6BB7; Mon, 3 Dec 2018 18:36:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6CC606B6BB8; Mon, 3 Dec 2018 18:36:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 519716B6BB9; Mon, 3 Dec 2018 18:36:25 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 24FA96B6BB7 for ; Mon, 3 Dec 2018 18:36:25 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id b26so15290307qtq.14 for ; Mon, 03 Dec 2018 15:36:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=YJeTGw3BudeygPm8Y6qm28IwTKcnnzbfv+AoDd8Lj6U=; b=e4vrTZahcBdsy/DtDNBQjN7f6aEsf9ZIw3lXr7pxiQfGMs5Ci7kGegfrk4wLAPN0c8 2AvS50DCgbupg+rC54Pbf2rZ3E4tSvnJTeGRY6X+hWfnEQoVeDlTKHE0Vo5e5ELp7di4 K7EvfoV+8pnwzCMxonaTpJ/Eobq1rX9odoy9QfgeC+FwhsU7AHn6HsNCukmc/M8byFXA 9QRwWDPyYItve0rY3qcDIm3oREvI1pAxJ0xm+IWvHzF9oawXKgn8Be0QvNHhy0foSCiS Ip599QBuhPK6Ec8rQ8JU4p4cDmngYMAal6E5b0rULuwxxWS8cMCNIREuKtMU4CzITg6W /Qiw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWan/ut4ZCYOcVHP3jJaKDeWwHlLpHzlPWIs9f9uLV0I53y7akBK gpbpH4WtF/ehtMY1x2TcYJE5otsnrfvckkCWYvHiOUVoTtBsEx3FeGmDWRqWXboO6VoHZz+/uM4 v/xssddUykfOgs+o/GTWqrrRC3UpxfdiwM1LZf7if+YSSSrmBd/iZ3F6hAzdwp9zhXQ== X-Received: by 2002:ac8:270d:: with SMTP id g13mr17634274qtg.168.1543880184908; Mon, 03 Dec 2018 15:36:24 -0800 (PST) X-Google-Smtp-Source: AFSGD/WwQo+4MkRH0K95PsOltV8DHGbsWfh8MD0VW1lFRrIt+9Y9hYhv1qDFdH6EkhD01cu6FPze X-Received: by 2002:ac8:270d:: with SMTP id g13mr17634241qtg.168.1543880184176; Mon, 03 Dec 2018 15:36:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880184; cv=none; d=google.com; s=arc-20160816; b=v9yLJGKGhjX5gxF6N2ZEqQcFH6qByRQZsGy7NgWXbsMDaW87uUkTyoqUpbDhCJF5TX S+91BROMK3fErRrC3/HRAzNvt1oigahNJSdI2arWUjBKuPM4s2XQuN52BufrG5JwgNI0 0YD1dw8QWhdsflbaAGuGY4vCefAYX9UHoz4/nQbZ2AhNToS+JF4lqp/TlmgBtjlaATZF Xw3kdJEB3gXMHB1hm4cLratClHrAVjq/dk30Ob+fmC/OMErGbeVkqKGkihbaDObF8uXs VaUpYkeEum9pKUQj0P8wpUdRPtKYSUwRgbdzxI20I8RbKr4PX+whBVLAIw2lCDPQ94cB 8WqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=YJeTGw3BudeygPm8Y6qm28IwTKcnnzbfv+AoDd8Lj6U=; b=S9jzTLiN5k4YydMQh0DcZBMNpXWHmladxn2rcJeg9ZhlkRdB3yZ9XI2lgpJtkHHY54 gFxz9tfaJFf7SBUjCEf6d/SUlRyIjdEtz7xwoRjOvuQg1AK9CCs8hEAJt8K1snDXSUNl BG63rDm3TCYUr+tKjO9fgou5ieYZRI5zsufsk/zSkHHylwSA6xHTLjOZo2GHkSr868HY CXyVagRO+Z7SrLEZDdBhDrNc8+d12GAbLR/6zrqxwcOrclaSe1s8Rla3V5+pwEpWWKtS oIMByjWyKQDTlDiFWInLJYaQl5piRCmvF+uPcKxIKVjKAZJmH2IQJ7lOWmIP+2CNTT/S uLow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id u24si3487126qtc.86.2018.12.03.15.36.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:24 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6332C81F0F; Mon, 3 Dec 2018 23:36:23 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id A192F600C7; Mon, 3 Dec 2018 23:36:22 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= Subject: [RFC PATCH 13/14] drm/nouveau: register GPU under heterogeneous memory system Date: Mon, 3 Dec 2018 18:35:08 -0500 Message-Id: <20181203233509.20671-14-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 03 Dec 2018 23:36:23 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse This register NVidia GPU under heterogeneous memory system so that one can use the GPU memory with new syscall like hbind() for compute work load. Signed-off-by: Jérôme Glisse --- drivers/gpu/drm/nouveau/Kbuild | 1 + drivers/gpu/drm/nouveau/nouveau_hms.c | 80 +++++++++++++++++++++++++++ drivers/gpu/drm/nouveau/nouveau_hms.h | 46 +++++++++++++++ drivers/gpu/drm/nouveau/nouveau_svm.c | 6 ++ 4 files changed, 133 insertions(+) create mode 100644 drivers/gpu/drm/nouveau/nouveau_hms.c create mode 100644 drivers/gpu/drm/nouveau/nouveau_hms.h diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild index a826a4df440d..9c1114b4d8a3 100644 --- a/drivers/gpu/drm/nouveau/Kbuild +++ b/drivers/gpu/drm/nouveau/Kbuild @@ -37,6 +37,7 @@ nouveau-y += nouveau_prime.o nouveau-y += nouveau_sgdma.o nouveau-y += nouveau_ttm.o nouveau-y += nouveau_vmm.o +nouveau-$(CONFIG_HMS) += nouveau_hms.o # DRM - modesetting nouveau-$(CONFIG_DRM_NOUVEAU_BACKLIGHT) += nouveau_backlight.o diff --git a/drivers/gpu/drm/nouveau/nouveau_hms.c b/drivers/gpu/drm/nouveau/nouveau_hms.c new file mode 100644 index 000000000000..52af9180e108 --- /dev/null +++ b/drivers/gpu/drm/nouveau/nouveau_hms.c @@ -0,0 +1,80 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ +#include "nouveau_dmem.h" +#include "nouveau_drv.h" +#include "nouveau_hms.h" + +#include + +static int nouveau_hms_migrate(struct hms_target *target, struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned natoms, uint32_t *atoms) +{ + struct nouveau_hms *hms = target->private; + struct nouveau_drm *drm = hms->drm; + unsigned long addr; + int ret = 0; + + down_read(&mm->mmap_sem); + + for (addr = start; addr < end;) { + struct vm_area_struct *vma; + unsigned long next; + + vma = find_vma_intersection(mm, addr, end); + if (!vma) + break; + + next = min(vma->vm_end, end); + ret = nouveau_dmem_migrate_vma(drm, vma, addr, next); + // FIXME ponder more on what to do + addr = next; + } + + up_read(&mm->mmap_sem); + + return ret; +} + +const static struct hms_target_hbind nouveau_hms_target_hbind = { + .migrate = nouveau_hms_migrate, +}; + + +void nouveau_hms_init(struct nouveau_drm *drm, struct nouveau_hms *hms) +{ + unsigned long vram_size = drm->gem.vram_available; + struct device *parent; + + hms->drm = drm; + parent = drm->dev->pdev ? &drm->dev->pdev->dev : drm->dev->dev; + hms_target_register(&hms->target, parent, drm->dev->dev->numa_node, + &nouveau_hms_target_hbind, vram_size, 0); + if (hms->target) { + hms->target->private = hms; + } +} + +void nouveau_hms_fini(struct nouveau_drm *drm, struct nouveau_hms *hms) +{ + hms_target_unregister(&hms->target); +} diff --git a/drivers/gpu/drm/nouveau/nouveau_hms.h b/drivers/gpu/drm/nouveau/nouveau_hms.h new file mode 100644 index 000000000000..cda111d7044b --- /dev/null +++ b/drivers/gpu/drm/nouveau/nouveau_hms.h @@ -0,0 +1,46 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ +#ifndef __NOUVEAU_HMS_H__ +#define __NOUVEAU_HMS_H__ + +#if IS_ENABLED(CONFIG_HMS) + +#include + +struct nouveau_hms { + struct hms_target *target; + struct nouveau_drm *drm; +}; + +void nouveau_hms_init(struct nouveau_drm *drm, struct nouveau_hms *hms); +void nouveau_hms_fini(struct nouveau_drm *drm, struct nouveau_hms *hms); + +#else /* IS_ENABLED(CONFIG_HMS) */ + +struct nouveau_hms { +}; + +#define nouveau_hms_init(drm, hms) +#define nouveau_hms_fini(drm, hms) + +#endif /* IS_ENABLED(CONFIG_HMS) */ +#endif diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 23435ee27892..26daa6d50766 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -23,6 +23,7 @@ #include "nouveau_drv.h" #include "nouveau_chan.h" #include "nouveau_dmem.h" +#include "nouveau_hms.h" #include #include @@ -44,6 +45,8 @@ struct nouveau_svm { int refs; struct list_head inst; + struct nouveau_hms hms; + struct nouveau_svm_fault_buffer { int id; struct nvif_object object; @@ -766,6 +769,7 @@ nouveau_svm_suspend(struct nouveau_drm *drm) void nouveau_svm_fini(struct nouveau_drm *drm) { + nouveau_hms_fini(drm, &drm->svm->hms); kfree(drm->svm); } @@ -776,6 +780,8 @@ nouveau_svm_init(struct nouveau_drm *drm) drm->svm->drm = drm; mutex_init(&drm->svm->mutex); INIT_LIST_HEAD(&drm->svm->inst); + + nouveau_hms_init(drm, &drm->svm->hms); } } From patchwork Mon Dec 3 23:35:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B37513BF for ; Mon, 3 Dec 2018 23:36:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AB6029267 for ; Mon, 3 Dec 2018 23:36:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E9E42B222; Mon, 3 Dec 2018 23:36:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8796A29267 for ; Mon, 3 Dec 2018 23:36:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF7286B6BB8; Mon, 3 Dec 2018 18:36:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D7CD76B6BB9; Mon, 3 Dec 2018 18:36:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCF276B6BBA; Mon, 3 Dec 2018 18:36:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id 8D89A6B6BB8 for ; Mon, 3 Dec 2018 18:36:26 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id w1so15013218qta.12 for ; Mon, 03 Dec 2018 15:36:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=AMDL3mEsjx4hNBf1ZrwltROruS71RegkJQEKehEBnE0=; b=knwyHGS4MZHnGUM7yvsQV+9XgSeg4GJdAXEyeiSsZmTnB6cLqah/5xqcOyncxaaJQR VtH4YKxUZcVQQb42+feoml4h80C0W9+YeMISi8Z+/KwTebSzsJ3+N6RLtDQ1kh4gPqq8 FPdG+BH8F+4kdoK+OwVVkFJTssVI2jxIUS3ga0VkHVrMAxmwFZNBU0UDI6v7/IiitvIm ZyUM4h9wL93VIavb2TlXO/7hcYyIbm/K2rtZRWbT1gFAqSPOjQzSskKyxRZmSzzexQvm bKU3DCnaH7yhk74A0Qt0DGSkG+yYDzBVuypm2zSjLKLLNv5mU7E9U+l0pOV6SCVMoSpf UC6g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZA5d6spoHrLvrVIXUXFNB8Zc/nftSf3VvJy/vkEiyyiXuI3paS 6k3ueWyGvQWafpR7poY/7aMBws7gbo2wrG6JcKk8ZCl9M8J/VKALIjMyVgFUtVVbs6QGrVj40gx DNAq+ec7hAXDW/IbdOl8k5bijsQATD69fpyp2om+9J2BT5U9TDox9lcoTdbcnuMXJxA== X-Received: by 2002:a0c:ec50:: with SMTP id n16mr18112088qvq.105.1543880186312; Mon, 03 Dec 2018 15:36:26 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xgq9DKsy5PuOXEtVlATmBIVxFF2vzehP1CVnIgDOSe5zJf+zvY0Wjs2QIdIONG/ClmgLqE X-Received: by 2002:a0c:ec50:: with SMTP id n16mr18112032qvq.105.1543880185072; Mon, 03 Dec 2018 15:36:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880185; cv=none; d=google.com; s=arc-20160816; b=XKn6Yilcd6k+FRpfrv/lvBUgO8bbH3au28bWq26rbsRqBYBrAPAPnA8fU4CGeNv4d7 P+TPENK9/pTeuV9PYQzxgrAD7GqsIL2NTD7fbEn1zzScDKoJZvMpWB/YZvQPVQ1da5iS Zc4Hgjll+ye4sYdxbw34NWEwF7d/KYEylbzm5tauMJ60jpndyDYsqkX56dLrKgzOBP5o G9ONFjXp0QIm38ruCF23+zgYm/8t+oddvwSQSO7ZzCEfvpHdiF6kqylPd6z8HHzPRAtg N1yM1rlBqAGUaHFoUYNMLfWlFOvS5hewVyrEQnduTx33b91Ep29yFFL5fH9MvJgIv8BN JTnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=AMDL3mEsjx4hNBf1ZrwltROruS71RegkJQEKehEBnE0=; b=ihWU2puYcH1RZSTWKQGA6q1EzwL/Ka54U0C1yozmice+OZibidrItIOE94o4XXPs6q VQxEEI19WhSqCbpgD169j6HRoHAOb6eKmvsBTVmpUG8OH8qducO7PldvQQZ+NQXUYYfT jKErlvuXZiIvbpW5p1rr14lByEF3atIwmXcHxofV5kwVQnAei3yWsAs88Tmz4k9FWT/L YsrcgKuA1qVVFhL7bmE9+VS9UjeQCXgGoXqdYpZ02Wl3Yb2f6wEyb5Z5wjxWdPY4ac6n X8zLPP3THEGzv77zGy6I4lbVzKynKsF1J2Rvm+6cnx06tfbZm5BCX+/a0of0kPWnxDHn VDPw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id a41si9428631qtb.19.2018.12.03.15.36.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:25 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4BF7C307886D; Mon, 3 Dec 2018 23:36:24 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8E91B600C7; Mon, 3 Dec 2018 23:36:23 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= Subject: [RFC PATCH 14/14] test/hms: tests for heterogeneous memory system Date: Mon, 3 Dec 2018 18:35:09 -0500 Message-Id: <20181203233509.20671-15-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Mon, 03 Dec 2018 23:36:24 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse Set of tests for heterogeneous memory system (migration, binding, ...) Signed-off-by: Jérôme Glisse --- tools/testing/hms/Makefile | 17 ++ tools/testing/hms/hbind-create-device-file.sh | 11 + tools/testing/hms/test-hms-migrate.c | 77 ++++++ tools/testing/hms/test-hms.c | 237 ++++++++++++++++++ tools/testing/hms/test-hms.h | 67 +++++ 5 files changed, 409 insertions(+) create mode 100644 tools/testing/hms/Makefile create mode 100755 tools/testing/hms/hbind-create-device-file.sh create mode 100644 tools/testing/hms/test-hms-migrate.c create mode 100644 tools/testing/hms/test-hms.c create mode 100644 tools/testing/hms/test-hms.h diff --git a/tools/testing/hms/Makefile b/tools/testing/hms/Makefile new file mode 100644 index 000000000000..57223a671cb0 --- /dev/null +++ b/tools/testing/hms/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 +LDFLAGS += -fsanitize=address -fsanitize=undefined +CFLAGS += -std=c99 -D_GNU_SOURCE -I. -I../../../include/uapi -g -Og -Wall +LDLIBS += -lpthread +TARGETS = test-hms-migrate +OFILES = test-hms + +targets: $(TARGETS) + +$(TARGETS): $(OFILES:%=%.o) $(TARGETS:%=%.c) + $(CC) $(CFLAGS) -o $@ $(OFILES:%=%.o) $@.c + +clean: + $(RM) $(TARGETS) *.o + +%.o: Makefile *.h %.c + $(CC) $(CFLAGS) -o $@ -c $(@:%.o=%.c) diff --git a/tools/testing/hms/hbind-create-device-file.sh b/tools/testing/hms/hbind-create-device-file.sh new file mode 100755 index 000000000000..60c2533cc85d --- /dev/null +++ b/tools/testing/hms/hbind-create-device-file.sh @@ -0,0 +1,11 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +major=10 +minor=$(awk "\$2==\"hbind\" {print \$1}" /proc/misc) + +echo hbind device minor is $minor, creating device file: +sudo rm /dev/hbind +sudo mknod /dev/hbind c $major $minor +sudo chmod 666 /dev/hbind +echo /dev/hbind created diff --git a/tools/testing/hms/test-hms-migrate.c b/tools/testing/hms/test-hms-migrate.c new file mode 100644 index 000000000000..b90f701c0b75 --- /dev/null +++ b/tools/testing/hms/test-hms-migrate.c @@ -0,0 +1,77 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +#include + +#include "test-hms.h" + +int main(int argc, char *argv[]) +{ + struct hms_context ctx; + struct hms_object *target = NULL; + uint64_t targets[1], ntargets = 1; + unsigned long size = 64 << 10; + unsigned long start, end, i; + unsigned *ptr; + int ret; + + if (argc != 2) { + printf("EE: usage: %s targetname\n", argv[0]); + return -1; + } + + hms_context_init(&ctx); + + /* Find target */ + do { + target = hms_context_object_find_reference(&ctx, target, argv[1]); + } while (target && target->type != HMS_TARGET); + if (target == NULL) { + printf("EE: could not find %s target\n", argv[1]); + return -1; + } + + /* Allocate memory */ + ptr = hms_malloc(size); + for (i = 0; i < (size / 4); ++i) { + ptr[i] = i; + } + + /* Migrate to target */ + targets[0] = target->id; + start = (uintptr_t)ptr; + end = start + size; + ntargets = 1; + ret = hms_migrate(&ctx, start, end, targets, ntargets); + if (ret) { + printf("EE: migration failure (%d)\n", ret); + } else { + for (i = 0; i < (size / 4); ++i) { + if (ptr[i] != i) { + printf("EE: migration failure ptr[%ld] = %d\n", i, ptr[i]); + goto out; + } + } + printf("OK: migration successful\n"); + } + +out: + /* Free */ + hms_mfree(ptr, size); + + hms_context_fini(&ctx); + return 0; +} diff --git a/tools/testing/hms/test-hms.c b/tools/testing/hms/test-hms.c new file mode 100644 index 000000000000..0502f49198c4 --- /dev/null +++ b/tools/testing/hms/test-hms.c @@ -0,0 +1,237 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "test-hms.h" +#include "linux/hbind.h" + + +static unsigned long page_mask = 0; +static int page_size = 0; +static int page_shift = 0; + +static inline void page_shift_init(void) +{ + if (!page_shift) { + page_size = sysconf(_SC_PAGE_SIZE); + + page_shift = ffs(page_size) - 1; + page_mask = ~((unsigned long)(page_size - 1)); + } +} + +static unsigned long page_align(unsigned long size) +{ + return (size + page_size - 1) & page_mask; +} + +void hms_object_parse_dir(struct hms_object *object, const char *ctype) +{ + struct dirent *dirent; + char dirname[256]; + DIR *dirp; + + snprintf(dirname, 255, "/sys/bus/hms/devices/v%u-%u-%s", + object->version, object->id, ctype); + dirp = opendir(dirname); + if (dirp == NULL) { + return; + } + while ((dirent = readdir(dirp))) { + struct hms_reference *reference; + + if (dirent->d_type != DT_LNK || !strcmp(dirent->d_name, "subsystem")) { + continue; + } + + reference = malloc(sizeof(*reference)); + strcpy(reference->name, dirent->d_name); + reference->object = NULL; + + reference->next = object->references; + object->references = reference; + } + closedir(dirp); +} + +void hms_object_free(struct hms_object *object) +{ + struct hms_reference *reference = object->references; + + for (; reference; reference = object->references) { + object->references = reference->next; + free(reference); + } + + free(object); +} + + +void hms_context_init(struct hms_context *ctx) +{ + struct dirent *dirent; + DIR *dirp; + + ctx->objects = NULL; + + /* Scan targets, initiators, links, bridges ... */ + dirp = opendir("/sys/bus/hms/devices/"); + if (dirp == NULL) { + printf("EE: could not open /sys/bus/hms/devices/\n"); + exit(-1); + } + while ((dirent = readdir(dirp))) { + struct hms_object *object; + unsigned version, id; + enum hms_type type; + char ctype[256]; + + if (dirent->d_type != DT_LNK || dirent->d_name[0] != 'v') { + continue; + } + if (sscanf(dirent->d_name, "v%d-%d-%s", &version, &id, ctype) != 3) { + continue; + } + + if (!strcmp("link", ctype)) { + type = HMS_LINK; + } else if (!strcmp("bridge", ctype)) { + type = HMS_BRIDGE; + } else if (!strcmp("target", ctype)) { + type = HMS_TARGET; + } else if (!strcmp("initiator", ctype)) { + type = HMS_INITIATOR; + } else { + continue; + } + + object = malloc(sizeof(*object)); + object->references = NULL; + object->version = version; + object->type = type; + object->id = id; + + object->next = ctx->objects; + ctx->objects = object; + + hms_object_parse_dir(object, ctype); + } + closedir(dirp); + + ctx->fd = open("/dev/hbind", O_RDWR); + if (ctx->fd < 0) { + printf("EE: could not open /dev/hbind\n"); + exit(-1); + } +} + +void hms_context_fini(struct hms_context *ctx) +{ + struct hms_object *object = ctx->objects; + + for (; object; object = ctx->objects) { + ctx->objects = object->next; + hms_object_free(object); + } + + close(ctx->fd); +} + +struct hms_object *hms_context_object_find_reference(struct hms_context *ctx, + struct hms_object *object, + const char *name) +{ + object = object ? object->next : ctx->objects; + for (; object; object = object->next) { + struct hms_reference *reference = object->references; + + for (; reference; reference = reference->next) { + if (!strcmp(reference->name, name)) { + return object; + } + } + } + + return NULL; +} + + +int hms_migrate(struct hms_context *ctx, + unsigned long start, + unsigned long end, + uint64_t *targets, + unsigned ntargets) +{ + struct hbind_params params; + uint64_t atoms[2], natoms; + int ret; + + atoms[0] = HBIND_ATOM_SET_CMD(HBIND_CMD_MIGRATE) | + HBIND_ATOM_SET_DWORDS(1); + atoms[1] = 0; + natoms = 2; + + params.targets = (uintptr_t)targets; + params.atoms = (uintptr_t)atoms; + + params.ntargets = ntargets; + params.natoms = natoms; + params.start = start; + params.end = end; + + do { + ret = ioctl(ctx->fd, HBIND_IOCTL, ¶ms); +printf("ret %d artoms %d\n", ret, (int)atoms[1]); + } while (ret && (errno == EINTR)); + + /* Result of migration is in the atoms after cmd dword */ +printf("ret %d artoms %d\n", ret, (int)atoms[1]); + ret = ret ? ret : atoms[1]; + + return ret; +} + + +void *hms_malloc(unsigned long size) +{ + void *ptr; + + page_shift_init(); + + ptr = mmap(0, page_align(size), PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (ptr == MAP_FAILED) { + return NULL; + } + return ptr; +} + +void hms_mfree(void *ptr, unsigned long size) +{ + munmap(ptr, page_align(size)); +} diff --git a/tools/testing/hms/test-hms.h b/tools/testing/hms/test-hms.h new file mode 100644 index 000000000000..b5d625e18d59 --- /dev/null +++ b/tools/testing/hms/test-hms.h @@ -0,0 +1,67 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +#ifndef TEST_HMS_H +#define TEST_HMS_H + +#include + +enum hms_type { + HMS_LINK = 0, + HMS_BRIDGE, + HMS_TARGET, + HMS_INITIATOR, +}; + +struct hms_reference { + char name[256]; + struct hms_object *object; + struct hms_reference *next; +}; + +struct hms_object { + struct hms_reference *references; + struct hms_object *next; + unsigned version; + unsigned id; + enum hms_type type; +}; + +struct hms_context { + struct hms_object *objects; + int fd; +}; + +void hms_context_init(struct hms_context *ctx); +void hms_context_fini(struct hms_context *ctx); +struct hms_object *hms_context_object_find_reference(struct hms_context *ctx, + struct hms_object *object, + const char *name); + + +int hms_migrate(struct hms_context *ctx, + unsigned long start, + unsigned long end, + uint64_t *targets, + unsigned ntargets); + + +/* Provide page align memory allocations */ +void *hms_malloc(unsigned long size); +void hms_mfree(void *ptr, unsigned long size); + + +#endif