From patchwork Mon Dec 3 23:35:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 10710905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F075E14E2 for ; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0DB62B21F for ; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D47D92B242; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E4E32B21F for ; Mon, 3 Dec 2018 23:36:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BEEE6B6BB3; Mon, 3 Dec 2018 18:36:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1719B6B6BB4; Mon, 3 Dec 2018 18:36:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2F236B6BB5; Mon, 3 Dec 2018 18:36:13 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id C6FC66B6BB3 for ; Mon, 3 Dec 2018 18:36:13 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id v74so14804920qkb.21 for ; Mon, 03 Dec 2018 15:36:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=8hFoGAyCjPU00zOBP6weebSI+EaYdnYXZx/EEMWWNOo=; b=dydK0kBGdIDOgQ2jShN74FoilnBmTBzxRprSxJinRwcO4oYXehH7zG3eXoxTlaPTbk VRZBWIzwJI/rQCVFBZ4z1V9tHGsL4Hg4U2ORr/VeooaHLIH6j1AuXwkfxdCBWfQXBUdv 8qJC4xnW+MlGvPBsMuYlV3Q2r76Zn+tsJf57Hbmth87GsNj/jDeZa2VuexBTnw6RgteH dizLi3YEnA10pXeO1tKvUsvZWqc8r485R1NlJraXz0Qoct1tM3MxqaLyDqPI+uuk+9Wx 1JHK4YK9sgvULp2wmBzFnWgDLG7bo60XQ2T+U20zG1Ny2GxCAHDTNZdn8AtZAkCc3Tqn P+Gg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWZcqdJ5bSV5g5CFqLcdv0UWcVz5NZhXdEcTtKJYzQAxCCw1yb1y 0VkJ9V/eqbZBhy/JSnMMvA3zuANY/GSafPbJm9ySFtOGsl0u67k8RPa4DEKQFaJZqirhOXTfCRj V3hlZ8so9EdeHpQR8G7K6cVlKcOFuvC75sqAoJWpHXhEMoboI6L7RGCk9rIjPU9TGZg== X-Received: by 2002:a0c:f787:: with SMTP id s7mr17804894qvn.167.1543880173566; Mon, 03 Dec 2018 15:36:13 -0800 (PST) X-Google-Smtp-Source: AFSGD/XtlFx1Jvcl/HQxq0sRs4EjSlpAf0VikkDO7uERT6SU/HjF8+uXZKkJb8FY/ztFynFwDH6P X-Received: by 2002:a0c:f787:: with SMTP id s7mr17804861qvn.167.1543880172834; Mon, 03 Dec 2018 15:36:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543880172; cv=none; d=google.com; s=arc-20160816; b=IKjB5PMfiPf6aPCqMi9oHU2oQmyeFaeFsTdQj6mGmYUoYmvTUMLYoi88cR+/RQN0Ga uWSSU56K3QcEfxbZ8H8Xax3fmrPYavS6atPUoPSGvp2yE+rHzq08Xtt1kn5G2kYlisPZ hPKctdukde3yxXuaxKvVhL3IjAoSDp0tvsH3Jik2NcUaca9TBarMU5SULEEnyaExAXzR XtJpTXmh1XwrMPj0t46ap+Eq/lbGZgshR+Lt4xxYhBGjkDSI6VPhKLWOPYdHJ38xx6Bx xjNyo3Cgbm/HLr2kRQFS7Q2SE3Z0YsLI5rhD6rF+8Yo59BsxLKW7k/bFzaw/YCCVwe1v 4ooA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=8hFoGAyCjPU00zOBP6weebSI+EaYdnYXZx/EEMWWNOo=; b=gefeDFY8Trz28sSarmX/n56GqrIAI8/yvXBN/8K20aunqenFFa4um94qAtd6kA1eRu LUGbF620I9NMAXDu74ulwoZ7mkTz+wcEggCLesKfvYxd1S/CB7pN37WbkRmKJ3OKJ6xV KdpaJvkLOpFknvG2r5FCFsT4FTtQ2VoIoetHA0ih6hAcOPtDMacoxt1Di+0zFyhpIlRP Jk0uNaD4SFgoH4P5IyAoUTrCvU7B2NAmMM2KevqXkYR1A/HnUGe741yEdFdupHsdV+Yh eoiHUqzuyMhPiTSOKlBgl0Sli0RUrtsI8sXaFHTiMg51EuW2blsUoJyAlXG34MmZJJH8 f/AQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id j1si1907qkj.111.2018.12.03.15.36.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 15:36:12 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D31F187622; Mon, 3 Dec 2018 23:36:11 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-120-188.rdu2.redhat.com [10.10.120.188]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5A312600C1; Mon, 3 Dec 2018 23:36:09 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , "Rafael J . Wysocki" , Ross Zwisler , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Felix Kuehling , Philip Yang , =?utf-8?q?Christian_K=C3=B6nig?= , Paul Blinzer , John Hubbard , Jonathan Cameron , Mark Hairgrove , Vivek Kini Subject: [RFC PATCH 09/14] mm/hms: hbind() for heterogeneous memory system (aka mbind() for HMS) Date: Mon, 3 Dec 2018 18:35:04 -0500 Message-Id: <20181203233509.20671-10-jglisse@redhat.com> In-Reply-To: <20181203233509.20671-1-jglisse@redhat.com> References: <20181203233509.20671-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 03 Dec 2018 23:36:12 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jérôme Glisse With the advance of heterogeneous computing and the new kind of memory topology that are now becoming more widespread (CPU HBM, persistent memory, ...). We no longer just have a flat memory topology inside a numa node. Instead there is a hierarchy of memory for instance HBM for CPU versus main memory. Moreover there is also device memory a good example is GPU which have a large amount of memory (several giga bytes and it keeps growing). In face of this the mbind() API is too limited to allow precise selection of which memory to use inside a node. This is why this patchset introduce a new API hbind() for heterogeneous bind, that allow to bind any kind of memory wether it is some specific memory like CPU's HBM in a node, or some device memory. Instead of using a bitmap, hbind() take an array of uid and each uid is a unique memory target inside the new HMS topology description. Signed-off-by: Jérôme Glisse Cc: Rafael J. Wysocki Cc: Ross Zwisler Cc: Haggai Eran Cc: Balbir Singh Cc: Aneesh Kumar K.V Cc: Felix Kuehling Cc: Philip Yang Cc: Christian König Cc: Paul Blinzer Cc: John Hubbard Cc: Jonathan Cameron Cc: Mark Hairgrove Cc: Vivek Kini Cc: linux-mm@kvack.org --- include/uapi/linux/hbind.h | 46 +++++++++++ mm/Makefile | 1 + mm/hms.c | 158 +++++++++++++++++++++++++++++++++++++ 3 files changed, 205 insertions(+) create mode 100644 include/uapi/linux/hbind.h create mode 100644 mm/hms.c diff --git a/include/uapi/linux/hbind.h b/include/uapi/linux/hbind.h new file mode 100644 index 000000000000..a9aba17ab142 --- /dev/null +++ b/include/uapi/linux/hbind.h @@ -0,0 +1,46 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#ifndef LINUX_UAPI_HBIND +#define LINUX_UAPI_HBIND + + +/* For now just freak out if it is bigger than a page. */ +#define HBIND_MAX_TARGETS (4096 / 4) +#define HBIND_MAX_ATOMS (4096 / 4) + + +struct hbind_params { + uint64_t start; + uint64_t end; + uint32_t ntargets; + uint32_t natoms; + uint64_t targets; + uint64_t atoms; +}; + + +#define HBIND_ATOM_GET_DWORDS(v) (((v) >> 20) & 0xfff) +#define HBIND_ATOM_SET_DWORDS(v) (((v) & 0xfff) << 20) +#define HBIND_ATOM_GET_CMD(v) ((v) & 0xfffff) +#define HBIND_ATOM_SET_CMD(v) ((v) & 0xfffff) + + +#define HBIND_IOCTL _IOWR('H', 0x00, struct hbind_params) + + +#endif /* LINUX_UAPI_HBIND */ diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..0537a95f6cbd 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -99,3 +99,4 @@ obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o obj-$(CONFIG_HMM) += hmm.o obj-$(CONFIG_MEMFD_CREATE) += memfd.o +obj-$(CONFIG_HMS) += hms.o diff --git a/mm/hms.c b/mm/hms.c new file mode 100644 index 000000000000..bf328bd577dc --- /dev/null +++ b/mm/hms.c @@ -0,0 +1,158 @@ +/* + * Copyright 2018 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Authors: + * Jérôme Glisse + */ +/* Heterogeneous memory system (HMS) see Documentation/vm/hms.rst */ +#define pr_fmt(fmt) "hms: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + +#define HBIND_FIX_ARRAY 64 + + +static ssize_t hbind_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static ssize_t hbind_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static long hbind_ioctl(struct file *file, unsigned cmd, unsigned long arg) +{ + uint32_t *targets, *_dtargets = NULL, _ftargets[HBIND_FIX_ARRAY]; + uint32_t *atoms, *_datoms = NULL, _fatoms[HBIND_FIX_ARRAY]; + void __user *uarg = (void __user *)arg; + struct hbind_params params; + uint32_t i, ndwords; + int ret; + + switch(cmd) { + case HBIND_IOCTL: + break; + default: + return -EINVAL; + } + + ret = copy_from_user(¶ms, uarg, sizeof(params)); + if (ret) + return ret; + + /* Some sanity checks */ + params.start &= PAGE_MASK; + params.end = PAGE_ALIGN(params.end); + if (params.end <= params.start) + return -EINVAL; + + /* More sanity checks */ + if (params.ntargets > HBIND_MAX_TARGETS) + return -EINVAL; + + /* We need at least one atoms. */ + if (!params.natoms || params.natoms > HBIND_MAX_ATOMS) + return -EINVAL; + + /* Let's allocate memory for parameters. */ + if (params.ntargets > HBIND_FIX_ARRAY) { + _dtargets = kzalloc(4 * params.ntargets, GFP_KERNEL); + if (_dtargets == NULL) + return -ENOMEM; + targets = _dtargets; + } else { + targets = _ftargets; + } + if (params.natoms > HBIND_FIX_ARRAY) { + _datoms = kzalloc(4 * params.natoms, GFP_KERNEL); + if (_datoms == NULL) { + ret = -ENOMEM; + goto out; + } + atoms = _datoms; + } else { + atoms = _fatoms; + } + + /* Let's fetch hbind() parameters. */ + ret = copy_from_user(atoms, (void __user *)params.atoms, + 4 * params.natoms); + if (ret) + goto out; + ret = copy_from_user(targets, (void __user *)params.targets, + 4 * params.ntargets); + if (ret) + goto out; + + mmget(current->mm); + + /* Sanity checks atoms and execute them. */ + for (i = 0, ndwords = 1; i < params.natoms; i += ndwords) { + ndwords = 1 + HBIND_ATOM_GET_DWORDS(atoms[i]); + switch (HBIND_ATOM_GET_CMD(atoms[i])) { + default: + ret = -EINVAL; + goto out_mm; + } + } + +out_mm: + copy_to_user((void __user *)params.atoms, atoms, 4 * params.natoms); + mmput(current->mm); +out: + kfree(_dtargets); + kfree(_datoms); + return ret; +} + +const struct file_operations hbind_fops = { + .llseek = no_llseek, + .read = hbind_read, + .write = hbind_write, + .unlocked_ioctl = hbind_ioctl, + .owner = THIS_MODULE, +}; + +static struct miscdevice hbind_device = { + .minor = MISC_DYNAMIC_MINOR, + .fops = &hbind_fops, + .name = "hbind", +}; + +int __init hbind_init(void) +{ + pr_info("Heterogeneous memory system (HMS) hbind() driver\n"); + return misc_register(&hbind_device); +} + +void __exit hbind_fini(void) +{ + misc_deregister(&hbind_device); +} + +module_init(hbind_init); +module_exit(hbind_fini);