From patchwork Wed May 1 14:04:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Welty, Brian" X-Patchwork-Id: 10925099 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 542261390 for ; Wed, 1 May 2019 14:03:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4152628CFE for ; Wed, 1 May 2019 14:03:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3F0BD28E72; Wed, 1 May 2019 14:03:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 157BB28D34 for ; Wed, 1 May 2019 14:03:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BD846B0008; Wed, 1 May 2019 10:03:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 06F9E6B000A; Wed, 1 May 2019 10:03:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E502E6B000C; Wed, 1 May 2019 10:03:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id A10FB6B0008 for ; Wed, 1 May 2019 10:03:03 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id l13so10967996pgp.3 for ; Wed, 01 May 2019 07:03:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=/tcr+ILPTxiOuSQzXSBhdny+n0XPHTWElZnbY08qaIc=; b=C7OnpiLOGy4rQnc3lSJ8rmYwiwoMTWje3MIJ7YSvyQ2CqeYO+ze6JXBRowGkJlneTy X6ckp75Jj+2FelPh8uuCyBlFK54elhp/oI/AW5SDs41Bb8vHFGDAK9OWIllYuWg8/0Jq fVA+MFskToMNua40laJMUcVWqNIx7ohpZeTp7g1U8ifO30o1ITNSGFWWi99qsZb8Ny08 0e7tkKlm35rbetmcFF1OzKFih/YVHBfOqH54mJuIDHjgXxMmCwQW8a5VIgFigileyLh2 86iDwtvBgXba24iFn7JnpsZyy1tvGzrvhKSnGRe04UNGdWT6galo2wn0qChq4iYyjzl/ p2CQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAVrVie6k299qQsCJLlzZMT5oPaw6ivdFaEn/nyCMWO22rjGVcm2 tMGU/nNHWIHYAw2d5O10wyleIu/IPembuG3lmlWhIUuAUoSPIT15+l4qmPEQ4exsTzJLMe56LxE RBQkoMVs0dEpBC51oWqKlrtTyzOF0BggnPoGNyZDQoeL82vG/EgBn5VdtQtKArFSQeg== X-Received: by 2002:a62:6842:: with SMTP id d63mr56191903pfc.9.1556719383262; Wed, 01 May 2019 07:03:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqxKk8ehfeDBS30OdrGvJSWtAgg+mlR0gHlZOix/PovpmWXt4MDQKYwzFEkEdRQ4r7cOPs+/ X-Received: by 2002:a62:6842:: with SMTP id d63mr56191699pfc.9.1556719381744; Wed, 01 May 2019 07:03:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556719381; cv=none; d=google.com; s=arc-20160816; b=d9Kogf3Lx6C0mzC/oEKzftrZMmJTpfpwyg25OMoZv5Wy6l5izSOKqCSYBUatM7BS97 WxiG3D07pJTgRdEWLBotnJ5pXVCiSawo9/t+AYXsAnHCDjFdD7F8M/5z0+kMHzmI/RLb 2kSpoji+gm2baaCqsU5B8g9fmSwk3mNs16T5fhXt0ksJW8mGddowS76gu4gwNTDrqvE8 nEAb2k9AI38Cdpoe3TPCKPf08p9B6HaBE1zJdvuGP07a3dq4Tj0f4N0v+M7cfO/+H2cp zd+GaW+7d3dQ4ImmtTK4z5PdsJJLh9XMBnSEEpQtZGGzRw6mw6zS6TLToQMe0Ddt4lCo tEkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from; bh=/tcr+ILPTxiOuSQzXSBhdny+n0XPHTWElZnbY08qaIc=; b=nrS4jmCiQ3KZeMkiNXyEzDUdFtXd1Qsx0oZxLsDF/n56qa/1FNTU6ChVQWKNAp2MN8 9s84S1d+PZnF7kTvMAXUdTF34E+VGf9V+rGQFmI4szggiceV6RAKjykxRU6F9bARhoIR kXsREknhOGv9PPRdg+SsRQajY7wA/UMBEXgFL3EypsICGC0AKdwZDQf//X58vS5nB3NE aIvQbkUDlHjbFJ2xmUoGCmdtWW0NmidyWfzpYBmaB/w2gHEAYgdLldOmmJ2juQaw7/EL UWRDR8LipcbDHjeDYeDPT54KWuKAM+QwyzSV3rTCyXfzd70cB0uLRrpK1pZR0jcvQGOP TGiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id q25si38000486pgv.534.2019.05.01.07.03.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 May 2019 07:03:01 -0700 (PDT) Received-SPF: pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2019 07:03:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,417,1549958400"; d="scan'208";a="145141390" Received: from nperf12.hd.intel.com ([10.127.88.161]) by fmsmga008.fm.intel.com with ESMTP; 01 May 2019 07:02:59 -0700 From: Brian Welty To: cgroups@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , linux-mm@kvack.org, Michal Hocko , Vladimir Davydov , dri-devel@lists.freedesktop.org, David Airlie , Daniel Vetter , intel-gfx@lists.freedesktop.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , =?utf-8?q?Christian_K=C3=B6nig?= , Alex Deucher , ChunMing Zhou , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= Subject: [RFC PATCH 1/5] cgroup: Add cgroup_subsys per-device registration framework Date: Wed, 1 May 2019 10:04:34 -0400 Message-Id: <20190501140438.9506-2-brian.welty@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190501140438.9506-1-brian.welty@intel.com> References: <20190501140438.9506-1-brian.welty@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP In containerized or virtualized environments, there is desire to have controls in place for resources that can be consumed by users of a GPU device. For this purpose, we extend control groups with a mechanism for device drivers to register with cgroup subsystems. Device drivers (GPU or other) are then able to reuse the existing cgroup controls, instead of inventing similar ones. A new framework is proposed to allow devices to register with existing cgroup controllers, which creates per-device cgroup_subsys_state within the cgroup. This gives device drivers their own private cgroup controls (such as memory limits or other parameters) to be applied to device resources instead of host system resources. It is exposed in cgroup filesystem as: mount//.devices// such as (for example): mount//memory.devices//memory.max mount//memory.devices//memory.current mount//cpu.devices//cpu.stat The creation of above files is implemented in css_populate_dir() for cgroup subsystems that have enabled per-device support. Above files are created either at time of cgroup creation (for known registered devices) or at the time of device driver registration of the device, during cgroup_register_device. cgroup_device_unregister will remove files from all current cgroups. Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: dri-devel@lists.freedesktop.org Cc: Matt Roper Signed-off-by: Brian Welty --- include/linux/cgroup-defs.h | 28 ++++ include/linux/cgroup.h | 3 + kernel/cgroup/cgroup.c | 270 ++++++++++++++++++++++++++++++++++-- 3 files changed, 289 insertions(+), 12 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1c70803e9f77..aeaab420e349 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -162,6 +162,17 @@ struct cgroup_subsys_state { struct work_struct destroy_work; struct rcu_work destroy_rwork; + /* + * Per-device state for devices registered with our subsys. + * @device_css_idr stores pointer to per-device cgroup_subsys_state, + * created when devices are associated with this css. + * @device_kn is for creating .devices sub-directory within this cgroup + * or for the per-device sub-directory (subsys.devices/). + */ + struct device *device; + struct idr device_css_idr; + struct kernfs_node *device_kn; + /* * PI: the parent css. Placed here for cache proximity to following * fields of the containing structure. @@ -589,6 +600,9 @@ struct cftype { */ struct cgroup_subsys { struct cgroup_subsys_state *(*css_alloc)(struct cgroup_subsys_state *parent_css); + struct cgroup_subsys_state *(*device_css_alloc)(struct device *device, + struct cgroup_subsys_state *cgroup_css, + struct cgroup_subsys_state *parent_device_css); int (*css_online)(struct cgroup_subsys_state *css); void (*css_offline)(struct cgroup_subsys_state *css); void (*css_released)(struct cgroup_subsys_state *css); @@ -636,6 +650,13 @@ struct cgroup_subsys { */ bool threaded:1; + /* + * If %true, the controller supports device drivers to register + * with this controller for cloning the cgroup functionality + * into per-device cgroup state under .dev//. + */ + bool allow_devices:1; + /* * If %false, this subsystem is properly hierarchical - * configuration, resource accounting and restriction on a parent @@ -664,6 +685,13 @@ struct cgroup_subsys { /* idr for css->id */ struct idr css_idr; + /* + * IDR of registered devices, allows subsys_state to have state + * for each device. Exposed as per-device entries in filesystem, + * under .device//. + */ + struct idr device_idr; + /* * List of cftypes. Each entry is the first entry of an array * terminated by zero length name. diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 81f58b4a5418..3531bf948703 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -116,6 +116,9 @@ int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry); int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *tsk); +int cgroup_device_register(struct cgroup_subsys *ss, struct device *dev, + unsigned long *dev_id); +void cgroup_device_unregister(struct cgroup_subsys *ss, unsigned long dev_id); void cgroup_fork(struct task_struct *p); extern int cgroup_can_fork(struct task_struct *p); extern void cgroup_cancel_fork(struct task_struct *p); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 3f2b4bde0f9c..9b035e728941 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -598,6 +598,8 @@ struct cgroup_subsys_state *of_css(struct kernfs_open_file *of) struct cgroup *cgrp = of->kn->parent->priv; struct cftype *cft = of_cft(of); + /* FIXME this needs updating to lookup device-specific CSS */ + /* * This is open and unprotected implementation of cgroup_css(). * seq_css() is only called from a kernfs file operation which has @@ -1583,14 +1585,15 @@ struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn, bool drain_offline) return NULL; } -static void cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft) +static void cgroup_rm_file(struct cgroup_subsys_state *css, struct cgroup *cgrp, + const struct cftype *cft) { char name[CGROUP_FILE_NAME_MAX]; + struct kernfs_node *dest_kn; lockdep_assert_held(&cgroup_mutex); if (cft->file_offset) { - struct cgroup_subsys_state *css = cgroup_css(cgrp, cft->ss); struct cgroup_file *cfile = (void *)css + cft->file_offset; spin_lock_irq(&cgroup_file_kn_lock); @@ -1600,6 +1603,7 @@ static void cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft) del_timer_sync(&cfile->notify_timer); } + dest_kn = (css->device) ? css->device_kn : cgrp->kn; kernfs_remove_by_name(cgrp->kn, cgroup_file_name(cgrp, cft, name)); } @@ -1630,10 +1634,49 @@ static void css_clear_dir(struct cgroup_subsys_state *css) } } +static int cgroup_device_mkdir(struct cgroup_subsys_state *css) +{ + struct cgroup_subsys_state *device_css; + struct cgroup *cgrp = css->cgroup; + char name[CGROUP_FILE_NAME_MAX]; + struct kernfs_node *kn; + int ret, dev_id; + + /* create subsys.device only if enabled in subsys and non-root cgroup */ + if (!css->ss->allow_devices || !cgroup_parent(cgrp)) + return 0; + + ret = strlcpy(name, css->ss->name, CGROUP_FILE_NAME_MAX); + ret += strlcat(name, ".device", CGROUP_FILE_NAME_MAX); + /* treat as non-error if truncation due to subsys name */ + if (WARN_ON_ONCE(ret >= CGROUP_FILE_NAME_MAX)) + return 0; + + kn = kernfs_create_dir(cgrp->kn, name, cgrp->kn->mode, cgrp); + if (IS_ERR(kn)) + return PTR_ERR(kn); + css->device_kn = kn; + + /* create subdirectory per each registered device */ + idr_for_each_entry(&css->device_css_idr, device_css, dev_id) { + /* FIXME: prefix dev_name with bus_name for uniqueness? */ + kn = kernfs_create_dir(css->device_kn, + dev_name(device_css->device), + cgrp->kn->mode, cgrp); + if (IS_ERR(kn)) + return PTR_ERR(kn); + /* FIXME: kernfs_get needed here? */ + device_css->device_kn = kn; + } + + return 0; +} + /** * css_populate_dir - create subsys files in a cgroup directory * @css: target css * + * Creates per-device directories if enabled in subsys. * On failure, no file is added. */ static int css_populate_dir(struct cgroup_subsys_state *css) @@ -1655,6 +1698,10 @@ static int css_populate_dir(struct cgroup_subsys_state *css) if (ret < 0) return ret; } else { + ret = cgroup_device_mkdir(css); + if (ret < 0) + return ret; + list_for_each_entry(cfts, &css->ss->cfts, node) { ret = cgroup_addrm_files(css, cgrp, cfts, true); if (ret < 0) { @@ -1673,6 +1720,7 @@ static int css_populate_dir(struct cgroup_subsys_state *css) break; cgroup_addrm_files(css, cgrp, cfts, false); } + /* FIXME: per-device files will be removed by kernfs_destroy_root? */ return ret; } @@ -3665,14 +3713,15 @@ static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp, struct cftype *cft) { char name[CGROUP_FILE_NAME_MAX]; - struct kernfs_node *kn; + struct kernfs_node *kn, *dest_kn; struct lock_class_key *key = NULL; int ret; #ifdef CONFIG_DEBUG_LOCK_ALLOC key = &cft->lockdep_key; #endif - kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name), + dest_kn = (css->device) ? css->device_kn : cgrp->kn; + kn = __kernfs_create_file(dest_kn, cgroup_file_name(cgrp, cft, name), cgroup_file_mode(cft), GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0, cft->kf_ops, cft, @@ -3709,15 +3758,13 @@ static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp, * Depending on @is_add, add or remove files defined by @cfts on @cgrp. * For removals, this function never fails. */ -static int cgroup_addrm_files(struct cgroup_subsys_state *css, - struct cgroup *cgrp, struct cftype cfts[], - bool is_add) +static int __cgroup_addrm_files(struct cgroup_subsys_state *css, + struct cgroup *cgrp, struct cftype cfts[], + bool is_add) { struct cftype *cft, *cft_end = NULL; int ret = 0; - lockdep_assert_held(&cgroup_mutex); - restart: for (cft = cfts; cft != cft_end && cft->name[0] != '\0'; cft++) { /* does cft->flags tell us to skip this file on @cgrp? */ @@ -3741,12 +3788,43 @@ static int cgroup_addrm_files(struct cgroup_subsys_state *css, goto restart; } } else { - cgroup_rm_file(cgrp, cft); + cgroup_rm_file(css, cgrp, cft); } } return ret; } +static int cgroup_addrm_files(struct cgroup_subsys_state *css, + struct cgroup *cgrp, struct cftype cfts[], + bool is_add) +{ + struct cgroup_subsys_state *device_css, *device_css_end = NULL; + int dev_id, ret, err = 0; + + lockdep_assert_held(&cgroup_mutex); +restart: + ret = __cgroup_addrm_files(css, cgrp, cfts, is_add); + if (ret) + return ret; + + /* repeat addrm for each device */ + idr_for_each_entry(&css->device_css_idr, device_css, dev_id) { + if (device_css == device_css_end) + break; + ret = __cgroup_addrm_files(device_css, cgrp, cfts, is_add); + if (ret && !is_add) { + return ret; + } else if (ret) { + is_add = false; + device_css_end = device_css; + err = ret; + goto restart; + } + } + + return err; +} + static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add) { struct cgroup_subsys *ss = cfts[0].ss; @@ -4711,9 +4789,14 @@ static void css_free_rwork_fn(struct work_struct *work) if (ss) { /* css free path */ - struct cgroup_subsys_state *parent = css->parent; - int id = css->id; + struct cgroup_subsys_state *device_css, *parent = css->parent; + int dev_id, id = css->id; + idr_for_each_entry(&css->device_css_idr, device_css, dev_id) { + css_put(device_css->parent); + ss->css_free(device_css); + } + idr_destroy(&css->device_css_idr); ss->css_free(css); cgroup_idr_remove(&ss->css_idr, id); cgroup_put(cgrp); @@ -4833,6 +4916,7 @@ static void init_and_link_css(struct cgroup_subsys_state *css, INIT_LIST_HEAD(&css->rstat_css_node); css->serial_nr = css_serial_nr_next++; atomic_set(&css->online_cnt, 0); + idr_init(&css->device_css_idr); if (cgroup_parent(cgrp)) { css->parent = cgroup_css(cgroup_parent(cgrp), ss); @@ -4885,6 +4969,79 @@ static void offline_css(struct cgroup_subsys_state *css) wake_up_all(&css->cgroup->offline_waitq); } +/* + * Associates a device with a css. + * Create a new device-specific css and insert into @css->device_css_idr. + * Acquires a references on @css, which is released when the device is + * dissociated with this css. + */ +static int cgroup_add_device(struct cgroup_subsys_state *css, + struct device *dev, int dev_id) +{ + struct cgroup_subsys *ss = css->ss; + struct cgroup_subsys_state *dev_css, *dev_parent_css; + int err; + + lockdep_assert_held(&cgroup_mutex); + + /* don't add devices at root cgroup level */ + if (!css->parent) + return -EINVAL; + + dev_parent_css = idr_find(&css->parent->device_css_idr, dev_id); + dev_css = ss->device_css_alloc(dev, css, dev_parent_css); + if (IS_ERR_OR_NULL(dev_css)) { + if (!dev_css) + return -ENOMEM; + if (IS_ERR(dev_css)) + return PTR_ERR(dev_css); + } + + /* store per-device css pointer in the cgroup's css */ + err = idr_alloc(&css->device_css_idr, dev_css, dev_id, + dev_id + 1, GFP_KERNEL); + if (err < 0) { + ss->css_free(dev_css); + return err; + } + + dev_css->device = dev; + dev_css->parent = dev_parent_css; + /* + * subsys per-device support is allowed to access cgroup subsys_state + * using cgroup.self. Increment reference on css so it remains valid + * as long as device is associated with it. + */ + dev_css->cgroup = css->cgroup; + dev_css->ss = css->ss; + css_get(css); + + return 0; +} + +/* + * For a new cgroup css, create device-specific css for each device which + * which has registered itself with the subsys. + */ +static int cgroup_add_devices(struct cgroup_subsys_state *css) +{ + struct device *dev; + int dev_id, err = 0; + + /* ignore adding devices for root cgroups */ + if (!css->parent) + return 0; + + /* create per-device css for each associated device */ + idr_for_each_entry(&css->ss->device_idr, dev, dev_id) { + err = cgroup_add_device(css, dev, dev_id); + if (err) + break; + } + + return err; +} + /** * css_create - create a cgroup_subsys_state * @cgrp: the cgroup new css will be associated with @@ -4921,6 +5078,10 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, goto err_free_css; css->id = err; + err = cgroup_add_devices(css); + if (err) + goto err_free_css; + /* @css is ready to be brought online now, make it visible */ list_add_tail_rcu(&css->sibling, &parent_css->children); cgroup_idr_replace(&ss->css_idr, css, css->id); @@ -5337,6 +5498,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) mutex_lock(&cgroup_mutex); idr_init(&ss->css_idr); + idr_init(&ss->device_idr); INIT_LIST_HEAD(&ss->cfts); /* Create the root cgroup state for this subsystem */ @@ -5637,6 +5799,90 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, return retval; } +void cgroup_device_unregister(struct cgroup_subsys *ss, unsigned long dev_id) +{ + struct cgroup_subsys_state *css, *device_css; + int css_id; + + if (!ss->allow_devices) + return; + + mutex_lock(&cgroup_mutex); + idr_for_each_entry(&ss->css_idr, css, css_id) { + WARN_ON(css->device); + if (!css->parent || css->device) + continue; + device_css = idr_remove(&css->device_css_idr, dev_id); + if (device_css) { + /* FIXME kernfs_get/put needed to make safe? */ + if (device_css->device_kn) + kernfs_remove(device_css->device_kn); + css_put(device_css->parent); + ss->css_free(device_css); + } + } + idr_remove(&ss->device_idr, dev_id); + mutex_unlock(&cgroup_mutex); +} + +/** + * cgroup_device_register - associate a struct device with @ss + * @ss: the subsystem of interest + * @dev: the device of interest + * @dev_id: index into @ss idr returned + * + * Insert @dev into set of devices to be associated with this subsystem. + * As cgroups are created, subdirectories ./allow_devices) + return -EACCES; + + mutex_lock(&cgroup_mutex); + + id = idr_alloc_cyclic(&ss->device_idr, dev, 0, 0, GFP_KERNEL); + if (id < 0) { + mutex_unlock(&cgroup_mutex); + return id; + } + + idr_for_each_entry(&ss->css_idr, css, css_id) { + WARN_ON(css->device); + if (!css->parent || css->device) + continue; + err = cgroup_add_device(css, dev, id); + if (err) + break; + + if (css_visible(css)) { + /* FIXME - something more lightweight can be done? */ + css_clear_dir(css); + /* FIXME kernfs_get/put needed to make safe? */ + kernfs_remove(css->device_kn); + err = css_populate_dir(css); + if (err) + /* FIXME handle error case */ + err = 0; + else + kernfs_activate(css->cgroup->kn); + } + } + + if (!err) + *dev_id = id; + mutex_unlock(&cgroup_mutex); + + return err; +} + /** * cgroup_fork - initialize cgroup related fields during copy_process() * @child: pointer to task_struct of forking parent process. From patchwork Wed May 1 14:04:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Welty, Brian" X-Patchwork-Id: 10925105 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71D3B15A6 for ; Wed, 1 May 2019 14:03:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EE3228D34 for ; Wed, 1 May 2019 14:03:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5D1EB28E61; Wed, 1 May 2019 14:03:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C2D328D34 for ; Wed, 1 May 2019 14:03:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63B376B000A; Wed, 1 May 2019 10:03:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 577546B000C; Wed, 1 May 2019 10:03:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 465B36B000D; Wed, 1 May 2019 10:03:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 1065F6B000A for ; Wed, 1 May 2019 10:03:05 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id o1so10937820pgv.15 for ; Wed, 01 May 2019 07:03:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=7acaSsTP2QALuxWExVoR0p4F9gaaZz2fpAryeyJhftE=; b=PsDYwhT6L8CQwPyjyyLpFLCvfKTYTNAJwkXH892RC0NlIVQ3uOplPVJXznIFtw2HYU /M4ZXO16dRHwY8NGPSlCv2+LPmP8eDttnQHdEl0uxOCLzEZM5JrfJE5qmP60m9w0Q9UF 3zXmsK1evSzK/Op9hfJN4xcJLgmT93I+rnp3UVdyE4hxlnA06ZJoQs+/FtT6LDpRJMnq 6gcQlxtaAgYtoV2/0YQopSOfjISl/obPBeu7eZDZog1r8ywqRdWSAdPabaUc64X/ApJU oGE5PIui/643FKE1SeUmT7YHgoRsgAz1b2h+Cb7B7g8angct2Z5nxpPh65bHqDF1YEPy YJqg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUI17kzQjhUPGSMsilZm88ByMATL1VzvuGHLlHFY4X7VycVD1Mu mtIjzTPKnLR9Jq7extv2rCy1jtzvII3/lqy9Vid8JzvLTSj5fhkDby9YgaZepAwzdtWp85I4iYa 17lqSgHQ7f2oxb2CJhoMf7VU1oiqcwL32yis3m8fIOyJzTGZq6wHryoo4aJu5cayKVw== X-Received: by 2002:a63:f115:: with SMTP id f21mr73297886pgi.65.1556719384623; Wed, 01 May 2019 07:03:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqwXbRDUbs5VWm7dKBmui270eCJ5kxjVwIkMecfqasEw/FSD5HJd6HUs7MxMqfltq3bcexhU X-Received: by 2002:a63:f115:: with SMTP id f21mr73297739pgi.65.1556719383307; Wed, 01 May 2019 07:03:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556719383; cv=none; d=google.com; s=arc-20160816; b=ud1EBwvCXOSbfXYQev4bfM9I+bmY778qvgmYuCpN5aWngmK9BAlzclnRMjyRc4cQM3 PkvnwXWzwrVnwnS85EpNnlA1tbQbA472bjFfnTlH+xMG/mG03jsHKtk29dKRKhWhjEkp ijsMIIGRTGwVryeIw6x4AgXpda5xN2I822gkCr7EUKTSrAGtA0HJDz5L4B4dkTBxnd0i 3Z2sHkWY3nByaKrPeZ1RSPg+O2+auCT0JEclyjTX+teA2PdwkeWeOc0p53nKRnSg1ViM edfTMgTqLBXpctFscVClJGzITWpzq7OPQk0uXBL025Ve28dSNM66nWjCNGWOXPTQ+vw/ rsXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from; bh=7acaSsTP2QALuxWExVoR0p4F9gaaZz2fpAryeyJhftE=; b=0hG4H4uDjGFv6ZZos32/7TJl00lAM2jlZt6tsNBUx1uY6Jbp/8Ah9b8FEcUgHr0yPc HJdvPuurLq1he7fAZZ01NrSD8VupMKBcBcWFvKhxFnKeKVlTlycuNYDM6O3KQJ1IygCo 5FmR4N4RKZANQlZoLTcFRAjkCPIzNaxnbFzny/+4IfSTjr9bccKxpR1/oM1f7geo/7lm tqvVXFfyBQ2JxREMTrAVGmCbY4WE0c9l1eOf6wrDMQkbho0UccX90iQFUawg+h9qTXHK od+iIfjQt4bOIcsrpJgWXAFo/JoLJYungkdoCUozu8S8a8Q9nplcW5ozWR9/UIOONoYV bYEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id q25si38000486pgv.534.2019.05.01.07.03.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 May 2019 07:03:03 -0700 (PDT) Received-SPF: pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2019 07:03:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,417,1549958400"; d="scan'208";a="145141398" Received: from nperf12.hd.intel.com ([10.127.88.161]) by fmsmga008.fm.intel.com with ESMTP; 01 May 2019 07:03:01 -0700 From: Brian Welty To: cgroups@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , linux-mm@kvack.org, Michal Hocko , Vladimir Davydov , dri-devel@lists.freedesktop.org, David Airlie , Daniel Vetter , intel-gfx@lists.freedesktop.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , =?utf-8?q?Christian_K=C3=B6nig?= , Alex Deucher , ChunMing Zhou , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= Subject: [RFC PATCH 2/5] cgroup: Change kernfs_node for directories to store cgroup_subsys_state Date: Wed, 1 May 2019 10:04:35 -0400 Message-Id: <20190501140438.9506-3-brian.welty@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190501140438.9506-1-brian.welty@intel.com> References: <20190501140438.9506-1-brian.welty@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Change the kernfs_node.priv to store the cgroup_subsys_state (CSS) pointer for directories, instead of storing cgroup pointer. This is done in order to support files within the cgroup associated with devices. We require of_css() to return the device-specific CSS pointer for these files. Cc: cgroups@vger.kernel.org Signed-off-by: Brian Welty --- kernel/cgroup/cgroup-v1.c | 10 ++++---- kernel/cgroup/cgroup.c | 48 +++++++++++++++++---------------------- 2 files changed, 27 insertions(+), 31 deletions(-) diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index c126b34fd4ff..4fa56cc2b99c 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -723,6 +723,7 @@ int proc_cgroupstats_show(struct seq_file *m, void *v) int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry) { struct kernfs_node *kn = kernfs_node_from_dentry(dentry); + struct cgroup_subsys_state *css; struct cgroup *cgrp; struct css_task_iter it; struct task_struct *tsk; @@ -740,12 +741,13 @@ int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry) * @kn->priv is RCU safe. Let's do the RCU dancing. */ rcu_read_lock(); - cgrp = rcu_dereference(*(void __rcu __force **)&kn->priv); - if (!cgrp || cgroup_is_dead(cgrp)) { + css = rcu_dereference(*(void __rcu __force **)&kn->priv); + if (!css || cgroup_is_dead(css->cgroup)) { rcu_read_unlock(); mutex_unlock(&cgroup_mutex); return -ENOENT; } + cgrp = css->cgroup; rcu_read_unlock(); css_task_iter_start(&cgrp->self, 0, &it); @@ -851,7 +853,7 @@ void cgroup1_release_agent(struct work_struct *work) static int cgroup1_rename(struct kernfs_node *kn, struct kernfs_node *new_parent, const char *new_name_str) { - struct cgroup *cgrp = kn->priv; + struct cgroup_subsys_state *css = kn->priv; int ret; if (kernfs_type(kn) != KERNFS_DIR) @@ -871,7 +873,7 @@ static int cgroup1_rename(struct kernfs_node *kn, struct kernfs_node *new_parent ret = kernfs_rename(kn, new_parent, new_name_str); if (!ret) - TRACE_CGROUP_PATH(rename, cgrp); + TRACE_CGROUP_PATH(rename, css->cgroup); mutex_unlock(&cgroup_mutex); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 9b035e728941..1fe4fee502ea 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -595,12 +595,13 @@ static void cgroup_get_live(struct cgroup *cgrp) struct cgroup_subsys_state *of_css(struct kernfs_open_file *of) { - struct cgroup *cgrp = of->kn->parent->priv; + struct cgroup_subsys_state *css = of->kn->parent->priv; struct cftype *cft = of_cft(of); - /* FIXME this needs updating to lookup device-specific CSS */ - /* + * If the cft specifies a subsys and this is not a device file, + * then lookup the css, otherwise it is already correct. + * * This is open and unprotected implementation of cgroup_css(). * seq_css() is only called from a kernfs file operation which has * an active reference on the file. Because all the subsystem @@ -608,10 +609,9 @@ struct cgroup_subsys_state *of_css(struct kernfs_open_file *of) * the matching css from the cgroup's subsys table is guaranteed to * be and stay valid until the enclosing operation is complete. */ - if (cft->ss) - return rcu_dereference_raw(cgrp->subsys[cft->ss->id]); - else - return &cgrp->self; + if (cft->ss && !css->device) + css = rcu_dereference_raw(css->cgroup->subsys[cft->ss->id]); + return css; } EXPORT_SYMBOL_GPL(of_css); @@ -1524,12 +1524,14 @@ static u16 cgroup_calc_subtree_ss_mask(u16 subtree_control, u16 this_ss_mask) */ void cgroup_kn_unlock(struct kernfs_node *kn) { + struct cgroup_subsys_state *css; struct cgroup *cgrp; if (kernfs_type(kn) == KERNFS_DIR) - cgrp = kn->priv; + css = kn->priv; else - cgrp = kn->parent->priv; + css = kn->parent->priv; + cgrp = css->cgroup; mutex_unlock(&cgroup_mutex); @@ -1556,12 +1558,14 @@ void cgroup_kn_unlock(struct kernfs_node *kn) */ struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn, bool drain_offline) { + struct cgroup_subsys_state *css; struct cgroup *cgrp; if (kernfs_type(kn) == KERNFS_DIR) - cgrp = kn->priv; + css = kn->priv; else - cgrp = kn->parent->priv; + css = kn->parent->priv; + cgrp = css->cgroup; /* * We're gonna grab cgroup_mutex which nests outside kernfs @@ -1652,7 +1656,7 @@ static int cgroup_device_mkdir(struct cgroup_subsys_state *css) if (WARN_ON_ONCE(ret >= CGROUP_FILE_NAME_MAX)) return 0; - kn = kernfs_create_dir(cgrp->kn, name, cgrp->kn->mode, cgrp); + kn = kernfs_create_dir(cgrp->kn, name, cgrp->kn->mode, css); if (IS_ERR(kn)) return PTR_ERR(kn); css->device_kn = kn; @@ -1662,7 +1666,7 @@ static int cgroup_device_mkdir(struct cgroup_subsys_state *css) /* FIXME: prefix dev_name with bus_name for uniqueness? */ kn = kernfs_create_dir(css->device_kn, dev_name(device_css->device), - cgrp->kn->mode, cgrp); + cgrp->kn->mode, device_css); if (IS_ERR(kn)) return PTR_ERR(kn); /* FIXME: kernfs_get needed here? */ @@ -2025,7 +2029,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) root->kf_root = kernfs_create_root(kf_sops, KERNFS_ROOT_CREATE_DEACTIVATED | KERNFS_ROOT_SUPPORT_EXPORTOP, - root_cgrp); + &root_cgrp->self); if (IS_ERR(root->kf_root)) { ret = PTR_ERR(root->kf_root); goto exit_root_id; @@ -3579,9 +3583,9 @@ static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { struct cgroup_namespace *ns = current->nsproxy->cgroup_ns; - struct cgroup *cgrp = of->kn->parent->priv; + struct cgroup_subsys_state *css = of_css(of); struct cftype *cft = of->kn->priv; - struct cgroup_subsys_state *css; + struct cgroup *cgrp = css->cgroup; int ret; /* @@ -3598,16 +3602,6 @@ static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, if (cft->write) return cft->write(of, buf, nbytes, off); - /* - * kernfs guarantees that a file isn't deleted with operations in - * flight, which means that the matching css is and stays alive and - * doesn't need to be pinned. The RCU locking is not necessary - * either. It's just for the convenience of using cgroup_css(). - */ - rcu_read_lock(); - css = cgroup_css(cgrp, cft->ss); - rcu_read_unlock(); - if (cft->write_u64) { unsigned long long v; ret = kstrtoull(buf, 0, &v); @@ -5262,7 +5256,7 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode) } /* create the directory */ - kn = kernfs_create_dir(parent->kn, name, mode, cgrp); + kn = kernfs_create_dir(parent->kn, name, mode, &cgrp->self); if (IS_ERR(kn)) { ret = PTR_ERR(kn); goto out_destroy; From patchwork Wed May 1 14:04:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Welty, Brian" X-Patchwork-Id: 10925111 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C91A81390 for ; Wed, 1 May 2019 14:03:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B498D28E42 for ; Wed, 1 May 2019 14:03:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A7E9428D34; Wed, 1 May 2019 14:03:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D67928BC1 for ; Wed, 1 May 2019 14:03:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68A666B000C; Wed, 1 May 2019 10:03:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 63E486B000D; Wed, 1 May 2019 10:03:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4DBD16B000E; Wed, 1 May 2019 10:03:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 13CE06B000C for ; Wed, 1 May 2019 10:03:07 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id d1so10929590pgk.21 for ; Wed, 01 May 2019 07:03:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=A7SFuJbJ0Uz+80lz/JGgS9jWr30zBIqu8J3bqFaW1VQ=; b=Rp6itI97yvFtc5xayfXv7n4nOPJsutx4U+rVu52slqnpE5FYUPBvPedChhRNUn5GlQ jyy2Z+yawIjKNlFXiD3ENyNdAlu2cKhGvDptqiE9lNMwLmKDry89DbB3PsMUreIJQnV5 pFQJxqTj1g9cuwo1Ep/sxdAn7qxyMVvYQbmFt5gGCGtyEWnakBAeYsfEdZGeqCHB4qMK h3NHHnae1s7RFs7vSj5R8ZKQPddlcLA9vLSo2do4G61UHr+22UY2XTkLwfzbPu4B303i vuw7HPUJCzA8rEWQvJi7GCw/LjBfZdxrbNijY/CSwZjAzKQNRqwHOSOUWbA3HcCy5IPC GoAg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAV5Ykiac6QsJ/81YsK32YClkkixg39kiHa5KS1f0DHsvjgN9oXG 0VBFW0RklDzMHi/B9YqvwOYuBc4gH3NQbHEJqJ7/Ab/7UAISJndTUdZPDdcRnt1oeWUZ+vflfMr Ozygzicj+KtpdogWBXFUC9f+o0TDj5tlF4UHH5afkGa6NFD5j91cxgKhL0JMWXldpbg== X-Received: by 2002:a17:902:4101:: with SMTP id e1mr79389527pld.25.1556719386667; Wed, 01 May 2019 07:03:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqzsV3MP65nzKxToax1oFV4Ytd5XP+9AuC2WOQ0B3+IWY1clVtyUMq8otdanxqAXIPAz2Ozx X-Received: by 2002:a17:902:4101:: with SMTP id e1mr79389304pld.25.1556719384902; Wed, 01 May 2019 07:03:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556719384; cv=none; d=google.com; s=arc-20160816; b=CK8y8Zx6UPaPdMQ5tU+tcMTsGQ/YTohbVAh2084MhUjXkkLnpY2acHHRF9zZwa0EEQ yE0Wlve77UQ8TkTwuqyENuJZwaiufBIaXbEd6sDqezUx2D+OWgE4behy8Hdy76Oug9IX 22CPV2IotbxE9n59UyJvwwUp24SX1R3VQnXxaca0kCn04rOwN2PoB9kO5/X2Wpje4Mf9 JezFk8kqoH1svKw1UOGaFr0VmF5lf+B4ilasXDSe+JzVBf05akfegtpHqxzRPXba+Tdg IMTKgW6c+a40Sx4CLRQLKh8jq4YCSBuMS3jC3GQMbsdal2uGnGd9ytmREiSY4HNC9iPV emMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from; bh=A7SFuJbJ0Uz+80lz/JGgS9jWr30zBIqu8J3bqFaW1VQ=; b=l6w4PU8dCb/pvUVbBJ+iop4LmyCrAzUA/Ca9Oj7MZ/z8GTkdDRu3SrsyywaL4rL4yC Qq7gDtrmqYa48DWWe9r7hRLtEDU4QDPFwA1QoTfe4XWKyei/itMS130Ia+/fw9hhcJl/ ccMcnuPH3/znjFg3cD/2u7JMu0kmzy6XgjDXhqmBHkcQLV27AChTSvWqH9YN8yb1tTQU UCUvI+YUOQiOe3J6Mfxft36bGcsO8pxhVJFDDFbDt+KcToJsOVDUOdmKmGfsDmwwCIUc dSt/ppdXGyPaMKR7auXCm9810G73HYMTTxiXuHZD9FR5M47PVJSI2Ahpm26034BjD0En Nksg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id q25si38000486pgv.534.2019.05.01.07.03.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 May 2019 07:03:04 -0700 (PDT) Received-SPF: pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2019 07:03:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,417,1549958400"; d="scan'208";a="145141406" Received: from nperf12.hd.intel.com ([10.127.88.161]) by fmsmga008.fm.intel.com with ESMTP; 01 May 2019 07:03:03 -0700 From: Brian Welty To: cgroups@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , linux-mm@kvack.org, Michal Hocko , Vladimir Davydov , dri-devel@lists.freedesktop.org, David Airlie , Daniel Vetter , intel-gfx@lists.freedesktop.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , =?utf-8?q?Christian_K=C3=B6nig?= , Alex Deucher , ChunMing Zhou , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= Subject: [RFC PATCH 3/5] memcg: Add per-device support to memory cgroup subsystem Date: Wed, 1 May 2019 10:04:36 -0400 Message-Id: <20190501140438.9506-4-brian.welty@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190501140438.9506-1-brian.welty@intel.com> References: <20190501140438.9506-1-brian.welty@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Here we update memory cgroup to enable the newly introduced per-device framework. As mentioned in the prior patch, the intent here is to allow drivers to have their own private cgroup controls (such as memory limit) to be applied to device resources instead of host system resources. In summary, to enable device registration for memory cgroup subsystem: * set .allow_devices to true * add new exported device register and device unregister functions to register a device with the cgroup subsystem * implement the .device_css_alloc callback to create device specific cgroups_subsys_state within a cgroup As cgroup is created and for current registered devices, one will see in the cgroup filesystem these additional files: mount//memory.devices// Registration of a new device is performed in device drivers using new mem_cgroup_device_register(). This will create above files in existing cgroups. And for runtime charging to the cgroup, we add the following: * add new routine to lookup the device-specific cgroup_subsys_state which is within the task's cgroup (mem_cgroup_device_from_task) * add new functions for device specific 'direct' charging The last point above involves adding new mem_cgroup_try_charge_direct and mem_cgroup_uncharge_direct functions. The 'direct' name is to say that we are charging the specified cgroup state directly and not using any associated page or mm_struct. We are called within device specific memory management routines, where the device driver will track which cgroup to charge within its own private data structures. With this initial submission, support for memory accounting and charging is functional. Nested cgroups will correctly maintain the parent for device-specific state as well, such that hierarchial charging to device files is supported. Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: dri-devel@lists.freedesktop.org Cc: Matt Roper Signed-off-by: Brian Welty --- include/linux/memcontrol.h | 10 ++ mm/memcontrol.c | 183 ++++++++++++++++++++++++++++++++++--- 2 files changed, 178 insertions(+), 15 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index dbb6118370c1..711669b613dc 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -348,6 +348,11 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg, bool compound); void mem_cgroup_uncharge(struct page *page); void mem_cgroup_uncharge_list(struct list_head *page_list); +/* direct charging to mem_cgroup is primarily for device driver usage */ +int mem_cgroup_try_charge_direct(struct mem_cgroup *memcg, + unsigned long nr_pages); +void mem_cgroup_uncharge_direct(struct mem_cgroup *memcg, + unsigned long nr_pages); void mem_cgroup_migrate(struct page *oldpage, struct page *newpage); @@ -395,6 +400,11 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); +struct mem_cgroup *mem_cgroup_device_from_task(unsigned long id, + struct task_struct *p); +int mem_cgroup_device_register(struct device *dev, unsigned long *dev_id); +void mem_cgroup_device_unregister(unsigned long dev_id); + struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); struct mem_cgroup *get_mem_cgroup_from_page(struct page *page); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 81a0d3914ec9..2c8407aed0f5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -823,6 +823,47 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) } EXPORT_SYMBOL(mem_cgroup_from_task); +int mem_cgroup_device_register(struct device *dev, unsigned long *dev_id) +{ + return cgroup_device_register(&memory_cgrp_subsys, dev, dev_id); +} +EXPORT_SYMBOL(mem_cgroup_device_register); + +void mem_cgroup_device_unregister(unsigned long dev_id) +{ + cgroup_device_unregister(&memory_cgrp_subsys, dev_id); +} +EXPORT_SYMBOL(mem_cgroup_device_unregister); + +/** + * mem_cgroup_device_from_task: Lookup device-specific memcg + * @id: device-specific id returned from mem_cgroup_device_register + * @p: task to lookup the memcg + * + * First use mem_cgroup_from_task to lookup and obtain a reference on + * the memcg associated with this task @p. Within this memcg, find the + * device-specific one associated with @id. + * However if mem_cgroup is disabled, NULL is returned. + */ +struct mem_cgroup *mem_cgroup_device_from_task(unsigned long id, + struct task_struct *p) +{ + struct mem_cgroup *memcg; + struct mem_cgroup *dev_memcg = NULL; + + if (mem_cgroup_disabled()) + return NULL; + + rcu_read_lock(); + memcg = mem_cgroup_from_task(p); + if (memcg) + dev_memcg = idr_find(&memcg->css.device_css_idr, id); + rcu_read_unlock(); + + return dev_memcg; +} +EXPORT_SYMBOL(mem_cgroup_device_from_task); + /** * get_mem_cgroup_from_mm: Obtain a reference on given mm_struct's memcg. * @mm: mm from which memcg should be extracted. It can be NULL. @@ -2179,13 +2220,31 @@ void mem_cgroup_handle_over_high(void) current->memcg_nr_pages_over_high = 0; } +static bool __try_charge(struct mem_cgroup *memcg, unsigned int nr_pages, + struct mem_cgroup **mem_over_limit) +{ + struct page_counter *counter; + + if (!do_memsw_account() || + page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { + if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) + return true; + if (do_memsw_account()) + page_counter_uncharge(&memcg->memsw, nr_pages); + *mem_over_limit = mem_cgroup_from_counter(counter, memory); + } else { + *mem_over_limit = mem_cgroup_from_counter(counter, memsw); + } + + return false; +} + static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; - struct page_counter *counter; unsigned long nr_reclaimed; bool may_swap = true; bool drained = false; @@ -2198,17 +2257,10 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (consume_stock(memcg, nr_pages)) return 0; - if (!do_memsw_account() || - page_counter_try_charge(&memcg->memsw, batch, &counter)) { - if (page_counter_try_charge(&memcg->memory, batch, &counter)) - goto done_restock; - if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, batch); - mem_over_limit = mem_cgroup_from_counter(counter, memory); - } else { - mem_over_limit = mem_cgroup_from_counter(counter, memsw); - may_swap = false; - } + if (__try_charge(memcg, batch, &mem_over_limit)) + goto done_restock; + else + may_swap = !do_memsw_account(); if (batch > nr_pages) { batch = nr_pages; @@ -2892,6 +2944,9 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) { int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + if (memcg->css.device) + return 0; + /* we call try-to-free pages for make this cgroup empty */ lru_add_drain_all(); @@ -4496,7 +4551,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) } static struct cgroup_subsys_state * __ref -mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) +__mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css, bool is_device) { struct mem_cgroup *parent = mem_cgroup_from_css(parent_css); struct mem_cgroup *memcg; @@ -4530,11 +4585,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) * much sense so let cgroup subsystem know about this * unfortunate state in our controller. */ - if (parent != root_mem_cgroup) + if (!is_device && parent != root_mem_cgroup) memory_cgrp_subsys.broken_hierarchy = true; } - /* The following stuff does not apply to the root */ + /* The following stuff does not apply to devices or the root */ + if (is_device) + return &memcg->css; if (!parent) { root_mem_cgroup = memcg; return &memcg->css; @@ -4554,6 +4611,34 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) return ERR_PTR(-ENOMEM); } +static struct cgroup_subsys_state * __ref +mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) +{ + return __mem_cgroup_css_alloc(parent_css, false); +} + +/* + * For given @cgroup_css, we create and return new device-specific css. + * + * @device and @cgroup_css are unused here, but they are provided as other + * cgroup subsystems might require them. + */ +static struct cgroup_subsys_state * __ref +mem_cgroup_device_css_alloc(struct device *device, + struct cgroup_subsys_state *cgroup_css, + struct cgroup_subsys_state *parent_device_css) +{ + /* + * For hierarchial page counters to work correctly, we specify + * parent here as the device-specific css from our parent css + * (@parent_device_css). In other words, for nested cgroups, + * the device-specific charging structures are also nested. + * Note, caller will itself set .device and .parent in returned + * structure. + */ + return __mem_cgroup_css_alloc(parent_device_css, true); +} + static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); @@ -4613,6 +4698,9 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); + if (css->device) + goto free_cgrp; + if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket) static_branch_dec(&memcg_sockets_enabled_key); @@ -4624,6 +4712,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css) mem_cgroup_remove_from_trees(memcg); memcg_free_shrinker_maps(memcg); memcg_free_kmem(memcg); +free_cgrp: mem_cgroup_free(memcg); } @@ -5720,6 +5809,7 @@ static struct cftype memory_files[] = { struct cgroup_subsys memory_cgrp_subsys = { .css_alloc = mem_cgroup_css_alloc, + .device_css_alloc = mem_cgroup_device_css_alloc, .css_online = mem_cgroup_css_online, .css_offline = mem_cgroup_css_offline, .css_released = mem_cgroup_css_released, @@ -5732,6 +5822,7 @@ struct cgroup_subsys memory_cgrp_subsys = { .dfl_cftypes = memory_files, .legacy_cftypes = mem_cgroup_legacy_files, .early_init = 0, + .allow_devices = true, }; /** @@ -6031,6 +6122,68 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg, cancel_charge(memcg, nr_pages); } +/** + * mem_cgroup_try_charge_direct - try charging nr_pages to memcg + * @memcg: memcgto charge + * @nr_pages: number of pages to charge + * + * Try to charge @nr_pages to specified @memcg. This variant is intended + * where the memcg is known and can be directly charged, with the primary + * use case being in device drivers that have registered with the subsys. + * Device drivers that implement their own device-specific memory manager + * will use these direct charging functions to make charges against their + * device-private state (@memcg) within the cgroup. + * + * There is no separate mem_cgroup_commit_charge() in this use case, as the + * device driver is not using page structs. Reclaim is not needed internally + * here, as the caller can decide to attempt memory reclaim on error. + * + * Returns 0 on success. Otherwise, an error code is returned. + * + * To uncharge (or cancel charge), call mem_cgroup_uncharge_direct(). + */ +int mem_cgroup_try_charge_direct(struct mem_cgroup *memcg, + unsigned long nr_pages) +{ + struct mem_cgroup *mem_over_limit; + int ret = 0; + + if (!memcg || mem_cgroup_disabled() || mem_cgroup_is_root(memcg)) + return 0; + + if (__try_charge(memcg, nr_pages, &mem_over_limit)) { + css_get_many(&memcg->css, nr_pages); + } else { + memcg_memory_event(mem_over_limit, MEMCG_MAX); + ret = -ENOMEM; + } + return ret; +} +EXPORT_SYMBOL(mem_cgroup_try_charge_direct); + +/** + * mem_cgroup_uncharge_direct - uncharge nr_pages to memcg + * @memcg: memcg to charge + * @nr_pages: number of pages to charge + * + * Uncharge @nr_pages to specified @memcg. This variant is intended + * where the memcg is known and can directly uncharge, with the primary + * use case being in device drivers that have registered with the subsys. + * Device drivers use these direct charging functions to make charges + * against their device-private state (@memcg) within the cgroup. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +void mem_cgroup_uncharge_direct(struct mem_cgroup *memcg, + unsigned long nr_pages) +{ + if (!memcg || mem_cgroup_disabled()) + return; + + cancel_charge(memcg, nr_pages); +} +EXPORT_SYMBOL(mem_cgroup_uncharge_direct); + struct uncharge_gather { struct mem_cgroup *memcg; unsigned long pgpgout; From patchwork Wed May 1 14:04:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Welty, Brian" X-Patchwork-Id: 10925121 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F00CB15A6 for ; Wed, 1 May 2019 14:03:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD72728E1D for ; Wed, 1 May 2019 14:03:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D1B6728D34; Wed, 1 May 2019 14:03:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17DF828CFE for ; Wed, 1 May 2019 14:03:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 799CF6B000D; Wed, 1 May 2019 10:03:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 74E136B000E; Wed, 1 May 2019 10:03:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EFC56B0010; Wed, 1 May 2019 10:03:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 2773C6B000D for ; Wed, 1 May 2019 10:03:08 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id v9so10927495pgg.8 for ; Wed, 01 May 2019 07:03:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=hyRjW8OzW6fFuYKbYW6yJA2KHI9E5p1014ZxcquzZpU=; b=aBOypPdvtu9dnCvwzPz2rqKhtx9s54pYtLfq7LY8aivNuha7VCJX6ok9zXHEGdlL+B /ub/rKa6jV5PWj/2OqL3lXz81M9GzHv5J7fhfiNSz3hRCMo4m49vKuT8rJo6f9LRz9OY 6veyYF0C8j1YZUjRi/gI45tXir7vzMYwViy2t/hCiKduIHxuVwdzukOB4Mktp9iDqfGs fTZ5xAXGm/G21Pp60zpNnuXTgYsu1e5PugEHk2B2gi7ClUDDBFmZIUc3qc6ozb6tLgZl 9+Upv0PT/Pc078S8vLB1N9ZCjn1qBEeXoobLMdtly44FqZk4pj2xJxKH01siyrR+wish 4fPA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUOW37pbG/JRDAcW0XHEJlt5RQtN5Knxp/+QhqFmaSdkgcOBXi1 khOpz0p6vz+Sgb0VvKPMZAwC+01iAF6hltPYanWHEbPGhXXoYBtcW1pUyeSAg9NPZSHPQAfvWYP sIDzilCQ2Gh8GqHml4YNBBlwX+2V+YKZ2CZKbBuKD+b49KErkFE1HMcP8gvnbRILYGg== X-Received: by 2002:a17:902:784d:: with SMTP id e13mr78556472pln.152.1556719387735; Wed, 01 May 2019 07:03:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqwp+wYBdhlvSRzhtVBgs5/qXyTWGvMYZIaImrYInPKf73ToIRP/R6eM2FQRXSzm7fUWE0Yg X-Received: by 2002:a17:902:784d:: with SMTP id e13mr78556303pln.152.1556719386394; Wed, 01 May 2019 07:03:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556719386; cv=none; d=google.com; s=arc-20160816; b=CtGmqx2abtpvSp+8qDcY2XDD7Bw/bFKWcI0m4ngfBJMn6LNXNQD4FfL/wJqWpkQ78H PpWsafRyfcqQoQDDZkJqKsYmITTKKvMA1kvayOweF+//h2Is6ud/Kan3oi7gONI0Zwgx wm20icbkI+X1cEAF/k/M543FyS2Dpe1nUPTjzXGKjSVJAbr+JiwmXslJzAJzfW7jHcZP 6FNmtoYmAXnQ27gOzwWwDMA0ZLWY5Y5VaRR5wd1n55W8ErFTqv2eIVHxNUCS8/jVIQHi hm3Jbbs8RLdEPHosSDDes/GaT9vz0tIMyGd+CnabebkCOHN4CaybJ3PohtSFRim15T4i s+jQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from; bh=hyRjW8OzW6fFuYKbYW6yJA2KHI9E5p1014ZxcquzZpU=; b=donVruu/TGBBzGIIkEjZvJblyFCfh5MB4EcrJHl5pcai/STEt0bo3OzAhNEZzwqe31 gsyrAnLwVeM2ZdPNnEwZNr6GBcxOe484+AJJSNxmMtd17GUVvDW2nO249K7oEnnkCZYV Ldeb6moMoYarvRKB4ceAGt5YFNEuPlhk2Xjf1pM7vlWItFec6+JtihSRKvzD/kknjH+T k0L7gZq3/f82CDjzb0lyc3eKvAUtw3yUZmqDVkO6xdfISi0LgUKNJwUCBmMckoiX99Pe HrPP6u/MxssP2K+XktfHzmoMq6iglAFo0Te4TuLRK0IvFmicPEjaFeP7SVDGtVrxQoDC B8vA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id q25si38000486pgv.534.2019.05.01.07.03.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 May 2019 07:03:06 -0700 (PDT) Received-SPF: pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2019 07:03:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,417,1549958400"; d="scan'208";a="145141415" Received: from nperf12.hd.intel.com ([10.127.88.161]) by fmsmga008.fm.intel.com with ESMTP; 01 May 2019 07:03:04 -0700 From: Brian Welty To: cgroups@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , linux-mm@kvack.org, Michal Hocko , Vladimir Davydov , dri-devel@lists.freedesktop.org, David Airlie , Daniel Vetter , intel-gfx@lists.freedesktop.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , =?utf-8?q?Christian_K=C3=B6nig?= , Alex Deucher , ChunMing Zhou , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= Subject: [RFC PATCH 4/5] drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit Date: Wed, 1 May 2019 10:04:37 -0400 Message-Id: <20190501140438.9506-5-brian.welty@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190501140438.9506-1-brian.welty@intel.com> References: <20190501140438.9506-1-brian.welty@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With new cgroups per-device framework, registration with memory cgroup subsystem can allow us to enforce limit for allocation of device memory against process cgroups. This patch adds new driver feature bit, DRIVER_CGROUPS, such that DRM will register the device with cgroups. Doing so allows device drivers to charge memory allocations to device-specific state within the cgroup. Note, this is only for GEM objects allocated from device memory. Memory charging for GEM objects using system memory is already handled by the mm subsystem charing the normal (non-device) memory cgroup. To charge device memory allocations, we need to (1) identify appropriate cgroup to charge (currently decided at object creation time), and (2) make the charging call at the time that memory pages are being allocated. Above is one policy, and this is open for debate if this is the right choice. For (1), we associate the current task's cgroup with GEM objects as they are created. That cgroup will be charged/uncharged for all paging activity against the GEM object. Note, if the process is not part of a memory cgroup, then this returns NULL and no charging will occur. For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1]. For (2), this is for device drivers to implement within appropriate page allocation logic. [1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership" Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: dri-devel@lists.freedesktop.org Cc: Matt Roper Signed-off-by: Brian Welty --- drivers/gpu/drm/drm_drv.c | 12 ++++++++++++ drivers/gpu/drm/drm_gem.c | 7 +++++++ include/drm/drm_device.h | 3 +++ include/drm/drm_drv.h | 8 ++++++++ include/drm/drm_gem.h | 11 +++++++++++ 5 files changed, 41 insertions(+) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 862621494a93..890bd3c0e63e 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -28,6 +28,7 @@ #include #include +#include #include #include #include @@ -987,6 +988,12 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags) if (ret) goto err_minors; + if (dev->dev && drm_core_check_feature(dev, DRIVER_CGROUPS)) { + ret = mem_cgroup_device_register(dev->dev, &dev->memcg_id); + if (ret) + goto err_minors; + } + dev->registered = true; if (dev->driver->load) { @@ -1009,6 +1016,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags) goto out_unlock; err_minors: + if (dev->memcg_id) + mem_cgroup_device_unregister(dev->memcg_id); remove_compat_control_link(dev); drm_minor_unregister(dev, DRM_MINOR_PRIMARY); drm_minor_unregister(dev, DRM_MINOR_RENDER); @@ -1052,6 +1061,9 @@ void drm_dev_unregister(struct drm_device *dev) drm_legacy_rmmaps(dev); + if (dev->memcg_id) + mem_cgroup_device_unregister(dev->memcg_id); + remove_compat_control_link(dev); drm_minor_unregister(dev, DRM_MINOR_PRIMARY); drm_minor_unregister(dev, DRM_MINOR_RENDER); diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 50de138c89e0..966fbd701deb 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -281,6 +282,9 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle) if (IS_ERR_OR_NULL(obj)) return -EINVAL; + /* Release reference on cgroup used with GEM object charging */ + mem_cgroup_put(obj->memcg); + /* Release driver's reference and decrement refcount. */ drm_gem_object_release_handle(handle, obj, filp); @@ -410,6 +414,9 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, goto err_revoke; } + /* Acquire reference on cgroup for charging GEM memory allocations */ + obj->memcg = mem_cgroup_device_from_task(dev->memcg_id, current); + *handlep = handle; return 0; diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index 7f9ef709b2b6..9859f2289066 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -190,6 +190,9 @@ struct drm_device { */ int irq; + /* @memcg_id: cgroup subsys (memcg) index for our device state */ + unsigned long memcg_id; + /** * @vblank_disable_immediate: * diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 5cc7f728ec73..13b0e0b9527f 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -92,6 +92,14 @@ enum drm_driver_feature { */ DRIVER_SYNCOBJ = BIT(5), + /** + * @DRIVER_CGROUPS: + * + * Driver supports and requests DRM to register with cgroups during + * drm_dev_register(). + */ + DRIVER_CGROUPS = BIT(6), + /* IMPORTANT: Below are all the legacy flags, add new ones above. */ /** diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 5047c7ee25f5..ca90ea512e45 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -34,6 +34,7 @@ * OTHER DEALINGS IN THE SOFTWARE. */ +#include #include #include @@ -202,6 +203,16 @@ struct drm_gem_object { */ struct file *filp; + /** + * @memcg: + * + * cgroup used for charging GEM object page allocations against. This + * is set to the current cgroup during GEM object creation. + * Charging policy is up to each DRM driver to decide, but intent is to + * charge during page allocation and use for device memory only. + */ + struct mem_cgroup *memcg; + /** * @vma_node: * From patchwork Wed May 1 14:04:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Welty, Brian" X-Patchwork-Id: 10925127 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B74F81390 for ; Wed, 1 May 2019 14:03:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A1E2628AE8 for ; Wed, 1 May 2019 14:03:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9537F28BAA; Wed, 1 May 2019 14:03:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0DBA928AE8 for ; Wed, 1 May 2019 14:03:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85CD96B000E; Wed, 1 May 2019 10:03:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 811186B0010; Wed, 1 May 2019 10:03:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FD196B0266; Wed, 1 May 2019 10:03:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 3952C6B000E for ; Wed, 1 May 2019 10:03:09 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id j18so11041855pfi.20 for ; Wed, 01 May 2019 07:03:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=7DuZmk//m+TJEgso8niAEXIhBVD4m+iZUrt06/JgkP4=; b=SHYlkSHISQC80LWu4ZY5NokUT082aj/DpNCoKt57T6rhClRtDMkSzSIfmEpOkSyR7/ rJwSs0SDHpALgoLr2GHYnc9yHPboxHbp7ZGZYZobZTv2P11y5/LRRXEisO6lne/dfGSs wpAMnJcmGIdyE9HTeZRbyRPChwei2+X15S29yo+DOOdvNCdRILjQZ2iJaXsKvCCW03KQ IUJNA/6VKJfe9C7C+NCKZeC8gS2eTR14xT9G5uQVfQ2PuXrlP0OWBN6UTnJEYCuRlFrc LVAOC/XYDWYaao4aoNQeC6NO5bgTcCl18ipJakkeebetB1bmkxKmRXsEyHQBHQhRrV9c wrxA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUh4yCA0/zht1F8VzLwqBpr1898oSLk64C7B7naPMc1ppZNCa8x oui9zw8A36bRfCt24YtesNvuakBsUqqbwuzywlrKzBdtRv/nJJha4JBqbM+8IaV38jnCZBanpUg McAu1FzNGmIz29Lym2i69RMr/NiD9lnp8jaDuXKmpAOOm4MbzFdof7QWuT4N9KTqQXQ== X-Received: by 2002:a62:2e02:: with SMTP id u2mr40965629pfu.1.1556719388885; Wed, 01 May 2019 07:03:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqysB6+eCd8FcUMA9rqHrQ643s9nx20VUH0Dmk087QuEmYuRgBS5z67LO8aC8mwjekvhzoGo X-Received: by 2002:a62:2e02:: with SMTP id u2mr40965513pfu.1.1556719387936; Wed, 01 May 2019 07:03:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556719387; cv=none; d=google.com; s=arc-20160816; b=OakM+FJ2P8kXna3xoVT9lsjKCC1dpjZATCUr1/Q+31ZMpuquDQBhJjh6p2yMcx2T9V JGhr87us7zh6fRhQzY6B+2Hf0/OYNAKAsZ2/epwP57bJzI3VGyvkxawFkmGBh7PKZFCD Tb4xy/kF6E0l9HfkzGd2lsJCK7oFHlDOgj2NSWRiQPTTy4nr23ADbIB8HBp/1xcbOYYY jYM9VWymYxIiaB3SqTk6fc7xHdVkR74gDVeOyCqvs9nMyAkNPvVUCinK6LhhAbtQcy7U gwXa1IiH5r/DFHi3FjM+uDLXjoikMZGobQQUVpNVkqG2Bv7PVC/NTcXfBXlnqpSdcPTw 9SOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from; bh=7DuZmk//m+TJEgso8niAEXIhBVD4m+iZUrt06/JgkP4=; b=JeVUrudWvzaF4HBJqteckZaoKfhELwO/EQOOLnzm3DaeWNeE7/Oes8bNYate+A4zAj 4KphnX1YgK0LbFo+9hj3UR8iOhy8swjwXt/qAtFghZpGXHGzi1yBcjAy8fc0TjAgtYam N7M4/RauQPTgNeh11RvMdOSknIDhzl8WuAGv970Mif2WSs80u0Lfl4ai9GqhJ/m2dAHa Wix8Lqv1JqhpAbp3V3J1OozpiA+mBXGKJFgN5VAZoCYTkaxJp/Z2kHPVnG5oLMv9czmV aQisqpRLWRKM1fFXg2gTLD0D3/ZXTDR7p/Yuv5BohtAszG8rn/hkvM81iSoBWxju7tHg UeQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id q25si38000486pgv.534.2019.05.01.07.03.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 May 2019 07:03:07 -0700 (PDT) Received-SPF: pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of brian.welty@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=brian.welty@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2019 07:03:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,417,1549958400"; d="scan'208";a="145141419" Received: from nperf12.hd.intel.com ([10.127.88.161]) by fmsmga008.fm.intel.com with ESMTP; 01 May 2019 07:03:06 -0700 From: Brian Welty To: cgroups@vger.kernel.org, Tejun Heo , Li Zefan , Johannes Weiner , linux-mm@kvack.org, Michal Hocko , Vladimir Davydov , dri-devel@lists.freedesktop.org, David Airlie , Daniel Vetter , intel-gfx@lists.freedesktop.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , =?utf-8?q?Christian_K=C3=B6nig?= , Alex Deucher , ChunMing Zhou , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= Subject: [RFC PATCH 5/5] drm/i915: Use memory cgroup for enforcing device memory limit Date: Wed, 1 May 2019 10:04:38 -0400 Message-Id: <20190501140438.9506-6-brian.welty@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190501140438.9506-1-brian.welty@intel.com> References: <20190501140438.9506-1-brian.welty@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP i915 driver now includes DRIVER_CGROUPS in feature bits. To charge device memory allocations, we need to (1) identify appropriate cgroup to charge (currently decided at object creation time), and (2) make the charging call at the time that memory pages are being allocated. For (1), see prior DRM patch which associates current task's cgroup with GEM objects as they are created. That cgroup will be charged/uncharged for all paging activity against the GEM object. For (2), we call mem_cgroup_try_charge_direct() in .get_pages callback for the GEM object type. Uncharging is done in .put_pages when the memory is marked such that it can be evicted. The try_charge() call will fail with -ENOMEM if the current memory allocation will exceed the cgroup device memory maximum, and allow for driver to perform memory reclaim. Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: dri-devel@lists.freedesktop.org Cc: Matt Roper Signed-off-by: Brian Welty --- drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/intel_memory_region.c | 24 ++++++++++++++++++---- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5a0a59922cb4..4d496c3c3681 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -3469,7 +3469,7 @@ static struct drm_driver driver = { * deal with them for Intel hardware. */ .driver_features = - DRIVER_GEM | DRIVER_PRIME | + DRIVER_GEM | DRIVER_PRIME | DRIVER_CGROUPS | DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ, .release = i915_driver_release, .open = i915_driver_open, diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 813ff83c132b..e4ac5e4d4857 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -53,6 +53,8 @@ i915_memory_region_put_pages_buddy(struct drm_i915_gem_object *obj, mutex_unlock(&obj->memory_region->mm_lock); obj->mm.dirty = false; + mem_cgroup_uncharge_direct(obj->base.memcg, + obj->base.size >> PAGE_SHIFT); } int @@ -65,19 +67,29 @@ i915_memory_region_get_pages_buddy(struct drm_i915_gem_object *obj) struct scatterlist *sg; unsigned int sg_page_sizes; unsigned long n_pages; + int err; GEM_BUG_ON(!IS_ALIGNED(size, mem->mm.min_size)); GEM_BUG_ON(!list_empty(&obj->blocks)); + err = mem_cgroup_try_charge_direct(obj->base.memcg, size >> PAGE_SHIFT); + if (err) { + DRM_DEBUG("MEMCG: try_charge failed for %lld\n", size); + return err; + } + st = kmalloc(sizeof(*st), GFP_KERNEL); - if (!st) - return -ENOMEM; + if (!st) { + err = -ENOMEM; + goto err_uncharge; + } n_pages = div64_u64(size, mem->mm.min_size); if (sg_alloc_table(st, n_pages, GFP_KERNEL)) { kfree(st); - return -ENOMEM; + err = -ENOMEM; + goto err_uncharge; } sg = st->sgl; @@ -161,7 +173,11 @@ i915_memory_region_get_pages_buddy(struct drm_i915_gem_object *obj) err_free_blocks: memory_region_free_pages(obj, st); mutex_unlock(&mem->mm_lock); - return -ENXIO; + err = -ENXIO; +err_uncharge: + mem_cgroup_uncharge_direct(obj->base.memcg, + obj->base.size >> PAGE_SHIFT); + return err; } int i915_memory_region_init_buddy(struct intel_memory_region *mem)