diff mbox

[v4.5,1/8] cgroup: Allow registration and lookup of cgroup private data (v3)

Message ID 20180321232344.20727-2-matthew.d.roper@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Matt Roper March 21, 2018, 11:23 p.m. UTC
There are cases where other parts of the kernel may wish to store data
associated with individual cgroups without building a full cgroup
controller.  Let's add interfaces to allow them to register and lookup
this private data for individual cgroups.

A kernel system (e.g., a driver) that wishes to register private data
for a cgroup should start by obtaining a unique private data key by
calling cgroup_priv_getkey().  It may then associate private data
with a cgroup by calling cgroup_priv_install(cgrp, key, ref) where 'ref'
is a pointer to a kref field inside the private data structure.  The
private data may later be looked up by calling cgroup_priv_get(cgrp,
key) to obtain a new reference to the private data.  Private data may be
unregistered via cgroup_priv_release(cgrp, key).

If a cgroup is removed, the reference count for all private data objects
will be decremented.

v2:  Significant overhaul suggested by Tejun, Alexei, and Roman
 - Rework interface to make consumers obtain an ida-based key rather
   than supplying their own arbitrary void*
 - Internal implementation now uses per-cgroup radixtrees which should
   allow much faster lookup than the previous hashtable approach
 - Private data is registered via kref, allowing a single private data
   structure to potentially be assigned to multiple cgroups.

v3:
 - Spare warning fixes (kbuild test robot)

Cc: Tejun Heo <tj@kernel.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: cgroups@vger.kernel.org
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

fixup! cgroup: Allow registration and lookup of cgroup private data (v2)
---
 include/linux/cgroup-defs.h |  10 +++
 include/linux/cgroup.h      |   7 ++
 kernel/cgroup/cgroup.c      | 183 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 198 insertions(+), 2 deletions(-)

Comments

Chris Wilson March 22, 2018, 6:04 p.m. UTC | #1
Quoting Matt Roper (2018-03-21 23:23:37)
> There are cases where other parts of the kernel may wish to store data
> associated with individual cgroups without building a full cgroup
> controller.  Let's add interfaces to allow them to register and lookup
> this private data for individual cgroups.
> 
> A kernel system (e.g., a driver) that wishes to register private data
> for a cgroup should start by obtaining a unique private data key by
> calling cgroup_priv_getkey().  It may then associate private data
> with a cgroup by calling cgroup_priv_install(cgrp, key, ref) where 'ref'
> is a pointer to a kref field inside the private data structure.  The
> private data may later be looked up by calling cgroup_priv_get(cgrp,
> key) to obtain a new reference to the private data.  Private data may be
> unregistered via cgroup_priv_release(cgrp, key).
> 
> If a cgroup is removed, the reference count for all private data objects
> will be decremented.
> 
> v2:  Significant overhaul suggested by Tejun, Alexei, and Roman
>  - Rework interface to make consumers obtain an ida-based key rather
>    than supplying their own arbitrary void*
>  - Internal implementation now uses per-cgroup radixtrees which should
>    allow much faster lookup than the previous hashtable approach
>  - Private data is registered via kref, allowing a single private data
>    structure to potentially be assigned to multiple cgroups.
> 
> v3:
>  - Spare warning fixes (kbuild test robot)
> 
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Alexei Starovoitov <ast@fb.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: cgroups@vger.kernel.org
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> 
> fixup! cgroup: Allow registration and lookup of cgroup private data (v2)
> ---
>  include/linux/cgroup-defs.h |  10 +++
>  include/linux/cgroup.h      |   7 ++
>  kernel/cgroup/cgroup.c      | 183 +++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 198 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 9f242b876fde..9086d963cc0a 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -427,6 +427,16 @@ struct cgroup {
>         /* used to store eBPF programs */
>         struct cgroup_bpf bpf;
>  
> +       /*
> +        * cgroup private data registered by other non-controller parts of the
> +        * kernel.  Insertions are protected by privdata_lock, lookups by
> +        * rcu_read_lock().
> +        */
> +       struct radix_tree_root privdata;
> +
> +       /* Protect against concurrent insertions/deletions to privdata */
> +       spinlock_t privdata_lock;
> +
>         /* ids of the ancestors at each level including self */
>         int ancestor_ids[];
>  };
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 473e0c0abb86..63d22dfa00bd 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -833,4 +833,11 @@ static inline void put_cgroup_ns(struct cgroup_namespace *ns)
>                 free_cgroup_ns(ns);
>  }
>  
> +/* cgroup private data handling */
> +int cgroup_priv_getkey(void (*free)(struct kref *));
> +void cgroup_priv_destroykey(int key);
> +int cgroup_priv_install(struct cgroup *cgrp, int key, struct kref *ref);
> +struct kref *cgroup_priv_get(struct cgroup *cgrp, int key);
> +void cgroup_priv_release(struct cgroup *cgrp, int key);
> +
>  #endif /* _LINUX_CGROUP_H */
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 8cda3bc3ae22..3268a21e8158 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -81,8 +81,15 @@ EXPORT_SYMBOL_GPL(css_set_lock);
>  #endif
>  
>  /*
> - * Protects cgroup_idr and css_idr so that IDs can be released without
> - * grabbing cgroup_mutex.
> + * ID allocator for cgroup private data keys; the ID's allocated here will
> + * be used to index all per-cgroup radix trees.  The radix tree built into
> + * the IDR itself will store a key-specific function to be passed to kref_put.
> + */
> +static DEFINE_IDR(cgroup_priv_idr);

Fun two radixtrees, one to hold the (*free)() and the other the void*.

Would not just a
struct cgroup_privdata {
	struct rcu_head rcu;
	void (*free)(struct kref *);
}
suffice for subclassing? (Also this is a prime candidate for switching
to XArray.)

> +struct kref *
> +cgroup_priv_get(struct cgroup *cgrp, int key)
> +{
> +       struct kref *ref;
> +
> +       WARN_ON(cgrp->root != &cgrp_dfl_root);
> +       WARN_ON(key == 0);
> +
> +       rcu_read_lock();
> +
> +       ref = radix_tree_lookup(&cgrp->privdata, key);
> +       if (ref)
> +               kref_get(ref);

This is not safe, you need

if (ref && !kref_get_unless_zero(ref))
	ref = NULL;

otherwise the cgroup_priv_release() may drop the last reference prior to
you obtaining yours. Also requires the privdata to be RCU protected.
-Chris
diff mbox

Patch

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 9f242b876fde..9086d963cc0a 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -427,6 +427,16 @@  struct cgroup {
 	/* used to store eBPF programs */
 	struct cgroup_bpf bpf;
 
+	/*
+	 * cgroup private data registered by other non-controller parts of the
+	 * kernel.  Insertions are protected by privdata_lock, lookups by
+	 * rcu_read_lock().
+	 */
+	struct radix_tree_root privdata;
+
+	/* Protect against concurrent insertions/deletions to privdata */
+	spinlock_t privdata_lock;
+
 	/* ids of the ancestors at each level including self */
 	int ancestor_ids[];
 };
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 473e0c0abb86..63d22dfa00bd 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -833,4 +833,11 @@  static inline void put_cgroup_ns(struct cgroup_namespace *ns)
 		free_cgroup_ns(ns);
 }
 
+/* cgroup private data handling */
+int cgroup_priv_getkey(void (*free)(struct kref *));
+void cgroup_priv_destroykey(int key);
+int cgroup_priv_install(struct cgroup *cgrp, int key, struct kref *ref);
+struct kref *cgroup_priv_get(struct cgroup *cgrp, int key);
+void cgroup_priv_release(struct cgroup *cgrp, int key);
+
 #endif /* _LINUX_CGROUP_H */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8cda3bc3ae22..3268a21e8158 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -81,8 +81,15 @@  EXPORT_SYMBOL_GPL(css_set_lock);
 #endif
 
 /*
- * Protects cgroup_idr and css_idr so that IDs can be released without
- * grabbing cgroup_mutex.
+ * ID allocator for cgroup private data keys; the ID's allocated here will
+ * be used to index all per-cgroup radix trees.  The radix tree built into
+ * the IDR itself will store a key-specific function to be passed to kref_put.
+ */
+static DEFINE_IDR(cgroup_priv_idr);
+
+/*
+ * Protects cgroup_idr, css_idr, and cgroup_priv_idr so that IDs can be
+ * released without grabbing cgroup_mutex.
  */
 static DEFINE_SPINLOCK(cgroup_idr_lock);
 
@@ -1839,6 +1846,8 @@  static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->cset_links);
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
+	INIT_RADIX_TREE(&cgrp->privdata, GFP_NOWAIT);
+	spin_lock_init(&cgrp->privdata_lock);
 	cgrp->self.cgroup = cgrp;
 	cgrp->self.flags |= CSS_ONLINE;
 	cgrp->dom_cgrp = cgrp;
@@ -4578,6 +4587,8 @@  static void css_release_work_fn(struct work_struct *work)
 		container_of(work, struct cgroup_subsys_state, destroy_work);
 	struct cgroup_subsys *ss = css->ss;
 	struct cgroup *cgrp = css->cgroup;
+	struct radix_tree_iter iter;
+	void __rcu **slot;
 
 	mutex_lock(&cgroup_mutex);
 
@@ -4617,6 +4628,12 @@  static void css_release_work_fn(struct work_struct *work)
 					 NULL);
 
 		cgroup_bpf_put(cgrp);
+
+		/* Drop reference on any private data */
+		rcu_read_lock();
+		radix_tree_for_each_slot(slot, &cgrp->privdata, &iter, 0)
+			cgroup_priv_release(cgrp, iter.index);
+		rcu_read_unlock();
 	}
 
 	mutex_unlock(&cgroup_mutex);
@@ -5932,3 +5949,165 @@  static int __init cgroup_sysfs_init(void)
 }
 subsys_initcall(cgroup_sysfs_init);
 #endif /* CONFIG_SYSFS */
+
+/**
+ * cgroup_priv_getkey - obtain a new cgroup_priv lookup key
+ * @free: Function to release private data associated with this key
+ *
+ * Allows non-controller kernel subsystems to register a new key that will
+ * be used to insert/lookup private data associated with individual cgroups.
+ * Private data lookup tables are implemented as per-cgroup radix trees.
+ *
+ * Returns:
+ * A positive integer lookup key if successful, or a negative error code
+ * on failure (e.g., if ID allocation fails).
+ */
+int
+cgroup_priv_getkey(void (*free)(struct kref *r))
+{
+	int ret;
+
+	WARN_ON(!free);
+
+	idr_preload(GFP_KERNEL);
+	spin_lock_bh(&cgroup_idr_lock);
+	ret = idr_alloc(&cgroup_priv_idr, free, 1, 0, GFP_NOWAIT);
+	spin_unlock_bh(&cgroup_idr_lock);
+	idr_preload_end();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cgroup_priv_getkey);
+
+/**
+ * cgroup_priv_destroykey - release a cgroup_priv key
+ * @key: Private data key to be released
+ *
+ * Removes a cgroup private data key and all private data associated with this
+ * key in any cgroup.  This is a heavy operation that will take cgroup_mutex.
+ */
+void
+cgroup_priv_destroykey(int key)
+{
+	struct cgroup *cgrp;
+
+	WARN_ON(key == 0);
+
+	mutex_lock(&cgroup_mutex);
+	cgroup_for_each_live_child(cgrp, &cgrp_dfl_root.cgrp)
+		cgroup_priv_release(cgrp, key);
+	idr_remove(&cgroup_priv_idr, key);
+	mutex_unlock(&cgroup_mutex);
+}
+EXPORT_SYMBOL_GPL(cgroup_priv_destroykey);
+
+/**
+ * cgroup_priv_install - install new cgroup private data
+ * @cgrp: cgroup to store private data for
+ * @key: key uniquely identifying kernel owner of private data
+ * @data: pointer to kref field of private data structure
+ *
+ * Allows non-controller kernel subsystems to register their own private data
+ * associated with a cgroup.  This will often be used by drivers which wish to
+ * track their own per-cgroup data without building a full cgroup controller.
+ *
+ * The caller is responsible for ensuring that no private data already exists
+ * for the given key.
+ *
+ * Registering cgroup private data with this function will increment the
+ * reference count for the private data structure.  If the cgroup is removed,
+ * the reference count will be decremented, allowing the private data to
+ * be freed if there are no other outstanding references.
+ *
+ * Returns:
+ * 0 on success, otherwise a negative error code on failure.
+ */
+int
+cgroup_priv_install(struct cgroup *cgrp, int key, struct kref *ref)
+{
+	int ret;
+
+	WARN_ON(cgrp->root != &cgrp_dfl_root);
+	WARN_ON(key == 0);
+
+	kref_get(ref);
+
+	ret = radix_tree_preload(GFP_KERNEL);
+	if (ret)
+		return ret;
+
+	spin_lock_bh(&cgrp->privdata_lock);
+	ret = radix_tree_insert(&cgrp->privdata, key, ref);
+	spin_unlock_bh(&cgrp->privdata_lock);
+	radix_tree_preload_end();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cgroup_priv_install);
+
+/**
+ * cgroup_priv_get - looks up cgroup private data
+ * @cgrp: cgroup to lookup private data for
+ * @key: key uniquely identifying owner of private data to lookup
+ *
+ * Looks up the cgroup private data associated with a key.  The private
+ * data's reference count is incremented and a pointer to its kref field
+ * is returned to the caller (which can use container_of()) to obtain
+ * the private data itself.
+ *
+ * Returns:
+ * A pointer to the private data's kref field, or NULL if no private data has
+ * been registered.
+ */
+struct kref *
+cgroup_priv_get(struct cgroup *cgrp, int key)
+{
+	struct kref *ref;
+
+	WARN_ON(cgrp->root != &cgrp_dfl_root);
+	WARN_ON(key == 0);
+
+	rcu_read_lock();
+
+	ref = radix_tree_lookup(&cgrp->privdata, key);
+	if (ref)
+		kref_get(ref);
+
+	rcu_read_unlock();
+
+	return ref;
+}
+EXPORT_SYMBOL_GPL(cgroup_priv_get);
+
+/**
+ * cgroup_priv_free - free cgroup private data
+ * @cgrp: cgroup to release private data for
+ * @key: key uniquely identifying owner of private data to free
+ *
+ * Removes private data associated with the given key from a cgroup's internal
+ * table and decrements the reference count for the private data removed,
+ * allowing it to freed if no other references exist.
+ */
+void
+cgroup_priv_release(struct cgroup *cgrp, int key)
+{
+	struct kref *ref;
+	void (*free)(struct kref *r);
+
+	WARN_ON(cgrp->root != &cgrp_dfl_root);
+	WARN_ON(key == 0);
+
+	rcu_read_lock();
+
+	free = idr_find(&cgroup_priv_idr, key);
+	WARN_ON(!free);
+
+	spin_lock_bh(&cgrp->privdata_lock);
+	ref = radix_tree_delete(&cgrp->privdata, key);
+	spin_unlock_bh(&cgrp->privdata_lock);
+	if (ref)
+		kref_put(ref, free);
+
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(cgroup_priv_release);