cgroup-v1: freezer: optionally killable freezer
diff mbox series

Message ID 20200219183231.50985-1-balejs@google.com
State Not Applicable, archived
Headers show
Series
  • cgroup-v1: freezer: optionally killable freezer
Related show

Commit Message

Marco Ballesio Feb. 19, 2020, 6:32 p.m. UTC
The cgroup v2 freezer allows killing frozen processes without the need
to unfreeze them first. This is not possible with the v1 freezer, where
processes are to be unfrozen prior any pending kill signals to take effect.

Add a configurable option to allow killing frozen tasks in a way similar to
cgroups v2. Change the status of frozen tasks to TASK_INTERRUPTIBLE and reset
their PF_FROZEN flag on pending fatal signals.

Use the run-time configurable option freezer.killable to enable killability,
preserve the pre-existing behavior by default.

Signed-off-by: Marco Ballesio <balejs@google.com>
---
 .../cgroup-v1/freezer-subsystem.rst           | 12 ++++
 include/linux/freezer.h                       |  1 +
 kernel/cgroup/legacy_freezer.c                | 69 ++++++++++++++++++-
 kernel/freezer.c                              | 20 +++++-
 4 files changed, 98 insertions(+), 4 deletions(-)

Comments

Marco Ballesio Feb. 29, 2020, 12:51 a.m. UTC | #1
Hi all,

did anyone have time to look into my proposal and, in case, are there
any suggestions, ideas or comments about it?

Marco
Roman Gushchin Feb. 29, 2020, 6:43 p.m. UTC | #2
On Fri, Feb 28, 2020 at 04:51:31PM -0800, Marco Ballesio wrote:
> Hi all,
> 
> did anyone have time to look into my proposal and, in case, are there
> any suggestions, ideas or comments about it?

Hello, Marco!

I'm sorry, somehow I missed the original letter.

In general the cgroup v1 interface is considered frozen. Are there any particular
reasons why you want to extend the v1 freezer rather than use the v2 version of it?

You don't even need to fully convert to cgroup v2 in order to do it, some v1
controllers can still be used.

Thanks!

Roman
Marco Ballesio March 1, 2020, 4:20 p.m. UTC | #3
On Sat, Feb 29, 2020 at 10:43:00AM -0800, Roman Gushchin wrote:
> On Fri, Feb 28, 2020 at 04:51:31PM -0800, Marco Ballesio wrote:
> > Hi all,
> > 
> > did anyone have time to look into my proposal and, in case, are there
> > any suggestions, ideas or comments about it?
> 
> Hello, Marco!
> 
> I'm sorry, somehow I missed the original letter.
> 
> In general the cgroup v1 interface is considered frozen. Are there any particular
> reasons why you want to extend the v1 freezer rather than use the v2 version of it?
> 
> You don't even need to fully convert to cgroup v2 in order to do it, some v1
> controllers can still be used.
> 
> Thanks!
> 
> Roman

Hi Roman,

When compared with backports of v2 features and their dependency chains, this
patch would be easier to carry in Android common. The potential is to have
killability for frozen processes on hw currently in use.

Marco
Roman Gushchin March 2, 2020, 4:53 p.m. UTC | #4
On Sun, Mar 01, 2020 at 08:20:03AM -0800, Marco Ballesio wrote:
> On Sat, Feb 29, 2020 at 10:43:00AM -0800, Roman Gushchin wrote:
> > On Fri, Feb 28, 2020 at 04:51:31PM -0800, Marco Ballesio wrote:
> > > Hi all,
> > > 
> > > did anyone have time to look into my proposal and, in case, are there
> > > any suggestions, ideas or comments about it?
> > 
> > Hello, Marco!
> > 
> > I'm sorry, somehow I missed the original letter.
> > 
> > In general the cgroup v1 interface is considered frozen. Are there any particular
> > reasons why you want to extend the v1 freezer rather than use the v2 version of it?
> > 
> > You don't even need to fully convert to cgroup v2 in order to do it, some v1
> > controllers can still be used.
> > 
> > Thanks!
> > 
> > Roman
> 
> Hi Roman,
> 
> When compared with backports of v2 features and their dependency chains, this
> patch would be easier to carry in Android common. The potential is to have
> killability for frozen processes on hw currently in use.

I see...

The implementation looks good to me, but I really not sure if adding new control files
to cgroup v1 is a good idea at this point. Are there any plans in the Android world
to move forward to cgroup v2? If not, why not?
If there are any specific issues/dependencies, let's discuss and resolve them.

Thanks!

Roman
Suren Baghdasaryan March 2, 2020, 5:46 p.m. UTC | #5
On Mon, Mar 2, 2020 at 8:53 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Sun, Mar 01, 2020 at 08:20:03AM -0800, Marco Ballesio wrote:
> > On Sat, Feb 29, 2020 at 10:43:00AM -0800, Roman Gushchin wrote:
> > > On Fri, Feb 28, 2020 at 04:51:31PM -0800, Marco Ballesio wrote:
> > > > Hi all,
> > > >
> > > > did anyone have time to look into my proposal and, in case, are there
> > > > any suggestions, ideas or comments about it?
> > >
> > > Hello, Marco!
> > >
> > > I'm sorry, somehow I missed the original letter.
> > >
> > > In general the cgroup v1 interface is considered frozen. Are there any particular
> > > reasons why you want to extend the v1 freezer rather than use the v2 version of it?
> > >
> > > You don't even need to fully convert to cgroup v2 in order to do it, some v1
> > > controllers can still be used.
> > >
> > > Thanks!
> > >
> > > Roman
> >
> > Hi Roman,
> >
> > When compared with backports of v2 features and their dependency chains, this
> > patch would be easier to carry in Android common. The potential is to have
> > killability for frozen processes on hw currently in use.
>

Hi Roman,

> I see...
>
> The implementation looks good to me, but I really not sure if adding new control files
> to cgroup v1 is a good idea at this point. Are there any plans in the Android world
> to move forward to cgroup v2? If not, why not?

There are plans to prototype that and gradually move from cgroups v1
to v2 at least for some cgroup controllers (the ones that can use
unified hierarchy). Creating an additional per-process cgroup v2
hierarchy only for freezer would be a high price to pay today. In the
future when we migrate some controllers to v2 the price will be
amortized and we will probably be able to do that.

> If there are any specific issues/dependencies, let's discuss and resolve them.
>
> Thanks!
>
> Roman

Thanks,
Suren.
Roman Gushchin March 2, 2020, 6:27 p.m. UTC | #6
On Mon, Mar 02, 2020 at 09:46:36AM -0800, Suren Baghdasaryan wrote:
> On Mon, Mar 2, 2020 at 8:53 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Sun, Mar 01, 2020 at 08:20:03AM -0800, Marco Ballesio wrote:
> > > On Sat, Feb 29, 2020 at 10:43:00AM -0800, Roman Gushchin wrote:
> > > > On Fri, Feb 28, 2020 at 04:51:31PM -0800, Marco Ballesio wrote:
> > > > > Hi all,
> > > > >
> > > > > did anyone have time to look into my proposal and, in case, are there
> > > > > any suggestions, ideas or comments about it?
> > > >
> > > > Hello, Marco!
> > > >
> > > > I'm sorry, somehow I missed the original letter.
> > > >
> > > > In general the cgroup v1 interface is considered frozen. Are there any particular
> > > > reasons why you want to extend the v1 freezer rather than use the v2 version of it?
> > > >
> > > > You don't even need to fully convert to cgroup v2 in order to do it, some v1
> > > > controllers can still be used.
> > > >
> > > > Thanks!
> > > >
> > > > Roman
> > >
> > > Hi Roman,
> > >
> > > When compared with backports of v2 features and their dependency chains, this
> > > patch would be easier to carry in Android common. The potential is to have
> > > killability for frozen processes on hw currently in use.
> >
> 
> Hi Roman,
> 
> > I see...
> >
> > The implementation looks good to me, but I really not sure if adding new control files
> > to cgroup v1 is a good idea at this point. Are there any plans in the Android world
> > to move forward to cgroup v2? If not, why not?
> 
> There are plans to prototype that and gradually move from cgroups v1
> to v2 at least for some cgroup controllers (the ones that can use
> unified hierarchy). Creating an additional per-process cgroup v2
> hierarchy only for freezer would be a high price to pay today. In the
> future when we migrate some controllers to v2 the price will be
> amortized and we will probably be able to do that.

I see... Thanks for the explanation, Suren!

Overall the idea of extending the frozen v1 interface looks dubious to me.
Especially if it's only required during the transition to v2.

But of course the decision is on maintainers.

Thanks!
Tejun Heo March 3, 2020, 1:48 p.m. UTC | #7
Hello,

On Wed, Feb 19, 2020 at 10:32:31AM -0800, Marco Ballesio wrote:
> @@ -94,6 +94,18 @@ The following cgroupfs files are created by cgroup freezer.
>    Shows the parent-state.  0 if none of the cgroup's ancestors is
>    frozen; otherwise, 1.
>  
> +* freezer.killable: Read-write
> +
> +  When read, returns the killable state of a cgroup - "1" if frozen
> +  tasks will respond to fatal signals, or "0" if they won't.
> +
> +  When written, this property sets the killable state of the cgroup.
> +  A value equal to "1" will switch the state of all frozen tasks in
> +  the cgroup to TASK_INTERRUPTIBLE (similarly to cgroup v2) and will
> +  make them react to fatal signals. A value of "0" will switch the
> +  state of frozen tasks to TASK_UNINTERRUPTIBLE and they won't respond
> +  to signals unless thawed or unfrozen.

As Roman said, I'm not too sure about adding a new cgroup1 freezer
interface at this point. If we do this, *maybe* a mount option would
be more minimal?

> diff --git a/kernel/freezer.c b/kernel/freezer.c
> index dc520f01f99d..92de1bfe62cf 100644
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -42,6 +42,9 @@ bool freezing_slow_path(struct task_struct *p)
>  	if (test_tsk_thread_flag(p, TIF_MEMDIE))
>  		return false;
>  
> +	if (cgroup_freezer_killable(p) && fatal_signal_pending(p))
> +		return false;
> +
>  	if (pm_nosig_freezing || cgroup_freezing(p))
>  		return true;
>  
> @@ -63,7 +66,12 @@ bool __refrigerator(bool check_kthr_stop)
>  	pr_debug("%s entered refrigerator\n", current->comm);
>  
>  	for (;;) {
> -		set_current_state(TASK_UNINTERRUPTIBLE);
> +		bool killable = cgroup_freezer_killable(current);
> +
> +		if (killable)
> +			set_current_state(TASK_INTERRUPTIBLE);
> +		else
> +			set_current_state(TASK_UNINTERRUPTIBLE);
>  
>  		spin_lock_irq(&freezer_lock);
>  		current->flags |= PF_FROZEN;
> @@ -75,6 +83,16 @@ bool __refrigerator(bool check_kthr_stop)
>  		if (!(current->flags & PF_FROZEN))
>  			break;
>  		was_frozen = true;
> +
> +		/*
> +		 * Now we're sure that there is no pending fatal signal.
> +		 * Clear TIF_SIGPENDING to not get out of schedule()
> +		 * immediately (if there is a non-fatal signal pending), and
> +		 * put the task into sleep.
> +		 */

and this looks really racy to me. What happens if this task gets a
fatal signal here? We clear TIF_SIGPENDING and go to sleep?

> +		if (killable)
> +			clear_thread_flag(TIF_SIGPENDING);
> +
>  		schedule();
>  	}

Thanks.
Daniel Colascione March 11, 2020, 5:46 p.m. UTC | #8
On Tue, Mar 3, 2020 at 5:48 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Wed, Feb 19, 2020 at 10:32:31AM -0800, Marco Ballesio wrote:
> > @@ -94,6 +94,18 @@ The following cgroupfs files are created by cgroup freezer.
> >    Shows the parent-state.  0 if none of the cgroup's ancestors is
> >    frozen; otherwise, 1.
> >
> > +* freezer.killable: Read-write
> > +
> > +  When read, returns the killable state of a cgroup - "1" if frozen
> > +  tasks will respond to fatal signals, or "0" if they won't.
> > +
> > +  When written, this property sets the killable state of the cgroup.
> > +  A value equal to "1" will switch the state of all frozen tasks in
> > +  the cgroup to TASK_INTERRUPTIBLE (similarly to cgroup v2) and will
> > +  make them react to fatal signals. A value of "0" will switch the
> > +  state of frozen tasks to TASK_UNINTERRUPTIBLE and they won't respond
> > +  to signals unless thawed or unfrozen.
>
> As Roman said, I'm not too sure about adding a new cgroup1 freezer
> interface at this point. If we do this, *maybe* a mount option would
> be more minimal?

I'd still prefer a cgroup flag. A mount option is a bigger
compatibility risk and isn't really any simpler than another cgroup
flag. A mount option will affect anything using the cgroup mount
point, potentially turning non-killable frozen processes into killable
ones unexpectedly. (Sure, you could mount multiple times, but only one
location is canonical, and that's the one that's going to get the flag
flipped.) A per-cgroup flag allows people to opt into the new behavior
only in specific contexts, so it's safer.
Marco Ballesio March 20, 2020, 8:10 p.m. UTC | #9
On Wed, Mar 11, 2020 at 10:46:15AM -0700, Daniel Colascione wrote:
> On Tue, Mar 3, 2020 at 5:48 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > Hello,
> >
> > On Wed, Feb 19, 2020 at 10:32:31AM -0800, Marco Ballesio wrote:
> > > @@ -94,6 +94,18 @@ The following cgroupfs files are created by cgroup freezer.
> > >    Shows the parent-state.  0 if none of the cgroup's ancestors is
> > >    frozen; otherwise, 1.
> > >
> > > +* freezer.killable: Read-write
> > > +
> > > +  When read, returns the killable state of a cgroup - "1" if frozen
> > > +  tasks will respond to fatal signals, or "0" if they won't.
> > > +
> > > +  When written, this property sets the killable state of the cgroup.
> > > +  A value equal to "1" will switch the state of all frozen tasks in
> > > +  the cgroup to TASK_INTERRUPTIBLE (similarly to cgroup v2) and will
> > > +  make them react to fatal signals. A value of "0" will switch the
> > > +  state of frozen tasks to TASK_UNINTERRUPTIBLE and they won't respond
> > > +  to signals unless thawed or unfrozen.
> >
> > As Roman said, I'm not too sure about adding a new cgroup1 freezer
> > interface at this point. If we do this, *maybe* a mount option would
> > be more minimal?
> 
> I'd still prefer a cgroup flag. A mount option is a bigger
> compatibility risk and isn't really any simpler than another cgroup
> flag. A mount option will affect anything using the cgroup mount
> point, potentially turning non-killable frozen processes into killable
> ones unexpectedly. (Sure, you could mount multiple times, but only one
> location is canonical, and that's the one that's going to get the flag
> flipped.) A per-cgroup flag allows people to opt into the new behavior
> only in specific contexts, so it's safer.

It might also be desirable for userland to have a way to modify the behavior of
an already mounted v1 freezer.

Tejun, would it be acceptable to have a flag but disable it by default, hiding
it behind a kernel configuration option?
Tejun Heo March 24, 2020, 6:26 p.m. UTC | #10
Hello,

On Fri, Mar 20, 2020 at 01:10:38PM -0700, Marco Ballesio wrote:
> It might also be desirable for userland to have a way to modify the behavior of
> an already mounted v1 freezer.
> 
> Tejun, would it be acceptable to have a flag but disable it by default, hiding
> it behind a kernel configuration option?

Given how dead-end this is, I'm not sure this needs to be upstream. Can you give
me some rationales?

Thanks.

Patch
diff mbox series

diff --git a/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst b/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst
index 582d3427de3f..06485ae9dccd 100644
--- a/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst
+++ b/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst
@@ -94,6 +94,18 @@  The following cgroupfs files are created by cgroup freezer.
   Shows the parent-state.  0 if none of the cgroup's ancestors is
   frozen; otherwise, 1.
 
+* freezer.killable: Read-write
+
+  When read, returns the killable state of a cgroup - "1" if frozen
+  tasks will respond to fatal signals, or "0" if they won't.
+
+  When written, this property sets the killable state of the cgroup.
+  A value equal to "1" will switch the state of all frozen tasks in
+  the cgroup to TASK_INTERRUPTIBLE (similarly to cgroup v2) and will
+  make them react to fatal signals. A value of "0" will switch the
+  state of frozen tasks to TASK_UNINTERRUPTIBLE and they won't respond
+  to signals unless thawed or unfrozen.
+
 The root cgroup is non-freezable and the above interface files don't
 exist.
 
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 21f5aa0b217f..1443810ac2bf 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -72,6 +72,7 @@  extern bool set_freezable(void);
 
 #ifdef CONFIG_CGROUP_FREEZER
 extern bool cgroup_freezing(struct task_struct *task);
+extern bool cgroup_freezer_killable(struct task_struct *task);
 #else /* !CONFIG_CGROUP_FREEZER */
 static inline bool cgroup_freezing(struct task_struct *task)
 {
diff --git a/kernel/cgroup/legacy_freezer.c b/kernel/cgroup/legacy_freezer.c
index 08236798d173..5bbc26c4b822 100644
--- a/kernel/cgroup/legacy_freezer.c
+++ b/kernel/cgroup/legacy_freezer.c
@@ -35,6 +35,7 @@  enum freezer_state_flags {
 	CGROUP_FREEZING_SELF	= (1 << 1), /* this freezer is freezing */
 	CGROUP_FREEZING_PARENT	= (1 << 2), /* the parent freezer is freezing */
 	CGROUP_FROZEN		= (1 << 3), /* this and its descendants frozen */
+	CGROUP_FREEZER_KILLABLE = (1 << 4), /* frozen pocesses can be killed */
 
 	/* mask for all FREEZING flags */
 	CGROUP_FREEZING		= CGROUP_FREEZING_SELF | CGROUP_FREEZING_PARENT,
@@ -73,6 +74,17 @@  bool cgroup_freezing(struct task_struct *task)
 	return ret;
 }
 
+bool cgroup_freezer_killable(struct task_struct *task)
+{
+	bool ret;
+
+	rcu_read_lock();
+	ret = task_freezer(task)->state & CGROUP_FREEZER_KILLABLE;
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static const char *freezer_state_strs(unsigned int state)
 {
 	if (state & CGROUP_FROZEN)
@@ -111,9 +123,15 @@  static int freezer_css_online(struct cgroup_subsys_state *css)
 
 	freezer->state |= CGROUP_FREEZER_ONLINE;
 
-	if (parent && (parent->state & CGROUP_FREEZING)) {
-		freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
-		atomic_inc(&system_freezing_cnt);
+	if (parent) {
+		if (parent->state & CGROUP_FREEZER_KILLABLE)
+			freezer->state |= CGROUP_FREEZER_KILLABLE;
+
+		if (parent->state & CGROUP_FREEZING) {
+			freezer->state |= CGROUP_FREEZING_PARENT |
+					CGROUP_FROZEN;
+			atomic_inc(&system_freezing_cnt);
+		}
 	}
 
 	mutex_unlock(&freezer_mutex);
@@ -450,6 +468,45 @@  static u64 freezer_parent_freezing_read(struct cgroup_subsys_state *css,
 	return (bool)(freezer->state & CGROUP_FREEZING_PARENT);
 }
 
+static u64 freezer_killable_read(struct cgroup_subsys_state *css,
+				     struct cftype *cft)
+{
+	struct freezer *freezer = css_freezer(css);
+
+	return (bool)(freezer->state & CGROUP_FREEZER_KILLABLE);
+}
+
+static int freezer_killable_write(struct cgroup_subsys_state *css,
+				      struct cftype *cft, u64 val)
+{
+	struct freezer *freezer = css_freezer(css);
+
+	if (val > 1)
+		return -EINVAL;
+
+	mutex_lock(&freezer_mutex);
+
+	if (val == !!(freezer->state & CGROUP_FREEZER_KILLABLE))
+		goto out;
+
+	if (val)
+		freezer->state |= CGROUP_FREEZER_KILLABLE;
+	else
+		freezer->state &= ~CGROUP_FREEZER_KILLABLE;
+
+
+	/*
+	 * Let __refrigerator spin once for each task to set it into the
+	 * appropriate state.
+	 */
+	unfreeze_cgroup(freezer);
+
+out:
+	mutex_unlock(&freezer_mutex);
+
+	return 0;
+}
+
 static struct cftype files[] = {
 	{
 		.name = "state",
@@ -467,6 +524,12 @@  static struct cftype files[] = {
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = freezer_parent_freezing_read,
 	},
+	{
+		.name = "killable",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.write_u64 = freezer_killable_write,
+		.read_u64 = freezer_killable_read,
+	},
 	{ }	/* terminate */
 };
 
diff --git a/kernel/freezer.c b/kernel/freezer.c
index dc520f01f99d..92de1bfe62cf 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -42,6 +42,9 @@  bool freezing_slow_path(struct task_struct *p)
 	if (test_tsk_thread_flag(p, TIF_MEMDIE))
 		return false;
 
+	if (cgroup_freezer_killable(p) && fatal_signal_pending(p))
+		return false;
+
 	if (pm_nosig_freezing || cgroup_freezing(p))
 		return true;
 
@@ -63,7 +66,12 @@  bool __refrigerator(bool check_kthr_stop)
 	pr_debug("%s entered refrigerator\n", current->comm);
 
 	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		bool killable = cgroup_freezer_killable(current);
+
+		if (killable)
+			set_current_state(TASK_INTERRUPTIBLE);
+		else
+			set_current_state(TASK_UNINTERRUPTIBLE);
 
 		spin_lock_irq(&freezer_lock);
 		current->flags |= PF_FROZEN;
@@ -75,6 +83,16 @@  bool __refrigerator(bool check_kthr_stop)
 		if (!(current->flags & PF_FROZEN))
 			break;
 		was_frozen = true;
+
+		/*
+		 * Now we're sure that there is no pending fatal signal.
+		 * Clear TIF_SIGPENDING to not get out of schedule()
+		 * immediately (if there is a non-fatal signal pending), and
+		 * put the task into sleep.
+		 */
+		if (killable)
+			clear_thread_flag(TIF_SIGPENDING);
+
 		schedule();
 	}