From patchwork Thu Jan 21 21:24:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 8085091 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 619F8BEEE5 for ; Thu, 21 Jan 2016 21:24:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 69F92200BE for ; Thu, 21 Jan 2016 21:24:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 517C02042A for ; Thu, 21 Jan 2016 21:24:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753015AbcAUVYd (ORCPT ); Thu, 21 Jan 2016 16:24:33 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:51932 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752109AbcAUVYb (ORCPT ); Thu, 21 Jan 2016 16:24:31 -0500 Received: from j217066.upc-j.chello.nl ([24.132.217.66] helo=twins) by bombadil.infradead.org with esmtpsa (Exim 4.80.1 #2 (Red Hat Linux)) id 1aMMiJ-0000O8-IK; Thu, 21 Jan 2016 21:24:19 +0000 Received: by twins (Postfix, from userid 1000) id 419521257A0D8; Thu, 21 Jan 2016 22:24:16 +0100 (CET) Date: Thu, 21 Jan 2016 22:24:16 +0100 From: Peter Zijlstra To: Tejun Heo Cc: Christian Borntraeger , linux-kernel@vger.kernel.org, linux-s390 , KVM list , Oleg Nesterov , "Paul E. McKenney" , Li Zefan , Johannes Weiner , cgroups@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 1/2] cgroup: make sure a parent css isn't offlined before its children Message-ID: <20160121212416.GL6357@twins.programming.kicks-ass.net> References: <56978452.6010606@de.ibm.com> <20160114195630.GA3520@mtj.duckdns.org> <5698A023.9070703@de.ibm.com> <20160121203111.GF5157@mtj.duckdns.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160121203111.GF5157@mtj.duckdns.org> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Jan 21, 2016 at 03:31:11PM -0500, Tejun Heo wrote: > There are three subsystem callbacks in css shutdown path - > css_offline(), css_released() and css_free(). Except for > css_released(), cgroup core didn't use to guarantee the order of > invocation. css_offline() or css_free() could be called on a parent > css before its children. This behavior is unexpected and led to > use-after-free in cpu controller. > > This patch updates offline path so that a parent css is never offlined > before its children. Each css keeps online_cnt which reaches zero iff > itself and all its children are offline and offline_css() is invoked > only after online_cnt reaches zero. > > This fixes the reported cpu controller malfunction. The next patch > will update css_free() handling. No, I need to fix the cpu controller too, because the offending code sits off of css_free() (the next patch), but also does a call_rcu() in between, which also doesn't guarantee order. So your patch and the below would be required to fix this I think. And then I should look at removing the call_rcu() from the css_free() at a later date, I think its superfluous but need to double check that. Tested-by: Christian Borntraeger --- Subject: sched: Fix cgroup entity load tracking tear-down When a cgroup's cpu runqueue is destroyed, it should remove its remaining load accounting from its parent cgroup. The current site for doing so it unsuited because its far too late and unordered against other cgroup removal (css_free will be, but we're also in an RCU callback). Put it in the css_offline callback, which is the start of cgroup destruction, right after the group has been made unavailable to userspace. The css_offline callbacks are called in hierarchical order. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 4 +--- kernel/sched/fair.c | 35 ++++++++++++++++++++--------------- kernel/sched/sched.h | 2 +- 3 files changed, 22 insertions(+), 19 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b8bd352dc63f..d589a140fe0e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7865,11 +7865,9 @@ void sched_destroy_group(struct task_group *tg) void sched_offline_group(struct task_group *tg) { unsigned long flags; - int i; /* end participation in shares distribution */ - for_each_possible_cpu(i) - unregister_fair_sched_group(tg, i); + unregister_fair_sched_group(tg); spin_lock_irqsave(&task_group_lock, flags); list_del_rcu(&tg->list); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7f60da0f0fd7..aff660b70bf5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8244,11 +8244,8 @@ void free_fair_sched_group(struct task_group *tg) for_each_possible_cpu(i) { if (tg->cfs_rq) kfree(tg->cfs_rq[i]); - if (tg->se) { - if (tg->se[i]) - remove_entity_load_avg(tg->se[i]); + if (tg->se) kfree(tg->se[i]); - } } kfree(tg->cfs_rq); @@ -8296,21 +8293,29 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) return 0; } -void unregister_fair_sched_group(struct task_group *tg, int cpu) +void unregister_fair_sched_group(struct task_group *tg) { - struct rq *rq = cpu_rq(cpu); unsigned long flags; + struct rq *rq; + int cpu; - /* - * Only empty task groups can be destroyed; so we can speculatively - * check on_list without danger of it being re-added. - */ - if (!tg->cfs_rq[cpu]->on_list) - return; + for_each_possible_cpu(cpu) { + if (tg->se[cpu]) + remove_entity_load_avg(tg->se[cpu]); - raw_spin_lock_irqsave(&rq->lock, flags); - list_del_leaf_cfs_rq(tg->cfs_rq[cpu]); - raw_spin_unlock_irqrestore(&rq->lock, flags); + /* + * Only empty task groups can be destroyed; so we can speculatively + * check on_list without danger of it being re-added. + */ + if (!tg->cfs_rq[cpu]->on_list) + continue; + + rq = cpu_rq(cpu); + + raw_spin_lock_irqsave(&rq->lock, flags); + list_del_leaf_cfs_rq(tg->cfs_rq[cpu]); + raw_spin_unlock_irqrestore(&rq->lock, flags); + } } void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 837bcd383cda..492478bb717c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -313,7 +313,7 @@ extern int tg_nop(struct task_group *tg, void *data); extern void free_fair_sched_group(struct task_group *tg); extern int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent); -extern void unregister_fair_sched_group(struct task_group *tg, int cpu); +extern void unregister_fair_sched_group(struct task_group *tg); extern void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq, struct sched_entity *se, int cpu, struct sched_entity *parent);