From patchwork Thu Jan 3 23:44:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10747857 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 433B317D2 for ; Thu, 3 Jan 2019 23:44:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32A4E24603 for ; Thu, 3 Jan 2019 23:44:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 264CF2621B; Thu, 3 Jan 2019 23:44:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C82C324603 for ; Thu, 3 Jan 2019 23:44:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728940AbfACXoF (ORCPT ); Thu, 3 Jan 2019 18:44:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57166 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728937AbfACXoE (ORCPT ); Thu, 3 Jan 2019 18:44:04 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E3C675D61B; Thu, 3 Jan 2019 23:44:03 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-120-244.rdu2.redhat.com [10.10.120.244]) by smtp.corp.redhat.com (Postfix) with ESMTP id DD5C719742; Thu, 3 Jan 2019 23:44:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 1/5] cgroup2: Always apply root flags on successful superblock obtainance From: David Howells To: viro@zeniv.linux.org.uk Cc: Andrei Vagin , avagin@gmail.com, dhowells@redhat.com, linux-fsdevel@vger.kernel.org Date: Thu, 03 Jan 2019 23:44:02 +0000 Message-ID: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 03 Jan 2019 23:44:04 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Always apply the v2 cgroup root flag settings after successfully obtaining a superblock, both when the superblock is new and when an already extant superblock is being shared. Currently there's a bug in commit b3678086951a5 whereby the flags are only changed if the superblock wasn't new. The intention was originally to effect the change for a new superblock by having kernfs_fill_super() call back into the subclassing filesystem, but that never got completed. On further reflection, it's possibly better not to do that so that we don't have to revert the flag change if there's a later failure. This patch is on top of Andrei Vagin's v6 cgroup refcount patch. Fixes: b3678086951a ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context") Reported-by: Al Viro Signed-off-by: David Howells cc: Andrei Vagin --- kernel/cgroup/cgroup.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index a19f0fec9d82..2e5150412ae0 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2044,12 +2044,9 @@ int cgroup_do_get_tree(struct fs_context *fc) fc->root = nsdentry; } + if (ctx->version == 2) + apply_cgroup_root_flags(ctx->flags); ret = 0; - if (ctx->kfc.new_sb_created) - goto out_cgrp; - apply_cgroup_root_flags(ctx->flags); - return 0; - out_cgrp: return ret; } From patchwork Thu Jan 3 23:44:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10747859 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0D1617D2 for ; Thu, 3 Jan 2019 23:44:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFF7824603 for ; Thu, 3 Jan 2019 23:44:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D43FD2621B; Thu, 3 Jan 2019 23:44:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75A3024603 for ; Thu, 3 Jan 2019 23:44:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728937AbfACXoM (ORCPT ); Thu, 3 Jan 2019 18:44:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45852 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728829AbfACXoL (ORCPT ); Thu, 3 Jan 2019 18:44:11 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D214583F3C; Thu, 3 Jan 2019 23:44:10 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-120-244.rdu2.redhat.com [10.10.120.244]) by smtp.corp.redhat.com (Postfix) with ESMTP id E575D600C9; Thu, 3 Jan 2019 23:44:09 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 2/5] kernfs: Provide a fill_super hook for the subclass filesystem From: David Howells To: viro@zeniv.linux.org.uk Cc: avagin@gmail.com, dhowells@redhat.com, linux-fsdevel@vger.kernel.org Date: Thu, 03 Jan 2019 23:44:09 +0000 Message-ID: <154655904912.3032.14601659851271963173.stgit@warthog.procyon.org.uk> In-Reply-To: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> References: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 03 Jan 2019 23:44:10 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Provide a fill_super hook for a subclass filesystem to use to get a reference on the resource it attaches to struct kernfs_super_info::root. Without this, error handling becomes tricky in the event that, say, kernfs_get_inode() fails in kernfs_fill_super() as the superblock destructor will be invoked before kernfs_get_tree() returns. Fixes: b3678086951a ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context") Signed-off-by: David Howells --- fs/kernfs/mount.c | 13 +++++++++++-- include/linux/kernfs.h | 1 + 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index 3b61a4bb02c4..f04173d29845 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -217,11 +217,13 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn, } while (true); } -static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *kfc) +static int kernfs_fill_super(struct super_block *sb, struct fs_context *fc) { + struct kernfs_fs_context *kfc = fc->fs_private; struct kernfs_super_info *info = kernfs_info(sb); struct inode *inode; struct dentry *root; + int ret; info->sb = sb; /* Userspace would break if executables or devices appear on sysfs */ @@ -238,6 +240,12 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k /* sysfs dentries and inodes don't require IO to create */ sb->s_shrink.seeks = 0; + if (kfc->fill_super) { + ret = kfc->fill_super(sb, fc); + if (ret < 0) + return ret; + } + /* get root inode, initialize and unlock it */ mutex_lock(&kernfs_mutex); inode = kernfs_get_inode(sb, info->root->kn); @@ -290,6 +298,7 @@ const void *kernfs_super_ns(struct super_block *sb) /** * kernfs_get_tree - kernfs filesystem access/retrieval helper * @fc: The filesystem context. + * @fill_super: The subclass's superblock initialiser function * * This is to be called from each kernfs user's fs_context->ops->get_tree() * implementation, which should set the specified ->@fs_type and ->@flags, and @@ -321,7 +330,7 @@ int kernfs_get_tree(struct fs_context *fc) kfc->new_sb_created = true; - error = kernfs_fill_super(sb, kfc); + error = kernfs_fill_super(sb, fc); if (error) { deactivate_locked_super(sb); return error; diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 60ae0f862a79..a68e3fdd428c 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -277,6 +277,7 @@ struct kernfs_fs_context { struct kernfs_root *root; /* Root of the hierarchy being mounted */ void *ns_tag; /* Namespace tag of the mount (or NULL) */ unsigned long magic; /* File system specific magic number */ + int (*fill_super)(struct super_block *sb, struct fs_context *fc); /* The following are set/used by kernfs_mount() */ bool new_sb_created; /* Set to T if we allocated a new sb */ From patchwork Thu Jan 3 23:44:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10747861 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C006746 for ; Thu, 3 Jan 2019 23:44:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AF7D260CD for ; Thu, 3 Jan 2019 23:44:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3F4D52623C; Thu, 3 Jan 2019 23:44:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CC2F260CD for ; Thu, 3 Jan 2019 23:44:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728942AbfACXoT (ORCPT ); Thu, 3 Jan 2019 18:44:19 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57860 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728941AbfACXoS (ORCPT ); Thu, 3 Jan 2019 18:44:18 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EA1C37F6C8; Thu, 3 Jan 2019 23:44:17 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-120-244.rdu2.redhat.com [10.10.120.244]) by smtp.corp.redhat.com (Postfix) with ESMTP id D9597600C9; Thu, 3 Jan 2019 23:44:16 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 3/5] cgroup: Fix refcounting From: David Howells To: viro@zeniv.linux.org.uk Cc: avagin@gmail.com, dhowells@redhat.com, linux-fsdevel@vger.kernel.org Date: Thu, 03 Jan 2019 23:44:16 +0000 Message-ID: <154655905608.3032.5762393419161551596.stgit@warthog.procyon.org.uk> In-Reply-To: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> References: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 03 Jan 2019 23:44:18 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Fix cgroup refcounting by the following means: (1) Don't use PERCPU_REF_INIT_DEAD and percpu_ref_reinit(). Using this causes a problem should kernfs_get_tree() create a superblock and then fail to create a root inode. The superblock destructor will be invoked before kernfs_get_tree() returns - and the refcount isn't reinit'd till after that. To this end, cgroup_setup_root() no longer needs a ref_flags argument. (2) Provide a flag, CSS_CREATING, that is used to prevent concurrent access to a cgroup root that is being set up and for which the superblock is still being set up. This appears to be necessary to hide the fact that the cgroup is accessible before the superblock is fully initialised. (3) Set CSS_CREATING in cgroup1_get_tree() on a new cgroup_root. This is then cleared in cgroup_do_get_tree(). (4) cgroup_get_tree() is made to call cgroup_get() on the root it sets for a v2 cgroup. Admittedly, this doesn't do anything because CSS_NO_REF is set, but it future proofs it in case this is changed in future. (5) cgroup_fill_super() is created and passed to kernfs_get_tree() in the kernfs_fs_context struct. This takes an extra ref on the root for the superblock in the event that a superblock is created. struct cgroup_fs_context::root then holds a single ref on the root right through till it is freed. Note that new_root is transferred into the cgroup_fs_context as is_new_root, though this is probably unnecessary as it's only used to clear CSS_CREATING - and no one else can gain access to the root until we've cleared the flag. Fixes: b3678086951a ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context") Signed-off-by: David Howells Tested-by: Andrei Vagin Reported-by: Andrei Vagin --- include/linux/cgroup-defs.h | 1 + include/linux/cgroup.h | 4 ++++ kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup-v1.c | 25 +++++++++---------------- kernel/cgroup/cgroup.c | 24 +++++++++++++++++++++--- 5 files changed, 37 insertions(+), 20 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 5e1694fe035b..f5da2396a809 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -52,6 +52,7 @@ enum { CSS_RELEASED = (1 << 2), /* refcnt reached zero, released */ CSS_VISIBLE = (1 << 3), /* css is visible to userland */ CSS_DYING = (1 << 4), /* css is dying */ + CSS_CREATING = (1 << 5), /* Root css is being constructed */ }; /* bits in struct cgroup flags field */ diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index bb0c7da50ed2..5708ad663572 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -333,6 +333,8 @@ static inline void css_get_many(struct cgroup_subsys_state *css, unsigned int n) */ static inline bool css_tryget(struct cgroup_subsys_state *css) { + if (css->flags & CSS_CREATING) + return false; if (!(css->flags & CSS_NO_REF)) return percpu_ref_tryget(&css->refcnt); return true; @@ -350,6 +352,8 @@ static inline bool css_tryget(struct cgroup_subsys_state *css) */ static inline bool css_tryget_online(struct cgroup_subsys_state *css) { + if (css->flags & CSS_CREATING) + return false; if (!(css->flags & CSS_NO_REF)) return percpu_ref_tryget_live(&css->refcnt); return true; diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index ff86943ea1c8..b22c3d95d8eb 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -46,6 +46,7 @@ struct cgroup_fs_context { unsigned int flags; /* CGRP_ROOT_* flags */ /* cgroup1 bits */ + bool is_new_root; /* ->root is new and needs refcnt init */ bool cpuset_clone_children; bool none; /* User explicitly requested empty subsystem */ bool all_ss; /* Seen 'all' option */ @@ -214,7 +215,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen, void cgroup_free_root(struct cgroup_root *root); void init_cgroup_root(struct cgroup_fs_context *ctx); -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags); +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask); int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask); int cgroup_do_get_tree(struct fs_context *fc); diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 4b189e821cad..0fbbde86a64d 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -1156,7 +1156,6 @@ int cgroup1_get_tree(struct fs_context *fc) struct cgroup_root *root; struct cgroup_subsys *ss; int i, ret = cgroup1_validate(fc); - bool new_root = false; if (ret) return ret; @@ -1261,12 +1260,19 @@ int cgroup1_get_tree(struct fs_context *fc) ret = -ENOMEM; goto err_unlock; } - new_root = true; ctx->root = root; + ctx->is_new_root = true; init_cgroup_root(ctx); - ret = cgroup_setup_root(root, ctx->subsys_mask, PERCPU_REF_INIT_DEAD); + /* + * There's a race window after we release cgroup_mutex and before + * allocating a superblock. Make sure a concurrent process won't be + * able to re-use the root during this window by setting CSS_CREATING. + */ + root->cgrp.self.flags |= CSS_CREATING; + + ret = cgroup_setup_root(root, ctx->subsys_mask); if (ret) goto err_unlock; @@ -1275,19 +1281,6 @@ int cgroup1_get_tree(struct fs_context *fc) ret = cgroup_do_get_tree(fc); - /* - * There's a race window after we release cgroup_mutex and before - * allocating a superblock. Make sure a concurrent process won't - * be able to re-use the root during this window by delaying the - * initialization of root refcnt. - */ - if (new_root) { - mutex_lock(&cgroup_mutex); - percpu_ref_reinit(&root->cgrp.self.refcnt); - mutex_unlock(&cgroup_mutex); - cgroup_get(&root->cgrp); - } - /* * If @pinned_sb, we're reusing an existing root and holding an * extra ref on its sb. Mount is complete. Put the extra ref. diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 2e5150412ae0..091e7eca3661 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1909,7 +1909,7 @@ void init_cgroup_root(struct cgroup_fs_context *ctx) set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags); } -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags) +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) { LIST_HEAD(tmp_links); struct cgroup *root_cgrp = &root->cgrp; @@ -1926,7 +1926,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags) root_cgrp->ancestor_ids[0] = ret; ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, - ref_flags, GFP_KERNEL); + 0, GFP_KERNEL); if (ret) goto out; @@ -2010,12 +2010,23 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags) return ret; } +static int cgroup_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct cgroup_fs_context *ctx = cgroup_fc2context(fc); + + mutex_lock(&cgroup_mutex); + cgroup_get(&ctx->root->cgrp); + mutex_unlock(&cgroup_mutex); + return 0; +} + int cgroup_do_get_tree(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); int ret; ctx->kfc.root = ctx->root->kf_root; + ctx->kfc.fill_super = cgroup_fill_super; ret = kernfs_get_tree(fc); if (ret < 0) @@ -2046,6 +2057,12 @@ int cgroup_do_get_tree(struct fs_context *fc) if (ctx->version == 2) apply_cgroup_root_flags(ctx->flags); + + if (ctx->is_new_root) { + mutex_lock(&cgroup_mutex); + ctx->root->cgrp.self.flags &= ~CSS_CREATING; + mutex_unlock(&cgroup_mutex); + } ret = 0; out_cgrp: return ret; @@ -2071,6 +2088,7 @@ static int cgroup_get_tree(struct fs_context *fc) cgroup_get_live(&cgrp_dfl_root.cgrp); ctx->root = &cgrp_dfl_root; + cgroup_get(&ctx->root->cgrp); return cgroup_do_get_tree(fc); default: @@ -5420,7 +5438,7 @@ int __init cgroup_init(void) hash_add(css_set_table, &init_css_set.hlist, css_set_hash(init_css_set.subsys)); - BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0)); + BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0)); mutex_unlock(&cgroup_mutex); From patchwork Thu Jan 3 23:44:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10747863 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25F6B746 for ; Thu, 3 Jan 2019 23:44:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1442925EA6 for ; Thu, 3 Jan 2019 23:44:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 083322621B; Thu, 3 Jan 2019 23:44:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 588F2260CD for ; Thu, 3 Jan 2019 23:44:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728945AbfACXoZ (ORCPT ); Thu, 3 Jan 2019 18:44:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35828 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726039AbfACXoZ (ORCPT ); Thu, 3 Jan 2019 18:44:25 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E9DEBCD4CA; Thu, 3 Jan 2019 23:44:24 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-120-244.rdu2.redhat.com [10.10.120.244]) by smtp.corp.redhat.com (Postfix) with ESMTP id E668717556; Thu, 3 Jan 2019 23:44:23 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 4/5] cgroup: refcount fixes From: David Howells To: viro@zeniv.linux.org.uk Cc: avagin@gmail.com, dhowells@redhat.com, linux-fsdevel@vger.kernel.org Date: Thu, 03 Jan 2019 23:44:23 +0000 Message-ID: <154655906318.3032.11515432049404623139.stgit@warthog.procyon.org.uk> In-Reply-To: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> References: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 03 Jan 2019 23:44:25 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP --- include/linux/cgroup-defs.h | 2 ++ include/linux/cgroup.h | 32 ++++++++++++++++++++++++-------- kernel/cgroup/cgroup-internal.h | 1 + kernel/cgroup/cgroup-v1.c | 6 ++++-- kernel/cgroup/cgroup.c | 26 ++++++++++++++++++++++---- 5 files changed, 53 insertions(+), 14 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index f5da2396a809..34a284ef3a78 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -166,6 +166,8 @@ struct cgroup_subsys_state { * fields of the containing structure. */ struct cgroup_subsys_state *parent; + + bool debug; }; /* diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 5708ad663572..72c076920072 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -295,6 +295,8 @@ void css_task_iter_end(struct css_task_iter *it); * Inline functions. */ +extern void css_refcount(struct cgroup_subsys_state *css, const char *op); + /** * css_get - obtain a reference on the specified css * @css: target css @@ -303,8 +305,10 @@ void css_task_iter_end(struct css_task_iter *it); */ static inline void css_get(struct cgroup_subsys_state *css) { - if (!(css->flags & CSS_NO_REF)) + if (!(css->flags & CSS_NO_REF)) { percpu_ref_get(&css->refcnt); + css_refcount(css, "GET"); + } } /** @@ -316,8 +320,10 @@ static inline void css_get(struct cgroup_subsys_state *css) */ static inline void css_get_many(struct cgroup_subsys_state *css, unsigned int n) { - if (!(css->flags & CSS_NO_REF)) + if (!(css->flags & CSS_NO_REF)) { percpu_ref_get_many(&css->refcnt, n); + css_refcount(css, "GETM"); + } } /** @@ -335,8 +341,11 @@ static inline bool css_tryget(struct cgroup_subsys_state *css) { if (css->flags & CSS_CREATING) return false; - if (!(css->flags & CSS_NO_REF)) - return percpu_ref_tryget(&css->refcnt); + if (css->flags & CSS_NO_REF) + return true; + if (!percpu_ref_tryget(&css->refcnt)) + return false; + css_refcount(css, "TRY"); return true; } @@ -354,8 +363,11 @@ static inline bool css_tryget_online(struct cgroup_subsys_state *css) { if (css->flags & CSS_CREATING) return false; - if (!(css->flags & CSS_NO_REF)) - return percpu_ref_tryget_live(&css->refcnt); + if (css->flags & CSS_NO_REF) + return true; + if (!percpu_ref_tryget_live(&css->refcnt)) + return false; + css_refcount(css, "TRYO"); return true; } @@ -387,8 +399,10 @@ static inline bool css_is_dying(struct cgroup_subsys_state *css) */ static inline void css_put(struct cgroup_subsys_state *css) { - if (!(css->flags & CSS_NO_REF)) + if (!(css->flags & CSS_NO_REF)) { percpu_ref_put(&css->refcnt); + css_refcount(css, "PUT"); + } } /** @@ -400,8 +414,10 @@ static inline void css_put(struct cgroup_subsys_state *css) */ static inline void css_put_many(struct cgroup_subsys_state *css, unsigned int n) { - if (!(css->flags & CSS_NO_REF)) + if (!(css->flags & CSS_NO_REF)) { percpu_ref_put_many(&css->refcnt, n); + css_refcount(css, "PUTM"); + } } static inline void cgroup_get(struct cgroup *cgrp) diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index b22c3d95d8eb..afe417c44602 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -46,6 +46,7 @@ struct cgroup_fs_context { unsigned int flags; /* CGRP_ROOT_* flags */ /* cgroup1 bits */ + bool debug; bool is_new_root; /* ->root is new and needs refcnt init */ bool cpuset_clone_children; bool none; /* User explicitly requested empty subsystem */ diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 0fbbde86a64d..281930390c6f 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -1033,6 +1033,8 @@ int cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param) return cg_invalf(fc, "cgroup1: name respecified"); ctx->name = param->string; param->string = NULL; + if (strcmp(ctx->name, "xxxxy") == 0) + ctx->debug = true; return 0; } @@ -1174,7 +1176,7 @@ int cgroup1_get_tree(struct fs_context *fc) ss->root == &cgrp_dfl_root) continue; - if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) { + if (!css_tryget_online(&ss->root->cgrp.self)) { mutex_unlock(&cgroup_mutex); goto err_restart; } @@ -1227,7 +1229,7 @@ int cgroup1_get_tree(struct fs_context *fc) */ pinned_sb = kernfs_pin_sb(root->kf_root, NULL); if (IS_ERR(pinned_sb) || - !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { + !css_tryget_online(&root->cgrp.self)) { mutex_unlock(&cgroup_mutex); if (!IS_ERR_OR_NULL(pinned_sb)) deactivate_super(pinned_sb); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 091e7eca3661..a25c6dee8515 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1907,6 +1907,8 @@ void init_cgroup_root(struct cgroup_fs_context *ctx) strscpy(root->name, ctx->name, MAX_CGROUP_ROOT_NAMELEN); if (ctx->cpuset_clone_children) set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags); + if (ctx->debug) + cgrp->self.debug = true; } int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) @@ -1926,7 +1928,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) root_cgrp->ancestor_ids[0] = ret; ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, - 0, GFP_KERNEL); + PERCPU_REF_INIT_ATOMIC, GFP_KERNEL); if (ret) goto out; @@ -2025,6 +2027,8 @@ int cgroup_do_get_tree(struct fs_context *fc) struct cgroup_fs_context *ctx = cgroup_fc2context(fc); int ret; + printk("*** cgroup_do_get_tree()\n"); + ctx->kfc.root = ctx->root->kf_root; ctx->kfc.fill_super = cgroup_fill_super; @@ -2072,6 +2076,8 @@ static int cgroup_get_tree(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); + printk("*** cgroup_get_tree()\n"); + /* * The first time anyone tries to mount a cgroup, enable the list * linking each css_set to its tasks and fix up all existing tasks. @@ -2169,8 +2175,10 @@ static void cgroup_kill_sb(struct super_block *sb) if (!list_empty(&root->cgrp.self.children) || root == &cgrp_dfl_root) cgroup_put(&root->cgrp); - else + else { + css_refcount(&root->cgrp.self, "KIL"); percpu_ref_kill(&root->cgrp.self.refcnt); + } kernfs_kill_sb(sb); } @@ -4894,7 +4902,7 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, init_and_link_css(css, ss, cgrp); - err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL); + err = percpu_ref_init(&css->refcnt, css_release, PERCPU_REF_INIT_ATOMIC, GFP_KERNEL); if (err) goto err_free_css; @@ -4949,7 +4957,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent) if (!cgrp) return ERR_PTR(-ENOMEM); - ret = percpu_ref_init(&cgrp->self.refcnt, css_release, 0, GFP_KERNEL); + ret = percpu_ref_init(&cgrp->self.refcnt, css_release, PERCPU_REF_INIT_ATOMIC, GFP_KERNEL); if (ret) goto out_free_cgrp; @@ -5197,6 +5205,7 @@ static void kill_css(struct cgroup_subsys_state *css) * Use percpu_ref_kill_and_confirm() to get notifications as each * css is confirmed to be seen as killed on all CPUs. */ + css_refcount(css, "KLC"); percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn); } @@ -5281,6 +5290,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp) cgroup1_check_for_release(parent); /* put the base reference */ + css_refcount(&cgrp->self, "KIL"); percpu_ref_kill(&cgrp->self.refcnt); return 0; @@ -6109,3 +6119,11 @@ static int __init cgroup_sysfs_init(void) } subsys_initcall(cgroup_sysfs_init); #endif /* CONFIG_SYSFS */ + +noinline void css_refcount(struct cgroup_subsys_state *css, const char *op) +{ + if (css->debug) + printk("*** %p %s %2ld: %pSR\n", + css, op, atomic_long_read(&css->refcnt.count), + __builtin_return_address(0)); +} From patchwork Thu Jan 3 23:44:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10747865 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B015717D2 for ; Thu, 3 Jan 2019 23:44:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EF4D25EA6 for ; Thu, 3 Jan 2019 23:44:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9397D2623C; Thu, 3 Jan 2019 23:44:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1F8F225EA6 for ; Thu, 3 Jan 2019 23:44:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728973AbfACXod (ORCPT ); Thu, 3 Jan 2019 18:44:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51476 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728956AbfACXoc (ORCPT ); Thu, 3 Jan 2019 18:44:32 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ED730BDF7; Thu, 3 Jan 2019 23:44:31 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-120-244.rdu2.redhat.com [10.10.120.244]) by smtp.corp.redhat.com (Postfix) with ESMTP id E8835608DD; Thu, 3 Jan 2019 23:44:30 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 5/5] percpu: Kill off PERCPU_REF_INIT_DEAD and percpu_ref_reinit() From: David Howells To: viro@zeniv.linux.org.uk Cc: avagin@gmail.com, dhowells@redhat.com, linux-fsdevel@vger.kernel.org Date: Thu, 03 Jan 2019 23:44:30 +0000 Message-ID: <154655907017.3032.3397037242037017828.stgit@warthog.procyon.org.uk> In-Reply-To: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> References: <154655904205.3032.4248425887082198759.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 03 Jan 2019 23:44:32 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Kill off PERCPU_REF_INIT_DEAD and percpu_ref_reinit() for lack of users. They were being used in cgroups, but this was causing problems with the refcounting in the event that kernfs_get_tree() created a superblock and then failed with the refcount not fully initialised. Signed-off-by: David Howells --- include/linux/percpu-refcount.h | 7 ------- lib/percpu-refcount.c | 31 +++---------------------------- 2 files changed, 3 insertions(+), 35 deletions(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index b297cd1cd4f1..fdd69b99f331 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -77,12 +77,6 @@ enum { * percpu_ref_switch_to_percpu() is invoked on it. */ PERCPU_REF_INIT_ATOMIC = 1 << 0, - - /* - * Start dead w/ ref == 0 in atomic mode. Must be revived with - * percpu_ref_reinit() before used. Implies INIT_ATOMIC. - */ - PERCPU_REF_INIT_DEAD = 1 << 1, }; struct percpu_ref { @@ -109,7 +103,6 @@ void percpu_ref_switch_to_percpu(struct percpu_ref *ref); void percpu_ref_kill_and_confirm(struct percpu_ref *ref, percpu_ref_func_t *confirm_kill); void percpu_ref_resurrect(struct percpu_ref *ref); -void percpu_ref_reinit(struct percpu_ref *ref); /** * percpu_ref_kill - drop the initial ref diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index de10b8c0bff6..7a56c36aec5d 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -70,15 +70,12 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release, ref->force_atomic = flags & PERCPU_REF_INIT_ATOMIC; - if (flags & (PERCPU_REF_INIT_ATOMIC | PERCPU_REF_INIT_DEAD)) + if (flags & PERCPU_REF_INIT_ATOMIC) ref->percpu_count_ptr |= __PERCPU_REF_ATOMIC; else start_count += PERCPU_COUNT_BIAS; - if (flags & PERCPU_REF_INIT_DEAD) - ref->percpu_count_ptr |= __PERCPU_REF_DEAD; - else - start_count++; + start_count++; atomic_long_set(&ref->count, start_count); @@ -282,14 +279,11 @@ EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic_sync); * @ref: percpu_ref to switch to percpu mode * * There's no reason to use this function for the usual reference counting. - * To re-use an expired ref, use percpu_ref_reinit(). * * Switch @ref to percpu mode. This function may be invoked concurrently * with all the get/put operations and can safely be mixed with kill and * reinit operations. This function reverses the sticky atomic state set - * by PERCPU_REF_INIT_ATOMIC or percpu_ref_switch_to_atomic(). If @ref is - * dying or dead, the actual switching takes place on the following - * percpu_ref_reinit(). + * by PERCPU_REF_INIT_ATOMIC or percpu_ref_switch_to_atomic(). * * This function may block if @ref is in the process of switching to atomic * mode. If the caller ensures that @ref is not in the process of @@ -343,25 +337,6 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, } EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm); -/** - * percpu_ref_reinit - re-initialize a percpu refcount - * @ref: perpcu_ref to re-initialize - * - * Re-initialize @ref so that it's in the same state as when it finished - * percpu_ref_init() ignoring %PERCPU_REF_INIT_DEAD. @ref must have been - * initialized successfully and reached 0 but not exited. - * - * Note that percpu_ref_tryget[_live]() are safe to perform on @ref while - * this function is in progress. - */ -void percpu_ref_reinit(struct percpu_ref *ref) -{ - WARN_ON_ONCE(!percpu_ref_is_zero(ref)); - - percpu_ref_resurrect(ref); -} -EXPORT_SYMBOL_GPL(percpu_ref_reinit); - /** * percpu_ref_resurrect - modify a percpu refcount from dead to live * @ref: perpcu_ref to resurrect