From patchwork Wed Jun 5 13:36:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10976983 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1202F1398 for ; Wed, 5 Jun 2019 13:37:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 034BE268AE for ; Wed, 5 Jun 2019 13:37:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EB4F728915; Wed, 5 Jun 2019 13:37:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5CFEB268AE for ; Wed, 5 Jun 2019 13:37:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D0226B000D; Wed, 5 Jun 2019 09:37:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 34A696B0010; Wed, 5 Jun 2019 09:37:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 212E16B0266; Wed, 5 Jun 2019 09:37:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id DEFE16B000D for ; Wed, 5 Jun 2019 09:37:25 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id bb9so16109394plb.2 for ; Wed, 05 Jun 2019 06:37:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=u+zLCba4ifvorqHECxdb8NWOVORyLACPhu1rUxGfaiU=; b=rxe5VMWcMy19dFb3wP26/SLnLLsZa3F21WV+Tk5b9CZeE1xN8k69ZWhFbGZfSqusAD NitApGESxZS1YyUZXHu3LoUDfwNzdsJVzHS7t6tktejAulWiYrMtLQ3QfZu6sZ2oIwbV Qw0CUCBleVyQ/npHC7ZADzs7hSLu/1JInPpKilAiGbqZm1/nXiItYQyOwhJrbaiasigd gxd6qZ5A1f+nVdlAov08C+B4gGJYb1YyWv23r4XUufeRYok0ke5lFDeG7cJnfEXzD5Dx NoOyyj2THDYrygy8PJf4fGJt8IglKNFcCb3itgYMkZEdvbenA5JGGf5OsQASyT75FDns E0rA== X-Gm-Message-State: APjAAAXsLn2F8zI1jSBvNRf0k4V658bUBOZZUuW7L9VF4fGy2wJnmzqN jp3tK3JBt4VIgjBxk1QH+rG6tmKQukicpzdo/+SrSI/Nkm0GcaN42aI301QfzuY1C08j91ZUbaq Wyc3ppvj2KYRqeb3W6kaUxNPUDXYuSfDBQ81kw4XpwMPTu1kJ0MYC2aF9AUZn2CXlnA== X-Received: by 2002:a63:4f07:: with SMTP id d7mr4424133pgb.77.1559741845358; Wed, 05 Jun 2019 06:37:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqz4bB9YQuA8v/7AJsIX9aMv8pP0blcsKKtvRUA39dmpOIXD26diG0JHIMribfsdhotQSrVp X-Received: by 2002:a63:4f07:: with SMTP id d7mr4424047pgb.77.1559741844462; Wed, 05 Jun 2019 06:37:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559741844; cv=none; d=google.com; s=arc-20160816; b=sOjJicTt6i4H39+FDZBfgtXTKoUvMW4GU+ssb9nx/jWe0I97lbMjSzQmt07IcyB7JD vrAOG3IyGQCOTp3/pWElQwK1BhVqSbPOD4Y9nRxslBJ9NWyGDwFFfBucG9Yt5t1bVp7Y UGWzsdWw70XnW1igYKOjXbgbES8Z1ygtUxNvCKnm937N4dcFEG5iwzkaAORSphC46xI1 5V6B6W6AXEHFPXhgZjCsOyefoD2jGSFQRtqJTGdon60xyWiHfsF3X+Myr4pCXUy2eY0U r0eHNpuZKLbI4+ZuJCbOcBoDv9bRvNkGVW+dw+Lxu9dFYMjZCPfUAaCyvpbhVtR/T3bX I5Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=u+zLCba4ifvorqHECxdb8NWOVORyLACPhu1rUxGfaiU=; b=YQJZxUO1902D89tiavA2LqAWRBtI/+SaYC1XL2tZcA9EwA6kn862fZM6f5nHaeCvTu BO+LIjy3j2oExMFiZEoRzf5cYK3OvZeYsbi1xcTN1kyTEpq4vQODg+ZyudUpLpJTGhd8 T9MCuvVpbMirbOidkO66gzufzoHN8YQNfudU5xw9F+AgffIBDXQsoE/wsQ4WgtAGHw5g SD9/y4NOZ9KE0v/kh9s2OOEPfWUvTqX6za8s+PByaV+bBzS1jPykHTSKVVruRJBMYT1a 8F1YnivAgHaUtKtlG3iuwMXMBaMMKBsPjaHov08PBcMfSBLFvHCWJoTNLk/2nfD3IqZg oEjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="04/ccSPI"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id d10si26040888plr.307.2019.06.05.06.37.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 06:37:24 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="04/ccSPI"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DTE1i119488; Wed, 5 Jun 2019 13:37:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=u+zLCba4ifvorqHECxdb8NWOVORyLACPhu1rUxGfaiU=; b=04/ccSPIDB9rw0yaqtcHi0KxfYmVhroln6GrxIIJzj+KGMwFDFX7hLTbB5o3fTLxBF89 f6L2rcGShr+HLDMtx402CIThkjTlkqOfuo5U4Vd+MJrso7BezPsVTOmqkth7JJQ+RTEc kb14qh1piCluKWghlyGA2hLFa4230l30Vnc3+2m6xnNZIVg07MSAiWxocE7bA/4j0m0+ TK9uMGn+1S+nIxB5mvY6N/WRTOKYBwl3FYzT/tWQhVUECUikgTVXITWKP/BEzEeEU74+ 6ZJYd4KTUVHwSCKtYk7FLDVi+4/BwJsFLksq3zUNl7MKDZf0qHnUIp/9vO+eJvVWqjYF RA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 2sugstjn49-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:09 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DZsPL069366; Wed, 5 Jun 2019 13:37:09 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 2swnghw2j9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:08 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x55Db0GP027121; Wed, 5 Jun 2019 13:37:01 GMT Received: from localhost.localdomain (/73.60.114.248) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 05 Jun 2019 06:37:00 -0700 From: Daniel Jordan To: hannes@cmpxchg.org, jiangshanlai@gmail.com, lizefan@huawei.com, tj@kernel.org Cc: bsd@redhat.com, dan.j.williams@intel.com, daniel.m.jordan@oracle.com, dave.hansen@intel.com, juri.lelli@redhat.com, mhocko@kernel.org, peterz@infradead.org, steven.sistare@oracle.com, tglx@linutronix.de, tom.hromatka@oracle.com, vdavydov.dev@gmail.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 1/5] cgroup: add cgroup v2 interfaces to migrate kernel threads Date: Wed, 5 Jun 2019 09:36:46 -0400 Message-Id: <20190605133650.28545-2-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190605133650.28545-1-daniel.m.jordan@oracle.com> References: <20190605133650.28545-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=870 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=902 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Prepare for cgroup aware workqueues by introducing cgroup_attach_kthread and a helper cgroup_attach_kthread_to_dfl_root. A workqueue worker will always migrate itself, so for now use @current in the interfaces to avoid handling task references. Signed-off-by: Daniel Jordan --- include/linux/cgroup.h | 6 ++++++ kernel/cgroup/cgroup.c | 48 +++++++++++++++++++++++++++++++++++++----- 2 files changed, 49 insertions(+), 5 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 81f58b4a5418..ad78784e3692 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -103,6 +103,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry, struct cgroup *cgroup_get_from_path(const char *path); struct cgroup *cgroup_get_from_fd(int fd); +int cgroup_attach_kthread(struct cgroup *dst_cgrp); int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from); @@ -530,6 +531,11 @@ static inline struct cgroup *task_dfl_cgroup(struct task_struct *task) return task_css_set(task)->dfl_cgrp; } +static inline int cgroup_attach_kthread_to_dfl_root(void) +{ + return cgroup_attach_kthread(&cgrp_dfl_root.cgrp); +} + static inline struct cgroup *cgroup_parent(struct cgroup *cgrp) { struct cgroup_subsys_state *parent_css = cgrp->self.parent; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 3f2b4bde0f9c..bc8d6a2e529f 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2771,21 +2771,59 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup) return tsk; } -void cgroup_procs_write_finish(struct task_struct *task) - __releases(&cgroup_threadgroup_rwsem) +static void __cgroup_procs_write_finish(struct task_struct *task) { struct cgroup_subsys *ss; int ssid; - /* release reference from cgroup_procs_write_start() */ - put_task_struct(task); + lockdep_assert_held(&cgroup_mutex); - percpu_up_write(&cgroup_threadgroup_rwsem); for_each_subsys(ss, ssid) if (ss->post_attach) ss->post_attach(); } +void cgroup_procs_write_finish(struct task_struct *task) + __releases(&cgroup_threadgroup_rwsem) +{ + lockdep_assert_held(&cgroup_mutex); + + /* release reference from cgroup_procs_write_start() */ + put_task_struct(task); + + percpu_up_write(&cgroup_threadgroup_rwsem); + __cgroup_procs_write_finish(task); +} + +/** + * cgroup_attach_kthread - attach the current kernel thread to a cgroup + * @dst_cgrp: the cgroup to attach to + * + * The caller is responsible for ensuring @dst_cgrp is valid until this + * function returns. + * + * Return: 0 on success or negative error code. + */ +int cgroup_attach_kthread(struct cgroup *dst_cgrp) +{ + int ret; + + if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD))) + return -EINVAL; + + mutex_lock(&cgroup_mutex); + + percpu_down_write(&cgroup_threadgroup_rwsem); + ret = cgroup_attach_task(dst_cgrp, current, false); + percpu_up_write(&cgroup_threadgroup_rwsem); + + __cgroup_procs_write_finish(current); + + mutex_unlock(&cgroup_mutex); + + return ret; +} + static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask) { struct cgroup_subsys *ss; From patchwork Wed Jun 5 13:36:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10976993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 973B715E6 for ; Wed, 5 Jun 2019 13:37:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8643F2890F for ; Wed, 5 Jun 2019 13:37:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7964628915; Wed, 5 Jun 2019 13:37:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C299E287F9 for ; Wed, 5 Jun 2019 13:37:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 414996B0269; Wed, 5 Jun 2019 09:37:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3753B6B026B; Wed, 5 Jun 2019 09:37:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14F736B026C; Wed, 5 Jun 2019 09:37:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id BA7116B0269 for ; Wed, 5 Jun 2019 09:37:30 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id v62so10237103pgb.0 for ; Wed, 05 Jun 2019 06:37:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=5HG2ykM8vZOQrDb3fIWkzYXdll7yeKv29Vo426kliWk=; b=h2DmprRyd8r32YgXRennl4h8Vf0t3HXRd6SP7uHMd5Epe31FofGHopHi3sfnVNjgwO +58tulIMGih0JLifR5qJ5iAP1qwXqH+7XC8wxxlS/CecKzdqKbzAdLdYoTq6CFMBt1/x 4SgeCIeCfijwIB1WhklxfD7UC+NuOy3yimfBFkhb4gVteEuYHfK7hnbEnV0xIXoiLpSf MppbKGqvmNbYYdv2XfP9WBlZPXu2LA428NBgLEM9yKTAleRJG6ZLx2OtmWEHakDuUZhR /fhYPSb1ndnPvdGm+6ZfjVSBrdVKO+meD4ZgEY5Q8lrltzI7nTC4vhtgb576Z//thiNj TtBA== X-Gm-Message-State: APjAAAVikJbeW5nb7knvqg07pAFmMjtgdGdDX9I9jd5qo5l4JDS4mR9i +7c16oI/S2DpesiaxK89X8uP8w8uJ7q8N22/MHCJNUj2NhTOM+sjNrpZ2qcxA81cWHAlw7IMah4 mb77rEM9cI+tBKqhDPuEAIcEYnHxWRU9y+bNdFZpMbC7JQZzO7gecJ5YD8C2VOMz41w== X-Received: by 2002:a62:5f42:: with SMTP id t63mr46635638pfb.83.1559741850205; Wed, 05 Jun 2019 06:37:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqxdTxzTwB7PXbV76M+2icfWBwDNaAxUwcmIc9yDhSF7NM1JP/csJKelBvhOl5oCNjtiEyqs X-Received: by 2002:a62:5f42:: with SMTP id t63mr46635442pfb.83.1559741848406; Wed, 05 Jun 2019 06:37:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559741848; cv=none; d=google.com; s=arc-20160816; b=yDus3vPi4L9R2VJW2eBUeIvqU0OWRVkrVfvFthETv7YuOtCG2CCsk42NCy/6MB7HOx MzQojVCNKtgnxwBrrGDkww8m5wAAITS6vNHFlNw9JgF+uJm6hFkFBtqel6+zcQSKPC5w ivHPyTTr+PnpBTNAsSUfeBktF1D65vlYyiKI5o6NhQK1sOPllLD0VxLHgCzHjkLJ9vsm Rq6wmiSwIRADaX7k2D3FXrK43s+sbPDHjEJAcz8oWoCd379N8Ju0AtAb2G6+6SUJ1DZG wauXEec5kpgXuUV3SIQjCRUuSJ8g/HaS+c9vS8II3qnhePBIOdrzky+fJ1gHlAgXjxA1 3ifw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=5HG2ykM8vZOQrDb3fIWkzYXdll7yeKv29Vo426kliWk=; b=f7eonj+Q6K/Djng5BQUqAKqG/uFHL1oMDB4bUgQL4xbO/S6wr2Gkg6RwjFcqy1Eu8N r/fSKOOq20Er5u/pcNRy+qCOeWqBvVZhAWVbBpA0tah+/TFgN6itU9S/Cxy+OP1/BjMq z+ZC5BNav4l0tWbb9WJA10S40VbQb4hvwjBh+EVyA8cL67ORRK1mjvveR8PzY+oJaqk1 2yPybk+gK2LvtcjwczzJEv1DXcF6xtOku9v82RmoytOXfcJ1gXCboRtVI3GSh5TcEx1+ Y516MjCdWdJEZ5fj6YiM1ennMslt8PBy5D8Pj6QOmbquNDSfgi513iolzrmiDmL+zW5x 1BIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=wVQiwl4e; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.79 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from aserp2130.oracle.com (aserp2130.oracle.com. [141.146.126.79]) by mx.google.com with ESMTPS id w5si2402160pjt.86.2019.06.05.06.37.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 06:37:28 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.79 as permitted sender) client-ip=141.146.126.79; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=wVQiwl4e; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 141.146.126.79 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DTFtq140137; Wed, 5 Jun 2019 13:37:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=5HG2ykM8vZOQrDb3fIWkzYXdll7yeKv29Vo426kliWk=; b=wVQiwl4eq6RdtcOrUhG2urGoSgRCEtxreZmXzLn1QE4jXW/ch9Hvl80KvnR/POVr2uYQ AhkTGOI5sfKakZfdhdgnCTP+E59MjRhkL8zgQjWgL2Rpp2az2D5MRklVUVeXJj2OpZzS uQwPeZeUVLGLp5XYOODEnHVtcOg9RMCuPEeODOmfWmM+F1mG5ucpUqgB01K0BInsxxjG DDQ5ttWHwS8kZ99Y+ZbzjBGNhqoS9Wvv3nNLdrBKkgO4fmuwHx0b/q9psEYqlGIWRXHc 9ZBWXF0bjHpjQOzpS8Y/z15ywYKpU2Ihe9/V/hJA2wJ2WHlibAYpUgqIa/UmjLU2dVlA wg== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2130.oracle.com with ESMTP id 2suevdjvbg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:09 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DadLx034051; Wed, 5 Jun 2019 13:37:09 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3030.oracle.com with ESMTP id 2swnhc4y3e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:08 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x55Db23H027139; Wed, 5 Jun 2019 13:37:02 GMT Received: from localhost.localdomain (/73.60.114.248) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 05 Jun 2019 06:37:02 -0700 From: Daniel Jordan To: hannes@cmpxchg.org, jiangshanlai@gmail.com, lizefan@huawei.com, tj@kernel.org Cc: bsd@redhat.com, dan.j.williams@intel.com, daniel.m.jordan@oracle.com, dave.hansen@intel.com, juri.lelli@redhat.com, mhocko@kernel.org, peterz@infradead.org, steven.sistare@oracle.com, tglx@linutronix.de, tom.hromatka@oracle.com, vdavydov.dev@gmail.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 2/5] workqueue, cgroup: add cgroup-aware workqueues Date: Wed, 5 Jun 2019 09:36:47 -0400 Message-Id: <20190605133650.28545-3-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190605133650.28545-1-daniel.m.jordan@oracle.com> References: <20190605133650.28545-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Workqueue workers ignore the cgroup of the queueing task, so workers' resource usage normally goes unaccounted, with exceptions such as cgroup writeback, and so can arbitrarily exceed controller limits. Add cgroup awareness to workqueue workers. Do it only for unbound workqueue since these tend to be the most resource intensive. There's a design overview in the cover letter. Signed-off-by: Daniel Jordan --- include/linux/cgroup.h | 11 ++ include/linux/workqueue.h | 85 ++++++++++++ kernel/cgroup/cgroup-internal.h | 1 - kernel/workqueue.c | 236 +++++++++++++++++++++++++++++--- kernel/workqueue_internal.h | 45 ++++++ 5 files changed, 360 insertions(+), 18 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index ad78784e3692..de578e29077b 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -91,6 +91,7 @@ extern struct css_set init_css_set; #define cgroup_subsys_on_dfl(ss) \ static_branch_likely(&ss ## _on_dfl_key) +bool cgroup_on_dfl(const struct cgroup *cgrp); bool css_has_online_children(struct cgroup_subsys_state *css); struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss); struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgroup, @@ -531,6 +532,11 @@ static inline struct cgroup *task_dfl_cgroup(struct task_struct *task) return task_css_set(task)->dfl_cgrp; } +static inline struct cgroup *cgroup_dfl_root(void) +{ + return &cgrp_dfl_root.cgrp; +} + static inline int cgroup_attach_kthread_to_dfl_root(void) { return cgroup_attach_kthread(&cgrp_dfl_root.cgrp); @@ -694,6 +700,11 @@ struct cgroup_subsys_state; struct cgroup; static inline void css_put(struct cgroup_subsys_state *css) {} +static inline void cgroup_put(struct cgroup *cgrp) {} +static inline struct cgroup *task_dfl_cgroup(struct task_struct *task) +{ + return NULL; +} static inline int cgroup_attach_task_all(struct task_struct *from, struct task_struct *t) { return 0; } static inline int cgroupstats_build(struct cgroupstats *stats, diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index b5bc12cc1dde..c200ab5268df 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -14,7 +14,9 @@ #include #include #include +#include +struct cgroup; struct workqueue_struct; struct work_struct; @@ -133,6 +135,13 @@ struct rcu_work { struct workqueue_struct *wq; }; +struct cgroup_work { + struct work_struct work; +#ifdef CONFIG_CGROUPS + struct cgroup *cgroup; +#endif +}; + /** * struct workqueue_attrs - A struct for workqueue attributes. * @@ -157,6 +166,12 @@ struct workqueue_attrs { * doesn't participate in pool hash calculations or equality comparisons. */ bool no_numa; + + /** + * Workers run work items while attached to the work's corresponding + * cgroup. This is a property of both workqueues and worker pools. + */ + bool cgroup_aware; }; static inline struct delayed_work *to_delayed_work(struct work_struct *work) @@ -169,6 +184,11 @@ static inline struct rcu_work *to_rcu_work(struct work_struct *work) return container_of(work, struct rcu_work, work); } +static inline struct cgroup_work *to_cgroup_work(struct work_struct *work) +{ + return container_of(work, struct cgroup_work, work); +} + struct execute_work { struct work_struct work; }; @@ -290,6 +310,12 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; } #define INIT_RCU_WORK_ONSTACK(_work, _func) \ INIT_WORK_ONSTACK(&(_work)->work, (_func)) +#define INIT_CGROUP_WORK(_work, _func) \ + INIT_WORK(&(_work)->work, (_func)) + +#define INIT_CGROUP_WORK_ONSTACK(_work, _func) \ + INIT_WORK_ONSTACK(&(_work)->work, (_func)) + /** * work_pending - Find out whether a work item is currently pending * @work: The work item in question @@ -344,6 +370,14 @@ enum { */ WQ_POWER_EFFICIENT = 1 << 7, + /* + * Workqueue is cgroup-aware. Valid only for WQ_UNBOUND workqueues + * since these work items tend to be the most resource-intensive and + * thus worth the accounting overhead. Only cgroup_work's may be + * queued. + */ + WQ_CGROUP = 1 << 8, + __WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */ __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */ __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ @@ -514,6 +548,57 @@ static inline bool queue_delayed_work(struct workqueue_struct *wq, return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay); } +#ifdef CONFIG_CGROUPS + +extern bool queue_cgroup_work_node(int node, struct workqueue_struct *wq, + struct cgroup_work *cwork, + struct cgroup *cgroup); + +/** + * queue_cgroup_work - queue work to be run in a cgroup + * @wq: workqueue to use + * @cwork: cgroup_work to queue + * @cgroup: cgroup that the worker assigned to @cwork will attach to + * + * A worker serving @wq will run @cwork while attached to @cgroup. + * + * Return: %false if @work was already on a queue, %true otherwise. + */ +static inline bool queue_cgroup_work(struct workqueue_struct *wq, + struct cgroup_work *cwork, + struct cgroup *cgroup) +{ + return queue_cgroup_work_node(NUMA_NO_NODE, wq, cwork, cgroup); +} + +static inline struct cgroup *work_to_cgroup(struct work_struct *work) +{ + return to_cgroup_work(work)->cgroup; +} + +#else /* CONFIG_CGROUPS */ + +static inline bool queue_cgroup_work_node(int node, struct workqueue_struct *wq, + struct cgroup_work *cwork, + struct cgroup *cgroup) +{ + return queue_work_node(node, wq, &cwork->work); +} + +static inline bool queue_cgroup_work(struct workqueue_struct *wq, + struct cgroup_work *cwork, + struct cgroup *cgroup) +{ + return queue_work_node(NUMA_NO_NODE, wq, &cwork->work); +} + +static inline struct cgroup *work_to_cgroup(struct work_struct *work) +{ + return NULL; +} + +#endif /* CONFIG_CGROUPS */ + /** * mod_delayed_work - modify delay of or queue a delayed work * @wq: workqueue to use diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 30e39f3932ad..575ca2d0a7bc 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -200,7 +200,6 @@ static inline void get_css_set(struct css_set *cset) } bool cgroup_ssid_enabled(int ssid); -bool cgroup_on_dfl(const struct cgroup *cgrp); bool cgroup_is_thread_root(struct cgroup *cgrp); bool cgroup_is_threaded(struct cgroup *cgrp); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 51aa010d728e..89b90899bc09 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -49,6 +49,7 @@ #include #include #include +#include #include "workqueue_internal.h" @@ -80,6 +81,11 @@ enum { WORKER_UNBOUND = 1 << 7, /* worker is unbound */ WORKER_REBOUND = 1 << 8, /* worker was rebound */ WORKER_NICED = 1 << 9, /* worker's nice was adjusted */ +#ifdef CONFIG_CGROUPS + WORKER_CGROUP = 1 << 10, /* worker is cgroup-aware */ +#else + WORKER_CGROUP = 0, /* eliminate branches */ +#endif WORKER_NOT_RUNNING = WORKER_PREP | WORKER_CPU_INTENSIVE | WORKER_UNBOUND | WORKER_REBOUND, @@ -106,6 +112,9 @@ enum { HIGHPRI_NICE_LEVEL = MIN_NICE, WQ_NAME_LEN = 24, + + /* flags for __queue_work */ + QUEUE_WORK_CGROUP = 1, }; /* @@ -1214,6 +1223,8 @@ static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color) * @work: work item to steal * @is_dwork: @work is a delayed_work * @flags: place to store irq state + * @is_cwork: set to %true if @work is a cgroup_work and PENDING is stolen + * (ret == 1) * * Try to grab PENDING bit of @work. This function can handle @work in any * stable state - idle, on timer or on worklist. @@ -1237,7 +1248,7 @@ static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color) * This function is safe to call from any context including IRQ handler. */ static int try_to_grab_pending(struct work_struct *work, bool is_dwork, - unsigned long *flags) + unsigned long *flags, bool *is_cwork) { struct worker_pool *pool; struct pool_workqueue *pwq; @@ -1297,6 +1308,8 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork, /* work->data points to pwq iff queued, point to pool */ set_work_pool_and_keep_pending(work, pool->id); + if (unlikely(is_cwork && (pwq->wq->flags & WQ_CGROUP))) + *is_cwork = true; spin_unlock(&pool->lock); return 1; @@ -1394,7 +1407,7 @@ static int wq_select_unbound_cpu(int cpu) } static void __queue_work(int cpu, struct workqueue_struct *wq, - struct work_struct *work) + struct work_struct *work, int flags) { struct pool_workqueue *pwq; struct worker_pool *last_pool; @@ -1416,6 +1429,12 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, if (unlikely(wq->flags & __WQ_DRAINING) && WARN_ON_ONCE(!is_chained_work(wq))) return; + + /* not allowed to queue regular works on a cgroup-aware workqueue */ + if (unlikely(wq->flags & WQ_CGROUP) && + WARN_ON_ONCE(!(flags & QUEUE_WORK_CGROUP))) + return; + retry: if (req_cpu == WORK_CPU_UNBOUND) cpu = wq_select_unbound_cpu(raw_smp_processor_id()); @@ -1516,7 +1535,7 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq, local_irq_save(flags); if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) { - __queue_work(cpu, wq, work); + __queue_work(cpu, wq, work, 0); ret = true; } @@ -1600,7 +1619,7 @@ bool queue_work_node(int node, struct workqueue_struct *wq, if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) { int cpu = workqueue_select_cpu_near(node); - __queue_work(cpu, wq, work); + __queue_work(cpu, wq, work, 0); ret = true; } @@ -1614,7 +1633,7 @@ void delayed_work_timer_fn(struct timer_list *t) struct delayed_work *dwork = from_timer(dwork, t, timer); /* should have been called from irqsafe timer with irq already off */ - __queue_work(dwork->cpu, dwork->wq, &dwork->work); + __queue_work(dwork->cpu, dwork->wq, &dwork->work, 0); } EXPORT_SYMBOL(delayed_work_timer_fn); @@ -1636,7 +1655,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, * on that there's no such delay when @delay is 0. */ if (!delay) { - __queue_work(cpu, wq, &dwork->work); + __queue_work(cpu, wq, &dwork->work, 0); return; } @@ -1706,7 +1725,7 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, int ret; do { - ret = try_to_grab_pending(&dwork->work, true, &flags); + ret = try_to_grab_pending(&dwork->work, true, &flags, NULL); } while (unlikely(ret == -EAGAIN)); if (likely(ret >= 0)) { @@ -1725,7 +1744,7 @@ static void rcu_work_rcufn(struct rcu_head *rcu) /* read the comment in __queue_work() */ local_irq_disable(); - __queue_work(WORK_CPU_UNBOUND, rwork->wq, &rwork->work); + __queue_work(WORK_CPU_UNBOUND, rwork->wq, &rwork->work, 0); local_irq_enable(); } @@ -1753,6 +1772,129 @@ bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork) } EXPORT_SYMBOL(queue_rcu_work); +#ifdef CONFIG_CGROUPS + +/** + * queue_cgroup_work_node - queue work to be run in a cgroup on a specific node + * @node: node to execute work on + * @wq: workqueue to use + * @cwork: work to queue + * @cgroup: cgroup that the assigned worker should attach to + * + * Queue @cwork to be run by a worker attached to @cgroup. + * + * It is the caller's responsibility to ensure @cgroup is valid until this + * function returns. + * + * Supports cgroup v2 only. If @cgroup is on a v1 hierarchy, the assigned + * worker runs in the root of the default hierarchy. + * + * Return: %false if @work was already on a queue, %true otherwise. + */ +bool queue_cgroup_work_node(int node, struct workqueue_struct *wq, + struct cgroup_work *cwork, struct cgroup *cgroup) +{ + bool ret = false; + unsigned long flags; + + if (WARN_ON_ONCE(!(wq->flags & WQ_CGROUP))) + return ret; + + local_irq_save(flags); + + if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, + work_data_bits(&cwork->work))) { + int cpu = workqueue_select_cpu_near(node); + + if (cgroup_on_dfl(cgroup)) + cwork->cgroup = cgroup; + else + cwork->cgroup = cgroup_dfl_root(); + + /* + * cgroup_put happens after a worker is assigned to @work and + * migrated into @cgroup, or @work is cancelled. + */ + cgroup_get(cwork->cgroup); + __queue_work(cpu, wq, &cwork->work, QUEUE_WORK_CGROUP); + ret = true; + } + + local_irq_restore(flags); + return ret; +} + +static inline bool worker_in_child_cgroup(struct worker *worker) +{ + return (worker->flags & WORKER_CGROUP) && cgroup_parent(worker->cgroup); +} + +static void attach_worker_to_dfl_root(struct worker *worker) +{ + int ret; + + if (!worker_in_child_cgroup(worker)) + return; + + ret = cgroup_attach_kthread_to_dfl_root(); + if (ret == 0) { + rcu_read_lock(); + worker->cgroup = task_dfl_cgroup(worker->task); + rcu_read_unlock(); + } else { + /* + * TODO Modify the cgroup migration path to guarantee that a + * kernel thread can successfully migrate to the default root + * cgroup. + */ + WARN_ONCE(1, "can't migrate %s to dfl root (%d)\n", + current->comm, ret); + } +} + +/** + * attach_worker_to_cgroup - attach worker to work's corresponding cgroup + * @worker: worker thread to attach + * @work: work used to decide which cgroup to attach to + * + * Attach a cgroup-aware worker to work's corresponding cgroup. + */ +static void attach_worker_to_cgroup(struct worker *worker, + struct work_struct *work) +{ + struct cgroup_work *cwork; + struct cgroup *cgroup; + + if (!(worker->flags & WORKER_CGROUP)) + return; + + cwork = to_cgroup_work(work); + + if (unlikely(is_wq_barrier_cgroup(cwork))) + return; + + cgroup = cwork->cgroup; + + if (cgroup == worker->cgroup) + goto out; + + if (cgroup_attach_kthread(cgroup) == 0) { + worker->cgroup = cgroup; + } else { + /* + * Attach failed, so attach to the default root so the + * work isn't accounted to an unrelated cgroup. + */ + attach_worker_to_dfl_root(worker); + } + +out: + /* Pairs with cgroup_get in queue_cgroup_work_node. */ + cgroup_put(cgroup); +} + +#endif /* CONFIG_CGROUPS */ + /** * worker_enter_idle - enter idle state * @worker: worker which is entering idle state @@ -1934,6 +2076,12 @@ static struct worker *create_worker(struct worker_pool *pool) set_user_nice(worker->task, pool->attrs->nice); kthread_bind_mask(worker->task, pool->attrs->cpumask); + if (pool->attrs->cgroup_aware) { + rcu_read_lock(); + worker->cgroup = task_dfl_cgroup(worker->task); + rcu_read_unlock(); + worker->flags |= WORKER_CGROUP; + } /* successful, attach the worker to the pool */ worker_attach_to_pool(worker, pool); @@ -2242,6 +2390,8 @@ __acquires(&pool->lock) spin_unlock_irq(&pool->lock); + attach_worker_to_cgroup(worker, work); + lock_map_acquire(&pwq->wq->lockdep_map); lock_map_acquire(&lockdep_map); /* @@ -2434,6 +2584,21 @@ static int worker_thread(void *__worker) } } while (keep_working(pool)); + /* + * Migrate a worker attached to a non-root cgroup to the root so a + * sleeping worker won't cause cgroup_rmdir to fail indefinitely. + * + * XXX Should probably also modify cgroup core so that cgroup_rmdir + * fails only if there are user (i.e. non-kthread) tasks in a cgroup; + * otherwise, long-running workers can still cause cgroup_rmdir to fail + * and userspace can't do anything other than wait. + */ + if (worker_in_child_cgroup(worker)) { + spin_unlock_irq(&pool->lock); + attach_worker_to_dfl_root(worker); + spin_lock_irq(&pool->lock); + } + worker_set_flags(worker, WORKER_PREP); sleep: /* @@ -2619,7 +2784,10 @@ static void check_flush_dependency(struct workqueue_struct *target_wq, } struct wq_barrier { - struct work_struct work; + union { + struct work_struct work; + struct cgroup_work cwork; + }; struct completion done; struct task_struct *task; /* purely informational */ }; @@ -2660,6 +2828,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq, { struct list_head *head; unsigned int linked = 0; + struct work_struct *barr_work; /* * debugobject calls are safe here even with pool->lock locked @@ -2667,8 +2836,17 @@ static void insert_wq_barrier(struct pool_workqueue *pwq, * checks and call back into the fixup functions where we * might deadlock. */ - INIT_WORK_ONSTACK(&barr->work, wq_barrier_func); - __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work)); + + if (unlikely(pwq->wq->flags & WQ_CGROUP)) { + barr_work = &barr->cwork.work; + INIT_CGROUP_WORK_ONSTACK(&barr->cwork, wq_barrier_func); + set_wq_barrier_cgroup(&barr->cwork); + } else { + barr_work = &barr->work; + INIT_WORK_ONSTACK(barr_work, wq_barrier_func); + } + + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(barr_work)); init_completion_map(&barr->done, &target->lockdep_map); @@ -2689,8 +2867,8 @@ static void insert_wq_barrier(struct pool_workqueue *pwq, __set_bit(WORK_STRUCT_LINKED_BIT, bits); } - debug_work_activate(&barr->work); - insert_work(pwq, &barr->work, head, + debug_work_activate(barr_work); + insert_work(pwq, barr_work, head, work_color_to_flags(WORK_NO_COLOR) | linked); } @@ -3171,10 +3349,11 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) { static DECLARE_WAIT_QUEUE_HEAD(cancel_waitq); unsigned long flags; + bool is_cwork = false; int ret; do { - ret = try_to_grab_pending(work, is_dwork, &flags); + ret = try_to_grab_pending(work, is_dwork, &flags, &is_cwork); /* * If someone else is already canceling, wait for it to * finish. flush_work() doesn't work for PREEMPT_NONE @@ -3210,6 +3389,10 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) mark_work_canceling(work); local_irq_restore(flags); + /* PENDING stolen, so drop the cgroup ref from queueing @work. */ + if (ret == 1 && is_cwork) + cgroup_put(work_to_cgroup(work)); + /* * This allows canceling during early boot. We know that @work * isn't executing. @@ -3271,7 +3454,7 @@ bool flush_delayed_work(struct delayed_work *dwork) { local_irq_disable(); if (del_timer_sync(&dwork->timer)) - __queue_work(dwork->cpu, dwork->wq, &dwork->work); + __queue_work(dwork->cpu, dwork->wq, &dwork->work, 0); local_irq_enable(); return flush_work(&dwork->work); } @@ -3300,15 +3483,20 @@ EXPORT_SYMBOL(flush_rcu_work); static bool __cancel_work(struct work_struct *work, bool is_dwork) { unsigned long flags; + bool is_cwork = false; int ret; do { - ret = try_to_grab_pending(work, is_dwork, &flags); + ret = try_to_grab_pending(work, is_dwork, &flags, &is_cwork); } while (unlikely(ret == -EAGAIN)); if (unlikely(ret < 0)) return false; + /* PENDING stolen, so drop the cgroup ref from queueing @work. */ + if (ret == 1 && is_cwork) + cgroup_put(work_to_cgroup(work)); + set_work_pool_and_clear_pending(work, get_work_pool_id(work)); local_irq_restore(flags); return ret; @@ -3465,12 +3653,13 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to, * get_unbound_pool() explicitly clears ->no_numa after copying. */ to->no_numa = from->no_numa; + to->cgroup_aware = from->cgroup_aware; } /* hash value of the content of @attr */ static u32 wqattrs_hash(const struct workqueue_attrs *attrs) { - u32 hash = 0; + u32 hash = attrs->cgroup_aware; hash = jhash_1word(attrs->nice, hash); hash = jhash(cpumask_bits(attrs->cpumask), @@ -3486,6 +3675,8 @@ static bool wqattrs_equal(const struct workqueue_attrs *a, return false; if (!cpumask_equal(a->cpumask, b->cpumask)) return false; + if (a->cgroup_aware != b->cgroup_aware) + return false; return true; } @@ -4002,6 +4193,8 @@ apply_wqattrs_prepare(struct workqueue_struct *wq, if (unlikely(cpumask_empty(new_attrs->cpumask))) cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask); + new_attrs->cgroup_aware = !!(wq->flags & WQ_CGROUP); + /* * We may create multiple pwqs with differing cpumasks. Make a * copy of @new_attrs which will be modified and used to obtain @@ -4323,6 +4516,13 @@ struct workqueue_struct *alloc_workqueue(const char *fmt, if ((flags & WQ_POWER_EFFICIENT) && wq_power_efficient) flags |= WQ_UNBOUND; + /* + * cgroup awareness supported only in unbound workqueues since those + * tend to be the most resource-intensive. + */ + if (WARN_ON_ONCE((flags & WQ_CGROUP) && !(flags & WQ_UNBOUND))) + flags &= ~WQ_CGROUP; + /* allocate wq and format name */ if (flags & WQ_UNBOUND) tbl_size = nr_node_ids * sizeof(wq->numa_pwq_tbl[0]); @@ -5980,6 +6180,7 @@ int __init workqueue_init_early(void) BUG_ON(!(attrs = alloc_workqueue_attrs(GFP_KERNEL))); attrs->nice = std_nice[i]; + attrs->cgroup_aware = true; unbound_std_wq_attrs[i] = attrs; /* @@ -5990,6 +6191,7 @@ int __init workqueue_init_early(void) BUG_ON(!(attrs = alloc_workqueue_attrs(GFP_KERNEL))); attrs->nice = std_nice[i]; attrs->no_numa = true; + attrs->cgroup_aware = true; ordered_wq_attrs[i] = attrs; } diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index cb68b03ca89a..3ad5861258ca 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -32,6 +32,7 @@ struct worker { work_func_t current_func; /* L: current_work's fn */ struct pool_workqueue *current_pwq; /* L: current_work's pwq */ struct list_head scheduled; /* L: scheduled works */ + struct cgroup *cgroup; /* private to worker->task */ /* 64 bytes boundary on 64bit, 32 on 32bit */ @@ -76,4 +77,48 @@ void wq_worker_waking_up(struct task_struct *task, int cpu); struct task_struct *wq_worker_sleeping(struct task_struct *task); work_func_t wq_worker_last_func(struct task_struct *task); +#ifdef CONFIG_CGROUPS + +/* + * A barrier work running in a cgroup-aware worker pool needs to specify a + * cgroup. For simplicity, WQ_BARRIER_CGROUP makes the worker stay in its + * current cgroup, which correctly accounts the barrier work to the cgroup of + * the work being flushed in most cases. The only exception is when the + * flushed work is in progress and a worker collision has caused a work from a + * different cgroup to be scheduled before the barrier work, but that seems + * acceptable since the barrier work isn't resource-intensive anyway. + */ +#define WQ_BARRIER_CGROUP ((struct cgroup *)1) + +static inline void set_wq_barrier_cgroup(struct cgroup_work *cwork) +{ + cwork->cgroup = WQ_BARRIER_CGROUP; +} + +static inline bool is_wq_barrier_cgroup(struct cgroup_work *cwork) +{ + return cwork->cgroup == WQ_BARRIER_CGROUP; +} + +#else + +static inline void set_wq_barrier_cgroup(struct cgroup_work *cwork) {} + +static inline bool is_wq_barrier_cgroup(struct cgroup_work *cwork) +{ + return false; +} + +static inline bool worker_in_child_cgroup(struct worker *worker) +{ + return false; +} + +static inline void attach_worker_to_cgroup(struct worker *worker, + struct work_struct *work) {} + +static inline void attach_worker_to_dfl_root(struct worker *worker) {} + +#endif /* CONFIG_CGROUPS */ + #endif /* _KERNEL_WORKQUEUE_INTERNAL_H */ From patchwork Wed Jun 5 13:36:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10976985 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E14A21398 for ; Wed, 5 Jun 2019 13:37:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D3425286D3 for ; Wed, 5 Jun 2019 13:37:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C77A52892C; Wed, 5 Jun 2019 13:37:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E83E8286D3 for ; Wed, 5 Jun 2019 13:37:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F82E6B000E; Wed, 5 Jun 2019 09:37:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6CDE86B0010; Wed, 5 Jun 2019 09:37:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 482866B0269; Wed, 5 Jun 2019 09:37:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 098BE6B000E for ; Wed, 5 Jun 2019 09:37:26 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id u7so18698116pfh.17 for ; Wed, 05 Jun 2019 06:37:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Ww3UO2Y64LsDIspP9nYZfcD+xc5topDBTlHYu02FoZk=; b=Ob3QTY53GIy5VXZfLMxJaD4QXdink1iASljBkcyEhwXlnY0lZnOtSWewf+DFzenMZf EvB9nFFZFwOmSVWG+Ze0B33HPV/brawpzYUqz8/yZ2f082dEB/PrdMQj9xeydftbdJEr QOT4fxTbZ+SmMbP9lylsqWZWPIV03mGhTaACOO7OfSY2l9ZExBm4lfmGNU4UYtdr0wRn FAYT1QSi4REkIfONNj/a7zs9xCYxU/88etWf5cWT9+7u233PtwD2OK+TboswGrSbFbsc 9+FsFrs/pJJgtn0w3g1bhvp+/FojUOq9yANPrwIAVo4hwXVKDND4jNv+rLBhmnIqcnAE 5Qrw== X-Gm-Message-State: APjAAAUI7FneuOfbHm0Qizs3JcxXcOCQlKhoIZv3VlXaVv0lJd0Tpt1g 25p68CGz1+Ujof8+mbWpA1WVHYKaPIHtNCx715tMldpf6E6PO06UOe/PsghxGoHRr2w9jGH2ull i7wASTosw4b38cIcY/jCKf2qNz4SwTTggMvcWx+J78nfv2GYz2e5Ale6yDW4ZuL2ppA== X-Received: by 2002:a62:bd0e:: with SMTP id a14mr46169465pff.44.1559741845452; Wed, 05 Jun 2019 06:37:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqx7N9hh69DkPkj+XRvTfgWed7MT2a+lj/BlwOanm4NjK3mS746Dx9sh5rWcucZiYbi436na X-Received: by 2002:a62:bd0e:: with SMTP id a14mr46169364pff.44.1559741844578; Wed, 05 Jun 2019 06:37:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559741844; cv=none; d=google.com; s=arc-20160816; b=aHArLHJg6rPOfG8PS3ucBPZhdlBS2EIGHp0Tc49dbgkcFHkm0GsUCRvLuWI8NBFC+Z 1RHj9syRvdIXhgX39YpEtPMfaHmoal1Azc08ASRsNZ4z6G9xQqqX+8Mi9go8I+Pi84cq PADdv4pomg3QWhgOcxtfzHbF3Xiw3TS5SW/fgkMEi/U0BihCfHFSanvlEsah+JmX/HQ5 1L09UZ5/D/YoE7qBVheRuX5+AXxEwVt38TBbptolyBM+EEX5DdtJV+HIl3wUBKr21Acu I0HLwYWQeP8unORmT5dksjdOs3eRPAJbPIeKQMj1YOzVt829QfZf6WIA8NQNf6s9QrLr nYnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=Ww3UO2Y64LsDIspP9nYZfcD+xc5topDBTlHYu02FoZk=; b=RpERhASg0kuvGHF37V5I0v7MUL1xNZSrVV20aKGnL6aOxpp931YxzVewH8KknIc/Vu JlvVkG23A8xvICLCg2lmOhHG1chzmzjrhNmzYDOH0eRSSafC7pkL0k4POJGAZDcgvoEd /hlB9JGWRUmzhBx7mttU38LQkHGhU8bickmq8JX2sf5txAvYkOF1hZNyigOd4kIw9vtH VFLE1JyAyRexhSKVMlXQGIpttVHgaZfbJh87o43NC8cCHPFGnmcIyux6BpRpMXyTw6qk TM+MYkcbIpXv4rTWVUpkOSRaLeC5yheU4473VbmZVU9A3saBKaLqyTlEClG3fQ9oX5OQ rfTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="JTf9y2n/"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id c32si26999882pje.0.2019.06.05.06.37.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 06:37:24 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="JTf9y2n/"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DTRP0119591; Wed, 5 Jun 2019 13:37:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=Ww3UO2Y64LsDIspP9nYZfcD+xc5topDBTlHYu02FoZk=; b=JTf9y2n/Sjoe4pCzMh/uy2QTChLH80qdYMT4tF5HvvdqnddEB6F4E7F4ln19olEOst+P s1d70dD4bMVQKxP4Wu+TyCMvlVbnlNw1Av/XCyGAad3Kwj35hUXdccR6qBwRPSim9lCU 7Rb092NiSatD4WL50ijDmD/UXKm3bjtZxCByZFaqV88vG2iSAbuTpL79Hr6kx3DNUG60 AllDwHwTDfclIdN9dy8zeJbbxLGmOLYPfZcfR/ES2Mtuqu9X1mGhxP5+Spi6RjmpSb8q nMtE1DTLPDzZD13aqefL2bhadYVaee21oqJeziL5s7NwKS9ai+fBjUFarxilJfvPWhIH gA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 2sugstjn48-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:09 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DZsGf069290; Wed, 5 Jun 2019 13:37:08 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 2swnghw2j7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:08 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x55Db4vD022901; Wed, 5 Jun 2019 13:37:04 GMT Received: from localhost.localdomain (/73.60.114.248) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 05 Jun 2019 06:37:04 -0700 From: Daniel Jordan To: hannes@cmpxchg.org, jiangshanlai@gmail.com, lizefan@huawei.com, tj@kernel.org Cc: bsd@redhat.com, dan.j.williams@intel.com, daniel.m.jordan@oracle.com, dave.hansen@intel.com, juri.lelli@redhat.com, mhocko@kernel.org, peterz@infradead.org, steven.sistare@oracle.com, tglx@linutronix.de, tom.hromatka@oracle.com, vdavydov.dev@gmail.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 3/5] workqueue, memcontrol: make memcg throttle workqueue workers Date: Wed, 5 Jun 2019 09:36:48 -0400 Message-Id: <20190605133650.28545-4-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190605133650.28545-1-daniel.m.jordan@oracle.com> References: <20190605133650.28545-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Attaching a worker to a css_set isn't enough for all controllers to throttle it. In particular, the memory controller currently bypasses accounting for kernel threads. Support memcg accounting for cgroup-aware workqueue workers so that they're appropriately throttled. Another, probably better way to do this is to have kernel threads, or even specifically cgroup-aware workqueue workers, call memalloc_use_memcg and memalloc_unuse_memcg during cgroup migration (memcg attach callback maybe). Signed-off-by: Daniel Jordan --- kernel/workqueue.c | 26 ++++++++++++++++++++++++++ kernel/workqueue_internal.h | 5 +++++ mm/memcontrol.c | 26 ++++++++++++++++++++++++-- 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 89b90899bc09..c8cc69e296c0 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -50,6 +50,8 @@ #include #include #include +#include +#include #include "workqueue_internal.h" @@ -1829,6 +1831,28 @@ static inline bool worker_in_child_cgroup(struct worker *worker) return (worker->flags & WORKER_CGROUP) && cgroup_parent(worker->cgroup); } +/* XXX Put this in the memory controller's attach callback. */ +#ifdef CONFIG_MEMCG +static void worker_unuse_memcg(struct worker *worker) +{ + if (worker->task->active_memcg) { + struct mem_cgroup *memcg = worker->task->active_memcg; + + memalloc_unuse_memcg(); + css_put(&memcg->css); + } +} + +static void worker_use_memcg(struct worker *worker) +{ + struct mem_cgroup *memcg; + + worker_unuse_memcg(worker); + memcg = mem_cgroup_from_css(task_get_css(worker->task, memory_cgrp_id)); + memalloc_use_memcg(memcg); +} +#endif /* CONFIG_MEMCG */ + static void attach_worker_to_dfl_root(struct worker *worker) { int ret; @@ -1841,6 +1865,7 @@ static void attach_worker_to_dfl_root(struct worker *worker) rcu_read_lock(); worker->cgroup = task_dfl_cgroup(worker->task); rcu_read_unlock(); + worker_unuse_memcg(worker); } else { /* * TODO Modify the cgroup migration path to guarantee that a @@ -1880,6 +1905,7 @@ static void attach_worker_to_cgroup(struct worker *worker, if (cgroup_attach_kthread(cgroup) == 0) { worker->cgroup = cgroup; + worker_use_memcg(worker); } else { /* * Attach failed, so attach to the default root so the diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index 3ad5861258ca..f254b93edc2c 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -79,6 +79,11 @@ work_func_t wq_worker_last_func(struct task_struct *task); #ifdef CONFIG_CGROUPS +#ifndef CONFIG_MEMCG +static inline void worker_use_memcg(struct worker *worker) {} +static inline void worker_unuse_memcg(struct worker *worker) {} +#endif /* CONFIG_MEMCG */ + /* * A barrier work running in a cgroup-aware worker pool needs to specify a * cgroup. For simplicity, WQ_BARRIER_CGROUP makes the worker stay in its diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 81a0d3914ec9..1a80931b124a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2513,9 +2513,31 @@ static void memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg, static inline bool memcg_kmem_bypass(void) { - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) + if (in_interrupt()) return true; - return false; + + if (unlikely(current->flags & PF_WQ_WORKER)) { + struct cgroup *parent; + + /* + * memcg should throttle cgroup-aware workers. Infer the + * worker is cgroup-aware by its presence in a non-root cgroup. + * + * This test won't detect a cgroup-aware worker attached to the + * default root, but in that case memcg doesn't need to + * throttle it anyway. + * + * XXX One alternative to this awkward block is adding a + * cgroup-aware-worker bit to task_struct. + */ + rcu_read_lock(); + parent = cgroup_parent(task_dfl_cgroup(current)); + rcu_read_unlock(); + + return !parent; + } + + return !current->mm || (current->flags & PF_KTHREAD); } /** From patchwork Wed Jun 5 13:36:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10976989 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C38F15E6 for ; Wed, 5 Jun 2019 13:37:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C77D28783 for ; Wed, 5 Jun 2019 13:37:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19FB428915; Wed, 5 Jun 2019 13:37:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D24892881E for ; Wed, 5 Jun 2019 13:37:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71EC26B026A; Wed, 5 Jun 2019 09:37:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6CF0E6B026C; Wed, 5 Jun 2019 09:37:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45D536B026B; Wed, 5 Jun 2019 09:37:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id F255A6B0266 for ; Wed, 5 Jun 2019 09:37:29 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id d7so14810249pgc.8 for ; Wed, 05 Jun 2019 06:37:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=/kR+J05BNWcdRdpq6dzt+Y70VdHoKrDhYHZLSK/vCNc=; b=Ec6StU6K/OMlXJ8weVOMgcIdEnug0tT1OaNyeouRZnoC1kSNtK5AV3eRTLGWkjLL7d M0YqNDbQRgZpkpvYR0ugwe9vB0B6JkKRrBkQE5EH+n3ULKxxLBgC59DkWgjvNffKAOhj rDfJN+NiAMvXzA1L2IwuOfnFt9NljTmW7GLJLZwXiIXXMQnO/p7KCbMP4jGxY+Ia2Rx1 BlM6296/sizVbjG5799Be1/nfvSntQxocwu6xCSVPkUc9QLxkhLspfwTnWQn6Xhn/yIs TubPRRwDmvI1aQiQJ2Q+Dl3Wem69vWOBuWpk0Xd5/zm4+YcvJhahaQWz6zmm0r9EGBhb 96kw== X-Gm-Message-State: APjAAAW397sQ+tIjJH1itKKasHhbthTJo3XS91OukCqpW51GEopu5VO2 oLSzkmJY4Uuz+nEaQb8YMrb7//mcVQK7L6vfXdHnnd++9k5vv3WwLIJcnkX0Fvf66dV7/sAF2ZL nPg5VHtI5ycXAdisCdRIsqZXQgXOhVYavxxKIkhO0SjHKpWkYjqEnGWoJ45KFDp1u4A== X-Received: by 2002:a17:90a:b94c:: with SMTP id f12mr41907904pjw.64.1559741849457; Wed, 05 Jun 2019 06:37:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqwONJyFS3+Hysrx5afanBNdGdGRvXf2n9DB3Zgu9VddI9mb4O0VGy9FqlOD3VO0cD1Y6ZRy X-Received: by 2002:a17:90a:b94c:: with SMTP id f12mr41907729pjw.64.1559741847866; Wed, 05 Jun 2019 06:37:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559741847; cv=none; d=google.com; s=arc-20160816; b=zc5tS3g9Y9+oj9oONlmx1KcbpQPAEiSCEqNh3OoGBrSyMXTWrjHJRzyuWHd+dSzU2F HYRIinMhxgUT/ULozlFgezrY7PRsIciPM9zuc4ABcfp89yZ/rBEA4lam2LD4czVOlN7n /czB+Mu6YO5ssJQcLON7S/7Pr6bajy/NPRVjP3uli2bcJGclBh+qML5Mx1G32ADUAIVX 7QUHgM/0OK+x+baVPccN820q3FWjU3XmoOojUYV82qjYP9dsWIgHs2tQ2cunggmc41Y0 XG5ZIUZ4adaaa70olss7p9B6O2o2qPwD5JuEBdvZE+87a03zwzPvLVYiO/9neA+MmOb7 dAFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=/kR+J05BNWcdRdpq6dzt+Y70VdHoKrDhYHZLSK/vCNc=; b=uCwOFm0sAUz1ZmDyyxIEAsqvBbum7/wD1zDN4jIjL1wd0i1x3jcdwykCwwFwX1LQC9 QHvm9FrmsMGkuCM0jkYYnFYTGpVpn8Dd5rOT+rHVke+e7bjLgIgOsh7hbY8aW+zAjZrc 1UKQORYaeyFNNGappFfptd9ZXeEvQ4ysCMpoddLXMXVFKFPqGP05R/RKGfDVEdsH549F P8ujin79S8iJzhsHmqaWy3d9b+NnAZ/5j6mDENICjN/uSEKAT35Wn5IXrPekNj70kugQ 2RaoflLYr2qq9yDywJvlSKi+9AdjtJKjdkdi0wXwjQufggvUPjcci+2IbJtDX1vQsSYJ vaYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="R8gl8B/W"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.85 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2120.oracle.com (userp2120.oracle.com. [156.151.31.85]) by mx.google.com with ESMTPS id d197si25131927pga.110.2019.06.05.06.37.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 06:37:27 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.85 as permitted sender) client-ip=156.151.31.85; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="R8gl8B/W"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.85 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DTff6127999; Wed, 5 Jun 2019 13:37:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=/kR+J05BNWcdRdpq6dzt+Y70VdHoKrDhYHZLSK/vCNc=; b=R8gl8B/WNwmi0k38CSxxHKwtS5n1yTLCCBZXEaq+20ch7V8DLgX2FcLYDP0Yh/6Hc8wS +KJ7haDSGqn3Xufzqt+nl2otUU2hWJXsFI43e6ttpdZeFswYhlel1XaWvh12pbfJvL87 l/J805Mr1tHwLj5OF4crfdVs/R7cf99ugX8dzICSkhaxlP7eugcQXUKJNhlJjFuK9AuI UvW0AObZAb9gU89MIIgp0drkQfh8NgxdJtZBE9McqOmuCVSdykAOzegpwTHP4fnqD0W/ BL3uWeqmruc/C5v6nOQnUuUpPcYUtRRwGNCv4FwtJSNkvBUISFC/vF65C9UwO/BjXtdk 5w== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 2suj0qjhr7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:09 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55Db8VO002759; Wed, 5 Jun 2019 13:37:09 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 2swngkw199-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:09 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x55Db69M009930; Wed, 5 Jun 2019 13:37:07 GMT Received: from localhost.localdomain (/73.60.114.248) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 05 Jun 2019 06:37:06 -0700 From: Daniel Jordan To: hannes@cmpxchg.org, jiangshanlai@gmail.com, lizefan@huawei.com, tj@kernel.org Cc: bsd@redhat.com, dan.j.williams@intel.com, daniel.m.jordan@oracle.com, dave.hansen@intel.com, juri.lelli@redhat.com, mhocko@kernel.org, peterz@infradead.org, steven.sistare@oracle.com, tglx@linutronix.de, tom.hromatka@oracle.com, vdavydov.dev@gmail.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 4/5] workqueue, cgroup: add test module Date: Wed, 5 Jun 2019 09:36:49 -0400 Message-Id: <20190605133650.28545-5-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190605133650.28545-1-daniel.m.jordan@oracle.com> References: <20190605133650.28545-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Test the basic functionality of cgroup aware workqueues. Inspired by Matthew Wilcox's test_xarray.c Signed-off-by: Daniel Jordan --- kernel/workqueue.c | 1 + lib/Kconfig.debug | 12 + lib/Makefile | 1 + lib/test_cgroup_workqueue.c | 325 ++++++++++++++++++ .../selftests/cgroup_workqueue/Makefile | 9 + .../testing/selftests/cgroup_workqueue/config | 1 + .../cgroup_workqueue/test_cgroup_workqueue.sh | 104 ++++++ 7 files changed, 453 insertions(+) create mode 100644 lib/test_cgroup_workqueue.c create mode 100644 tools/testing/selftests/cgroup_workqueue/Makefile create mode 100644 tools/testing/selftests/cgroup_workqueue/config create mode 100755 tools/testing/selftests/cgroup_workqueue/test_cgroup_workqueue.sh diff --git a/kernel/workqueue.c b/kernel/workqueue.c index c8cc69e296c0..15459b5bb0bf 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1825,6 +1825,7 @@ bool queue_cgroup_work_node(int node, struct workqueue_struct *wq, local_irq_restore(flags); return ret; } +EXPORT_SYMBOL_GPL(queue_cgroup_work_node); static inline bool worker_in_child_cgroup(struct worker *worker) { diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index d5a4a4036d2f..9909a306c142 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2010,6 +2010,18 @@ config TEST_STACKINIT If unsure, say N. +config TEST_CGROUP_WQ + tristate "Test cgroup-aware workqueues at runtime" + depends on CGROUPS + help + Test cgroup-aware workqueues, in which workers attach to the + cgroup specified by the queueing task. Basic test coverage + for whether workers attach to the expected cgroup, both for + cgroup-aware and unaware works, and whether workers are + throttled by the memory controller. + + If unsure, say N. + endif # RUNTIME_TESTING_MENU config MEMTEST diff --git a/lib/Makefile b/lib/Makefile index 18c2be516ab4..d08b4a50bfd1 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -92,6 +92,7 @@ obj-$(CONFIG_TEST_OBJAGG) += test_objagg.o obj-$(CONFIG_TEST_STACKINIT) += test_stackinit.o obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/ +obj-$(CONFIG_TEST_CGROUP_WQ) += test_cgroup_workqueue.o ifeq ($(CONFIG_DEBUG_KOBJECT),y) CFLAGS_kobject.o += -DDEBUG diff --git a/lib/test_cgroup_workqueue.c b/lib/test_cgroup_workqueue.c new file mode 100644 index 000000000000..466ec4e6e55b --- /dev/null +++ b/lib/test_cgroup_workqueue.c @@ -0,0 +1,325 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * test_cgroup_workqueue.c: Test cgroup-aware workqueues + * Copyright (c) 2019 Oracle and/or its affiliates. All rights reserved. + * Author: Daniel Jordan + * + * Inspired by Matthew Wilcox's test_xarray.c. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static atomic_long_t cwq_tests_run = ATOMIC_LONG_INIT(0); +static atomic_long_t cwq_tests_passed = ATOMIC_LONG_INIT(0); + +static struct workqueue_struct *cwq_cgroup_aware_wq; +static struct workqueue_struct *cwq_cgroup_unaware_wq; + +/* cgroup v2 hierarchy mountpoint */ +static char *cwq_cgrp_root_path = "/test_cgroup_workqueue"; +module_param(cwq_cgrp_root_path, charp, 0444); +static struct cgroup *cwq_cgrp_root; + +static char *cwq_cgrp_1_path = "/cwq_1"; +module_param(cwq_cgrp_1_path, charp, 0444); +static struct cgroup *cwq_cgrp_1; + +static char *cwq_cgrp_2_path = "/cwq_2"; +module_param(cwq_cgrp_2_path, charp, 0444); +static struct cgroup *cwq_cgrp_2; + +static char *cwq_cgrp_3_path = "/cwq_3"; +module_param(cwq_cgrp_3_path, charp, 0444); +static struct cgroup *cwq_cgrp_3; + +static size_t cwq_memcg_max = 1ul << 20; +module_param(cwq_memcg_max, ulong, 0444); + +#define CWQ_BUG(x, test_name) ({ \ + int __ret_cwq_bug = !!(x); \ + atomic_long_inc(&cwq_tests_run); \ + if (__ret_cwq_bug) { \ + pr_warn("BUG at %s:%d\n", __func__, __LINE__); \ + pr_warn("%s\n", (test_name)); \ + dump_stack(); \ + } else { \ + atomic_long_inc(&cwq_tests_passed); \ + } \ + __ret_cwq_bug; \ +}) + +#define CWQ_BUG_ON(x) CWQ_BUG((x), "") + +struct cwq_data { + struct cgroup_work cwork; + struct cgroup *expected_cgroup; + const char *test_name; +}; + +#define CWQ_INIT_DATA(data, func, expected_cgrp) do { \ + INIT_WORK_ONSTACK(&(data)->work, (func)); \ + (data)->expected_cgroup = (expected_cgrp); \ +} while (0) + +static void cwq_verify_worker_cgroup(struct work_struct *work) +{ + struct cgroup_work *cwork = container_of(work, struct cgroup_work, + work); + struct cwq_data *data = container_of(cwork, struct cwq_data, cwork); + struct cgroup *worker_cgroup; + + CWQ_BUG(!(current->flags & PF_WQ_WORKER), data->test_name); + + rcu_read_lock(); + worker_cgroup = task_dfl_cgroup(current); + rcu_read_unlock(); + + CWQ_BUG(worker_cgroup != data->expected_cgroup, data->test_name); +} + +static noinline void cwq_test_reg_work_on_cgrp_unaware_wq(void) +{ + struct cwq_data data; + + data.expected_cgroup = cwq_cgrp_root; + data.test_name = __func__; + INIT_WORK_ONSTACK(&data.cwork.work, cwq_verify_worker_cgroup); + + CWQ_BUG_ON(!queue_work(cwq_cgroup_unaware_wq, &data.cwork.work)); + flush_work(&data.cwork.work); +} + +static noinline void cwq_test_cgrp_work_on_cgrp_aware_wq(void) +{ + struct cwq_data data; + + data.expected_cgroup = cwq_cgrp_1; + data.test_name = __func__; + INIT_CGROUP_WORK_ONSTACK(&data.cwork, cwq_verify_worker_cgroup); + + CWQ_BUG_ON(!queue_cgroup_work(cwq_cgroup_aware_wq, &data.cwork, + cwq_cgrp_1)); + flush_work(&data.cwork.work); +} + +static struct cgroup *cwq_get_random_cgroup(void) +{ + switch (prandom_u32_max(4)) { + case 1: return cwq_cgrp_1; + case 2: return cwq_cgrp_2; + case 3: return cwq_cgrp_3; + default: return cwq_cgrp_root; + } +} + +#define CWQ_NWORK 256 +static noinline void cwq_test_many_cgrp_works_on_cgrp_aware_wq(void) +{ + int i; + struct cwq_data *data_array = kmalloc_array(CWQ_NWORK, + sizeof(struct cwq_data), + GFP_KERNEL); + if (CWQ_BUG_ON(!data_array)) + return; + + for (i = 0; i < CWQ_NWORK; ++i) { + struct cgroup *cgrp = cwq_get_random_cgroup(); + + data_array[i].expected_cgroup = cgrp; + data_array[i].test_name = __func__; + INIT_CGROUP_WORK(&data_array[i].cwork, + cwq_verify_worker_cgroup); + CWQ_BUG_ON(!queue_cgroup_work(cwq_cgroup_aware_wq, + &data_array[i].cwork, + cgrp)); + } + + for (i = 0; i < CWQ_NWORK; ++i) + flush_work(&data_array[i].cwork.work); + + kfree(data_array); +} + +static void cwq_verify_worker_obeys_memcg(struct work_struct *work) +{ + struct cgroup_work *cwork = container_of(work, struct cgroup_work, + work); + struct cwq_data *data = container_of(cwork, struct cwq_data, cwork); + struct cgroup *worker_cgroup; + void *mem; + + CWQ_BUG(!(current->flags & PF_WQ_WORKER), data->test_name); + + rcu_read_lock(); + worker_cgroup = task_dfl_cgroup(current); + rcu_read_unlock(); + + CWQ_BUG(worker_cgroup != data->expected_cgroup, data->test_name); + + mem = __vmalloc(cwq_memcg_max * 2, __GFP_ACCOUNT | __GFP_NOWARN, + PAGE_KERNEL); + if (data->expected_cgroup == cwq_cgrp_2) { + /* + * cwq_cgrp_2 has its memory.max set to cwq_memcg_max, so the + * allocation should fail. + */ + CWQ_BUG(mem, data->test_name); + } else { + /* + * Other cgroups don't have a memory.max limit, so the + * allocation should succeed. + */ + CWQ_BUG(!mem, data->test_name); + } + vfree(mem); +} + +static noinline void cwq_test_reg_work_is_not_throttled_by_memcg(void) +{ + struct cwq_data data; + + if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !cwq_memcg_max) + return; + + data.expected_cgroup = cwq_cgrp_root; + data.test_name = __func__; + INIT_WORK_ONSTACK(&data.cwork.work, cwq_verify_worker_obeys_memcg); + CWQ_BUG_ON(!queue_work(cwq_cgroup_unaware_wq, &data.cwork.work)); + flush_work(&data.cwork.work); +} + +static noinline void cwq_test_cgrp_work_is_throttled_by_memcg(void) +{ + struct cwq_data data; + + if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !cwq_memcg_max) + return; + + /* + * The kselftest shell script enables the memory controller in + * cwq_cgrp_2 and sets memory.max to cwq_memcg_max. + */ + data.expected_cgroup = cwq_cgrp_2; + data.test_name = __func__; + INIT_CGROUP_WORK_ONSTACK(&data.cwork, cwq_verify_worker_obeys_memcg); + + CWQ_BUG_ON(!queue_cgroup_work(cwq_cgroup_aware_wq, &data.cwork, + cwq_cgrp_2)); + flush_work(&data.cwork.work); +} + +static noinline void cwq_test_cgrp_work_is_not_throttled_by_memcg(void) +{ + struct cwq_data data; + + if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !cwq_memcg_max) + return; + + /* + * The kselftest shell script doesn't set a memory limit for cwq_cgrp_1 + * or _3, so a cgroup work should still be able to allocate memory + * above that limit. + */ + data.expected_cgroup = cwq_cgrp_1; + data.test_name = __func__; + INIT_CGROUP_WORK_ONSTACK(&data.cwork, cwq_verify_worker_obeys_memcg); + + CWQ_BUG_ON(!queue_cgroup_work(cwq_cgroup_aware_wq, &data.cwork, + cwq_cgrp_1)); + flush_work(&data.cwork.work); + + /* + * And cgroup workqueues shouldn't choke on a cgroup that's disabled + * the memory controller, such as cwq_cgrp_3. + */ + data.expected_cgroup = cwq_cgrp_3; + data.test_name = __func__; + INIT_CGROUP_WORK_ONSTACK(&data.cwork, cwq_verify_worker_obeys_memcg); + + CWQ_BUG_ON(!queue_cgroup_work(cwq_cgroup_aware_wq, &data.cwork, + cwq_cgrp_3)); + flush_work(&data.cwork.work); +} + +static int cwq_init(void) +{ + s64 passed, run; + + pr_warn("cgroup workqueue test module\n"); + + cwq_cgroup_aware_wq = alloc_workqueue("cwq_cgroup_aware_wq", + WQ_UNBOUND | WQ_CGROUP, 0); + if (!cwq_cgroup_aware_wq) { + pr_warn("cwq_cgroup_aware_wq allocation failed\n"); + return -EAGAIN; + } + + cwq_cgroup_unaware_wq = alloc_workqueue("cwq_cgroup_unaware_wq", + WQ_UNBOUND, 0); + if (!cwq_cgroup_unaware_wq) { + pr_warn("cwq_cgroup_unaware_wq allocation failed\n"); + goto alloc_wq_fail; + } + + cwq_cgrp_root = cgroup_get_from_path("/"); + if (IS_ERR(cwq_cgrp_root)) { + pr_warn("can't get root cgroup\n"); + goto cgroup_get_fail; + } + + cwq_cgrp_1 = cgroup_get_from_path(cwq_cgrp_1_path); + if (IS_ERR(cwq_cgrp_1)) { + pr_warn("can't get child cgroup 1\n"); + goto cgroup_get_fail; + } + + cwq_cgrp_2 = cgroup_get_from_path(cwq_cgrp_2_path); + if (IS_ERR(cwq_cgrp_2)) { + pr_warn("can't get child cgroup 2\n"); + goto cgroup_get_fail; + } + + cwq_cgrp_3 = cgroup_get_from_path(cwq_cgrp_3_path); + if (IS_ERR(cwq_cgrp_3)) { + pr_warn("can't get child cgroup 3\n"); + goto cgroup_get_fail; + } + + cwq_test_reg_work_on_cgrp_unaware_wq(); + cwq_test_cgrp_work_on_cgrp_aware_wq(); + cwq_test_many_cgrp_works_on_cgrp_aware_wq(); + cwq_test_reg_work_is_not_throttled_by_memcg(); + cwq_test_cgrp_work_is_throttled_by_memcg(); + cwq_test_cgrp_work_is_not_throttled_by_memcg(); + + passed = atomic_long_read(&cwq_tests_passed); + run = atomic_long_read(&cwq_tests_run); + pr_warn("cgroup workqueues: %lld of %lld tests passed\n", passed, run); + return (run == passed) ? 0 : -EINVAL; + +cgroup_get_fail: + destroy_workqueue(cwq_cgroup_unaware_wq); +alloc_wq_fail: + destroy_workqueue(cwq_cgroup_aware_wq); + return -EAGAIN; /* better ideas? */ +} + +static void cwq_exit(void) +{ + destroy_workqueue(cwq_cgroup_aware_wq); + destroy_workqueue(cwq_cgroup_unaware_wq); +} + +module_init(cwq_init); +module_exit(cwq_exit); +MODULE_AUTHOR("Daniel Jordan "); +MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/cgroup_workqueue/Makefile b/tools/testing/selftests/cgroup_workqueue/Makefile new file mode 100644 index 000000000000..2ba1cd922670 --- /dev/null +++ b/tools/testing/selftests/cgroup_workqueue/Makefile @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0 + +all: + +TEST_PROGS := test_cgroup_workqueue.sh + +include ../lib.mk + +clean: diff --git a/tools/testing/selftests/cgroup_workqueue/config b/tools/testing/selftests/cgroup_workqueue/config new file mode 100644 index 000000000000..ae38b8f3c3db --- /dev/null +++ b/tools/testing/selftests/cgroup_workqueue/config @@ -0,0 +1 @@ +CONFIG_TEST_CGROUP_WQ=m diff --git a/tools/testing/selftests/cgroup_workqueue/test_cgroup_workqueue.sh b/tools/testing/selftests/cgroup_workqueue/test_cgroup_workqueue.sh new file mode 100755 index 000000000000..33251276d2cf --- /dev/null +++ b/tools/testing/selftests/cgroup_workqueue/test_cgroup_workqueue.sh @@ -0,0 +1,104 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# Runs cgroup workqueue kernel module tests + +# hierarchy: root +# / \ +# 1 2 +# / +# 3 +CG_ROOT='/test_cgroup_workqueue' +CG_1='/cwq_1' +CG_2='/cwq_2' +CG_3='/cwq_3' +MEMCG_MAX="$((2**16))" # small but arbitrary amount + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +cg_mnt='' +cg_mnt_mounted='' +cg_1_created='' +cg_2_created='' +cg_3_created='' + +cleanup() +{ + [ -n "$cg_3_created" ] && rmdir "$CG_ROOT/$CG_1/$CG_3" || exit 1 + [ -n "$cg_2_created" ] && rmdir "$CG_ROOT/$CG_2" || exit 1 + [ -n "$cg_1_created" ] && rmdir "$CG_ROOT/$CG_1" || exit 1 + [ -n "$cg_mnt_mounted" ] && umount "$CG_ROOT" || exit 1 + [ -n "$cg_mnt_created" ] && rmdir "$CG_ROOT" || exit 1 + exit "$1" +} + +if ! /sbin/modprobe -q -n test_cgroup_workqueue; then + echo "cgroup_workqueue: module test_cgroup_workqueue not found [SKIP]" + exit $ksft_skip +fi + +# Setup cgroup v2 hierarchy +if mkdir "$CG_ROOT"; then + cg_mnt_created=1 +else + echo "cgroup_workqueue: can't create cgroup mountpoint at $CG_ROOT" + cleanup 1 +fi + +if mount -t cgroup2 none "$CG_ROOT"; then + cg_mnt_mounted=1 +else + echo "cgroup_workqueue: couldn't mount cgroup hierarchy at $CG_ROOT" + cleanup 1 +fi + +if grep -q memory "$CG_ROOT/cgroup.controllers"; then + /bin/echo +memory > "$CG_ROOT/cgroup.subtree_control" + cwq_memcg_max="$MEMCG_MAX" +else + # Tell test module not to run memory.max tests. + cwq_memcg_max=0 +fi + +if mkdir "$CG_ROOT/$CG_1"; then + cg_1_created=1 +else + echo "cgroup_workqueue: can't mkdir $CG_ROOT/$CG_1" + cleanup 1 +fi + +if mkdir "$CG_ROOT/$CG_2"; then + cg_2_created=1 +else + echo "cgroup_workqueue: can't mkdir $CG_ROOT/$CG_2" + cleanup 1 +fi + +if mkdir "$CG_ROOT/$CG_1/$CG_3"; then + cg_3_created=1 + # Ensure the memory controller is disabled as expected in $CG_3's + # parent, $CG_1, for testing. + if grep -q memory "$CG_ROOT/$CG_1/cgroup.subtree_control"; then + /bin/echo -memory > "$CG_ROOT/$CG_1/cgroup.subtree_control" + fi +else + echo "cgroup_workqueue: can't mkdir $CG_ROOT/$CG_1/$CG_3" + cleanup 1 +fi + +if (( cwq_memcg_max != 0 )); then + /bin/echo "$MEMCG_MAX" > "$CG_ROOT/$CG_2/memory.max" +fi + +if /sbin/modprobe -q test_cgroup_workqueue cwq_cgrp_root_path="$CG_ROOT" \ + cwq_cgrp_1_path="$CG_1" \ + cwq_cgrp_2_path="$CG_2" \ + cwq_cgrp_3_path="$CG_1/$CG_3" \ + cwq_memcg_max="$cwq_memcg_max"; then + echo "cgroup_workqueue: ok" + /sbin/modprobe -q -r test_cgroup_workqueue + cleanup 0 +else + echo "cgroup_workqueue: [FAIL]" + cleanup 1 +fi From patchwork Wed Jun 5 13:36:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Jordan X-Patchwork-Id: 10976987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 335DA15E6 for ; Wed, 5 Jun 2019 13:37:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 22FB9286D4 for ; Wed, 5 Jun 2019 13:37:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 150C62892C; Wed, 5 Jun 2019 13:37:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D615268AE for ; Wed, 5 Jun 2019 13:37:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 855936B0010; Wed, 5 Jun 2019 09:37:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 807786B0266; Wed, 5 Jun 2019 09:37:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6586E6B0269; Wed, 5 Jun 2019 09:37:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 2B7496B0010 for ; Wed, 5 Jun 2019 09:37:28 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id g11so16073548plt.23 for ; Wed, 05 Jun 2019 06:37:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=8yYACzlvtHpZ4BGD0wo6T7cr3uUD8GGsT+k2KVWRDBw=; b=nJU2dlyNNp7AbO1ToPwjvL/v2ZubIFUHOOxdEbwhG8npBH+cuSgBMP/oX3O4jNAD2N MRzT6+uSB6apSoza7oIBEmpt4BNqGruErXycLEXHYkZZ24Rp0n+M6McHLxzDF7d3J6jT psqNiMZUsDUylfVE7DwXcTRdvhcTgrItD/sWgjxJNvUlH025RxoOuv0NwEbYhV1Y4cc5 sCRQ2gGOZdAnLiB3CfBsGBZoVXrFGIUZfnc6SCCgBK8EHnO1ER2FRhR9GQp6z30+EZhF wjebjrRV3Z7XZ2bkj95aWaxkOi7T0zA4UaDQ4SSrWCbjBsy45ZNgy2nbVlNzhgTK4gHY bqRw== X-Gm-Message-State: APjAAAUeJ/hBS8vxEjStx61OgCDjgLCVb1/IM1L6h2Bxibel2ZkAkx2y tATRrSBPrZSLgGAknc6LF/6qwTw1N8gsvopu1rMyWU+0FOH/9z6j21WMJJDiBHu77mouPwvx3yr nciJOJ1uG4WuD8qL4HwS+OaL2OyI1ZqycKu3KKgrIfAaLuvKpoHU5OUqFDKIiHhuTcg== X-Received: by 2002:a17:902:b110:: with SMTP id q16mr35850777plr.218.1559741847608; Wed, 05 Jun 2019 06:37:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqw5gCYTfGgYFo5tLpXeoAazFfeq66l0xGR60sisrHVoMjDQ2/Q3hPKeccN3zCZ6LZDCyoHd X-Received: by 2002:a17:902:b110:: with SMTP id q16mr35850674plr.218.1559741846647; Wed, 05 Jun 2019 06:37:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559741846; cv=none; d=google.com; s=arc-20160816; b=INKn3rsGy/XhBPa++mJNVstp0w/QPbiV6yNgVZgf+6oqpGYVjme28en8c6iPWl4a9C 731xdEWLDheVk+TRlNcjQ/Y0lJebqjOCPu1mn9FsRl6+YdPCs0bb78sbv23Txn8sJlOy mCu80wcfa3AnK26r+rlP2lcamEdxNDzhI0pCp/c2BtFm1b9jQjr+DaTttTaDOsMMSGb+ MDddBZ05Xp3xSQ5+vzTSbugfCcp5+NgsKrMR+JR3pvwVhy8vGFFFCpaKdfr3lHkVfDIT SxEntnWWmW7X4ZUYDuCZNWquRD6nJhMyVJanKdKUmFFxsszW9i1TFmOoHK2PwdqChmGW JJ/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=8yYACzlvtHpZ4BGD0wo6T7cr3uUD8GGsT+k2KVWRDBw=; b=AiI24NpgjHbNv++jP58zMxCA28d0IkplOwl52/nlJM5J8YHbXMtESBhXDt0E4WuYHf aVsatQEkCLqsDuan1nh3QE3b6bWnVvJcxHoOtjShy+6fPzXqymOwKtyMAO/XuVinooy6 a1iRkmHQRgTaiOPtXhMzaJ8u4n7sowMeWWJlhuS53KlVsLonnbY1Jft9sgwuT9TXg4C3 PuF4eFSHM74yiskLk8qE7Ro0YpTltKHHkV1G1Z9UExhMPyX1XQ6hPdrwYg0Z4lYOVI7R MYXLELRwSfHIMGLB2GgvrNsAm1LHHNnWLZxsSGokC+KaTh/GpMr0htRODnEbqZWa4gnt T3RQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="KgjE/QqS"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id k8si26495552pgi.123.2019.06.05.06.37.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Jun 2019 06:37:26 -0700 (PDT) Received-SPF: pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b="KgjE/QqS"; spf=pass (google.com: domain of daniel.m.jordan@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=daniel.m.jordan@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55DTIKT119560; Wed, 5 Jun 2019 13:37:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2018-07-02; bh=8yYACzlvtHpZ4BGD0wo6T7cr3uUD8GGsT+k2KVWRDBw=; b=KgjE/QqSvEaVhxjmUZSwxl3HJ81jm0KfBRa6GrQqRCsxSbtGJII3Vn0krmhid9ln8mxl /IAuycz4jC5dI7ZQ+BbIdfgrUIddnOrjMaUlQBmSmAjx7jbhhKxVHO0tmRvnjBa6T652 1mOFsUnAF0jSPfRCh+xnFIDivnRf2yGS/XbnthP+tMOotD5MgJmBMZacmbItiC7n9UY8 xouolCSMtTywmiU/Bp/x438/4Fs3wutxNVFHbbgMiC/WVKzta3VvayOSXeVvQmIe2eSu WNjobOq+HhQiFUAGXPjGTblcavUC3T9/ef9esBe90wb/Cr5yjigYZ00ZXYJUirGpd8z4 7w== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 2sugstjn4c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:10 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x55Db1Z1002510; Wed, 5 Jun 2019 13:37:10 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 2swngkw19a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 05 Jun 2019 13:37:09 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x55Db9LB022921; Wed, 5 Jun 2019 13:37:09 GMT Received: from localhost.localdomain (/73.60.114.248) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 05 Jun 2019 06:37:08 -0700 From: Daniel Jordan To: hannes@cmpxchg.org, jiangshanlai@gmail.com, lizefan@huawei.com, tj@kernel.org Cc: bsd@redhat.com, dan.j.williams@intel.com, daniel.m.jordan@oracle.com, dave.hansen@intel.com, juri.lelli@redhat.com, mhocko@kernel.org, peterz@infradead.org, steven.sistare@oracle.com, tglx@linutronix.de, tom.hromatka@oracle.com, vdavydov.dev@gmail.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 5/5] ktask, cgroup: attach helper threads to the master thread's cgroup Date: Wed, 5 Jun 2019 09:36:50 -0400 Message-Id: <20190605133650.28545-6-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190605133650.28545-1-daniel.m.jordan@oracle.com> References: <20190605133650.28545-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9278 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906050087 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP ktask tasks are expensive, and helper threads are not currently throttled by the master's cgroup, so helpers' resource usage is unbounded. Attach helper threads to the master thread's cgroup to ensure helpers get this throttling. It's possible for the master to be migrated to a new cgroup before the task is finished. In that case, to keep it simple, the helpers continue executing in the original cgroup. Signed-off-by: Daniel Jordan --- include/linux/cgroup.h | 26 ++++++++++++++++++++++++++ kernel/ktask.c | 32 ++++++++++++++++++++------------ 2 files changed, 46 insertions(+), 12 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index de578e29077b..67b2c469f17f 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -532,6 +532,28 @@ static inline struct cgroup *task_dfl_cgroup(struct task_struct *task) return task_css_set(task)->dfl_cgrp; } +/** + * task_get_dfl_cgroup - find and get the default hierarchy cgroup for task + * @task: the target task + * + * Find the default hierarchy cgroup for @task, take a reference on it, and + * return it. Guaranteed to return a valid cgroup. + */ +static inline struct cgroup *task_get_dfl_cgroup(struct task_struct *task) +{ + struct cgroup *cgroup; + + rcu_read_lock(); + while (true) { + cgroup = task_dfl_cgroup(task); + if (likely(css_tryget_online(&cgroup->self))) + break; + cpu_relax(); + } + rcu_read_unlock(); + return cgroup; +} + static inline struct cgroup *cgroup_dfl_root(void) { return &cgrp_dfl_root.cgrp; @@ -705,6 +727,10 @@ static inline struct cgroup *task_dfl_cgroup(struct task_struct *task) { return NULL; } +static inline struct cgroup *task_get_dfl_cgroup(struct task_struct *task) +{ + return NULL; +} static inline int cgroup_attach_task_all(struct task_struct *from, struct task_struct *t) { return 0; } static inline int cgroupstats_build(struct cgroupstats *stats, diff --git a/kernel/ktask.c b/kernel/ktask.c index 15d62ed7c67e..b047f30f77fa 100644 --- a/kernel/ktask.c +++ b/kernel/ktask.c @@ -14,6 +14,7 @@ #ifdef CONFIG_KTASK +#include #include #include #include @@ -49,7 +50,7 @@ enum ktask_work_flags { /* Used to pass ktask data to the workqueue API. */ struct ktask_work { - struct work_struct kw_work; + struct cgroup_work kw_work; struct ktask_task *kw_task; int kw_ktask_node_i; int kw_queue_nid; @@ -76,6 +77,7 @@ struct ktask_task { size_t kt_nr_nodes; size_t kt_nr_nodes_left; int kt_error; /* first error from thread_func */ + struct cgroup *kt_cgroup; #ifdef CONFIG_LOCKDEP struct lockdep_map kt_lockdep_map; #endif @@ -103,16 +105,16 @@ static void ktask_init_work(struct ktask_work *kw, struct ktask_task *kt, { /* The master's work is always on the stack--in __ktask_run_numa. */ if (flags & KTASK_WORK_MASTER) - INIT_WORK_ONSTACK(&kw->kw_work, ktask_thread); + INIT_CGROUP_WORK_ONSTACK(&kw->kw_work, ktask_thread); else - INIT_WORK(&kw->kw_work, ktask_thread); + INIT_CGROUP_WORK(&kw->kw_work, ktask_thread); kw->kw_task = kt; kw->kw_ktask_node_i = ktask_node_i; kw->kw_queue_nid = queue_nid; kw->kw_flags = flags; } -static void ktask_queue_work(struct ktask_work *kw) +static void ktask_queue_work(struct ktask_work *kw, struct cgroup *cgroup) { struct workqueue_struct *wq; @@ -128,7 +130,8 @@ static void ktask_queue_work(struct ktask_work *kw) } WARN_ON(!wq); - WARN_ON(!queue_work_node(kw->kw_queue_nid, wq, &kw->kw_work)); + WARN_ON(!queue_cgroup_work_node(kw->kw_queue_nid, wq, &kw->kw_work, + cgroup)); } /* Returns true if we're migrating this part of the task to another node. */ @@ -163,14 +166,15 @@ static bool ktask_node_migrate(struct ktask_node *old_kn, struct ktask_node *kn, WARN_ON(kw->kw_flags & (KTASK_WORK_FINISHED | KTASK_WORK_UNDO)); ktask_init_work(kw, kt, ktask_node_i, new_queue_nid, kw->kw_flags); - ktask_queue_work(kw); + ktask_queue_work(kw, kt->kt_cgroup); return true; } static void ktask_thread(struct work_struct *work) { - struct ktask_work *kw = container_of(work, struct ktask_work, kw_work); + struct cgroup_work *cw = container_of(work, struct cgroup_work, work); + struct ktask_work *kw = container_of(cw, struct ktask_work, kw_work); struct ktask_task *kt = kw->kw_task; struct ktask_ctl *kc = &kt->kt_ctl; struct ktask_node *kn = &kt->kt_nodes[kw->kw_ktask_node_i]; @@ -455,7 +459,7 @@ static void __ktask_wait_for_completion(struct ktask_task *kt, while (!(READ_ONCE(work->kw_flags) & KTASK_WORK_FINISHED)) cpu_relax(); } else { - flush_work_at_nice(&work->kw_work, task_nice(current)); + flush_work_at_nice(&work->kw_work.work, task_nice(current)); } } @@ -530,15 +534,18 @@ int __ktask_run_numa(struct ktask_node *nodes, size_t nr_nodes, kt.kt_chunk_size = ktask_chunk_size(kt.kt_total_size, ctl->kc_min_chunk_size, nr_works); + /* Ensure the master's cgroup throttles helper threads. */ + kt.kt_cgroup = task_get_dfl_cgroup(current); list_for_each_entry(work, &unfinished_works, kw_list) - ktask_queue_work(work); + ktask_queue_work(work, kt.kt_cgroup); /* Use the current thread, which saves starting a workqueue worker. */ ktask_init_work(&kw, &kt, 0, nodes[0].kn_nid, KTASK_WORK_MASTER); INIT_LIST_HEAD(&kw.kw_list); - ktask_thread(&kw.kw_work); + ktask_thread(&kw.kw_work.work); ktask_wait_for_completion(&kt, &unfinished_works, &finished_works); + cgroup_put(kt.kt_cgroup); if (kt.kt_error != KTASK_RETURN_SUCCESS && ctl->kc_undo_func) ktask_undo(nodes, nr_nodes, ctl, &finished_works, &kw); @@ -611,13 +618,14 @@ void __init ktask_init(void) if (!ktask_rlim_init()) goto out; - ktask_wq = alloc_workqueue("ktask_wq", WQ_UNBOUND, 0); + ktask_wq = alloc_workqueue("ktask_wq", WQ_UNBOUND | WQ_CGROUP, 0); if (!ktask_wq) { pr_warn("disabled (failed to alloc ktask_wq)"); goto out; } - ktask_nonuma_wq = alloc_workqueue("ktask_nonuma_wq", WQ_UNBOUND, 0); + ktask_nonuma_wq = alloc_workqueue("ktask_nonuma_wq", + WQ_UNBOUND | WQ_CGROUP, 0); if (!ktask_nonuma_wq) { pr_warn("disabled (failed to alloc ktask_nonuma_wq)"); goto alloc_fail;