From patchwork Wed May 26 22:25:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12282841 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83D0BC47089 for ; Wed, 26 May 2021 22:26:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 29556613B6 for ; Wed, 26 May 2021 22:26:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 29556613B6 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AED606B0070; Wed, 26 May 2021 18:26:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC4A66B0071; Wed, 26 May 2021 18:26:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93F5C6B0072; Wed, 26 May 2021 18:26:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id 604FD6B0070 for ; Wed, 26 May 2021 18:26:10 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F139D180ACEE6 for ; Wed, 26 May 2021 22:26:09 +0000 (UTC) X-FDA: 78184816458.29.9DD602D Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf15.hostedemail.com (Postfix) with ESMTP id 38A93A0001CC for ; Wed, 26 May 2021 22:26:05 +0000 (UTC) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.0.43/8.16.0.43) with SMTP id 14QMFLhH031498 for ; Wed, 26 May 2021 15:26:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=SomAxtUJP5xUs6M5RbPKXBYaY+1g3U8hbmApZbQkBSA=; b=cw6uRiKZOfclCxXdffrSg1fiRHvvHelL+0/e/qlYhkP9+Zn9ieSEkRE7HI9dpDqvHhWN 47K4sb36sIxRY1pSKpkkuaX6qDwENybWzM32qrt86pWqRKxdsWBJiZ3gsxZ0DKElyhsX 0QYW3oDAW5tNxiUIkUuEeDFkEM6f9IRf/rE= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net with ESMTP id 38smauvd14-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 26 May 2021 15:26:08 -0700 Received: from intmgw002.46.prn1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 26 May 2021 15:26:04 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id A5F517B6ABB5; Wed, 26 May 2021 15:25:58 -0700 (PDT) From: Roman Gushchin To: Jan Kara , Tejun Heo CC: , , , Alexander Viro , Dennis Zhou , Dave Chinner , , Roman Gushchin Subject: [PATCH v5 1/2] writeback, cgroup: keep list of inodes attached to bdi_writeback Date: Wed, 26 May 2021 15:25:56 -0700 Message-ID: <20210526222557.3118114-2-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210526222557.3118114-1-guro@fb.com> References: <20210526222557.3118114-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: w6z4lN0t1M6ARsgJr8GF29m4WW1gAWnN X-Proofpoint-ORIG-GUID: w6z4lN0t1M6ARsgJr8GF29m4WW1gAWnN X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-05-26_12:2021-05-26,2021-05-26 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 lowpriorityscore=0 malwarescore=0 phishscore=0 adultscore=0 priorityscore=1501 impostorscore=0 clxscore=1015 mlxscore=0 suspectscore=0 mlxlogscore=586 spamscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2105260151 X-FB-Internal: deliver X-Rspamd-Queue-Id: 38A93A0001CC Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=cw6uRiKZ; dmarc=pass (policy=reject) header.from=fb.com; spf=pass (imf15.hostedemail.com: domain of "prvs=4780f431ea=guro@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=4780f431ea=guro@fb.com" X-Rspamd-Server: rspam03 X-Stat-Signature: zewq76x4ay7s646r4adu6kjwodebpgqt X-HE-Tag: 1622067965-60853 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently there is no way to iterate over inodes attached to a specific cgwb structure. It limits the ability to efficiently reclaim the writeback structure itself and associated memory and block cgroup structures without scanning all inodes belonging to a sb, which can be prohibitively expensive. While dirty/in-active-writeback an inode belongs to one of the bdi_writeback's io lists: b_dirty, b_io, b_more_io and b_dirty_time. Once cleaned up, it's removed from all io lists. So the inode->i_io_list can be reused to maintain the list of inodes, attached to a bdi_writeback structure. This patch introduces a new wb->b_attached list, which contains all inodes which were dirty at least once and are attached to the given cgwb. Inodes attached to the root bdi_writeback structures are never placed on such list. The following patch will use this list to try to release cgwbs structures more efficiently. Suggested-by: Jan Kara Signed-off-by: Roman Gushchin --- fs/fs-writeback.c | 66 ++++++++++++++++++++++++-------- include/linux/backing-dev-defs.h | 1 + include/linux/backing-dev.h | 7 ++++ include/linux/writeback.h | 1 + mm/backing-dev.c | 2 + 5 files changed, 60 insertions(+), 17 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e91980f49388..631ef6366293 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -135,18 +135,23 @@ static bool inode_io_list_move_locked(struct inode *inode, * inode_io_list_del_locked - remove an inode from its bdi_writeback IO list * @inode: inode to be removed * @wb: bdi_writeback @inode is being removed from + * @final: inode is going to be freed and can't reappear on any IO list * * Remove @inode which may be on one of @wb->b_{dirty|io|more_io} lists and * clear %WB_has_dirty_io if all are empty afterwards. */ static void inode_io_list_del_locked(struct inode *inode, - struct bdi_writeback *wb) + struct bdi_writeback *wb, + bool final) { assert_spin_locked(&wb->list_lock); assert_spin_locked(&inode->i_lock); inode->i_state &= ~I_SYNC_QUEUED; - list_del_init(&inode->i_io_list); + if (final) + list_del_init(&inode->i_io_list); + else + inode_cgwb_move_to_attached(inode, wb); wb_io_lists_depopulated(wb); } @@ -278,6 +283,25 @@ void __inode_attach_wb(struct inode *inode, struct page *page) } EXPORT_SYMBOL_GPL(__inode_attach_wb); +/** + * inode_cgwb_move_to_attached - put the inode onto wb->b_attached list + * @inode: inode of interest with i_lock held + * @wb: target bdi_writeback + * + * Remove the inode from wb's io lists and if necessarily put onto b_attached + * list. Only inodes attached to cgwb's are kept on this list. + */ +void inode_cgwb_move_to_attached(struct inode *inode, struct bdi_writeback *wb) +{ + assert_spin_locked(&wb->list_lock); + assert_spin_locked(&inode->i_lock); + + if (wb != &wb->bdi->wb) + list_move(&inode->i_io_list, &wb->b_attached); + else + list_del_init(&inode->i_io_list); +} + /** * locked_inode_to_wb_and_lock_list - determine a locked inode's wb and lock it * @inode: inode of interest with i_lock held @@ -419,21 +443,29 @@ static void inode_switch_wbs_work_fn(struct work_struct *work) wb_get(new_wb); /* - * Transfer to @new_wb's IO list if necessary. The specific list - * @inode was on is ignored and the inode is put on ->b_dirty which - * is always correct including from ->b_dirty_time. The transfer - * preserves @inode->dirtied_when ordering. + * Transfer to @new_wb's IO list if necessary. If the @inode is dirty, + * the specific list @inode was on is ignored and the @inode is put on + * ->b_dirty which is always correct including from ->b_dirty_time. + * The transfer preserves @inode->dirtied_when ordering. If the @inode + * was clean, it means it was on the b_attached list, so move it onto + * the b_attached list of @new_wb. */ if (!list_empty(&inode->i_io_list)) { - struct inode *pos; - - inode_io_list_del_locked(inode, old_wb); + inode_io_list_del_locked(inode, old_wb, true); inode->i_wb = new_wb; - list_for_each_entry(pos, &new_wb->b_dirty, i_io_list) - if (time_after_eq(inode->dirtied_when, - pos->dirtied_when)) - break; - inode_io_list_move_locked(inode, new_wb, pos->i_io_list.prev); + + if (inode->i_state & I_DIRTY_ALL) { + struct inode *pos; + + list_for_each_entry(pos, &new_wb->b_dirty, i_io_list) + if (time_after_eq(inode->dirtied_when, + pos->dirtied_when)) + break; + inode_io_list_move_locked(inode, new_wb, + pos->i_io_list.prev); + } else { + inode_cgwb_move_to_attached(inode, new_wb); + } } else { inode->i_wb = new_wb; } @@ -1124,7 +1156,7 @@ void inode_io_list_del(struct inode *inode) wb = inode_to_wb_and_lock_list(inode); spin_lock(&inode->i_lock); - inode_io_list_del_locked(inode, wb); + inode_io_list_del_locked(inode, wb, true); spin_unlock(&inode->i_lock); spin_unlock(&wb->list_lock); } @@ -1437,7 +1469,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb, inode->i_state &= ~I_SYNC_QUEUED; } else { /* The inode is clean. Remove from writeback lists. */ - inode_io_list_del_locked(inode, wb); + inode_io_list_del_locked(inode, wb, false); } } @@ -1589,7 +1621,7 @@ static int writeback_single_inode(struct inode *inode, * responsible for the writeback lists. */ if (!(inode->i_state & I_DIRTY_ALL)) - inode_io_list_del_locked(inode, wb); + inode_io_list_del_locked(inode, wb, false); spin_unlock(&wb->list_lock); inode_sync_complete(inode); out: diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index fff9367a6348..e5dc238ebe4f 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -154,6 +154,7 @@ struct bdi_writeback { struct cgroup_subsys_state *blkcg_css; /* and blkcg */ struct list_head memcg_node; /* anchored at memcg->cgwb_list */ struct list_head blkcg_node; /* anchored at blkcg->cgwb_list */ + struct list_head b_attached; /* attached inodes, protected by list_lock */ union { struct work_struct release_work; diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 44df4fcef65c..4256e66802e6 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -177,6 +177,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi, void wb_memcg_offline(struct mem_cgroup *memcg); void wb_blkcg_offline(struct blkcg *blkcg); int inode_congested(struct inode *inode, int cong_bits); +void inode_cgwb_move_to_attached(struct inode *inode, struct bdi_writeback *wb); /** * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode @@ -345,6 +346,12 @@ static inline bool inode_cgwb_enabled(struct inode *inode) return false; } +static inline void inode_cgwb_move_to_attached(struct inode *inode, + struct bdi_writeback *wb) +{ + list_del_init(&inode->i_io_list); +} + static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi) { return &bdi->wb; diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 8e5c5bb16e2d..572a13c40c90 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -212,6 +212,7 @@ static inline void wait_on_inode(struct inode *inode) #include void __inode_attach_wb(struct inode *inode, struct page *page); +void inode_cgwb_move_to_attached(struct inode *inode, struct bdi_writeback *wb); void wbc_attach_and_unlock_inode(struct writeback_control *wbc, struct inode *inode) __releases(&inode->i_lock); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 576220acd686..54c5dc4b8c24 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -396,6 +396,7 @@ static void cgwb_release_workfn(struct work_struct *work) fprop_local_destroy_percpu(&wb->memcg_completions); percpu_ref_exit(&wb->refcnt); wb_exit(wb); + WARN_ON_ONCE(!list_empty(&wb->b_attached)); kfree_rcu(wb, rcu); } @@ -472,6 +473,7 @@ static int cgwb_create(struct backing_dev_info *bdi, wb->memcg_css = memcg_css; wb->blkcg_css = blkcg_css; + INIT_LIST_HEAD(&wb->b_attached); INIT_WORK(&wb->release_work, cgwb_release_workfn); set_bit(WB_registered, &wb->state); From patchwork Wed May 26 22:25:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12282837 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBAAEC47089 for ; Wed, 26 May 2021 22:26:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 525A9613D4 for ; Wed, 26 May 2021 22:26:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 525A9613D4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8CAFF6B0036; Wed, 26 May 2021 18:26:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87AFA6B006E; Wed, 26 May 2021 18:26:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CD746B0070; Wed, 26 May 2021 18:26:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 302EF6B0036 for ; Wed, 26 May 2021 18:26:05 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C5A35824999B for ; Wed, 26 May 2021 22:26:04 +0000 (UTC) X-FDA: 78184816248.26.2E9708E Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf29.hostedemail.com (Postfix) with ESMTP id 9D5B82C2 for ; Wed, 26 May 2021 22:25:54 +0000 (UTC) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 14QMGl9J012054 for ; Wed, 26 May 2021 15:26:02 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=upg3t5shzXRGgTa68wVqIPfFpLcD5pG4JjjszPkvr3s=; b=pXmMxe1EcrBE6Auo/Et4S0yZjyZTd8RcDj6U9hkGzzsAn71VE/KiEsvXtVmVZLAzQNi+ WCldWrLZAgkoHU/D+Jwf/tuXmbD1fbxqXGqCMrxjh7bhunQ3XE04SrRgW8eenkIdzB91 awrqNZTU8gOmelDl+h3a1vdu9ejJIyDZ0NU= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 38sm3fvckt-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 26 May 2021 15:26:02 -0700 Received: from intmgw001.27.prn2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 26 May 2021 15:26:01 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id ABC017B6ABB7; Wed, 26 May 2021 15:25:58 -0700 (PDT) From: Roman Gushchin To: Jan Kara , Tejun Heo CC: , , , Alexander Viro , Dennis Zhou , Dave Chinner , , Roman Gushchin Subject: [PATCH v5 2/2] writeback, cgroup: release dying cgwbs by switching attached inodes Date: Wed, 26 May 2021 15:25:57 -0700 Message-ID: <20210526222557.3118114-3-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210526222557.3118114-1-guro@fb.com> References: <20210526222557.3118114-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: ARWhYAYkeD5spE7wOnJWEMiz8V4j5qlr X-Proofpoint-GUID: ARWhYAYkeD5spE7wOnJWEMiz8V4j5qlr X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-05-26_12:2021-05-26,2021-05-26 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 spamscore=0 mlxscore=0 phishscore=0 lowpriorityscore=0 suspectscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 adultscore=0 clxscore=1015 mlxlogscore=697 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2105260151 X-FB-Internal: deliver Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=pXmMxe1E; dmarc=pass (policy=reject) header.from=fb.com; spf=pass (imf29.hostedemail.com: domain of "prvs=4780f431ea=guro@fb.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=4780f431ea=guro@fb.com" X-Stat-Signature: nfxtbz1st1rbqjsff6xhjpimy7e7795y X-Rspamd-Queue-Id: 9D5B82C2 X-Rspamd-Server: rspam02 X-HE-Tag: 1622067954-700488 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Asynchronously try to release dying cgwbs by switching clean attached inodes to the bdi's wb. It helps to get rid of per-cgroup writeback structures themselves and of pinned memory and block cgroups, which are way larger structures (mostly due to large per-cpu statistics data). It helps to prevent memory waste and different scalability problems caused by large piles of dying cgroups. A cgwb cleanup operation can fail due to different reasons (e.g. the cgwb has in-glight/pending io, an attached inode is locked or isn't clean, etc). In this case the next scheduled cleanup will make a new attempt. An attempt is made each time a new cgwb is offlined (in other words a memcg and/or a blkcg is deleted by a user). In the future an additional attempt scheduled by a timer can be implemented. Signed-off-by: Roman Gushchin --- fs/fs-writeback.c | 35 ++++++++++++++++++ include/linux/backing-dev-defs.h | 1 + include/linux/writeback.h | 1 + mm/backing-dev.c | 61 ++++++++++++++++++++++++++++++-- 4 files changed, 96 insertions(+), 2 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 631ef6366293..8fbcd50844f0 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -577,6 +577,41 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id) kfree(isw); } +/** + * cleanup_offline_wb - detach associated clean inodes + * @wb: target wb + * + * Switch the inode->i_wb pointer of the attached inodes to the bdi's wb and + * drop the corresponding per-cgroup wb's reference. Skip inodes which are + * dirty, freeing, in the active writeback process or are in any way busy. + */ +void cleanup_offline_wb(struct bdi_writeback *wb) +{ + struct inode *inode, *tmp; + + spin_lock(&wb->list_lock); +restart: + list_for_each_entry_safe(inode, tmp, &wb->b_attached, i_io_list) { + if (!spin_trylock(&inode->i_lock)) + continue; + xa_lock_irq(&inode->i_mapping->i_pages); + if ((inode->i_state & I_REFERENCED) != I_REFERENCED) { + struct bdi_writeback *bdi_wb = &inode_to_bdi(inode)->wb; + + WARN_ON_ONCE(inode->i_wb != wb); + + inode->i_wb = bdi_wb; + list_del_init(&inode->i_io_list); + wb_put(wb); + } + xa_unlock_irq(&inode->i_mapping->i_pages); + spin_unlock(&inode->i_lock); + if (cond_resched_lock(&wb->list_lock)) + goto restart; + } + spin_unlock(&wb->list_lock); +} + /** * wbc_attach_and_unlock_inode - associate wbc with target inode and unlock it * @wbc: writeback_control of interest diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index e5dc238ebe4f..07d6b6d6dbdf 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -155,6 +155,7 @@ struct bdi_writeback { struct list_head memcg_node; /* anchored at memcg->cgwb_list */ struct list_head blkcg_node; /* anchored at blkcg->cgwb_list */ struct list_head b_attached; /* attached inodes, protected by list_lock */ + struct list_head offline_node; /* anchored at offline_cgwbs */ union { struct work_struct release_work; diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 572a13c40c90..922f15fe6ad4 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -222,6 +222,7 @@ void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page, int cgroup_writeback_by_id(u64 bdi_id, int memcg_id, unsigned long nr_pages, enum wb_reason reason, struct wb_completion *done); void cgroup_writeback_umount(void); +void cleanup_offline_wb(struct bdi_writeback *wb); /** * inode_attach_wb - associate an inode with its wb diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 54c5dc4b8c24..92a00bcaa504 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -371,12 +371,16 @@ static void wb_exit(struct bdi_writeback *wb) #include /* - * cgwb_lock protects bdi->cgwb_tree, blkcg->cgwb_list, and memcg->cgwb_list. - * bdi->cgwb_tree is also RCU protected. + * cgwb_lock protects bdi->cgwb_tree, blkcg->cgwb_list, offline_cgwbs and + * memcg->cgwb_list. bdi->cgwb_tree is also RCU protected. */ static DEFINE_SPINLOCK(cgwb_lock); static struct workqueue_struct *cgwb_release_wq; +static LIST_HEAD(offline_cgwbs); +static void cleanup_offline_cgwbs_workfn(struct work_struct *work); +static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn); + static void cgwb_release_workfn(struct work_struct *work) { struct bdi_writeback *wb = container_of(work, struct bdi_writeback, @@ -395,6 +399,7 @@ static void cgwb_release_workfn(struct work_struct *work) fprop_local_destroy_percpu(&wb->memcg_completions); percpu_ref_exit(&wb->refcnt); + WARN_ON(!list_empty(&wb->offline_node)); wb_exit(wb); WARN_ON_ONCE(!list_empty(&wb->b_attached)); kfree_rcu(wb, rcu); @@ -414,6 +419,10 @@ static void cgwb_kill(struct bdi_writeback *wb) WARN_ON(!radix_tree_delete(&wb->bdi->cgwb_tree, wb->memcg_css->id)); list_del(&wb->memcg_node); list_del(&wb->blkcg_node); + if (!list_empty(&wb->b_attached)) + list_add(&wb->offline_node, &offline_cgwbs); + else + INIT_LIST_HEAD(&wb->offline_node); percpu_ref_kill(&wb->refcnt); } @@ -635,6 +644,50 @@ static void cgwb_bdi_unregister(struct backing_dev_info *bdi) mutex_unlock(&bdi->cgwb_release_mutex); } +/** + * cleanup_offline_cgwbs - try to release dying cgwbs + * + * Try to release dying cgwbs by switching attached inodes to the wb + * belonging to the root memory cgroup. Processed wbs are placed at the + * end of the list to guarantee the forward progress. + * + * Should be called with the acquired cgwb_lock lock, which might + * be released and re-acquired in the process. + */ +static void cleanup_offline_cgwbs_workfn(struct work_struct *work) +{ + struct bdi_writeback *wb; + LIST_HEAD(processed); + + spin_lock_irq(&cgwb_lock); + + while (!list_empty(&offline_cgwbs)) { + wb = list_first_entry(&offline_cgwbs, struct bdi_writeback, + offline_node); + list_move(&wb->offline_node, &processed); + + if (wb_has_dirty_io(wb)) + continue; + + if (!percpu_ref_tryget(&wb->refcnt)) + continue; + + spin_unlock_irq(&cgwb_lock); + cleanup_offline_wb(wb); + spin_lock_irq(&cgwb_lock); + + if (list_empty(&wb->b_attached)) + list_del_init(&wb->offline_node); + + wb_put(wb); + } + + if (!list_empty(&processed)) + list_splice_tail(&processed, &offline_cgwbs); + + spin_unlock_irq(&cgwb_lock); +} + /** * wb_memcg_offline - kill all wb's associated with a memcg being offlined * @memcg: memcg being offlined @@ -650,6 +703,10 @@ void wb_memcg_offline(struct mem_cgroup *memcg) list_for_each_entry_safe(wb, next, memcg_cgwb_list, memcg_node) cgwb_kill(wb); memcg_cgwb_list->next = NULL; /* prevent new wb's */ + + if (!list_empty(&offline_cgwbs)) + schedule_work(&cleanup_offline_cgwbs_work); + spin_unlock_irq(&cgwb_lock); }