From patchwork Sat Aug 3 14:01:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11074449 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07277912 for ; Sat, 3 Aug 2019 14:02:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC93F2883C for ; Sat, 3 Aug 2019 14:02:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E07D6288C7; Sat, 3 Aug 2019 14:02:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A76D2883C for ; Sat, 3 Aug 2019 14:02:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A8336B000C; Sat, 3 Aug 2019 10:02:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 17C296B000D; Sat, 3 Aug 2019 10:02:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06A196B000E; Sat, 3 Aug 2019 10:02:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id D8FB86B000C for ; Sat, 3 Aug 2019 10:02:06 -0400 (EDT) Received: by mail-qt1-f198.google.com with SMTP id r58so70911256qtb.5 for ; Sat, 03 Aug 2019 07:02:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references; bh=JC5z+niHC2ys8Ss7xrjM0YoJZI8L5s31HXoeqm/We84=; b=RPKk8i+JMX2a6/aYq5nxe14Mtb7QtIWYuOp6flCPoojyL3dvRn4E/8yxhWitZ0Xsni D0sn1/mlio0MadDBwn6V8zLTbe7VXTNjaLb/OG3YIeJQwvGPgF/nQ+5td1eCBnn9S3H4 1If/iz5hL0RMoWV7BybpBNovihdUrhBPb+dv3wr5GY38sA9Cblb7K8sHoDW44s4/6Kkx 0JovAG1FIdAgpnJFeWMZLgtfW06ZPrLoJlgTXT9dN2ziGxUZ89mvKLCU+P/sK9YyJJyz Y64ydTdlfJSh+qMiDlbG4CiMxb1iQg9xfxA3RUsfe4ZIs3awxMtmVE9pRzzGsvsGDvyo fvww== X-Gm-Message-State: APjAAAWEb5WsECVhZfkIuIIOguN90NaXT9rGogI405COR2VPna50ovoj DFa9B3U8WYWjL8V/1/3yEourEIccWthmlx2BspWUDLRHa7+ZFtallI6g83rCRTGC1eCp3tqM9Yl QBESXT9k7ieXuxT58XFFs7rDjvz+pe+tpjfn63JHPJKKU4fKscOsEixwyCir/Nig= X-Received: by 2002:a05:620a:1456:: with SMTP id i22mr92912167qkl.170.1564840926609; Sat, 03 Aug 2019 07:02:06 -0700 (PDT) X-Received: by 2002:a05:620a:1456:: with SMTP id i22mr92912027qkl.170.1564840925049; Sat, 03 Aug 2019 07:02:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564840925; cv=none; d=google.com; s=arc-20160816; b=YWxQG2t3emzrdTtqhUq1mbcCIkPLH0GhJHqPuWcIOrG4hx2Qx8MQ8qPTjRMuplOqqR jvcCsPJWtvrvppbKNTbhfa+SL4oQ6OAGpL7I6pY/C1W5v4hG3QA1mR+XBG6jVyHLpXYM b/SWqotTTks3TkT36EDfV2GnEbowEi7alUx3MgNGcJSl8il6O41Ems2NRLauog2WVTZv ynF1xpW2DCfzrxoj980VqOryZkVsh7hA0vMJaNe6e2TZB3VQQpK0Gs5CjZx0NTjyMtod PVUbIn3QvoG4I8qqpVgxmd6sca6NJbDMdZLZkKUfLHKX89xhJacKAwdUUXbpNSWgm6pW jM3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from:sender :dkim-signature; bh=JC5z+niHC2ys8Ss7xrjM0YoJZI8L5s31HXoeqm/We84=; b=0W0eYimZnv2f1+T+HUEiAopxGL+FvZKesCV10CxmXG/gfBfRMTHnATywRuslTRUG8N 7pXrc9NKrRDtWpHZkI6698uYAE6jm2Q0JU8uPCQp5SknaYqTQgDzJf0lRgrcD6E17Btx 5A90ohRPGqw1KhF2FGpggZZtKVChcjnEyOtaYri/IySIxjWblDRTwJhPugvSWWH1bugA Dd8RBIQYbze1tnar8ehvW0mMEahlwRr++UgJCjG+ribCmIxgEos+4Av8UIR7OW0b1mLk zxNhuJL/U5fVna4ERHboX7u9JKqyadKiR8GhZKcAlhUekhrICNz+d0wGtIqzWshtHEYb 75PA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=DJEr2Am8; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id c77sor44751194qkg.24.2019.08.03.07.02.05 for (Google Transport Security); Sat, 03 Aug 2019 07:02:05 -0700 (PDT) Received-SPF: pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=DJEr2Am8; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=JC5z+niHC2ys8Ss7xrjM0YoJZI8L5s31HXoeqm/We84=; b=DJEr2Am8aUs/Bxqbf8Oe1iNAGSg2kMKLdQ1AG21tQFIdXSnuJzMrosYcYzf2ROze8q 4YFxAny8N5FSx8a9UauG8ni7CRArp5pGSkfNJRKfGu8oQzlfzzVKCAEt+vO4W/RpE9Dx lrqdPORRoI6J1OSpz7UB36KWiMg0Xksy51goxvY6Sv2uO2Jdx2b2nzslAQekssUa/WZz G/HYMZPcaLxsVH3jXbGhlN5J2hG8HxvRWM6XJJHS7T2cyi89iPkOERZFo/bjgewLN7gG W2sjjgE9Ztufd2wWKhxBY0zmqn0yMHZ/FlnMpz0mdXg4p+E8JbMfNy+IumRdWrnn1u4E 0Zng== X-Google-Smtp-Source: APXvYqx6kDOVLownb5M9dvvva0Pt9lpYX5+xnbJUF0bi4bWBczYyE77hErwO6xWSYUie2OGNdCdh/w== X-Received: by 2002:a05:620a:12c4:: with SMTP id e4mr4463975qkl.81.1564840924616; Sat, 03 Aug 2019 07:02:04 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::efce]) by smtp.gmail.com with ESMTPSA id t76sm34716927qke.79.2019.08.03.07.02.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Aug 2019 07:02:03 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk, jack@suse.cz, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, guro@fb.com, akpm@linux-foundation.org, Tejun Heo Subject: [PATCH 1/4] writeback: Generalize and expose wb_completion Date: Sat, 3 Aug 2019 07:01:52 -0700 Message-Id: <20190803140155.181190-2-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190803140155.181190-1-tj@kernel.org> References: <20190803140155.181190-1-tj@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP wb_completion is used to track writeback completions. We want to use it from memcg side for foreign inode flushes. This patch updates it to remember the target waitq instead of assuming bdi->wb_waitq and expose it outside of fs-writeback.c. Signed-off-by: Tejun Heo Reviewed-by: Jan Kara --- fs/fs-writeback.c | 47 ++++++++++---------------------- include/linux/backing-dev-defs.h | 20 ++++++++++++++ include/linux/backing-dev.h | 2 ++ 3 files changed, 36 insertions(+), 33 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 542b02d170f8..6129debdc938 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -36,10 +36,6 @@ */ #define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) -struct wb_completion { - atomic_t cnt; -}; - /* * Passed into wb_writeback(), essentially a subset of writeback_control */ @@ -60,19 +56,6 @@ struct wb_writeback_work { struct wb_completion *done; /* set if the caller waits */ }; -/* - * If one wants to wait for one or more wb_writeback_works, each work's - * ->done should be set to a wb_completion defined using the following - * macro. Once all work items are issued with wb_queue_work(), the caller - * can wait for the completion of all using wb_wait_for_completion(). Work - * items which are waited upon aren't freed automatically on completion. - */ -#define DEFINE_WB_COMPLETION_ONSTACK(cmpl) \ - struct wb_completion cmpl = { \ - .cnt = ATOMIC_INIT(1), \ - } - - /* * If an inode is constantly having its pages dirtied, but then the * updates stop dirtytime_expire_interval seconds in the past, it's @@ -182,7 +165,7 @@ static void finish_writeback_work(struct bdi_writeback *wb, if (work->auto_free) kfree(work); if (done && atomic_dec_and_test(&done->cnt)) - wake_up_all(&wb->bdi->wb_waitq); + wake_up_all(done->waitq); } static void wb_queue_work(struct bdi_writeback *wb, @@ -206,20 +189,18 @@ static void wb_queue_work(struct bdi_writeback *wb, /** * wb_wait_for_completion - wait for completion of bdi_writeback_works - * @bdi: bdi work items were issued to * @done: target wb_completion * * Wait for one or more work items issued to @bdi with their ->done field - * set to @done, which should have been defined with - * DEFINE_WB_COMPLETION_ONSTACK(). This function returns after all such - * work items are completed. Work items which are waited upon aren't freed + * set to @done, which should have been initialized with + * DEFINE_WB_COMPLETION(). This function returns after all such work items + * are completed. Work items which are waited upon aren't freed * automatically on completion. */ -static void wb_wait_for_completion(struct backing_dev_info *bdi, - struct wb_completion *done) +void wb_wait_for_completion(struct wb_completion *done) { atomic_dec(&done->cnt); /* put down the initial count */ - wait_event(bdi->wb_waitq, !atomic_read(&done->cnt)); + wait_event(*done->waitq, !atomic_read(&done->cnt)); } #ifdef CONFIG_CGROUP_WRITEBACK @@ -843,7 +824,7 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, restart: rcu_read_lock(); list_for_each_entry_continue_rcu(wb, &bdi->wb_list, bdi_node) { - DEFINE_WB_COMPLETION_ONSTACK(fallback_work_done); + DEFINE_WB_COMPLETION(fallback_work_done, bdi); struct wb_writeback_work fallback_work; struct wb_writeback_work *work; long nr_pages; @@ -890,7 +871,7 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, last_wb = wb; rcu_read_unlock(); - wb_wait_for_completion(bdi, &fallback_work_done); + wb_wait_for_completion(&fallback_work_done); goto restart; } rcu_read_unlock(); @@ -2362,7 +2343,8 @@ static void wait_sb_inodes(struct super_block *sb) static void __writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr, enum wb_reason reason, bool skip_if_busy) { - DEFINE_WB_COMPLETION_ONSTACK(done); + struct backing_dev_info *bdi = sb->s_bdi; + DEFINE_WB_COMPLETION(done, bdi); struct wb_writeback_work work = { .sb = sb, .sync_mode = WB_SYNC_NONE, @@ -2371,14 +2353,13 @@ static void __writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr, .nr_pages = nr, .reason = reason, }; - struct backing_dev_info *bdi = sb->s_bdi; if (!bdi_has_dirty_io(bdi) || bdi == &noop_backing_dev_info) return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); bdi_split_work_to_wbs(sb->s_bdi, &work, skip_if_busy); - wb_wait_for_completion(bdi, &done); + wb_wait_for_completion(&done); } /** @@ -2440,7 +2421,8 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb); */ void sync_inodes_sb(struct super_block *sb) { - DEFINE_WB_COMPLETION_ONSTACK(done); + struct backing_dev_info *bdi = sb->s_bdi; + DEFINE_WB_COMPLETION(done, bdi); struct wb_writeback_work work = { .sb = sb, .sync_mode = WB_SYNC_ALL, @@ -2450,7 +2432,6 @@ void sync_inodes_sb(struct super_block *sb) .reason = WB_REASON_SYNC, .for_sync = 1, }; - struct backing_dev_info *bdi = sb->s_bdi; /* * Can't skip on !bdi_has_dirty() because we should wait for !dirty @@ -2464,7 +2445,7 @@ void sync_inodes_sb(struct super_block *sb) /* protect against inode wb switch, see inode_switch_wbs_work_fn() */ bdi_down_write_wb_switch_rwsem(bdi); bdi_split_work_to_wbs(bdi, &work, false); - wb_wait_for_completion(bdi, &done); + wb_wait_for_completion(&done); bdi_up_write_wb_switch_rwsem(bdi); wait_sb_inodes(sb); diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 6a1a8a314d85..8fb740178d5d 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -67,6 +67,26 @@ enum wb_reason { WB_REASON_MAX, }; +struct wb_completion { + atomic_t cnt; + wait_queue_head_t *waitq; +}; + +#define __WB_COMPLETION_INIT(_waitq) \ + (struct wb_completion){ .cnt = ATOMIC_INIT(1), .waitq = (_waitq) } + +/* + * If one wants to wait for one or more wb_writeback_works, each work's + * ->done should be set to a wb_completion defined using the following + * macro. Once all work items are issued with wb_queue_work(), the caller + * can wait for the completion of all using wb_wait_for_completion(). Work + * items which are waited upon aren't freed automatically on completion. + */ +#define WB_COMPLETION_INIT(bdi) __WB_COMPLETION_INIT(&(bdi)->wb_waitq) + +#define DEFINE_WB_COMPLETION(cmpl, bdi) \ + struct wb_completion cmpl = WB_COMPLETION_INIT(bdi) + /* * For cgroup writeback, multiple wb's may map to the same blkcg. Those * wb's can operate mostly independently but should share the congested diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 35b31d176f74..02650b1253a2 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -44,6 +44,8 @@ void wb_start_background_writeback(struct bdi_writeback *wb); void wb_workfn(struct work_struct *work); void wb_wakeup_delayed(struct bdi_writeback *wb); +void wb_wait_for_completion(struct wb_completion *done); + extern spinlock_t bdi_lock; extern struct list_head bdi_list; From patchwork Sat Aug 3 14:01:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11074453 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B638517E0 for ; Sat, 3 Aug 2019 14:02:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5F102883C for ; Sat, 3 Aug 2019 14:02:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 99ACB288C7; Sat, 3 Aug 2019 14:02:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B05F28895 for ; Sat, 3 Aug 2019 14:02:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1B146B000D; Sat, 3 Aug 2019 10:02:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A560D6B000E; Sat, 3 Aug 2019 10:02:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 944C56B0010; Sat, 3 Aug 2019 10:02:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 6FEB06B000D for ; Sat, 3 Aug 2019 10:02:08 -0400 (EDT) Received: by mail-qk1-f197.google.com with SMTP id s25so67324255qkj.18 for ; Sat, 03 Aug 2019 07:02:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references; bh=nLlHstn4NE1DLTa8MBcCUwC1TFpRtd2dCdrLnBskzhg=; b=ERsIH9B5dQLL/Jg6hmvc2Ny5Ua8OifVyix0mvPsH1iU4UGoed9sQUi7rHH/JlVWATc arPZYQkdQdn7SII/j0Z9gvosWDXbyoRAaROx7cmwZfEYkd9UAvy72TJTCJRIFAWA0l1k 8fMNhlp4Fpi/iEE7Z430++NgjzU2qOvSaxXvGyFbCqbtLbEb3LSVEqVQMoOsUFDoG/z0 miMvr2xTsMwNm0sSLKRPAE/VSlu0qFgzS+utNmj+oafH+CDLxh1kdIxWNARy01ITnVXS Py/wTgmP2lOod+nBTPdnN13x4ZHFCCIyiuZdHblBGgzRUK2/h9SHcUqQzh2Bu2iF/AJ3 bRIQ== X-Gm-Message-State: APjAAAXr26WHNwkeULbfSS0sksloCjJgqXi81zo1iYHskfznyFkw7HS2 2Mko+3HrWZnMp8orNx4GywB+8a8qPb4D2RroHchyBuqno3AbOXHvLWvf8DrfW09UFXB5QqTt8df ezqLHxH1tv3J9IlGNBEPmmxOHWq4citXf+KEPaB2b2fZqs7JIdANJjtkAzFDhF44= X-Received: by 2002:a05:620a:12c4:: with SMTP id e4mr4464285qkl.81.1564840928235; Sat, 03 Aug 2019 07:02:08 -0700 (PDT) X-Received: by 2002:a05:620a:12c4:: with SMTP id e4mr4464208qkl.81.1564840927355; Sat, 03 Aug 2019 07:02:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564840927; cv=none; d=google.com; s=arc-20160816; b=xt8yoiA6NVKNpuOp9GDu7mKlonMnwcaQ8YbSv33lmbfClspZT2Ssvgzigs0Nq2dt/f iJwzUH5C/v+CsnlEfkCVvJDcAMofBIUtmmyDd3ymKP/IawS8GU/0jDp3rBGRfUFg1oh9 Q2M53S+Am30at2d21vO63NkyTOiXAP9FH3ZjhmZ8F+nzQK3Q/ANQx6GL7tOvKse8UnHF qKBsQjQT1cPY7WmQJJh6ewlDGMqCKAaIHY9l7EAt/g+r8RVFNNMzrKQm3wa8k7MVpTCZ de53W1BDQdn3UUBwVJ3jk4xg9/0n+QHgshCp4cqJUrVIZVPGYo2Zq2QeejRwKdv0ItJH oVFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from:sender :dkim-signature; bh=nLlHstn4NE1DLTa8MBcCUwC1TFpRtd2dCdrLnBskzhg=; b=h0vMLZvf5R9MgsX9zvELoKJ3MPP5/l+6utelzrUfUwF4fHuoG3ly69lcw21Biww2ck N2hoLwndNgdRJFvwG7pvnCregqoPuZZag+FE+ZXZMf+IuTwlc4YSqw1WdezdEOHfXzNg UgmLjgMUZWDjqP6H13hPUJvk7+hRjHxKyDCCtkH00yKVdH4Z8xeVNeNC+wxhFOZVJlCs aqef14KmnSUWMEo6XCf2gNxbB178iURcMrX0VYDSx9y3nhHYbMpAnDk4Ui6yyjLMC/Cd DDr7SqtSkOLaJNJXBrRFmbdnP9YDr6QRb5MucNGCWT5lV+eMztLOBk5mURxzz9evWk+2 vCdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=b1oxKsey; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id u27sor66605789qvf.16.2019.08.03.07.02.07 for (Google Transport Security); Sat, 03 Aug 2019 07:02:07 -0700 (PDT) Received-SPF: pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=b1oxKsey; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=nLlHstn4NE1DLTa8MBcCUwC1TFpRtd2dCdrLnBskzhg=; b=b1oxKseyt1LyDYjoudvzC1hpTJo++RG9fGoMA1s5KW/D/iHMPVK750EZAfJwkzPSC6 uURCombHdllFO2iJlD57v5pIos+dxvvSVQwNR38Ny/GBOnKlw3SJFzjyc2Qsuc8Wwc4N Nv28xKTg1sJX4TSYvunTzqqhf8k+Oc6jCcIbufFqWeH3EwtZeLgDx/DSQPLU8DuSyjcQ tXnApdDsE/ZGqpT6H9+Ft332dtpil1zcgHKQRF3ZYGzFf9LB7cESicyzbnyBG2Jh2Q+8 1RzH6XX1hi4m9gPi55cwHUJjDsdDWzfPe3th0+5QgvDKEqfhnUeE8IgKdzsVYoJsOxdk oCMw== X-Google-Smtp-Source: APXvYqzMiari5I3bSyI7xBIGI8+SdQjIZjsTWhlYOhUC+G2ITji3BPgjA4OSJA43R/PapwQ5OGKLAw== X-Received: by 2002:a0c:895b:: with SMTP id 27mr99111155qvq.94.1564840926938; Sat, 03 Aug 2019 07:02:06 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::efce]) by smtp.gmail.com with ESMTPSA id z1sm38529457qkg.103.2019.08.03.07.02.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Aug 2019 07:02:06 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk, jack@suse.cz, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, guro@fb.com, akpm@linux-foundation.org, Tejun Heo Subject: [PATCH 2/4] bdi: Add bdi->id Date: Sat, 3 Aug 2019 07:01:53 -0700 Message-Id: <20190803140155.181190-3-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190803140155.181190-1-tj@kernel.org> References: <20190803140155.181190-1-tj@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There currently is no way to universally identify and lookup a bdi without holding a reference and pointer to it. This patch adds an non-recycling bdi->id and implements bdi_get_by_id() which looks up bdis by their ids. This will be used by memcg foreign inode flushing. I left bdi_list alone for simplicity and because while rb_tree does support rcu assignment it doesn't seem to guarantee lossless walk when walk is racing aginst tree rebalance operations. Signed-off-by: Tejun Heo Reviewed-by: Jan Kara --- include/linux/backing-dev-defs.h | 2 + include/linux/backing-dev.h | 1 + mm/backing-dev.c | 65 +++++++++++++++++++++++++++++++- 3 files changed, 66 insertions(+), 2 deletions(-) diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 8fb740178d5d..1075f2552cfc 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -185,6 +185,8 @@ struct bdi_writeback { }; struct backing_dev_info { + u64 id; + struct rb_node rb_node; /* keyed by ->id */ struct list_head bdi_list; unsigned long ra_pages; /* max readahead in PAGE_SIZE units */ unsigned long io_pages; /* max allowed IO size */ diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 02650b1253a2..84cdcfbc763f 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -24,6 +24,7 @@ static inline struct backing_dev_info *bdi_get(struct backing_dev_info *bdi) return bdi; } +struct backing_dev_info *bdi_get_by_id(u64 id); void bdi_put(struct backing_dev_info *bdi); __printf(2, 3) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index e8e89158adec..4a8816e0b8d4 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only #include +#include #include #include #include @@ -22,10 +23,12 @@ EXPORT_SYMBOL_GPL(noop_backing_dev_info); static struct class *bdi_class; /* - * bdi_lock protects updates to bdi_list. bdi_list has RCU reader side - * locking. + * bdi_lock protects bdi_tree and updates to bdi_list. bdi_list has RCU + * reader side locking. */ DEFINE_SPINLOCK(bdi_lock); +static u64 bdi_id_cursor; +static struct rb_root bdi_tree = RB_ROOT; LIST_HEAD(bdi_list); /* bdi_wq serves all asynchronous writeback tasks */ @@ -859,9 +862,58 @@ struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id) } EXPORT_SYMBOL(bdi_alloc_node); +struct rb_node **bdi_lookup_rb_node(u64 id, struct rb_node **parentp) +{ + struct rb_node **p = &bdi_tree.rb_node; + struct rb_node *parent = NULL; + struct backing_dev_info *bdi; + + lockdep_assert_held(&bdi_lock); + + while (*p) { + parent = *p; + bdi = rb_entry(parent, struct backing_dev_info, rb_node); + + if (bdi->id > id) + p = &(*p)->rb_left; + else if (bdi->id < id) + p = &(*p)->rb_right; + else + break; + } + + if (parentp) + *parentp = parent; + return p; +} + +/** + * bdi_get_by_id - lookup and get bdi from its id + * @id: bdi id to lookup + * + * Find bdi matching @id and get it. Returns NULL if the matching bdi + * doesn't exist or is already unregistered. + */ +struct backing_dev_info *bdi_get_by_id(u64 id) +{ + struct backing_dev_info *bdi = NULL; + struct rb_node **p; + + spin_lock_irq(&bdi_lock); + p = bdi_lookup_rb_node(id, NULL); + if (*p) { + bdi = rb_entry(*p, struct backing_dev_info, rb_node); + bdi_get(bdi); + } + spin_unlock_irq(&bdi_lock); + + return bdi; +} + int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args) { struct device *dev; + struct rb_node *parent, **p; if (bdi->dev) /* The driver needs to use separate queues per device */ return 0; @@ -877,7 +929,15 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args) set_bit(WB_registered, &bdi->wb.state); spin_lock_bh(&bdi_lock); + + bdi->id = ++bdi_id_cursor; + + p = bdi_lookup_rb_node(bdi->id, &parent); + rb_link_node(&bdi->rb_node, parent, p); + rb_insert_color(&bdi->rb_node, &bdi_tree); + list_add_tail_rcu(&bdi->bdi_list, &bdi_list); + spin_unlock_bh(&bdi_lock); trace_writeback_bdi_register(bdi); @@ -918,6 +978,7 @@ EXPORT_SYMBOL(bdi_register_owner); static void bdi_remove_from_list(struct backing_dev_info *bdi) { spin_lock_bh(&bdi_lock); + rb_erase(&bdi->rb_node, &bdi_tree); list_del_rcu(&bdi->bdi_list); spin_unlock_bh(&bdi_lock); From patchwork Sat Aug 3 14:01:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11074455 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BEED4912 for ; Sat, 3 Aug 2019 14:02:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE5712883C for ; Sat, 3 Aug 2019 14:02:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A2798288C7; Sat, 3 Aug 2019 14:02:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 223992883C for ; Sat, 3 Aug 2019 14:02:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A8EB6B000E; Sat, 3 Aug 2019 10:02:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 881626B0010; Sat, 3 Aug 2019 10:02:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 770B66B0266; Sat, 3 Aug 2019 10:02:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id 4D80F6B000E for ; Sat, 3 Aug 2019 10:02:11 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id 199so67648131qkj.9 for ; Sat, 03 Aug 2019 07:02:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references; bh=D9orjRjfDP9pgbxu2ifvLHy1ukGCDC2EnGjf/tPTneg=; b=oTPA7Qexkeiq4pY8qRtR96BHj6gYFf+o1RKZUjbkNMVrEn7GL4lKQlQw6Csqg2bAAF 0qP2l6dh/tZGy5SVYmZmnJO8BnDm3Uas1DzrBV5dksQIsEEF1NOSPJf1m/dtLnqgQ1tE vX4I97cvX2TAO6YmcOT54zDUmUlTBeaJAR3EdYZT+AmbH4Zof6o5e9/kcAvwwdmmpRxZ Ya9QP1Z1IhytM9GryspmNNUtGUa7QKOCUPLYrtnLRXjme6n7swS2wpwGiWR9TkWIDXji ldwxbkvJHsH6RksjuLLkiOnGxPYF7jg7eKomIQkVrjS2y3iplqIS2Qq8NlZ2dwpuBwuO 7HaQ== X-Gm-Message-State: APjAAAXONW1vOav8sbX4YNQL8/OK8nfd6FiA7N4PwY60nIM623IucGDU jtT305fKKCSzi2xOG8pVDUNZ7WS6wY5SDK9mm8Jbn2ATlm9ZPvcWBCAf5efyaXT43UYvuHSblqB oKxQ7AsT1wqu+tfi0ulCwxQzbd/kMLUY8Fps5YWV8GbPdLKgqq+B5nzDWQohPjio= X-Received: by 2002:a0c:9891:: with SMTP id f17mr103091367qvd.49.1564840931056; Sat, 03 Aug 2019 07:02:11 -0700 (PDT) X-Received: by 2002:a0c:9891:: with SMTP id f17mr103091240qvd.49.1564840929726; Sat, 03 Aug 2019 07:02:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564840929; cv=none; d=google.com; s=arc-20160816; b=Z6d14RWy6lFKRl3izdZ743qPamj/wYre0cE6Qs6SJkE/BQBoYX3QUtmnee+ntX4EO3 A6bW47cwYVM/ut1aL44mLkkVKd7JWNFH+IwGrpU+dm6C9g8aXeUEL2hKepqaONJo/k3b OxCqtoLNauVi/Yf9vk+WUh5J/8TeZHgK7fm/kio9uLqPaMPrO09xa9MEqhd6KPe2tkfX nWeVaFHi1RFj0wkPhN6UdZublYF86olzkK69J3AnhVE739SwQPypgqFGPYspSaXzH9OD Uvielk0nHVrTkf6i6Wp89BNaTf8BEIfgjRu7k3d6RrqqSaPnOtn/lp0cQ/jcSE26cSpt d2HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from:sender :dkim-signature; bh=D9orjRjfDP9pgbxu2ifvLHy1ukGCDC2EnGjf/tPTneg=; b=DGXJv4GwhhuSsigUK1Sevta3sEBkEmYsP1jLaL1CU5x2emMIWmRQFyupDiVz7HhGp6 1uKiU20p4MHHMPrqgy0MNW6nzZNgdevNr+GYpcKM+JFLOsUBTJrxmNw6cwmhbVCiLA2G 7CwcRQ9VlRmXbciWxU/NMjDh9McmKfhL1He5xFNYHZvKP5oY4IqJEDUjdSHyrrl5NHo0 9KichTirT1aahkmYKcJrbG+toyVSvTTng7mBwP54FPpHfpXJkLVxHhqtqL5X+6OqTjSV moirVaFqCpGtfjKuLAkvDMadULd+avCjQS/cb2JVCNyT9tpxRT9wHkyE/JyEfOt39/GS eM8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cYPLPoMt; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id c8sor103963910qtc.13.2019.08.03.07.02.09 for (Google Transport Security); Sat, 03 Aug 2019 07:02:09 -0700 (PDT) Received-SPF: pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cYPLPoMt; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=D9orjRjfDP9pgbxu2ifvLHy1ukGCDC2EnGjf/tPTneg=; b=cYPLPoMt3nONdxSxEBH2DXkIoLL+c9kekNeZLQCrDU7eCfsa88/QnKRrpo2uMjGpnt 8yr36k8tiB7bRtBvd+j8pp9LWfqhmue6miSiyxYELzY5FOgfWc3t58Snbg5xMh32u7rt 53KM77N+vC78kPnMnfPjHh98M4fvvJg+TDfRrGu/otL4kB49lI+RXnGcUJn6ggTljn6T 9eY1HB5jWO32Zsqkc6HchBQ3JKLeJC5KoocZEgyb7QzmFt3SrF//DO0yYyQMG7d5kpQr HLMxXYD/noRAOoXbQ7C1kXaOd/Rh95WKAb4novdD59drxwXue25gMwWDx4RmQHus7pYV t6ug== X-Google-Smtp-Source: APXvYqzq4ivE4tAq5lf09J8xU7TbLV1kaIEe1KkD0cgBL2Q8YLk017OZlnCCIbvPhSic+6Qi+z3/Ew== X-Received: by 2002:ac8:c45:: with SMTP id l5mr96088707qti.63.1564840929275; Sat, 03 Aug 2019 07:02:09 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::efce]) by smtp.gmail.com with ESMTPSA id 18sm35265973qkh.77.2019.08.03.07.02.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Aug 2019 07:02:08 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk, jack@suse.cz, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, guro@fb.com, akpm@linux-foundation.org, Tejun Heo Subject: [PATCH 3/4] writeback, memcg: Implement cgroup_writeback_by_id() Date: Sat, 3 Aug 2019 07:01:54 -0700 Message-Id: <20190803140155.181190-4-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190803140155.181190-1-tj@kernel.org> References: <20190803140155.181190-1-tj@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Implement cgroup_writeback_by_id() which initiates cgroup writeback from bdi and memcg IDs. This will be used by memcg foreign inode flushing. Signed-off-by: Tejun Heo --- fs/fs-writeback.c | 64 +++++++++++++++++++++++++++++++++++++++ include/linux/writeback.h | 4 +++ 2 files changed, 68 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6129debdc938..5c79d7acefdb 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -880,6 +880,70 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, wb_put(last_wb); } +/** + * cgroup_writeback_by_id - initiate cgroup writeback from bdi and memcg IDs + * @bdi_id: target bdi id + * @memcg_id: target memcg css id + * @nr_pages: number of pages to write + * @reason: reason why some writeback work initiated + * @done: target wb_completion + * + * Initiate flush of the bdi_writeback identified by @bdi_id and @memcg_id + * with the specified parameters. + */ +int cgroup_writeback_by_id(u64 bdi_id, int memcg_id, unsigned long nr, + enum wb_reason reason, struct wb_completion *done) +{ + struct backing_dev_info *bdi; + struct cgroup_subsys_state *memcg_css; + struct bdi_writeback *wb; + struct wb_writeback_work *work; + int ret; + + /* lookup bdi and memcg */ + bdi = bdi_get_by_id(bdi_id); + if (!bdi) + return -ENOENT; + + rcu_read_lock(); + memcg_css = css_from_id(memcg_id, &memory_cgrp_subsys); + if (memcg_css && !css_tryget(memcg_css)) + memcg_css = NULL; + rcu_read_unlock(); + if (!memcg_css) { + ret = -ENOENT; + goto out_bdi_put; + } + + /* and find the associated wb */ + wb = wb_get_create(bdi, memcg_css, GFP_NOWAIT | __GFP_NOWARN); + if (!wb) { + ret = -ENOMEM; + goto out_css_put; + } + + /* issue the writeback work */ + work = kzalloc(sizeof(*work), GFP_NOWAIT | __GFP_NOWARN); + if (work) { + work->nr_pages = nr; + work->sync_mode = WB_SYNC_NONE; + work->reason = reason; + work->done = done; + work->auto_free = 1; + wb_queue_work(wb, work); + ret = 0; + } else { + ret = -ENOMEM; + } + + wb_put(wb); +out_css_put: + css_put(memcg_css); +out_bdi_put: + bdi_put(bdi); + return ret; +} + /** * cgroup_writeback_umount - flush inode wb switches for umount * diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 8945aac31392..ad794f2a7d42 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -217,6 +217,10 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc, void wbc_detach_inode(struct writeback_control *wbc); void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page, size_t bytes); +int cgroup_writeback_by_id(u64 bdi_id, int memcg_id, unsigned long nr_pages, + enum wb_reason reason, struct wb_completion *done); +int writeback_by_id(int id, unsigned long nr, enum wb_reason reason, + struct wb_completion *done); void cgroup_writeback_umount(void); /** From patchwork Sat Aug 3 14:01:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11074457 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6F8D1510 for ; Sat, 3 Aug 2019 14:02:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B37182883C for ; Sat, 3 Aug 2019 14:02:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A7945288C7; Sat, 3 Aug 2019 14:02:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DAF72883C for ; Sat, 3 Aug 2019 14:02:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7D0F6B0010; Sat, 3 Aug 2019 10:02:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A2D8D6B0266; Sat, 3 Aug 2019 10:02:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F8D26B0269; Sat, 3 Aug 2019 10:02:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 577B66B0010 for ; Sat, 3 Aug 2019 10:02:14 -0400 (EDT) Received: by mail-qt1-f200.google.com with SMTP id e22so5172687qtp.9 for ; Sat, 03 Aug 2019 07:02:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references; bh=S29WNWkQzU++CjjwYZJ/xpf/eIhd3FbOeiovUi2IChc=; b=AvzOeJ/+DFqU1Wv/sZuXyxurx92Fixpitw6ZgMGUSxF0NdAqTVywLOOhG9q4gLNKdT 6mxN7F2dapkfruW18artBeGD5WDIRS8w1N3QGB+TGyQSJ/RefNGfW3zXChhOG8obP2Ki qLhN25F/pfFnO8b6ah3TFfm5vHvm0D+oiywrTWSgNp6iKxg99BN7HyTy8zCWC4UvFAOg 3tzVyjzTH8xcqGqhLantXjBvCGCQF09CmPhdQ6VrcvC3wzvLx0HeQGziw41om8Q0laZI 0GyEvLZyzuA3Z3u4vzpMZ5YMFemADhfirZomwZs+QmTwEgYiVTT1oznKSOav6kjkHomh pWrw== X-Gm-Message-State: APjAAAXxe/YIdDbLYn+iZLJcb+ibRJrJKFoE7dGmS97wQBeO6KF2N06O xqpL86Xfq9B0EBMWAAHImuHx/elnISkrfOm18h4U1uNzMwVlKIA8ddBr++FBYwenem3AChJtPYX W2RDqLbnxJJtW0J3lT7qiK4DEGWSoa9RPjF8QM6uauX+4xk+3MS/5H14bP8+7+I4= X-Received: by 2002:aed:2fa7:: with SMTP id m36mr102331408qtd.344.1564840933931; Sat, 03 Aug 2019 07:02:13 -0700 (PDT) X-Received: by 2002:aed:2fa7:: with SMTP id m36mr102331254qtd.344.1564840932403; Sat, 03 Aug 2019 07:02:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564840932; cv=none; d=google.com; s=arc-20160816; b=DTWOORTuoWiF9Pkh3HrLN61Dn9eX+tBqRXyltbdx/D6JOpsL8GYXzMMDOA+YlcRHTq +KpKMerW2kjecxuu/9cHRs5HyeI67XEf7biaR9KyR47DIw4lP6wPFz8qaNgu8bHd15X0 n68urQyDA90GN7sHqZwkP1txe1Ubd2wZsWAijam58tUGlkaKk7y67be/s2FY7imis0qf ech80vPdoHe6Oh8PQsvmXJPiUX6uDXaFZ6AJBGodJCmIEXQ66+xrJdCu2i5CCxFJX0f8 qRqmXLXHlyWWxPwbCtrRh9n0pyJdSF3ALef8tsH04ArG87eZ6C3xzTNx0SAsLJWOFzSU 4D0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from:sender :dkim-signature; bh=S29WNWkQzU++CjjwYZJ/xpf/eIhd3FbOeiovUi2IChc=; b=tV9d/16Ahs6GOimcahJkPfOwoWwhx6u6+ovHqt+ADmRN9eZ2kGOQ0Vtm4govVFrcfC jCrnPIiKfP9gIFA5VnLOFbFIzWE90ISE7ajx/JPICq5WGme8MG5kyf43SgbN1cOck4bW wwuyVjk2oJFzdpC7KZhRBC0Qr3FnaVss/wvN1+nbgSO9o3O+zAgaE7CdFtyoYZONNkJ6 CtXRe/T56F3emXvP1mZxwoQTbYyANyzklPdpYhZT5uEAdzWaUEdp0UhuT9ihjuJT8gjM iQHPJxubUNgsnT/oHFGYiQbMzQc6c5NPUQUOYRcVik+7BLZUyB05ek/n1GIsEgbYd98f izRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kbhAFdlX; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id k30sor66144184qvk.58.2019.08.03.07.02.12 for (Google Transport Security); Sat, 03 Aug 2019 07:02:12 -0700 (PDT) Received-SPF: pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kbhAFdlX; spf=pass (google.com: domain of htejun@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=S29WNWkQzU++CjjwYZJ/xpf/eIhd3FbOeiovUi2IChc=; b=kbhAFdlXnWd69ZBZDBfysba4fmhy6LvtaMjAjv6emUktUDs6HxuvjRF2tJqq9UkllF CKg9z2vmBm0YgYfPgXKGTwZt8vm4+nnDAU0iQvLiSuq1OkuknWRrrFbtoX5pg+S3YPML uHzTU44N6Lnisgi96JuEDl7UzRQMCC9+eNGyP60qFVCNQmx6VXRmsCkCynQg8LLeuoW7 FCqbpm30rc6h4XpOmYBUYEKtUM0MkuCwODRd4du0o7Ag0kIX6zQR4SZz8cEWZvMeSIBh thK+ayzeTEM7BFeySrLgi0oCKB/HqBGrrAiymbVrCrdPr/NPmZxepLrtQsv9pJWC9NOF I9zg== X-Google-Smtp-Source: APXvYqzgZksolNUaImJ0bbGEqwWi8Rgfmv8HxxaOizjS3v6/bgfD1LnAwjuVrYkHuzwCxF4nhDGPQw== X-Received: by 2002:a05:6214:11ad:: with SMTP id u13mr8698935qvv.31.1564840931896; Sat, 03 Aug 2019 07:02:11 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::efce]) by smtp.gmail.com with ESMTPSA id i17sm31874735qkl.71.2019.08.03.07.02.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 03 Aug 2019 07:02:11 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk, jack@suse.cz, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, guro@fb.com, akpm@linux-foundation.org, Tejun Heo Subject: [PATCH 4/4] writeback, memcg: Implement foreign dirty flushing Date: Sat, 3 Aug 2019 07:01:55 -0700 Message-Id: <20190803140155.181190-5-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190803140155.181190-1-tj@kernel.org> References: <20190803140155.181190-1-tj@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There's an inherent mismatch between memcg and writeback. The former trackes ownership per-page while the latter per-inode. This was a deliberate design decision because honoring per-page ownership in the writeback path is complicated, may lead to higher CPU and IO overheads and deemed unnecessary given that write-sharing an inode across different cgroups isn't a common use-case. Combined with inode majority-writer ownership switching, this works well enough in most cases but there are some pathological cases. For example, let's say there are two cgroups A and B which keep writing to different but confined parts of the same inode. B owns the inode and A's memory is limited far below B's. A's dirty ratio can rise enough to trigger balance_dirty_pages() sleeps but B's can be low enough to avoid triggering background writeback. A will be slowed down without a way to make writeback of the dirty pages happen. This patch implements foreign dirty recording and foreign mechanism so that when a memcg encounters a condition as above it can trigger flushes on bdi_writebacks which can clean its pages. Please see the comment on top of mem_cgroup_track_foreign_dirty_slowpath() for details. A reproducer follows. write-range.c:: #include #include #include #include #include static const char *usage = "write-range FILE START SIZE\n"; int main(int argc, char **argv) { int fd; unsigned long start, size, end, pos; char *endp; char buf[4096]; if (argc < 4) { fprintf(stderr, usage); return 1; } fd = open(argv[1], O_WRONLY); if (fd < 0) { perror("open"); return 1; } start = strtoul(argv[2], &endp, 0); if (*endp != '\0') { fprintf(stderr, usage); return 1; } size = strtoul(argv[3], &endp, 0); if (*endp != '\0') { fprintf(stderr, usage); return 1; } end = start + size; while (1) { for (pos = start; pos < end; ) { long bread, bwritten = 0; if (lseek(fd, pos, SEEK_SET) < 0) { perror("lseek"); return 1; } bread = read(0, buf, sizeof(buf) < end - pos ? sizeof(buf) : end - pos); if (bread < 0) { perror("read"); return 1; } if (bread == 0) return 0; while (bwritten < bread) { long this; this = write(fd, buf + bwritten, bread - bwritten); if (this < 0) { perror("write"); return 1; } bwritten += this; pos += bwritten; } } } } repro.sh:: #!/bin/bash set -e set -x sysctl -w vm.dirty_expire_centisecs=300000 sysctl -w vm.dirty_writeback_centisecs=300000 sysctl -w vm.dirtytime_expire_seconds=300000 echo 3 > /proc/sys/vm/drop_caches TEST=/sys/fs/cgroup/test A=$TEST/A B=$TEST/B mkdir -p $A $B echo "+memory +io" > $TEST/cgroup.subtree_control echo $((1<<30)) > $A/memory.high echo $((32<<30)) > $B/memory.high rm -f testfile touch testfile fallocate -l 4G testfile echo "Starting B" (echo $BASHPID > $B/cgroup.procs pv -q --rate-limit 70M < /dev/urandom | ./write-range testfile $((2<<30)) $((2<<30))) & echo "Waiting 10s to ensure B claims the testfile inode" sleep 5 sync sleep 5 sync echo "Starting A" (echo $BASHPID > $A/cgroup.procs pv < /dev/urandom | ./write-range testfile 0 $((2<<30))) Signed-off-by: Tejun Heo --- include/linux/backing-dev-defs.h | 1 + include/linux/memcontrol.h | 35 +++++++++ mm/memcontrol.c | 125 +++++++++++++++++++++++++++++++ mm/page-writeback.c | 4 + 4 files changed, 165 insertions(+) diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 1075f2552cfc..4fc87dee005a 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -63,6 +63,7 @@ enum wb_reason { * so it has a mismatch name. */ WB_REASON_FORKER_THREAD, + WB_REASON_FOREIGN_FLUSH, WB_REASON_MAX, }; diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 44c41462be33..7422e4a4dbeb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -183,6 +183,19 @@ struct memcg_padding { #define MEMCG_PADDING(name) #endif +/* + * Remember four most recent foreign inodes with dirty pages on this + * cgroup. See mem_cgroup_track_foreign_dirty_slowpath() for details. + */ +#define MEMCG_CGWB_FRN_CNT 4 + +struct memcg_cgwb_frn { + u64 bdi_id; /* bdi->id of the foreign inode */ + int memcg_id; /* memcg->css.id of foreign inode */ + u64 at; /* jiffies_64 at the time of dirtying */ + struct wb_completion done; /* tracks in-flight foreign writebacks */ +}; + /* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide @@ -307,6 +320,7 @@ struct mem_cgroup { #ifdef CONFIG_CGROUP_WRITEBACK struct list_head cgwb_list; struct wb_domain cgwb_domain; + struct memcg_cgwb_frn cgwb_frn[MEMCG_CGWB_FRN_CNT]; #endif /* List of events which userspace want to receive */ @@ -1218,6 +1232,18 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, unsigned long *pheadroom, unsigned long *pdirty, unsigned long *pwriteback); +void mem_cgroup_track_foreign_dirty_slowpath(struct page *page, + struct bdi_writeback *wb); + +static inline void mem_cgroup_track_foreign_dirty(struct page *page, + struct bdi_writeback *wb) +{ + if (unlikely(&page->mem_cgroup->css != wb->memcg_css)) + mem_cgroup_track_foreign_dirty_slowpath(page, wb); +} + +void mem_cgroup_flush_foreign(struct bdi_writeback *wb); + #else /* CONFIG_CGROUP_WRITEBACK */ static inline struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb) @@ -1233,6 +1259,15 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb, { } +static inline void mem_cgroup_track_foreign_dirty(struct page *page, + struct bdi_writeback *wb) +{ +} + +static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb) +{ +} + #endif /* CONFIG_CGROUP_WRITEBACK */ struct sock; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index cdbb7a84cb6e..e642fbacb504 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -87,6 +87,10 @@ int do_swap_account __read_mostly; #define do_swap_account 0 #endif +#ifdef CONFIG_CGROUP_WRITEBACK +static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); +#endif + /* Whether legacy memory+swap accounting is active */ static bool do_memsw_account(void) { @@ -4145,6 +4149,118 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, } } +/* + * Foreign dirty flushing + * + * There's an inherent mismatch between memcg and writeback. The former + * trackes ownership per-page while the latter per-inode. This was a + * deliberate design decision because honoring per-page ownership in the + * writeback path is complicated, may lead to higher CPU and IO overheads + * and deemed unnecessary given that write-sharing an inode across + * different cgroups isn't a common use-case. + * + * Combined with inode majority-writer ownership switching, this works well + * enough in most cases but there are some pathological cases. For + * example, let's say there are two cgroups A and B which keep writing to + * different but confined parts of the same inode. B owns the inode and + * A's memory is limited far below B's. A's dirty ratio can rise enough to + * trigger balance_dirty_pages() sleeps but B's can be low enough to avoid + * triggering background writeback. A will be slowed down without a way to + * make writeback of the dirty pages happen. + * + * Conditions like the above can lead to a cgroup getting repatedly and + * severely throttled after making some progress after each + * dirty_expire_interval while the underyling IO device is almost + * completely idle. + * + * Solving this problem completely requires matching the ownership tracking + * granularities between memcg and writeback in either direction. However, + * the more egregious behaviors can be avoided by simply remembering the + * most recent foreign dirtying events and initiating remote flushes on + * them when local writeback isn't enough to keep the memory clean enough. + * + * The following two functions implement such mechanism. When a foreign + * page - a page whose memcg and writeback ownerships don't match - is + * dirtied, mem_cgroup_track_foreign_dirty() records the inode owning + * bdi_writeback on the page owning memcg. When balance_dirty_pages() + * decides that the memcg needs to sleep due to high dirty ratio, it calls + * mem_cgroup_flush_foreign() which queues writeback on the recorded + * foreign bdi_writebacks which haven't expired. Both the numbers of + * recorded bdi_writebacks and concurrent in-flight foreign writebacks are + * limited to MEMCG_CGWB_FRN_CNT. + * + * The mechanism only remembers IDs and doesn't hold any object references. + * As being wrong occasionally doesn't matter, updates and accesses to the + * records are lockless and racy. + */ +void mem_cgroup_track_foreign_dirty_slowpath(struct page *page, + struct bdi_writeback *wb) +{ + struct mem_cgroup *memcg = page->mem_cgroup; + struct memcg_cgwb_frn *frn; + u64 now = jiffies_64; + u64 oldest_at = now; + int oldest = -1; + int i; + + /* + * Pick the slot to use. If there is already a slot for @wb, keep + * using it. If not replace the oldest one which isn't being + * written out. + */ + for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) { + frn = &memcg->cgwb_frn[i]; + if (frn->bdi_id == wb->bdi->id && + frn->memcg_id == wb->memcg_css->id) + break; + if (frn->at < oldest_at && atomic_read(&frn->done.cnt) == 1) { + oldest = i; + oldest_at = frn->at; + } + } + + if (i < MEMCG_CGWB_FRN_CNT) { + unsigned long update_intv = + min_t(unsigned long, HZ, + msecs_to_jiffies(dirty_expire_interval * 10) / 8); + /* + * Re-using an existing one. Let's update timestamp lazily + * to avoid making the cacheline hot. + */ + if (frn->at < now - update_intv) + frn->at = now; + } else if (oldest >= 0) { + /* replace the oldest free one */ + frn = &memcg->cgwb_frn[oldest]; + frn->bdi_id = wb->bdi->id; + frn->memcg_id = wb->memcg_css->id; + frn->at = now; + } +} + +/* + * Issue foreign writeback flushes for recorded foreign dirtying events + * which haven't expired yet and aren't already being written out. + */ +void mem_cgroup_flush_foreign(struct bdi_writeback *wb) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); + unsigned long intv = msecs_to_jiffies(dirty_expire_interval * 10); + u64 now = jiffies_64; + int i; + + for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) { + struct memcg_cgwb_frn *frn = &memcg->cgwb_frn[i]; + + if (frn->at > now - intv && atomic_read(&frn->done.cnt) == 1) { + frn->at = 0; + cgroup_writeback_by_id(frn->bdi_id, frn->memcg_id, + LONG_MAX, WB_REASON_FOREIGN_FLUSH, + &frn->done); + } + } +} + #else /* CONFIG_CGROUP_WRITEBACK */ static int memcg_wb_domain_init(struct mem_cgroup *memcg, gfp_t gfp) @@ -4661,6 +4777,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) struct mem_cgroup *memcg; unsigned int size; int node; + int __maybe_unused i; size = sizeof(struct mem_cgroup); size += nr_node_ids * sizeof(struct mem_cgroup_per_node *); @@ -4704,6 +4821,9 @@ static struct mem_cgroup *mem_cgroup_alloc(void) #endif #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); + for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) + memcg->cgwb_frn[i].done = + __WB_COMPLETION_INIT(&memcg_cgwb_frn_waitq); #endif idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); return memcg; @@ -4833,7 +4953,12 @@ static void mem_cgroup_css_released(struct cgroup_subsys_state *css) static void mem_cgroup_css_free(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); + int __maybe_unused i; +#ifdef CONFIG_CGROUP_WRITEBACK + for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) + wb_wait_for_completion(&memcg->cgwb_frn[i].done); +#endif if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket) static_branch_dec(&memcg_sockets_enabled_key); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 1804f64ff43c..50055d2e4ea8 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1667,6 +1667,8 @@ static void balance_dirty_pages(struct bdi_writeback *wb, if (unlikely(!writeback_in_progress(wb))) wb_start_background_writeback(wb); + mem_cgroup_flush_foreign(wb); + /* * Calculate global domain's pos_ratio and select the * global dtc by default. @@ -2427,6 +2429,8 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) task_io_account_write(PAGE_SIZE); current->nr_dirtied++; this_cpu_inc(bdp_ratelimits); + + mem_cgroup_track_foreign_dirty(page, wb); } }