From patchwork Fri Jun 14 00:33:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C0A314BB for ; Fri, 14 Jun 2019 00:34:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E9C326E78 for ; Fri, 14 Jun 2019 00:34:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41E50277D9; Fri, 14 Jun 2019 00:34:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB41526E78 for ; Fri, 14 Jun 2019 00:34:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726933AbfFNAd6 (ORCPT ); Thu, 13 Jun 2019 20:33:58 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:35864 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725778AbfFNAd5 (ORCPT ); Thu, 13 Jun 2019 20:33:57 -0400 Received: by mail-pg1-f196.google.com with SMTP id f21so468220pgi.3; Thu, 13 Jun 2019 17:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=g3Gi8DffQm5IvbH4amu3rwd2AgRjtOkwiizeIPZ8bC4=; b=rvQAZUJFX2R3lXeU2x+V5wBlwbIoWXPu3GKOpeFiQ3RXyJ1jeAL0/e17CGNft2rhrX C98IKGfrqAdktxu2RyjD5e1YGBrg5sfaCFVnJDNNW6Y/NSb5zoqFp+7w8XY5UkSP/seM bk20mFQAQQxbfMlLT+qamrbbOzCmt2KeQFlC+N+G9Qq8dYBWMYG5dfy6yAyw+93V6DZm +kMwOKX8sB+EoZr1wlDTQUZBnYVMT7j/CyFnld5HkKVlc6uZDffm1lZfoI7+G+jfw0PB tdwbWTTkcSxzFKGpX4RtffK4GeyLz2nCmYoXYmwK5yNEbtCE9zQituvUyScIaAot4bJM amLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=g3Gi8DffQm5IvbH4amu3rwd2AgRjtOkwiizeIPZ8bC4=; b=YMyBUj3GvHFnctlMF8ifG3/+8XqF0laNGJ0tyoBGbEkV4SVVqHzWTrt9AREhD0vhCT Y9941RDmFIyn0uhcdyoZFtZE/Yd+fmMpgGDbpWJOJFJISEj+8k3YyF0qUeQlzwuzkLYL emVyh76esJml5Zps15bKRkm9/24dqIfZfxJWVVxOQeCiYkD355ZPGqp9HTChsRKN7Ujc lUixvjvOjK1M3G8bZJocFpHYNJoC+4ExJcJm0C/LN5yDrmbn1G1A30cH1WcPCe2dyQ5Z Q/Cd8KJeQmogUblsyXd0Cydh9D3Bq9vanODX2t7+upt18kFRxtTa385TK6v/Yop5d24Y uDXg== X-Gm-Message-State: APjAAAWkybZC8h1tZi3HerXvXRMy6cFYcA1l59seBhizETVf5Sc7UIqq PgL+l+EnBw//3VuII4bJbUY= X-Google-Smtp-Source: APXvYqwSECXMKx4WcTqubSqZ2goTeL0wAPRtCcx0+/9G0MyUC9o8uoV6sIiRlA7ObP0dRXQGYDtgAw== X-Received: by 2002:a17:90a:bf84:: with SMTP id d4mr8049813pjs.124.1560472436352; Thu, 13 Jun 2019 17:33:56 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id l7sm879098pfl.9.2019.06.13.17.33.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:33:55 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com, Tejun Heo Subject: [PATCH 1/8] blkcg, writeback: Add wbc->no_wbc_acct Date: Thu, 13 Jun 2019 17:33:43 -0700 Message-Id: <20190614003350.1178444-2-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When writeback IOs are bounced through async layers, the IOs should only be accounted against the wbc from the original bdi writeback to avoid confusing cgroup inode ownership arbitration. Add wbc->no_wbc_acct to allow disabling wbc accounting. This will be used make btfs compression work well with cgroup IO control. Signed-off-by: Tejun Heo Reviewed-by: Josef Bacik --- fs/fs-writeback.c | 2 +- include/linux/writeback.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 36855c1f8daf..6c8061a49ca0 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -721,7 +721,7 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page, * behind a slow cgroup. Ultimately, we want pageout() to kick off * regular writeback instead of writing things out itself. */ - if (!wbc->wb) + if (!wbc->wb || wbc->no_wbc_acct) return; id = mem_cgroup_css_from_page(page)->id; diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 738a0c24874f..b8f5f000cde4 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -68,6 +68,7 @@ struct writeback_control { unsigned for_reclaim:1; /* Invoked from the page allocator */ unsigned range_cyclic:1; /* range_start is cyclic */ unsigned for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ + unsigned no_wbc_acct:1; /* skip wbc IO accounting */ #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ struct inode *inode; /* inode being written out */ From patchwork Fri Jun 14 00:33:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993923 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D77F314BB for ; Fri, 14 Jun 2019 00:34:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9AB926E78 for ; Fri, 14 Jun 2019 00:34:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE094277D9; Fri, 14 Jun 2019 00:34:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B0D5A26E78 for ; Fri, 14 Jun 2019 00:34:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727040AbfFNAeB (ORCPT ); Thu, 13 Jun 2019 20:34:01 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:35200 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727001AbfFNAd7 (ORCPT ); Thu, 13 Jun 2019 20:33:59 -0400 Received: by mail-pg1-f195.google.com with SMTP id s27so471771pgl.2; Thu, 13 Jun 2019 17:33:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=J+WIIloOg8R89ClRTeMiBEyvJHmr12Pt7f1GOmgErrg=; b=YhRTqGTux1UNwJbR9l++EjV3FIfs4oZYe2F/+74GkB28vP13HoaYTz1OOgDbnCwuAn 7PTQk5aAmp3XVlQFWDwzMZKCoiY2Nt4VrW/UCS4aucRv8Tkl1lSnWF8trTiEPjVnzVEp OIyCgZUn+Hq6FjOA/cwjGotXXWat43A8uaMwa139+gx/OkrnHit+w4xLIxpuarPGfWzc bg1XsyDLeM5ikjMmd3AVa7DXwf4VE9am6/XJE7BQh14YR5Ax7uZg+EjXhY5OWbISVx2q XZuk2A+iD3Oj2Qb+QmOJrWS4CX8tNF1TYSxOGeFNyhjgxD7LPjVSkW9VQoHF7eSwcQzV 0JiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=J+WIIloOg8R89ClRTeMiBEyvJHmr12Pt7f1GOmgErrg=; b=qvM4KuI3Rv9cgI926sawFC2XTf7pnW3YfR5IS9b8RSEUvP//r9PnResikaD149+wKh 7UfjhozhvUt7elezpuC2J3og2O/k1PPurRzf6bqAGrJn5cZArYHs5Ic3gQYdXCltGDzZ At4z9+26ZN48h6HLo0cIAhB6WOHDSUYKcSF7Qh5NZZ9OeaX0yJhjRLdaj+g48siQLsk4 S2F9HN32sbrHIRKHlEZoxvaATdPQ7io7K1whpr0gNpbeflJ/7CKP7xZu/ULc+ZZJ9C6E ZOR647C3KTQzTfOyffQR37rKAvFku2+2xtHH1xIUR5SvGFnKkVEaWVsdoX+uYRoHKJRJ aBMw== X-Gm-Message-State: APjAAAUNZc2n1gEdNlzm7O3JtUO/EvbqdC5SAUDKfpJuKaF4V/ZLxNig GUZuuwpIecyohqExmn9omVk= X-Google-Smtp-Source: APXvYqxJXtZl5ddqNJkiz0UB18vPyNmtxAoGE34xxniDb2UxO2ThLJNJOSjuSWiMS9pD4PVo8R7OjQ== X-Received: by 2002:a17:90a:bb01:: with SMTP id u1mr7185132pjr.92.1560472438757; Thu, 13 Jun 2019 17:33:58 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id p1sm827742pff.74.2019.06.13.17.33.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:33:58 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com, Tejun Heo Subject: [PATCH 2/8] blkcg, writeback: Implement wbc_blkcg_css() Date: Thu, 13 Jun 2019 17:33:44 -0700 Message-Id: <20190614003350.1178444-3-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add a helper to determine the target blkcg from wbc. Signed-off-by: Tejun Heo Reviewed-by: Josef Bacik --- include/linux/writeback.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index b8f5f000cde4..1c85563f035d 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -93,6 +93,16 @@ static inline int wbc_to_write_flags(struct writeback_control *wbc) return 0; } +static inline struct cgroup_subsys_state * +wbc_blkcg_css(struct writeback_control *wbc) +{ +#ifdef CONFIG_CGROUP_WRITEBACK + if (wbc->wb) + return wbc->wb->blkcg_css; +#endif + return blkcg_root_css; +} + /* * A wb_domain represents a domain that wb's (bdi_writeback's) belong to * and are measured against each other in. There always is one global From patchwork Fri Jun 14 00:33:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 451A114BB for ; Fri, 14 Jun 2019 00:34:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 36EC326E78 for ; Fri, 14 Jun 2019 00:34:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B8BD277D9; Fri, 14 Jun 2019 00:34:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61A3726E78 for ; Fri, 14 Jun 2019 00:34:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727141AbfFNAeE (ORCPT ); Thu, 13 Jun 2019 20:34:04 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:37522 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727070AbfFNAeC (ORCPT ); Thu, 13 Jun 2019 20:34:02 -0400 Received: by mail-pg1-f194.google.com with SMTP id 20so466050pgr.4; Thu, 13 Jun 2019 17:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=n3Cqzxv6aTGPD2mSNBYzcUSmQ9497Z24wwCs2a6QkSk=; b=SCib9XfntYcn4LcVls7DGrduqu9RTCtFqpZHTyb/qLI0lY2kANW4iijqmCQUxyknSk KMvulF1Ve+rXxV49s4csFqHDwf0rU6UEFdf3YJV1chwPkaWVKOG9L4sLxOj3Idl5fNMz I1fuswDCDx5Vi+l0RZKENjE8pGym0asVZLSrHfeu1ycoyL7cZ5xR3m6FQ1GeYrojGhvI BYGOVqorb4nk5XS9XZuVZ3e5V0cCSDNCZ8TNlVK9UIMO7vPoYg3x+a/bNsKuy4e/QidM QroIDSZkQtGCEgUy/BrHIAXTtzFYBXF7cSmnPgsZS/s3hvBZAjaE/ZViGJzBYgQA80vq niVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=n3Cqzxv6aTGPD2mSNBYzcUSmQ9497Z24wwCs2a6QkSk=; b=MZTF+e93PAwnalKd8XxtU+Ifspq/ODyHpU1Y6vzRCv6EeA5FYzphYmJ8LcL8+Ed5P7 ZDhY3HKuReDfgxrfH+zRFbwBGUDSoHqY4RJP2Mnfq+pFKQDlhsCzmzfvp1oAzxBRlkz8 e/b/uTS6sbQ3jdWWF/DjLv2boCoh+qEmiKTENkmrE/+EFcpi44hjKXtg02dkDyFcTxQ7 7DxUBQUm6bTrhKAF/p7dh1A/zi7Yb24HWOAEDPtrATKLcIN67tyEEZDWbloa4BAOQxQM P2j2+lW3RXqezuYbpfRMvUWzborBol90yO5NjtWCDtw4iOtudSeVkNsIlWwl9OoFnTvA xuxw== X-Gm-Message-State: APjAAAWeOYiJhHAbxJdfEJZu8ZOuNcgwZ+uzYnKbRI3BJEetKKZ1g8Si 7i6SZ98V4sOmkw4wjMxyOAw= X-Google-Smtp-Source: APXvYqwyu8WX98uWyN1Y3bw4gfznBIi9+crjgWsT1IpQAEADSMGX+rMye9WPQ0sR/VTOl4e8tvuRow== X-Received: by 2002:a65:5685:: with SMTP id v5mr33607678pgs.184.1560472441269; Thu, 13 Jun 2019 17:34:01 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id t5sm832075pgh.46.2019.06.13.17.34.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:34:00 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com, Tejun Heo Subject: [PATCH 3/8] blkcg: implement REQ_CGROUP_PUNT Date: Thu, 13 Jun 2019 17:33:45 -0700 Message-Id: <20190614003350.1178444-4-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When a shared kthread needs to issue a bio for a cgroup, doing so synchronously can lead to priority inversions as the kthread can be trapped waiting for that cgroup. This patch implements REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing to a dedicated per-blkcg work item to avoid such priority inversions. This will be used to fix priority inversions in btrfs compression and should be generally useful as we grow filesystem support for comprehensive IO control. Signed-off-by: Tejun Heo Cc: Chris Mason Reviewed-by: Josef Bacik --- block/blk-cgroup.c | 56 +++++++++++++++++++++++++++++++++++-- block/blk-core.c | 3 ++ include/linux/backing-dev.h | 1 + include/linux/blk-cgroup.h | 16 ++++++++++- include/linux/blk_types.h | 10 +++++++ include/linux/writeback.h | 13 +++++++-- 6 files changed, 92 insertions(+), 7 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 617a2b3f7582..64d80a661205 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -46,12 +46,10 @@ struct blkcg blkcg_root; EXPORT_SYMBOL_GPL(blkcg_root); struct cgroup_subsys_state * const blkcg_root_css = &blkcg_root.css; - static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS]; - static LIST_HEAD(all_blkcgs); /* protected by blkcg_pol_mutex */ - static bool blkcg_debug_stats = false; +static struct workqueue_struct *blkcg_punt_bio_wq; static bool blkcg_policy_enabled(struct request_queue *q, const struct blkcg_policy *pol) @@ -87,6 +85,8 @@ static void __blkg_release(struct rcu_head *rcu) percpu_ref_exit(&blkg->refcnt); + WARN_ON(!bio_list_empty(&blkg->async_bios)); + /* release the blkcg and parent blkg refs this blkg has been holding */ css_put(&blkg->blkcg->css); if (blkg->parent) @@ -112,6 +112,23 @@ static void blkg_release(struct percpu_ref *ref) call_rcu(&blkg->rcu_head, __blkg_release); } +static void blkg_async_bio_workfn(struct work_struct *work) +{ + struct blkcg_gq *blkg = container_of(work, struct blkcg_gq, + async_bio_work); + struct bio_list bios = BIO_EMPTY_LIST; + struct bio *bio; + + /* as long as there are pending bios, @blkg can't go away */ + spin_lock_bh(&blkg->async_bio_lock); + bio_list_merge(&bios, &blkg->async_bios); + bio_list_init(&blkg->async_bios); + spin_unlock_bh(&blkg->async_bio_lock); + + while ((bio = bio_list_pop(&bios))) + submit_bio(bio); +} + /** * blkg_alloc - allocate a blkg * @blkcg: block cgroup the new blkg is associated with @@ -137,6 +154,9 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q, blkg->q = q; INIT_LIST_HEAD(&blkg->q_node); + spin_lock_init(&blkg->async_bio_lock); + bio_list_init(&blkg->async_bios); + INIT_WORK(&blkg->async_bio_work, blkg_async_bio_workfn); blkg->blkcg = blkcg; for (i = 0; i < BLKCG_MAX_POLS; i++) { @@ -1582,6 +1602,25 @@ void blkcg_policy_unregister(struct blkcg_policy *pol) } EXPORT_SYMBOL_GPL(blkcg_policy_unregister); +bool __blkcg_punt_bio_submit(struct bio *bio) +{ + struct blkcg_gq *blkg = bio->bi_blkg; + + /* consume the flag first */ + bio->bi_opf &= ~REQ_CGROUP_PUNT; + + /* never bounce for the root cgroup */ + if (!blkg->parent) + return false; + + spin_lock_bh(&blkg->async_bio_lock); + bio_list_add(&blkg->async_bios, bio); + spin_unlock_bh(&blkg->async_bio_lock); + + queue_work(blkcg_punt_bio_wq, &blkg->async_bio_work); + return true; +} + /* * Scale the accumulated delay based on how long it has been since we updated * the delay. We only call this when we are adding delay, in case it's been a @@ -1782,5 +1821,16 @@ void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta) atomic64_add(delta, &blkg->delay_nsec); } +static int __init blkcg_init(void) +{ + blkcg_punt_bio_wq = alloc_workqueue("blkcg_punt_bio", + WQ_MEM_RECLAIM | WQ_FREEZABLE | + WQ_UNBOUND | WQ_SYSFS, 0); + if (!blkcg_punt_bio_wq) + return -ENOMEM; + return 0; +} +subsys_initcall(blkcg_init); + module_param(blkcg_debug_stats, bool, 0644); MODULE_PARM_DESC(blkcg_debug_stats, "True if you want debug stats, false if not"); diff --git a/block/blk-core.c b/block/blk-core.c index a55389ba8779..5879c1ec044d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1165,6 +1165,9 @@ EXPORT_SYMBOL_GPL(direct_make_request); */ blk_qc_t submit_bio(struct bio *bio) { + if (blkcg_punt_bio_submit(bio)) + return BLK_QC_T_NONE; + /* * If it's a regular read/write or a barrier with data attached, * go through the normal accounting stuff before submission. diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index f9b029180241..35b31d176f74 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -48,6 +48,7 @@ extern spinlock_t bdi_lock; extern struct list_head bdi_list; extern struct workqueue_struct *bdi_wq; +extern struct workqueue_struct *bdi_async_bio_wq; static inline bool wb_has_dirty_io(struct bdi_writeback *wb) { diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 76c61318fda5..ffb2f88e87c6 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -134,13 +134,17 @@ struct blkcg_gq { struct blkg_policy_data *pd[BLKCG_MAX_POLS]; - struct rcu_head rcu_head; + spinlock_t async_bio_lock; + struct bio_list async_bios; + struct work_struct async_bio_work; atomic_t use_delay; atomic64_t delay_nsec; atomic64_t delay_start; u64 last_delay; int last_use; + + struct rcu_head rcu_head; }; typedef struct blkcg_policy_data *(blkcg_pol_alloc_cpd_fn)(gfp_t gfp); @@ -763,6 +767,15 @@ static inline bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg struct bio *bio) { return false; } #endif +bool __blkcg_punt_bio_submit(struct bio *bio); + +static inline bool blkcg_punt_bio_submit(struct bio *bio) +{ + if (bio->bi_opf & REQ_CGROUP_PUNT) + return __blkcg_punt_bio_submit(bio); + else + return false; +} static inline void blkcg_bio_issue_init(struct bio *bio) { @@ -910,6 +923,7 @@ static inline char *blkg_path(struct blkcg_gq *blkg) { return NULL; } static inline void blkg_get(struct blkcg_gq *blkg) { } static inline void blkg_put(struct blkcg_gq *blkg) { } +static inline bool blkcg_punt_bio_submit(struct bio *bio) { return false; } static inline void blkcg_bio_issue_init(struct bio *bio) { } static inline bool blkcg_bio_issue_check(struct request_queue *q, struct bio *bio) { return true; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 791fee35df88..e8b42a786315 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -321,6 +321,14 @@ enum req_flag_bits { __REQ_RAHEAD, /* read ahead, can fail anytime */ __REQ_BACKGROUND, /* background IO */ __REQ_NOWAIT, /* Don't wait if request will block */ + /* + * When a shared kthread needs to issue a bio for a cgroup, doing + * so synchronously can lead to priority inversions as the kthread + * can be trapped waiting for that cgroup. CGROUP_PUNT flag makes + * submit_bio() punt the actual issuing to a dedicated per-blkcg + * work item to avoid such priority inversions. + */ + __REQ_CGROUP_PUNT, /* command specific flags for REQ_OP_WRITE_ZEROES: */ __REQ_NOUNMAP, /* do not free blocks when zeroing */ @@ -347,6 +355,8 @@ enum req_flag_bits { #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) +#define REQ_CGROUP_PUNT (1ULL << __REQ_CGROUP_PUNT) + #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) #define REQ_HIPRI (1ULL << __REQ_HIPRI) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 1c85563f035d..be602c42aab8 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -11,6 +11,7 @@ #include #include #include +#include struct bio; @@ -69,6 +70,7 @@ struct writeback_control { unsigned range_cyclic:1; /* range_start is cyclic */ unsigned for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ unsigned no_wbc_acct:1; /* skip wbc IO accounting */ + unsigned punt_to_cgroup:1; /* cgrp punting, see __REQ_CGROUP_PUNT */ #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ struct inode *inode; /* inode being written out */ @@ -85,12 +87,17 @@ struct writeback_control { static inline int wbc_to_write_flags(struct writeback_control *wbc) { + int flags = 0; + + if (wbc->punt_to_cgroup) + flags = REQ_CGROUP_PUNT; + if (wbc->sync_mode == WB_SYNC_ALL) - return REQ_SYNC; + flags |= REQ_SYNC; else if (wbc->for_kupdate || wbc->for_background) - return REQ_BACKGROUND; + flags |= REQ_BACKGROUND; - return 0; + return flags; } static inline struct cgroup_subsys_state * From patchwork Fri Jun 14 00:33:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 240CE13AF for ; Fri, 14 Jun 2019 00:34:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1578A26E78 for ; Fri, 14 Jun 2019 00:34:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 09A2D277D9; Fri, 14 Jun 2019 00:34:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6117426E78 for ; Fri, 14 Jun 2019 00:34:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727185AbfFNAeF (ORCPT ); Thu, 13 Jun 2019 20:34:05 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:44610 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727131AbfFNAeE (ORCPT ); Thu, 13 Jun 2019 20:34:04 -0400 Received: by mail-pg1-f194.google.com with SMTP id n2so447870pgp.11; Thu, 13 Jun 2019 17:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=CrY+KbreoYfldBeV5zPNnyWU1ofuXbFZm/3YwXQdBT4=; b=M28Z4fmsM72cbEt5AgFedpHmNIsfMN+JgMEM3X0yRielVNeNhLVVqhbHDQy5vvhIW8 HomqwqRy5qKYPssOmd6gLUJcV1tpMloUMMHtqQpnNZ9HJCyb9GadIyM5aw19qc8+ZXCe VrTI/ysbSaww7UZZFLnVj+O9kDCo74ue6x+FeOQ3a2zzaQ5QCzU6Q7NEcwNiGJU5eU7S MvjR9Pm2Iy57CHUFwZsP8nbdAy3SEhPkJqQ9tBveeNYkyhJrrrZjGkiNFcC20oJOp63H 4vlxSGB5N2vz5rZrk1HMqzb1MF0LeBnDUNMzWIE4Ijv+IbhX7tb7R3ATCtge81shLEV7 /57g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=CrY+KbreoYfldBeV5zPNnyWU1ofuXbFZm/3YwXQdBT4=; b=HxeeRzsPU4EHCQPI8Ko9yxK3o+vVDvJN4DNtfN66WPVjEvhmPHmQaaGXmk9S7HbQRO CCmZm64xfY/0R7C7HjU2SqKDs8FiD1F5NYOSMzx7QVepHDvbja+XThVYtLvlw5gV81YH P99YZNil3jgJzmsGxnYol9ffBZEdKw21f6GfEVopkcXgnFW9cWJhUhXIVf+ydZCahmk7 dhh8JhZYm5AtQAq4EhXHw1f4denxqrkizxBv6Rj+g/gqzIl7eqYH5CjsZ0jMKEYJd380 nsGgpJj4yL6Ig6RHrXm5+tNxQyQQYy//5+m586ZA3cYsXOiCp8toTlsG9J8ohaXyjxNx hLtQ== X-Gm-Message-State: APjAAAUNAsUER7dPBIjPYsd5DoBHsUuOSYCPLDY/DCNiciJwO2VpXyk1 vFi3B3/D/5rJ+lFFeA9Bw0k= X-Google-Smtp-Source: APXvYqzgYmsCZbDvDBYklA/y0gChrsQ3rmaEl5P4PAB8IaWAOT96tK7DZChQsR04mJWj8K5aU4nwQg== X-Received: by 2002:a17:90a:d34f:: with SMTP id i15mr8252804pjx.1.1560472443647; Thu, 13 Jun 2019 17:34:03 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id l20sm787797pff.102.2019.06.13.17.34.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:34:03 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/8] Btrfs: stop using btrfs_schedule_bio() Date: Thu, 13 Jun 2019 17:33:46 -0700 Message-Id: <20190614003350.1178444-5-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason btrfs_schedule_bio() hands IO off to a helper thread to do the actual submit_bio() call. This has been used to make sure async crc and compression helpers don't get stuck on IO submission. To maintain good performance, over time the IO submission threads duplicated some IO scheduler characteristics such as high and low priority IOs and they also made some ugly assumptions about request allocation batch sizes. All of this cost at least one extra context switch during IO submission, and doesn't fit well with the modern blkmq IO stack. So, this commit stops using btrfs_schedule_bio(). We may need to adjust the number of async helper threads for crcs and compression, but long term it's a better path. Signed-off-by: Chris Mason Reviewed-by: Josef Bacik --- fs/btrfs/compression.c | 8 +++--- fs/btrfs/disk-io.c | 6 ++--- fs/btrfs/inode.c | 6 ++--- fs/btrfs/volumes.c | 55 +++--------------------------------------- fs/btrfs/volumes.h | 2 +- 5 files changed, 15 insertions(+), 62 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 4ec1df369e47..873261b932b8 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -355,7 +355,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, bio, 0, 1); + ret = btrfs_map_bio(fs_info, bio, 0); if (ret) { bio->bi_status = ret; bio_endio(bio); @@ -385,7 +385,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, bio, 0, 1); + ret = btrfs_map_bio(fs_info, bio, 0); if (ret) { bio->bi_status = ret; bio_endio(bio); @@ -638,7 +638,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, sums += DIV_ROUND_UP(comp_bio->bi_iter.bi_size, fs_info->sectorsize); - ret = btrfs_map_bio(fs_info, comp_bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); if (ret) { comp_bio->bi_status = ret; bio_endio(comp_bio); @@ -662,7 +662,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, comp_bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); if (ret) { comp_bio->bi_status = ret; bio_endio(comp_bio); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 663efce22d98..b34240406f36 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -800,7 +800,7 @@ static void run_one_async_done(struct btrfs_work *work) } ret = btrfs_map_bio(btrfs_sb(inode->i_sb), async->bio, - async->mirror_num, 1); + async->mirror_num); if (ret) { async->bio->bi_status = ret; bio_endio(async->bio); @@ -901,12 +901,12 @@ static blk_status_t btree_submit_bio_hook(struct inode *inode, struct bio *bio, BTRFS_WQ_ENDIO_METADATA); if (ret) goto out_w_error; - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); } else if (!async) { ret = btree_csum_one_bio(bio); if (ret) goto out_w_error; - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); } else { /* * kthread helpers are used to submit writes so that diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d519c3520e87..91b161fb1521 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2032,7 +2032,7 @@ static blk_status_t btrfs_submit_bio_hook(struct inode *inode, struct bio *bio, } mapit: - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); out: if (ret) { @@ -7764,7 +7764,7 @@ static inline blk_status_t submit_dio_repair_bio(struct inode *inode, if (ret) return ret; - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); return ret; } @@ -8295,7 +8295,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, goto err; } map: - ret = btrfs_map_bio(fs_info, bio, 0, 0); + ret = btrfs_map_bio(fs_info, bio, 0); err: return ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1c2a6e4b39da..72326cc23985 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6386,52 +6386,8 @@ static void btrfs_end_bio(struct bio *bio) } } -/* - * see run_scheduled_bios for a description of why bios are collected for - * async submit. - * - * This will add one bio to the pending list for a device and make sure - * the work struct is scheduled. - */ -static noinline void btrfs_schedule_bio(struct btrfs_device *device, - struct bio *bio) -{ - struct btrfs_fs_info *fs_info = device->fs_info; - int should_queue = 1; - struct btrfs_pending_bios *pending_bios; - - /* don't bother with additional async steps for reads, right now */ - if (bio_op(bio) == REQ_OP_READ) { - btrfsic_submit_bio(bio); - return; - } - - WARN_ON(bio->bi_next); - bio->bi_next = NULL; - - spin_lock(&device->io_lock); - if (op_is_sync(bio->bi_opf)) - pending_bios = &device->pending_sync_bios; - else - pending_bios = &device->pending_bios; - - if (pending_bios->tail) - pending_bios->tail->bi_next = bio; - - pending_bios->tail = bio; - if (!pending_bios->head) - pending_bios->head = bio; - if (device->running_pending) - should_queue = 0; - - spin_unlock(&device->io_lock); - - if (should_queue) - btrfs_queue_work(fs_info->submit_workers, &device->work); -} - static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, - u64 physical, int dev_nr, int async) + u64 physical, int dev_nr) { struct btrfs_device *dev = bbio->stripes[dev_nr].dev; struct btrfs_fs_info *fs_info = bbio->fs_info; @@ -6449,10 +6405,7 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_bio_counter_inc_noblocked(fs_info); - if (async) - btrfs_schedule_bio(dev, bio); - else - btrfsic_submit_bio(bio); + btrfsic_submit_bio(bio); } static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) @@ -6473,7 +6426,7 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) } blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num, int async_submit) + int mirror_num) { struct btrfs_device *dev; struct bio *first_bio = bio; @@ -6542,7 +6495,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, bio = first_bio; submit_stripe_bio(bbio, bio, bbio->stripes[dev_nr].physical, - dev_nr, async_submit); + dev_nr); } btrfs_bio_counter_dec(fs_info); return BLK_STS_OK; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b8a0e8d0672d..8c7bd79b234a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -415,7 +415,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 type); void btrfs_mapping_init(struct btrfs_mapping_tree *tree); void btrfs_mapping_tree_free(struct btrfs_mapping_tree *tree); blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num, int async_submit); + int mirror_num); int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); struct btrfs_device *btrfs_scan_one_device(const char *path, From patchwork Fri Jun 14 00:33:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993913 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CD2714BB for ; Fri, 14 Jun 2019 00:34:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 704B826E78 for ; Fri, 14 Jun 2019 00:34:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 64119277D9; Fri, 14 Jun 2019 00:34:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A235D26E78 for ; Fri, 14 Jun 2019 00:34:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727336AbfFNAeL (ORCPT ); Thu, 13 Jun 2019 20:34:11 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:43022 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727215AbfFNAeH (ORCPT ); Thu, 13 Jun 2019 20:34:07 -0400 Received: by mail-pl1-f193.google.com with SMTP id cl9so220497plb.10; Thu, 13 Jun 2019 17:34:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=F/JAvfx5+Iwwj7+HN8ufYq5e0vcppBiU4o2iq5huV3Q=; b=qkhvWmsOL7zsVT0hOk04yq/Oe689PujUHGd/HbwUkGvE5Ppwr2lk5fIlPnJbJlNTfP dqQfQkNQOpQ+rCF3hDfnMnP57H155WJ+oyUwcPVSqbtqmruc4PCJN6i1LuAXrC0wUt1P ukHdX8Sm4uWQ8m8sQywvjpEIhn2nb5hwvrENs8HcTxkU2AOJoiX7Fl3q4gT6VuyQzfNc mFS55jTmHcYeFPBYK8yTSwkmJYaM/OCRghkLzHVVQMEdQwaQmfGEt0UcvGaDPUCVV7l8 DlSuRs6RoViYdVXFHjMTKyQJQjhm1ZF0JobRhF91WbxxHbWtwX7Clblobl+vjz7inhGc 9TZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=F/JAvfx5+Iwwj7+HN8ufYq5e0vcppBiU4o2iq5huV3Q=; b=EsNaC++/6zbGrt+zVIzVGPOAV/Ny/CPy0buN8APuWozl+nhNV06D8Iwx3LQ5vvc2XS rphvFE+r8kl9f0NPinnc9O1WZ57nR0aXvV31uYjh3irrJQ6FiZo0loyiF9hlrfT4AMJY A598lPJQfxk9o52LXtwbLzantsllgTNzckZdvEzJtNqhZlHJMnLBT+NBBBOMjFNr3uvV ElY3hODO+6vxzBDbqnw2dJjthWeNmrNbKjM+9owxHd/+0zfcK4sEWg29EXQEqWA5E61b rPhTtuXV627Vy6D2qyg4J5WeVJa2E4/Wh+o5sp6QePBf5juRW9ljGWfDaHcKwKC8OJmW /cOA== X-Gm-Message-State: APjAAAV4YdfjBANwZOFk2Jkx+XQUcD8yEEntOZurKpBIQq6fE45Q/Y8A KRdUZuBx1k9W86XIVQ/KTP4= X-Google-Smtp-Source: APXvYqyne0TY1YcUxE/KwkLrkpmL2a5rpfb6BvmOOipyfbEGjkSQYevqyWhP+3UQkAhr86liIajZZg== X-Received: by 2002:a17:902:aa8a:: with SMTP id d10mr53973231plr.159.1560472446331; Thu, 13 Jun 2019 17:34:06 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id g2sm911781pfb.95.2019.06.13.17.34.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:34:05 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/8] Btrfs: delete the entire async bio submission framework Date: Thu, 13 Jun 2019 17:33:47 -0700 Message-Id: <20190614003350.1178444-6-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason Now that we're not using btrfs_schedule_bio() anymore, delete all the code that supported it. Signed-off-by: Chris Mason Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 1 - fs/btrfs/disk-io.c | 13 +-- fs/btrfs/super.c | 1 - fs/btrfs/volumes.c | 209 --------------------------------------------- fs/btrfs/volumes.h | 8 -- 5 files changed, 1 insertion(+), 231 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b81c331b28fa..2a5ba0f85ed3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -989,7 +989,6 @@ struct btrfs_fs_info { struct btrfs_workqueue *endio_meta_write_workers; struct btrfs_workqueue *endio_write_workers; struct btrfs_workqueue *endio_freespace_worker; - struct btrfs_workqueue *submit_workers; struct btrfs_workqueue *caching_workers; struct btrfs_workqueue *readahead_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b34240406f36..9dbe4ba3995d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2028,7 +2028,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->rmw_workers); btrfs_destroy_workqueue(fs_info->endio_write_workers); btrfs_destroy_workqueue(fs_info->endio_freespace_worker); - btrfs_destroy_workqueue(fs_info->submit_workers); btrfs_destroy_workqueue(fs_info->delayed_workers); btrfs_destroy_workqueue(fs_info->caching_workers); btrfs_destroy_workqueue(fs_info->readahead_workers); @@ -2194,16 +2193,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->caching_workers = btrfs_alloc_workqueue(fs_info, "cache", flags, max_active, 0); - /* - * a higher idle thresh on the submit workers makes it much more - * likely that bios will be send down in a sane order to the - * devices - */ - fs_info->submit_workers = - btrfs_alloc_workqueue(fs_info, "submit", flags, - min_t(u64, fs_devices->num_devices, - max_active), 64); - fs_info->fixup_workers = btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0); @@ -2246,7 +2235,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, max_active), 8); if (!(fs_info->workers && fs_info->delalloc_workers && - fs_info->submit_workers && fs_info->flush_workers && + fs_info->flush_workers && fs_info->endio_workers && fs_info->endio_meta_workers && fs_info->endio_meta_write_workers && fs_info->endio_repair_workers && diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 2c66d9ea6a3b..3fb86a7bfdf7 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1668,7 +1668,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info->workers, new_pool_size); btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size); - btrfs_workqueue_set_max(fs_info->submit_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_meta_workers, new_pool_size); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 72326cc23985..fc3a16d87869 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -509,212 +509,6 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder, return ret; } -static void requeue_list(struct btrfs_pending_bios *pending_bios, - struct bio *head, struct bio *tail) -{ - - struct bio *old_head; - - old_head = pending_bios->head; - pending_bios->head = head; - if (pending_bios->tail) - tail->bi_next = old_head; - else - pending_bios->tail = tail; -} - -/* - * we try to collect pending bios for a device so we don't get a large - * number of procs sending bios down to the same device. This greatly - * improves the schedulers ability to collect and merge the bios. - * - * But, it also turns into a long list of bios to process and that is sure - * to eventually make the worker thread block. The solution here is to - * make some progress and then put this work struct back at the end of - * the list if the block device is congested. This way, multiple devices - * can make progress from a single worker thread. - */ -static noinline void run_scheduled_bios(struct btrfs_device *device) -{ - struct btrfs_fs_info *fs_info = device->fs_info; - struct bio *pending; - struct backing_dev_info *bdi; - struct btrfs_pending_bios *pending_bios; - struct bio *tail; - struct bio *cur; - int again = 0; - unsigned long num_run; - unsigned long batch_run = 0; - unsigned long last_waited = 0; - int force_reg = 0; - int sync_pending = 0; - struct blk_plug plug; - - /* - * this function runs all the bios we've collected for - * a particular device. We don't want to wander off to - * another device without first sending all of these down. - * So, setup a plug here and finish it off before we return - */ - blk_start_plug(&plug); - - bdi = device->bdev->bd_bdi; - -loop: - spin_lock(&device->io_lock); - -loop_lock: - num_run = 0; - - /* take all the bios off the list at once and process them - * later on (without the lock held). But, remember the - * tail and other pointers so the bios can be properly reinserted - * into the list if we hit congestion - */ - if (!force_reg && device->pending_sync_bios.head) { - pending_bios = &device->pending_sync_bios; - force_reg = 1; - } else { - pending_bios = &device->pending_bios; - force_reg = 0; - } - - pending = pending_bios->head; - tail = pending_bios->tail; - WARN_ON(pending && !tail); - - /* - * if pending was null this time around, no bios need processing - * at all and we can stop. Otherwise it'll loop back up again - * and do an additional check so no bios are missed. - * - * device->running_pending is used to synchronize with the - * schedule_bio code. - */ - if (device->pending_sync_bios.head == NULL && - device->pending_bios.head == NULL) { - again = 0; - device->running_pending = 0; - } else { - again = 1; - device->running_pending = 1; - } - - pending_bios->head = NULL; - pending_bios->tail = NULL; - - spin_unlock(&device->io_lock); - - while (pending) { - - rmb(); - /* we want to work on both lists, but do more bios on the - * sync list than the regular list - */ - if ((num_run > 32 && - pending_bios != &device->pending_sync_bios && - device->pending_sync_bios.head) || - (num_run > 64 && pending_bios == &device->pending_sync_bios && - device->pending_bios.head)) { - spin_lock(&device->io_lock); - requeue_list(pending_bios, pending, tail); - goto loop_lock; - } - - cur = pending; - pending = pending->bi_next; - cur->bi_next = NULL; - - BUG_ON(atomic_read(&cur->__bi_cnt) == 0); - - /* - * if we're doing the sync list, record that our - * plug has some sync requests on it - * - * If we're doing the regular list and there are - * sync requests sitting around, unplug before - * we add more - */ - if (pending_bios == &device->pending_sync_bios) { - sync_pending = 1; - } else if (sync_pending) { - blk_finish_plug(&plug); - blk_start_plug(&plug); - sync_pending = 0; - } - - btrfsic_submit_bio(cur); - num_run++; - batch_run++; - - cond_resched(); - - /* - * we made progress, there is more work to do and the bdi - * is now congested. Back off and let other work structs - * run instead - */ - if (pending && bdi_write_congested(bdi) && batch_run > 8 && - fs_info->fs_devices->open_devices > 1) { - struct io_context *ioc; - - ioc = current->io_context; - - /* - * the main goal here is that we don't want to - * block if we're going to be able to submit - * more requests without blocking. - * - * This code does two great things, it pokes into - * the elevator code from a filesystem _and_ - * it makes assumptions about how batching works. - */ - if (ioc && ioc->nr_batch_requests > 0 && - time_before(jiffies, ioc->last_waited + HZ/50UL) && - (last_waited == 0 || - ioc->last_waited == last_waited)) { - /* - * we want to go through our batch of - * requests and stop. So, we copy out - * the ioc->last_waited time and test - * against it before looping - */ - last_waited = ioc->last_waited; - cond_resched(); - continue; - } - spin_lock(&device->io_lock); - requeue_list(pending_bios, pending, tail); - device->running_pending = 1; - - spin_unlock(&device->io_lock); - btrfs_queue_work(fs_info->submit_workers, - &device->work); - goto done; - } - } - - cond_resched(); - if (again) - goto loop; - - spin_lock(&device->io_lock); - if (device->pending_bios.head || device->pending_sync_bios.head) - goto loop_lock; - spin_unlock(&device->io_lock); - -done: - blk_finish_plug(&plug); -} - -static void pending_bios_fn(struct btrfs_work *work) -{ - struct btrfs_device *device; - - device = container_of(work, struct btrfs_device, work); - run_scheduled_bios(device); -} - static bool device_path_matched(const char *path, struct btrfs_device *device) { int found; @@ -6599,9 +6393,6 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info, else generate_random_uuid(dev->uuid); - btrfs_init_work(&dev->work, btrfs_submit_helper, - pending_bios_fn, NULL, NULL); - return dev; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 8c7bd79b234a..231f50dd107d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -18,10 +18,6 @@ extern struct mutex uuid_mutex; #define BTRFS_STRIPE_LEN SZ_64K struct buffer_head; -struct btrfs_pending_bios { - struct bio *head; - struct bio *tail; -}; /* * Use sequence counter to get consistent device stat data on @@ -55,10 +51,6 @@ struct btrfs_device { spinlock_t io_lock ____cacheline_aligned; int running_pending; - /* regular prio bios */ - struct btrfs_pending_bios pending_bios; - /* sync bios */ - struct btrfs_pending_bios pending_sync_bios; struct block_device *bdev; From patchwork Fri Jun 14 00:33:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75DE013AF for ; Fri, 14 Jun 2019 00:34:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6615D26E78 for ; Fri, 14 Jun 2019 00:34:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5A3A2277D9; Fri, 14 Jun 2019 00:34:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B473026E78 for ; Fri, 14 Jun 2019 00:34:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727314AbfFNAeL (ORCPT ); Thu, 13 Jun 2019 20:34:11 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:44974 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727131AbfFNAeJ (ORCPT ); Thu, 13 Jun 2019 20:34:09 -0400 Received: by mail-pf1-f196.google.com with SMTP id t16so264917pfe.11; Thu, 13 Jun 2019 17:34:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=B/Vt9MyMtLSprSkCPoNLLFZLeevCHMFAkIZcFHo/83Q=; b=WGa9hHFfN8Ll9X1LHhox3hfYJAtpD6OaHmCU6iSakFBAGLC1RZ98CS3LdaLN1Gu/ZV /qC3VbKHi+3x/veBLqFb1lP5PTdZox1/Y35aNHuXs4D9nLvOYqf20JrgKbQKKmv19LDG uwt6Y5lBqY7F/JauKo0hvJpvkNyWrC5E/eeb/ja07oTyQivH+bqHzwd/t2UROvrJ1w4m wZAUPXo9RMu7jwtWCNmLOiGpsLmoIDJ0AgG6sXsp4NE6V+UjHDnKup1UasJXYjVt1i6O 2tq8Zem87W3mcax7uNdVI8opovS23uYSEcZEOQ0ykOvIs10wn101pW0jey4CQ38uSBfe SEVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=B/Vt9MyMtLSprSkCPoNLLFZLeevCHMFAkIZcFHo/83Q=; b=s+7TcFPjGJ1AJ7g/kn9GVOXssEl0rjflEs6VkQMbSs0sTDJ7LC5y77G0VOtXyF5MSJ bduCpChJroyL7MNaMPV/GRGdTSmhYyr+y3cfIdpIQ4OxwC19mqF76LVtAa+52daxDQUm Lrk8vOhz7OdIrWjKDhb61HcHSPTKb8vmmrB8hcFAAOSZTMCelap/EJjLa0dAblrp9Mrs /DH8PircjtYXl6Mpe71nPfEu3+O03Iekq+siMwqFV1562sPMLVdB+w6eufTQt2h0kIg+ 2Yf8UIGRj8gwYhCTr6JUPbrbXuvZjXDG42Yc8QZQ9HLpnWUpUimZn/+j8bpnMsu+Gota OJOQ== X-Gm-Message-State: APjAAAWc3GjumuzNYKr0DMbMkmBGkDoMJsWzVgK6DPdem1pLmp3O1K8z cTwFf5qECbn9cSnyZaXXaZDWCa8a X-Google-Smtp-Source: APXvYqwat4Xy+uX5U9Ip0i+sQbDbsoUG/H2DmFDer4qV6d8Hy2N1SFMedgp3iW8IOOUKoTip4EwWDw== X-Received: by 2002:a17:90b:d8b:: with SMTP id bg11mr8215737pjb.30.1560472448759; Thu, 13 Jun 2019 17:34:08 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id y22sm870240pfo.39.2019.06.13.17.34.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:34:08 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 6/8] Btrfs: only associate the locked page with one async_cow struct Date: Thu, 13 Jun 2019 17:33:48 -0700 Message-Id: <20190614003350.1178444-7-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason The btrfs writepages function collects a large range of pages flagged for delayed allocation, and then sends them down through the COW code for processing. When compression is on, we allocate one async_cow structure for every 512K, and then run those pages through the compression code for IO submission. writepages starts all of this off with a single page, locked by the original call to extent_write_cache_pages(), and it's important to keep track of this page because it has already been through clear_page_dirty_for_io(). The btrfs async_cow struct has a pointer to the locked_page, and when we're redirtying the page because compression had to fallback to uncompressed IO, we use page->index to decide if a given async_cow struct really owns that page. But, this is racey. If a given delalloc range is broken up into two async_cows (cow_A and cow_B), we can end up with something like this: compress_file_range(cowA) submit_compress_extents(cowA) submit compressed bios(cowA) put_page(locked_page) compress_file_range(cowB) ... The end result is that cowA is completed and cleaned up before cowB even starts processing. This means we can free locked_page() and reuse it elsewhere. If we get really lucky, it'll have the same page->index in its new home as it did before. While we're processing cowB, we might decide we need to fall back to uncompressed IO, and so compress_file_range() will call __set_page_dirty_nobufers() on cowB->locked_page. Without cgroups in use, this creates as a phantom dirty page, which isn't great but isn't the end of the world. With cgroups in use, we might crash in the accounting code because page->mapping->i_wb isn't set. [ 8308.523110] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0 [ 8308.531084] IP: percpu_counter_add_batch+0x11/0x70 [ 8308.538371] PGD 66534e067 P4D 66534e067 PUD 66534f067 PMD 0 [ 8308.541750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 8308.551948] CPU: 16 PID: 2172 Comm: rm Not tainted [ 8308.566883] RIP: 0010:percpu_counter_add_batch+0x11/0x70 [ 8308.567891] RSP: 0018:ffffc9000a97bbe0 EFLAGS: 00010286 [ 8308.568986] RAX: 0000000000000005 RBX: 0000000000000090 RCX: 0000000000026115 [ 8308.570734] RDX: 0000000000000030 RSI: ffffffffffffffff RDI: 0000000000000090 [ 8308.572543] RBP: 0000000000000000 R08: fffffffffffffff5 R09: 0000000000000000 [ 8308.573856] R10: 00000000000260c0 R11: ffff881037fc26c0 R12: ffffffffffffffff [ 8308.580099] R13: ffff880fe4111548 R14: ffffc9000a97bc90 R15: 0000000000000001 [ 8308.582520] FS: 00007f5503ced480(0000) GS:ffff880ff7200000(0000) knlGS:0000000000000000 [ 8308.585440] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8308.587951] CR2: 00000000000000d0 CR3: 00000001e0459005 CR4: 0000000000360ee0 [ 8308.590707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8308.592865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8308.594469] Call Trace: [ 8308.595149] account_page_cleaned+0x15b/0x1f0 [ 8308.596340] __cancel_dirty_page+0x146/0x200 [ 8308.599395] truncate_cleanup_page+0x92/0xb0 [ 8308.600480] truncate_inode_pages_range+0x202/0x7d0 [ 8308.617392] btrfs_evict_inode+0x92/0x5a0 [ 8308.619108] evict+0xc1/0x190 [ 8308.620023] do_unlinkat+0x176/0x280 [ 8308.621202] do_syscall_64+0x63/0x1a0 [ 8308.623451] entry_SYSCALL_64_after_hwframe+0x42/0xb7 The fix here is to make asyc_cow->locked_page NULL everywhere but the one async_cow struct that's allowed to do things to the locked page. Signed-off-by: Chris Mason Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads") Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 2 +- fs/btrfs/inode.c | 25 +++++++++++++++++++++---- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 13fca7bfc1f2..9f223d7d78c0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1838,7 +1838,7 @@ static int __process_pages_contig(struct address_space *mapping, if (page_ops & PAGE_SET_PRIVATE2) SetPagePrivate2(pages[i]); - if (pages[i] == locked_page) { + if (locked_page && pages[i] == locked_page) { put_page(pages[i]); pages_locked++; continue; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 91b161fb1521..df5527cc07b9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -666,10 +666,12 @@ static noinline void compress_file_range(struct async_chunk *async_chunk, * to our extent and set things up for the async work queue to run * cow_file_range to do the normal delalloc dance. */ - if (page_offset(async_chunk->locked_page) >= start && - page_offset(async_chunk->locked_page) <= end) + if (async_chunk->locked_page && + (page_offset(async_chunk->locked_page) >= start && + page_offset(async_chunk->locked_page)) <= end) { __set_page_dirty_nobuffers(async_chunk->locked_page); /* unlocked later on in the async handlers */ + } if (redirty) extent_range_redirty_for_io(inode, start, end); @@ -759,7 +761,7 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) async_extent->start + async_extent->ram_size - 1, WB_SYNC_ALL); - else if (ret) + else if (ret && async_chunk->locked_page) unlock_page(async_chunk->locked_page); kfree(async_extent); cond_resched(); @@ -1236,10 +1238,25 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, async_chunk[i].inode = inode; async_chunk[i].start = start; async_chunk[i].end = cur_end; - async_chunk[i].locked_page = locked_page; async_chunk[i].write_flags = write_flags; INIT_LIST_HEAD(&async_chunk[i].extents); + /* + * The locked_page comes all the way from writepage and its + * the original page we were actually given. As we spread + * this large delalloc region across multiple async_cow + * structs, only the first struct needs a pointer to locked_page + * + * This way we don't need racey decisions about who is supposed + * to unlock it. + */ + if (locked_page) { + async_chunk[i].locked_page = locked_page; + locked_page = NULL; + } else { + async_chunk[i].locked_page = NULL; + } + btrfs_init_work(&async_chunk[i].work, btrfs_delalloc_helper, async_cow_start, async_cow_submit, From patchwork Fri Jun 14 00:33:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F3BAA14BB for ; Fri, 14 Jun 2019 00:34:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7B2326E78 for ; Fri, 14 Jun 2019 00:34:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB6FD277D9; Fri, 14 Jun 2019 00:34:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5075B26E78 for ; Fri, 14 Jun 2019 00:34:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727406AbfFNAeN (ORCPT ); Thu, 13 Jun 2019 20:34:13 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:34851 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727357AbfFNAeM (ORCPT ); Thu, 13 Jun 2019 20:34:12 -0400 Received: by mail-pl1-f193.google.com with SMTP id p1so232724plo.2; Thu, 13 Jun 2019 17:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=sVYTXiQ9o7WtxYr36lhIYdisyABTyxIOOYbtpASLFbc=; b=Nsm52YdQ9/C0YEYNzUwNwfEiT2WrLRKvwF9qHbpVmOXnMDhunyGf06jKohMovZUz+1 vRkcBhnZvLVK+5UVfWfmkXph53Dsmw+2TagL1PIk/LCfwVSVnWMAu+WXv2Hd3TGMGtYW 0dpeEfIYGRbkGCpustHJHJ9/BQ4/oCiCFWYE1FW7bwdxH+BdmTve+7rkUubTOBqaFwei Yivb4Y7Ct4bcoUI7EJu1+o/VsTcRZuhDtvG9q9aWfx/MMTIGpmXtCbq2ODuXmHXToFew jmOS1vuK4qNZAw0mWG/SGojmtmYXK+do8UjtCeTL53yNR3WuCkihI0AyTPolxVRci+pL /esA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=sVYTXiQ9o7WtxYr36lhIYdisyABTyxIOOYbtpASLFbc=; b=an+NKq6RdKxPbv7top/VXToCnymxBEpXbK/qh5oS4U6AjFhLumr6cgAYGjkJqpbi09 AsFbE27Osz2PBAP7u2dHQNlH5N2mmuy3A+wm1n53QjTodBzSl+/Qe8wove3k5PtxwRi7 GhJV/x1PoJyl2HBl+Flp9efEV/R0AGNLbHQPVJ76XaplduYkIrpZ2+Pwn6ZtbABmZxwn 49M87CiXfhQnBkkulIm31q5JeCdUZyyMeXTq/C7KQ5Vfap/I4Ieb2laiCBM9n5L6FHhx kqutXI25fY+bQd1RGAeTQUqMBjk/5tv1tmeh4UVVzJKv8rXb3o6s7pYJZu9MbAzPnWDA vohQ== X-Gm-Message-State: APjAAAUl5L2ttHA2t71l8HqFzZLeUO7jvFXTIgMjbjruU1kOaX6uEAhS u4zoYu46WtSgV7YdSWLRzlI= X-Google-Smtp-Source: APXvYqwz+bmUI0KjWHlNAVTYLIUIInTaR5wR92oR5jpzIEIymkBKlZGhe/zTMBBR7lTiY1ALrT5utg== X-Received: by 2002:a17:902:5a4c:: with SMTP id f12mr11030750plm.332.1560472451167; Thu, 13 Jun 2019 17:34:11 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id j1sm819526pfe.101.2019.06.13.17.34.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:34:10 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 7/8] Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios Date: Thu, 13 Jun 2019 17:33:49 -0700 Message-Id: <20190614003350.1178444-8-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason Async CRCs and compression submit IO through helper threads, which means they have IO priority inversions when cgroup IO controllers are in use. This flags all of the writes submitted by btrfs helper threads as REQ_CGROUP_PUNT. submit_bio() will punt these to dedicated per-blkcg work items to avoid the priority inversion. For the compression code, we take a reference on the wbc's blkg css and pass it down to the async workers. For the async crcs, the bio already has the correct css, we just need to tell the block layer to use REQ_CGROUP_PUNT. Signed-off-by: Chris Mason Modified-and-reviewed-by: Tejun Heo Reviewed-by: Josef Bacik --- fs/btrfs/compression.c | 8 +++++++- fs/btrfs/compression.h | 3 ++- fs/btrfs/disk-io.c | 6 ++++++ fs/btrfs/extent_io.c | 3 +++ fs/btrfs/inode.c | 30 +++++++++++++++++++++++++++--- 5 files changed, 45 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 873261b932b8..138479a9576c 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -289,7 +289,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, unsigned long compressed_len, struct page **compressed_pages, unsigned long nr_pages, - unsigned int write_flags) + unsigned int write_flags, + struct cgroup_subsys_state *blkcg_css) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct bio *bio = NULL; @@ -323,6 +324,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, bio->bi_opf = REQ_OP_WRITE | write_flags; bio->bi_private = cb; bio->bi_end_io = end_compressed_bio_write; + + if (blkcg_css) { + bio->bi_opf |= REQ_CGROUP_PUNT; + bio_associate_blkg_from_css(bio, blkcg_css); + } refcount_set(&cb->pending_bios, 1); /* create and submit bios for the compressed pages */ diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 9976fe0f7526..7cbefab96ecf 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -93,7 +93,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, unsigned long compressed_len, struct page **compressed_pages, unsigned long nr_pages, - unsigned int write_flags); + unsigned int write_flags, + struct cgroup_subsys_state *blkcg_css); blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, int mirror_num, unsigned long bio_flags); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9dbe4ba3995d..a5ebbf3d0833 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -799,6 +799,12 @@ static void run_one_async_done(struct btrfs_work *work) return; } + /* + * All of the bios that pass through here are from async helpers. + * Use REQ_CGROUP_PUNT to issue them from the owning cgroup's + * context. This changes nothing when cgroups aren't in use. + */ + async->bio->bi_opf |= REQ_CGROUP_PUNT; ret = btrfs_map_bio(btrfs_sb(inode->i_sb), async->bio, async->mirror_num); if (ret) { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9f223d7d78c0..d7b57341ff1a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4175,6 +4175,9 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, .nr_to_write = nr_pages * 2, .range_start = start, .range_end = end + 1, + /* we're called from an async helper function */ + .punt_to_cgroup = 1, + .no_wbc_acct = 1, }; while (start <= end) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index df5527cc07b9..3f9b35bc0455 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -357,6 +357,7 @@ struct async_extent { }; struct async_chunk { + struct cgroup_subsys_state *blkcg_css; struct inode *inode; struct page *locked_page; u64 start; @@ -846,7 +847,8 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) ins.objectid, ins.offset, async_extent->pages, async_extent->nr_pages, - async_chunk->write_flags)) { + async_chunk->write_flags, + async_chunk->blkcg_css)) { struct page *p = async_extent->pages[0]; const u64 start = async_extent->start; const u64 end = start + async_extent->ram_size - 1; @@ -1170,6 +1172,8 @@ static noinline void async_cow_free(struct btrfs_work *work) async_chunk = container_of(work, struct async_chunk, work); if (async_chunk->inode) btrfs_add_delayed_iput(async_chunk->inode); + if (async_chunk->blkcg_css) + css_put(async_chunk->blkcg_css); /* * Since the pointer to 'pending' is at the beginning of the array of * async_chunk's, freeing it ensures the whole array has been freed. @@ -1178,12 +1182,15 @@ static noinline void async_cow_free(struct btrfs_work *work) kvfree(async_chunk->pending); } -static int cow_file_range_async(struct inode *inode, struct page *locked_page, +static int cow_file_range_async(struct inode *inode, + struct writeback_control *wbc, + struct page *locked_page, u64 start, u64 end, int *page_started, unsigned long *nr_written, unsigned int write_flags) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc); struct async_cow *ctx; struct async_chunk *async_chunk; unsigned long nr_pages; @@ -1251,12 +1258,29 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, * to unlock it. */ if (locked_page) { + /* + * Depending on the compressibility, the pages + * might or might not go through async. We want + * all of them to be accounted against @wbc once. + * Let's do it here before the paths diverge. wbc + * accounting is used only for foreign writeback + * detection and doesn't need full accuracy. Just + * account the whole thing against the first page. + */ + wbc_account_io(wbc, locked_page, cur_end - start); async_chunk[i].locked_page = locked_page; locked_page = NULL; } else { async_chunk[i].locked_page = NULL; } + if (blkcg_css != blkcg_root_css) { + css_get(blkcg_css); + async_chunk[i].blkcg_css = blkcg_css; + } else { + async_chunk[i].blkcg_css = NULL; + } + btrfs_init_work(&async_chunk[i].work, btrfs_delalloc_helper, async_cow_start, async_cow_submit, @@ -1653,7 +1677,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &BTRFS_I(inode)->runtime_flags); - ret = cow_file_range_async(inode, locked_page, start, end, + ret = cow_file_range_async(inode, wbc, locked_page, start, end, page_started, nr_written, write_flags); } From patchwork Fri Jun 14 00:33:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 10993907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06CC413AF for ; Fri, 14 Jun 2019 00:34:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC1C126E78 for ; Fri, 14 Jun 2019 00:34:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E07C5277D9; Fri, 14 Jun 2019 00:34:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 915C226E78 for ; Fri, 14 Jun 2019 00:34:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727482AbfFNAeQ (ORCPT ); Thu, 13 Jun 2019 20:34:16 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:44980 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727437AbfFNAeO (ORCPT ); Thu, 13 Jun 2019 20:34:14 -0400 Received: by mail-pf1-f196.google.com with SMTP id t16so265030pfe.11; Thu, 13 Jun 2019 17:34:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=/0EbDb07rl0+2nTNpL5MUvCUgcM2dhBTtu5L9LJF/5g=; b=fQzf7NTf2AamLhEOekVA/tGq5CXzVixbTzPlZT30QW79+jP48lozuQt7de3C/lUMk6 7p1A90E3IRJOZrR+B2m+ZrxoFfCgONFripL6/Z6mK2eXDG56V6Zi347mNIXBIr+OhFla aAg7SCSH04phUpxuD4oZnN5t/VB1pzQAvjDg+Espr09ucLisQJrJW6+c6rOQiDYNDbH5 xhdBaUkdvurnAXEPEiRXw/Kv2kk5D/VZ460Vakf2HZFPW/HJJLZOQ0mIDMFTThsckZhX 0mcILt3PpQqiZ21EFLKp0N0jJor1Tpv04ozuYMNx5aMTFznWeMM5D7UzlFmwCxJl01KW 1mtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=/0EbDb07rl0+2nTNpL5MUvCUgcM2dhBTtu5L9LJF/5g=; b=uDPLnad4C/EA2D2qXmUkEr+QLoS1ghc1SiYmgM1uXTayEO+a+kZM9oY84WdFD7llAF K5HxRmpuJ92qYgEOv0F0HQl5B563OjnlK8GWJt4kjyPq2vNLyUxaOCxjwDjjLYyNw4HB KKdiCnaRyQByEzCZRT7QrL6xkSahocJ1neWMg6dN72PIyFpX35ocCRQM1GGoYQqO8I1y khx7sccA4gIxk1IvFPxWsrf47uyrXJ6HjOPoFZDZ7rGvnM2ma/QbrywsOt3uLfSKmdsP N7zXWtDJlj7uYbdGpkYsN+I73HV3SxqHbrToHqPDcQ/WB3LqxC6a6g5eOg562xGcRNna QjIw== X-Gm-Message-State: APjAAAU4uUcw+6gv1EhpjrFfdgcU18/B2nPotznGLQwWX+mwiUNWUcIH /zNR4U+99+7RmxAt/scZ5R3+jmU1 X-Google-Smtp-Source: APXvYqybF/2c2vJJoH/ecmJ28qkRmO4fYSx+6HWKnNfzXIXgZ2gGvs6MVXS9F4ZKN7ITIYYq4h+QuQ== X-Received: by 2002:a63:e950:: with SMTP id q16mr8093874pgj.270.1560472453741; Thu, 13 Jun 2019 17:34:13 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::2:9d14]) by smtp.gmail.com with ESMTPSA id p3sm898857pgh.90.2019.06.13.17.34.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Jun 2019 17:34:13 -0700 (PDT) From: Tejun Heo To: dsterba@suse.com, clm@fb.com, josef@toxicpanda.com, axboe@kernel.dk, jack@suse.cz Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 8/8] Btrfs: extent_write_locked_range() should attach inode->i_wb Date: Thu, 13 Jun 2019 17:33:50 -0700 Message-Id: <20190614003350.1178444-9-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190614003350.1178444-1-tj@kernel.org> References: <20190614003350.1178444-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason extent_write_locked_range() is used when we're falling back to buffered IO from inside of compression. It allocates its own wbc and should associate it with the inode's i_wb to make sure the IO goes down from the correct cgroup. Signed-off-by: Chris Mason Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d7b57341ff1a..afb916a69c30 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4180,6 +4180,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, .no_wbc_acct = 1, }; + wbc_attach_fdatawrite_inode(&wbc_writepages, inode); while (start <= end) { page = find_get_page(mapping, start >> PAGE_SHIFT); if (clear_page_dirty_for_io(page)) @@ -4194,11 +4195,12 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, } ASSERT(ret <= 0); - if (ret < 0) { + if (ret == 0) + ret = flush_write_bio(&epd); + else end_write_bio(&epd, ret); - return ret; - } - ret = flush_write_bio(&epd); + + wbc_detach_inode(&wbc_writepages); return ret; }