From patchwork Wed Jul 10 19:28:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11038849 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AD551395 for ; Wed, 10 Jul 2019 19:28:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B5A4202A5 for ; Wed, 10 Jul 2019 19:28:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4E035285E3; Wed, 10 Jul 2019 19:28:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B9C6202A5 for ; Wed, 10 Jul 2019 19:28:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728148AbfGJT20 (ORCPT ); Wed, 10 Jul 2019 15:28:26 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:36425 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726111AbfGJT20 (ORCPT ); Wed, 10 Jul 2019 15:28:26 -0400 Received: by mail-pg1-f194.google.com with SMTP id l21so1708199pgm.3; Wed, 10 Jul 2019 12:28:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=mm1SnsW2u+mr0bNhyAzLcbP4q6SXoqO+q2aMLbRme3w=; b=scuDNSJNA6TWSKKOn2ZSU+juz1K5EGHHDaB7FRL1KC8SJ1YBhrykpq2YvIFz/JfzoQ Lnf4u5A2+BLSL8Wnud7LtcOVd+KeDjH8Kf7BkHPxsSYlhJcv9AVQgi5mWMmUUknUXhtI r4KeuT1Lkf7zarhh9D0grBmB68TyH4iB1hlz/a69Z4RmREFXer6tQBK+h4t9Chx/v9ds CyPGqYJv4MNjhCtxmGJKDWl2UOYyPSZso/mtXfte4UUCgKWkEE6nhNKXnBvwhWsI+7I9 kX3ptaAmV5B0XVnAGgsPl8cmoJF2f4LNYoo+v2d3jJhpNNdBCqhhBgipXQDeYbTQJYoL F+Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=mm1SnsW2u+mr0bNhyAzLcbP4q6SXoqO+q2aMLbRme3w=; b=IoQWlYjJjvoWmmeF5UBP7/HVM7MDUhiMtf0M5UnCFq6Y7qBsRh/dNxcJ79WV7CHi6A nuL09+3R1Sr2PPHtaaXCJ0KlK9kfBPw/ZRT0J+rwU2F7+1apxDuzlW0eE/9YAAAB86ZR qrHN8aV0NfO6iPin2tjf2kTK+NNmBu9CvShckbYBywRJYikfw9avC86yvayGO3NTLaZZ q28DmbvyTKw927eNjQ3W7c4Gkzib71A6sHGRRN5a2VX4PgH3ao37t7w1fej7V8KvQiLE yYPLXylYShKw3jSSt4WMJ6aW2GgrcCUitDxdN/g2wbVGvJ1r2c+1X+1ht4a3FHMslqlh /qag== X-Gm-Message-State: APjAAAU/c7/WOcROlOlqP0/TjsKt6WtvhsVLSn9q13GDuhWDq/ynoyyn SAzKRSmZ+2FZqws/gRhwSmU= X-Google-Smtp-Source: APXvYqzhznd8XyFes9TF/pNn10k56dtNx65/MjvP8JoSME+N6wZpKlGzwQ3cdfnJQWwsbsMKvvrzOg== X-Received: by 2002:a65:404d:: with SMTP id h13mr38065885pgp.71.1562786904935; Wed, 10 Jul 2019 12:28:24 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::3:2bbe]) by smtp.gmail.com with ESMTPSA id s66sm3734889pfs.8.2019.07.10.12.28.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Jul 2019 12:28:24 -0700 (PDT) From: Tejun Heo To: josef@toxicpanda.com, clm@fb.com, dsterba@suse.com Cc: axboe@kernel.dk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 1/5] Btrfs: stop using btrfs_schedule_bio() Date: Wed, 10 Jul 2019 12:28:14 -0700 Message-Id: <20190710192818.1069475-2-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190710192818.1069475-1-tj@kernel.org> References: <20190710192818.1069475-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason btrfs_schedule_bio() hands IO off to a helper thread to do the actual submit_bio() call. This has been used to make sure async crc and compression helpers don't get stuck on IO submission. To maintain good performance, over time the IO submission threads duplicated some IO scheduler characteristics such as high and low priority IOs and they also made some ugly assumptions about request allocation batch sizes. All of this cost at least one extra context switch during IO submission, and doesn't fit well with the modern blkmq IO stack. So, this commit stops using btrfs_schedule_bio(). We may need to adjust the number of async helper threads for crcs and compression, but long term it's a better path. Signed-off-by: Chris Mason Reviewed-by: Josef Bacik Reviewed-by: Nikolay Borisov --- fs/btrfs/compression.c | 8 +++--- fs/btrfs/disk-io.c | 6 ++--- fs/btrfs/inode.c | 6 ++--- fs/btrfs/volumes.c | 55 +++--------------------------------------- fs/btrfs/volumes.h | 2 +- 5 files changed, 15 insertions(+), 62 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 84dd4a8980c5..dfc4eb9b7717 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -354,7 +354,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, bio, 0, 1); + ret = btrfs_map_bio(fs_info, bio, 0); if (ret) { bio->bi_status = ret; bio_endio(bio); @@ -384,7 +384,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, bio, 0, 1); + ret = btrfs_map_bio(fs_info, bio, 0); if (ret) { bio->bi_status = ret; bio_endio(bio); @@ -637,7 +637,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, sums += DIV_ROUND_UP(comp_bio->bi_iter.bi_size, fs_info->sectorsize); - ret = btrfs_map_bio(fs_info, comp_bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); if (ret) { comp_bio->bi_status = ret; bio_endio(comp_bio); @@ -661,7 +661,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(fs_info, comp_bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, comp_bio, mirror_num); if (ret) { comp_bio->bi_status = ret; bio_endio(comp_bio); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index deb74a8c191a..6b1ecc27913b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -800,7 +800,7 @@ static void run_one_async_done(struct btrfs_work *work) } ret = btrfs_map_bio(btrfs_sb(inode->i_sb), async->bio, - async->mirror_num, 1); + async->mirror_num); if (ret) { async->bio->bi_status = ret; bio_endio(async->bio); @@ -901,12 +901,12 @@ static blk_status_t btree_submit_bio_hook(struct inode *inode, struct bio *bio, BTRFS_WQ_ENDIO_METADATA); if (ret) goto out_w_error; - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); } else if (!async) { ret = btree_csum_one_bio(bio); if (ret) goto out_w_error; - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); } else { /* * kthread helpers are used to submit writes so that diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a2aabdb85226..6e6df0eab324 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2032,7 +2032,7 @@ static blk_status_t btrfs_submit_bio_hook(struct inode *inode, struct bio *bio, } mapit: - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); out: if (ret) { @@ -7774,7 +7774,7 @@ static inline blk_status_t submit_dio_repair_bio(struct inode *inode, if (ret) return ret; - ret = btrfs_map_bio(fs_info, bio, mirror_num, 0); + ret = btrfs_map_bio(fs_info, bio, mirror_num); return ret; } @@ -8305,7 +8305,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, goto err; } map: - ret = btrfs_map_bio(fs_info, bio, 0, 0); + ret = btrfs_map_bio(fs_info, bio, 0); err: return ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1c2a6e4b39da..72326cc23985 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6386,52 +6386,8 @@ static void btrfs_end_bio(struct bio *bio) } } -/* - * see run_scheduled_bios for a description of why bios are collected for - * async submit. - * - * This will add one bio to the pending list for a device and make sure - * the work struct is scheduled. - */ -static noinline void btrfs_schedule_bio(struct btrfs_device *device, - struct bio *bio) -{ - struct btrfs_fs_info *fs_info = device->fs_info; - int should_queue = 1; - struct btrfs_pending_bios *pending_bios; - - /* don't bother with additional async steps for reads, right now */ - if (bio_op(bio) == REQ_OP_READ) { - btrfsic_submit_bio(bio); - return; - } - - WARN_ON(bio->bi_next); - bio->bi_next = NULL; - - spin_lock(&device->io_lock); - if (op_is_sync(bio->bi_opf)) - pending_bios = &device->pending_sync_bios; - else - pending_bios = &device->pending_bios; - - if (pending_bios->tail) - pending_bios->tail->bi_next = bio; - - pending_bios->tail = bio; - if (!pending_bios->head) - pending_bios->head = bio; - if (device->running_pending) - should_queue = 0; - - spin_unlock(&device->io_lock); - - if (should_queue) - btrfs_queue_work(fs_info->submit_workers, &device->work); -} - static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, - u64 physical, int dev_nr, int async) + u64 physical, int dev_nr) { struct btrfs_device *dev = bbio->stripes[dev_nr].dev; struct btrfs_fs_info *fs_info = bbio->fs_info; @@ -6449,10 +6405,7 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_bio_counter_inc_noblocked(fs_info); - if (async) - btrfs_schedule_bio(dev, bio); - else - btrfsic_submit_bio(bio); + btrfsic_submit_bio(bio); } static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) @@ -6473,7 +6426,7 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) } blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num, int async_submit) + int mirror_num) { struct btrfs_device *dev; struct bio *first_bio = bio; @@ -6542,7 +6495,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, bio = first_bio; submit_stripe_bio(bbio, bio, bbio->stripes[dev_nr].physical, - dev_nr, async_submit); + dev_nr); } btrfs_bio_counter_dec(fs_info); return BLK_STS_OK; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 136a3eb64604..e532d095c6a4 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -416,7 +416,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 type); void btrfs_mapping_init(struct btrfs_mapping_tree *tree); void btrfs_mapping_tree_free(struct btrfs_mapping_tree *tree); blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num, int async_submit); + int mirror_num); int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); struct btrfs_device *btrfs_scan_one_device(const char *path, From patchwork Wed Jul 10 19:28:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11038851 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B59913BD for ; Wed, 10 Jul 2019 19:28:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4C5BE202A5 for ; Wed, 10 Jul 2019 19:28:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3466F28817; Wed, 10 Jul 2019 19:28:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D81C202A5 for ; Wed, 10 Jul 2019 19:28:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728237AbfGJT23 (ORCPT ); Wed, 10 Jul 2019 15:28:29 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:43429 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726111AbfGJT23 (ORCPT ); Wed, 10 Jul 2019 15:28:29 -0400 Received: by mail-pg1-f193.google.com with SMTP id f25so1690763pgv.10; Wed, 10 Jul 2019 12:28:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=zkTtK+B4Hc3hWS1xeEvOMIPoXRqeBBXhhK6OzMBfzY4=; b=FR715TlEJfF7PhT/lPqjTreGAoyqYz84oy1MfbOslo98zWCRf05ISJm/6r/Zh7xcnO kZ/MN8kYOEETcOeu/nDKCl9bzCckJ60Zo1WYPZvOkJNoQfXh1fAmaLZiARISpgYO3+s+ cc1oKCSoI9uWzLZriqDkUu1p8+5frN90vNW/s1cwAkl0W52XL2LxXCSvTskp8vcT10Oy 8aa/ksD6cwkP4zx3cjCgFVcPosIRxOhZjzvnf/01b3JnuoZETgyKJhJfy7A9Y/Z64DLU f+IxI82TdGA0iWzCJ+2be8LD5P2MMMzrgCITnwQfl8lbqw8ML9qZk0BI2exTywp829CI 9s/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=zkTtK+B4Hc3hWS1xeEvOMIPoXRqeBBXhhK6OzMBfzY4=; b=IQ7lSks2HCua1jsrQNZhv5MtqAgEDuz64CFcGXwo5Vq4sWNxODR414nuA4Rjvh4dnk +/B1YhI0kPb7HLpVi6ycGdZ2x9K/zICdeF2q8b24cddAH0D8HZnkI35FJRVknh/q1ake EEo52lc7+IZwiqB4qsjldJT3stSHst4rCB0B8dfU4WTaaSswmeYIIdsp7tnKdsDQ8XU6 mXCYUvglxI0ooI95z2MsNptjM3spr9TOfeneqr6dZsbz6uIw4lPV+11UCX7qIhCN1FEj 14uOB3lsOp1J7fIp2j3ES7k4WOhxRFkpOs0jCXj8J6jo2TdotC8z4uTgRT0/bpFifRQg 8psw== X-Gm-Message-State: APjAAAW+Jwh1fPDg/hWgjnZebSvXmrTtQo0iBiz6SpLIwgN1umnrLRq8 OLkdzUd3LT0sMbP8nDC0lYtNULyRs2c= X-Google-Smtp-Source: APXvYqwgk+pK2MmnUTnoARfUdkx3duwRDj5Nhk8kLcdGW2kaDaYr6YeKGEVtlaKq94dPFno1fDMprg== X-Received: by 2002:a63:3d8d:: with SMTP id k135mr39897325pga.23.1562786907707; Wed, 10 Jul 2019 12:28:27 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::3:2bbe]) by smtp.gmail.com with ESMTPSA id v3sm2811637pfm.188.2019.07.10.12.28.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Jul 2019 12:28:27 -0700 (PDT) From: Tejun Heo To: josef@toxicpanda.com, clm@fb.com, dsterba@suse.com Cc: axboe@kernel.dk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/5] Btrfs: delete the entire async bio submission framework Date: Wed, 10 Jul 2019 12:28:15 -0700 Message-Id: <20190710192818.1069475-3-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190710192818.1069475-1-tj@kernel.org> References: <20190710192818.1069475-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason Now that we're not using btrfs_schedule_bio() anymore, delete all the code that supported it. Signed-off-by: Chris Mason Reviewed-by: Josef Bacik Reviewed-by: Nikolay Borisov --- fs/btrfs/ctree.h | 1 - fs/btrfs/disk-io.c | 13 +-- fs/btrfs/super.c | 1 - fs/btrfs/volumes.c | 209 --------------------------------------------- fs/btrfs/volumes.h | 8 -- 5 files changed, 1 insertion(+), 231 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0a61dff27f57..21618b5b18a4 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -989,7 +989,6 @@ struct btrfs_fs_info { struct btrfs_workqueue *endio_meta_write_workers; struct btrfs_workqueue *endio_write_workers; struct btrfs_workqueue *endio_freespace_worker; - struct btrfs_workqueue *submit_workers; struct btrfs_workqueue *caching_workers; struct btrfs_workqueue *readahead_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6b1ecc27913b..323cab06f2a9 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2028,7 +2028,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->rmw_workers); btrfs_destroy_workqueue(fs_info->endio_write_workers); btrfs_destroy_workqueue(fs_info->endio_freespace_worker); - btrfs_destroy_workqueue(fs_info->submit_workers); btrfs_destroy_workqueue(fs_info->delayed_workers); btrfs_destroy_workqueue(fs_info->caching_workers); btrfs_destroy_workqueue(fs_info->readahead_workers); @@ -2194,16 +2193,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->caching_workers = btrfs_alloc_workqueue(fs_info, "cache", flags, max_active, 0); - /* - * a higher idle thresh on the submit workers makes it much more - * likely that bios will be send down in a sane order to the - * devices - */ - fs_info->submit_workers = - btrfs_alloc_workqueue(fs_info, "submit", flags, - min_t(u64, fs_devices->num_devices, - max_active), 64); - fs_info->fixup_workers = btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0); @@ -2246,7 +2235,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, max_active), 8); if (!(fs_info->workers && fs_info->delalloc_workers && - fs_info->submit_workers && fs_info->flush_workers && + fs_info->flush_workers && fs_info->endio_workers && fs_info->endio_meta_workers && fs_info->endio_meta_write_workers && fs_info->endio_repair_workers && diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0645ec428b4f..b130dc43b5f1 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1668,7 +1668,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info->workers, new_pool_size); btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size); - btrfs_workqueue_set_max(fs_info->submit_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_meta_workers, new_pool_size); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 72326cc23985..fc3a16d87869 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -509,212 +509,6 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder, return ret; } -static void requeue_list(struct btrfs_pending_bios *pending_bios, - struct bio *head, struct bio *tail) -{ - - struct bio *old_head; - - old_head = pending_bios->head; - pending_bios->head = head; - if (pending_bios->tail) - tail->bi_next = old_head; - else - pending_bios->tail = tail; -} - -/* - * we try to collect pending bios for a device so we don't get a large - * number of procs sending bios down to the same device. This greatly - * improves the schedulers ability to collect and merge the bios. - * - * But, it also turns into a long list of bios to process and that is sure - * to eventually make the worker thread block. The solution here is to - * make some progress and then put this work struct back at the end of - * the list if the block device is congested. This way, multiple devices - * can make progress from a single worker thread. - */ -static noinline void run_scheduled_bios(struct btrfs_device *device) -{ - struct btrfs_fs_info *fs_info = device->fs_info; - struct bio *pending; - struct backing_dev_info *bdi; - struct btrfs_pending_bios *pending_bios; - struct bio *tail; - struct bio *cur; - int again = 0; - unsigned long num_run; - unsigned long batch_run = 0; - unsigned long last_waited = 0; - int force_reg = 0; - int sync_pending = 0; - struct blk_plug plug; - - /* - * this function runs all the bios we've collected for - * a particular device. We don't want to wander off to - * another device without first sending all of these down. - * So, setup a plug here and finish it off before we return - */ - blk_start_plug(&plug); - - bdi = device->bdev->bd_bdi; - -loop: - spin_lock(&device->io_lock); - -loop_lock: - num_run = 0; - - /* take all the bios off the list at once and process them - * later on (without the lock held). But, remember the - * tail and other pointers so the bios can be properly reinserted - * into the list if we hit congestion - */ - if (!force_reg && device->pending_sync_bios.head) { - pending_bios = &device->pending_sync_bios; - force_reg = 1; - } else { - pending_bios = &device->pending_bios; - force_reg = 0; - } - - pending = pending_bios->head; - tail = pending_bios->tail; - WARN_ON(pending && !tail); - - /* - * if pending was null this time around, no bios need processing - * at all and we can stop. Otherwise it'll loop back up again - * and do an additional check so no bios are missed. - * - * device->running_pending is used to synchronize with the - * schedule_bio code. - */ - if (device->pending_sync_bios.head == NULL && - device->pending_bios.head == NULL) { - again = 0; - device->running_pending = 0; - } else { - again = 1; - device->running_pending = 1; - } - - pending_bios->head = NULL; - pending_bios->tail = NULL; - - spin_unlock(&device->io_lock); - - while (pending) { - - rmb(); - /* we want to work on both lists, but do more bios on the - * sync list than the regular list - */ - if ((num_run > 32 && - pending_bios != &device->pending_sync_bios && - device->pending_sync_bios.head) || - (num_run > 64 && pending_bios == &device->pending_sync_bios && - device->pending_bios.head)) { - spin_lock(&device->io_lock); - requeue_list(pending_bios, pending, tail); - goto loop_lock; - } - - cur = pending; - pending = pending->bi_next; - cur->bi_next = NULL; - - BUG_ON(atomic_read(&cur->__bi_cnt) == 0); - - /* - * if we're doing the sync list, record that our - * plug has some sync requests on it - * - * If we're doing the regular list and there are - * sync requests sitting around, unplug before - * we add more - */ - if (pending_bios == &device->pending_sync_bios) { - sync_pending = 1; - } else if (sync_pending) { - blk_finish_plug(&plug); - blk_start_plug(&plug); - sync_pending = 0; - } - - btrfsic_submit_bio(cur); - num_run++; - batch_run++; - - cond_resched(); - - /* - * we made progress, there is more work to do and the bdi - * is now congested. Back off and let other work structs - * run instead - */ - if (pending && bdi_write_congested(bdi) && batch_run > 8 && - fs_info->fs_devices->open_devices > 1) { - struct io_context *ioc; - - ioc = current->io_context; - - /* - * the main goal here is that we don't want to - * block if we're going to be able to submit - * more requests without blocking. - * - * This code does two great things, it pokes into - * the elevator code from a filesystem _and_ - * it makes assumptions about how batching works. - */ - if (ioc && ioc->nr_batch_requests > 0 && - time_before(jiffies, ioc->last_waited + HZ/50UL) && - (last_waited == 0 || - ioc->last_waited == last_waited)) { - /* - * we want to go through our batch of - * requests and stop. So, we copy out - * the ioc->last_waited time and test - * against it before looping - */ - last_waited = ioc->last_waited; - cond_resched(); - continue; - } - spin_lock(&device->io_lock); - requeue_list(pending_bios, pending, tail); - device->running_pending = 1; - - spin_unlock(&device->io_lock); - btrfs_queue_work(fs_info->submit_workers, - &device->work); - goto done; - } - } - - cond_resched(); - if (again) - goto loop; - - spin_lock(&device->io_lock); - if (device->pending_bios.head || device->pending_sync_bios.head) - goto loop_lock; - spin_unlock(&device->io_lock); - -done: - blk_finish_plug(&plug); -} - -static void pending_bios_fn(struct btrfs_work *work) -{ - struct btrfs_device *device; - - device = container_of(work, struct btrfs_device, work); - run_scheduled_bios(device); -} - static bool device_path_matched(const char *path, struct btrfs_device *device) { int found; @@ -6599,9 +6393,6 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info, else generate_random_uuid(dev->uuid); - btrfs_init_work(&dev->work, btrfs_submit_helper, - pending_bios_fn, NULL, NULL); - return dev; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index e532d095c6a4..819047621176 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -18,10 +18,6 @@ extern struct mutex uuid_mutex; #define BTRFS_STRIPE_LEN SZ_64K struct buffer_head; -struct btrfs_pending_bios { - struct bio *head; - struct bio *tail; -}; /* * Use sequence counter to get consistent device stat data on @@ -55,10 +51,6 @@ struct btrfs_device { spinlock_t io_lock ____cacheline_aligned; int running_pending; - /* regular prio bios */ - struct btrfs_pending_bios pending_bios; - /* sync bios */ - struct btrfs_pending_bios pending_sync_bios; struct block_device *bdev; From patchwork Wed Jul 10 19:28:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11038857 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9BB91395 for ; Wed, 10 Jul 2019 19:28:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D85DA202A5 for ; Wed, 10 Jul 2019 19:28:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC8062873C; Wed, 10 Jul 2019 19:28:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D8C2202A5 for ; Wed, 10 Jul 2019 19:28:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728315AbfGJT2d (ORCPT ); Wed, 10 Jul 2019 15:28:33 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:33959 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726111AbfGJT2b (ORCPT ); Wed, 10 Jul 2019 15:28:31 -0400 Received: by mail-pf1-f195.google.com with SMTP id b13so1556756pfo.1; Wed, 10 Jul 2019 12:28:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=nZ7elc2UuS/FrboNoV7WD8dAaucbBsLM5Y4zkWcMIec=; b=qiCfx0rmc+8sLIBCZVBXWBshloWP9bWx9Q85U5m/78c5vqi7KwfuQl3wXQ0er2NMKq cnyPIzEt0dy+WgDXjj3vUzWKdUO2sbqFX9v0mbdvpak7wsrO6vFDYmVd+p2E2we+nF+X TsgaPawOW1GxWiMWLBAYHqDmxCcldeYpbrxA6oHbqFbPdCNZJw+pba5HkKPu8S+HRPKc 1A6MmMaxaZdwrbyD+h9GiZUBuUQXicaNbuOAKshkXytGyXN1SEcwnmimMnB8kQA/hp8E CLGJ+mDDsBmw1baxWsEFwCG/dZWfHAwpZBAVeCCAdolAc9ASpRguth3hJeqhVd0+Xq+y V2vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=nZ7elc2UuS/FrboNoV7WD8dAaucbBsLM5Y4zkWcMIec=; b=RKRbkMYN/0/4XaINpfwrE9R3DnnFfq80C+Mlgw5C+ZNosUDZBrPDbiCBGyzik7aWns HKCCLmUnS5k5dSTH6ABVRbYfHl58vB1miwq8Y/WE6oXWH8rqzuMFDt2+jCuybVqhhRtk hW8+y5wb8uiDWlpSg+pO1c0Reo8HDC2O/Y75NeJX/ZilHH7EunDe6HzVZ6SStp8SRewC pjYgK9AmX5WzKBEYofm3amRiRL3jZPYJTeIlclpmotUgB3GmgwYr/mOE6OmiF812X+Vd kJHKkZbh0HGg7SFBFQOI+eXVO2TR0G4LQlWHAqYDjgq3oqZpPlbF9uaslpR+/hQTrVRa Bw4A== X-Gm-Message-State: APjAAAUwVMGNd4zdyNLV9naw0x66/81DQCR/EBFDGPTEsbjH/XkNTBn5 QidVAxxEvZqpRIRFS3AzkTw= X-Google-Smtp-Source: APXvYqznLwyb8aAh4JJt/ZHzUO2evjmORiPsPxcjE79kOScSuoqih3fkZmwydf/5r6XJjfnllDF69A== X-Received: by 2002:a63:18d:: with SMTP id 135mr38954916pgb.62.1562786910390; Wed, 10 Jul 2019 12:28:30 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::3:2bbe]) by smtp.gmail.com with ESMTPSA id u137sm2963923pgc.91.2019.07.10.12.28.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Jul 2019 12:28:29 -0700 (PDT) From: Tejun Heo To: josef@toxicpanda.com, clm@fb.com, dsterba@suse.com Cc: axboe@kernel.dk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 3/5] Btrfs: only associate the locked page with one async_cow struct Date: Wed, 10 Jul 2019 12:28:16 -0700 Message-Id: <20190710192818.1069475-4-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190710192818.1069475-1-tj@kernel.org> References: <20190710192818.1069475-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason The btrfs writepages function collects a large range of pages flagged for delayed allocation, and then sends them down through the COW code for processing. When compression is on, we allocate one async_cow structure for every 512K, and then run those pages through the compression code for IO submission. writepages starts all of this off with a single page, locked by the original call to extent_write_cache_pages(), and it's important to keep track of this page because it has already been through clear_page_dirty_for_io(). The btrfs async_cow struct has a pointer to the locked_page, and when we're redirtying the page because compression had to fallback to uncompressed IO, we use page->index to decide if a given async_cow struct really owns that page. But, this is racey. If a given delalloc range is broken up into two async_cows (cow_A and cow_B), we can end up with something like this: compress_file_range(cowA) submit_compress_extents(cowA) submit compressed bios(cowA) put_page(locked_page) compress_file_range(cowB) ... The end result is that cowA is completed and cleaned up before cowB even starts processing. This means we can free locked_page() and reuse it elsewhere. If we get really lucky, it'll have the same page->index in its new home as it did before. While we're processing cowB, we might decide we need to fall back to uncompressed IO, and so compress_file_range() will call __set_page_dirty_nobufers() on cowB->locked_page. Without cgroups in use, this creates as a phantom dirty page, which isn't great but isn't the end of the world. With cgroups in use, we might crash in the accounting code because page->mapping->i_wb isn't set. [ 8308.523110] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0 [ 8308.531084] IP: percpu_counter_add_batch+0x11/0x70 [ 8308.538371] PGD 66534e067 P4D 66534e067 PUD 66534f067 PMD 0 [ 8308.541750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 8308.551948] CPU: 16 PID: 2172 Comm: rm Not tainted [ 8308.566883] RIP: 0010:percpu_counter_add_batch+0x11/0x70 [ 8308.567891] RSP: 0018:ffffc9000a97bbe0 EFLAGS: 00010286 [ 8308.568986] RAX: 0000000000000005 RBX: 0000000000000090 RCX: 0000000000026115 [ 8308.570734] RDX: 0000000000000030 RSI: ffffffffffffffff RDI: 0000000000000090 [ 8308.572543] RBP: 0000000000000000 R08: fffffffffffffff5 R09: 0000000000000000 [ 8308.573856] R10: 00000000000260c0 R11: ffff881037fc26c0 R12: ffffffffffffffff [ 8308.580099] R13: ffff880fe4111548 R14: ffffc9000a97bc90 R15: 0000000000000001 [ 8308.582520] FS: 00007f5503ced480(0000) GS:ffff880ff7200000(0000) knlGS:0000000000000000 [ 8308.585440] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8308.587951] CR2: 00000000000000d0 CR3: 00000001e0459005 CR4: 0000000000360ee0 [ 8308.590707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8308.592865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8308.594469] Call Trace: [ 8308.595149] account_page_cleaned+0x15b/0x1f0 [ 8308.596340] __cancel_dirty_page+0x146/0x200 [ 8308.599395] truncate_cleanup_page+0x92/0xb0 [ 8308.600480] truncate_inode_pages_range+0x202/0x7d0 [ 8308.617392] btrfs_evict_inode+0x92/0x5a0 [ 8308.619108] evict+0xc1/0x190 [ 8308.620023] do_unlinkat+0x176/0x280 [ 8308.621202] do_syscall_64+0x63/0x1a0 [ 8308.623451] entry_SYSCALL_64_after_hwframe+0x42/0xb7 The fix here is to make asyc_cow->locked_page NULL everywhere but the one async_cow struct that's allowed to do things to the locked page. Signed-off-by: Chris Mason Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads") Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 2 +- fs/btrfs/inode.c | 25 +++++++++++++++++++++---- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5106008f5e28..a31574df06aa 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1838,7 +1838,7 @@ static int __process_pages_contig(struct address_space *mapping, if (page_ops & PAGE_SET_PRIVATE2) SetPagePrivate2(pages[i]); - if (pages[i] == locked_page) { + if (locked_page && pages[i] == locked_page) { put_page(pages[i]); pages_locked++; continue; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6e6df0eab324..a81e9860ee1f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -666,10 +666,12 @@ static noinline void compress_file_range(struct async_chunk *async_chunk, * to our extent and set things up for the async work queue to run * cow_file_range to do the normal delalloc dance. */ - if (page_offset(async_chunk->locked_page) >= start && - page_offset(async_chunk->locked_page) <= end) + if (async_chunk->locked_page && + (page_offset(async_chunk->locked_page) >= start && + page_offset(async_chunk->locked_page)) <= end) { __set_page_dirty_nobuffers(async_chunk->locked_page); /* unlocked later on in the async handlers */ + } if (redirty) extent_range_redirty_for_io(inode, start, end); @@ -759,7 +761,7 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) async_extent->start + async_extent->ram_size - 1, WB_SYNC_ALL); - else if (ret) + else if (ret && async_chunk->locked_page) unlock_page(async_chunk->locked_page); kfree(async_extent); cond_resched(); @@ -1236,10 +1238,25 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, async_chunk[i].inode = inode; async_chunk[i].start = start; async_chunk[i].end = cur_end; - async_chunk[i].locked_page = locked_page; async_chunk[i].write_flags = write_flags; INIT_LIST_HEAD(&async_chunk[i].extents); + /* + * The locked_page comes all the way from writepage and its + * the original page we were actually given. As we spread + * this large delalloc region across multiple async_cow + * structs, only the first struct needs a pointer to locked_page + * + * This way we don't need racey decisions about who is supposed + * to unlock it. + */ + if (locked_page) { + async_chunk[i].locked_page = locked_page; + locked_page = NULL; + } else { + async_chunk[i].locked_page = NULL; + } + btrfs_init_work(&async_chunk[i].work, btrfs_delalloc_helper, async_cow_start, async_cow_submit, From patchwork Wed Jul 10 19:28:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11038855 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D5A91395 for ; Wed, 10 Jul 2019 19:28:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F9B2202A5 for ; Wed, 10 Jul 2019 19:28:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82A012873C; Wed, 10 Jul 2019 19:28:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C52E7202A5 for ; Wed, 10 Jul 2019 19:28:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728329AbfGJT2e (ORCPT ); Wed, 10 Jul 2019 15:28:34 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:41336 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728294AbfGJT2e (ORCPT ); Wed, 10 Jul 2019 15:28:34 -0400 Received: by mail-pg1-f195.google.com with SMTP id q4so1697821pgj.8; Wed, 10 Jul 2019 12:28:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=KAo4WuDRyf54TlDDwMAgNdT7JU5VFdfLCyBXfAFicR0=; b=AJIBfpzy73I3fiEnWjtI1Rq9EaTJalUxkidRxqm5b1kP+SgrCqS1OQP0/l9slWzmxW WwX+5HvIvatawBqEdrYJOrkFxAfqQlaYsqzhgYHMFZJaA8sr9e8m6/ULvXoBVM+ddZXS ldb6psmsfBDZ21w+HIL+DkNo9geu452qaYd+SgQO0YZieAfHvUxYYtdTmymQ5aMW9WU2 1ExEjBhc2RgWnYmMh3yMNvvAhyekRIRvmuP9X3jq51/hLFzfr5EG46bm/60IjtJYyMup AF1ZyXqglKdM7NM6biviIIrLa9pClhTyJoxyiYoGjANkblO/fasTKXTgnRyvvgFfBXKl 3UHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=KAo4WuDRyf54TlDDwMAgNdT7JU5VFdfLCyBXfAFicR0=; b=A4PMT0Rrf8fABz7OTTWygZJm1HT9LoibrRvvbFR74fn9XiKn60rGBnj02rhTVeOZbT QKJZIWq0oEtyCR7R6I3D0TLGk1Cn7iKmFW5p+ZwZdxaawFEfNtr6G2rFwWlbB3xlZMRU voneN8/yV2LXO/fz0hVQZJ1iLafEm6xW4Tima/T7DG1KqbQyd85K99VVJFFETMIv7HpX jVKYbdRfuyz3L10bPIWNGp/WPttzQWZyp0oE35Bj2MeZrszm/md2MJ9fEUlCnZ8FzFZb jL3hoGtnH3YLDeuHstx1paE4rhvqLsYJPS/WgACd9NskiLcdqKMja9Yd32Caay9mmUdp 8nIQ== X-Gm-Message-State: APjAAAVp9XBXYVSm5BkZv/f0guf4o5lj1BXGm6gTCulW+D8bzphOfWWr fPzm81F7OGZ0Qb5s4GFqK1Q= X-Google-Smtp-Source: APXvYqyXzZCHQUdHCdzFQDuIckehOf5px056RzVkmTgXv/uggUcEMbxpHIwKZUaPmsddyKYbjqNioA== X-Received: by 2002:a63:61cb:: with SMTP id v194mr36621307pgb.95.1562786913011; Wed, 10 Jul 2019 12:28:33 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::3:2bbe]) by smtp.gmail.com with ESMTPSA id t7sm2552894pjq.15.2019.07.10.12.28.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Jul 2019 12:28:32 -0700 (PDT) From: Tejun Heo To: josef@toxicpanda.com, clm@fb.com, dsterba@suse.com Cc: axboe@kernel.dk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/5] Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios Date: Wed, 10 Jul 2019 12:28:17 -0700 Message-Id: <20190710192818.1069475-5-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190710192818.1069475-1-tj@kernel.org> References: <20190710192818.1069475-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason Async CRCs and compression submit IO through helper threads, which means they have IO priority inversions when cgroup IO controllers are in use. This flags all of the writes submitted by btrfs helper threads as REQ_CGROUP_PUNT. submit_bio() will punt these to dedicated per-blkcg work items to avoid the priority inversion. For the compression code, we take a reference on the wbc's blkg css and pass it down to the async workers. For the async crcs, the bio already has the correct css, we just need to tell the block layer to use REQ_CGROUP_PUNT. Signed-off-by: Chris Mason Modified-and-reviewed-by: Tejun Heo Reviewed-by: Josef Bacik --- fs/btrfs/compression.c | 8 +++++++- fs/btrfs/compression.h | 3 ++- fs/btrfs/disk-io.c | 6 ++++++ fs/btrfs/extent_io.c | 3 +++ fs/btrfs/inode.c | 31 ++++++++++++++++++++++++++++--- 5 files changed, 46 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index dfc4eb9b7717..5b142d0d0a0b 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -288,7 +288,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, unsigned long compressed_len, struct page **compressed_pages, unsigned long nr_pages, - unsigned int write_flags) + unsigned int write_flags, + struct cgroup_subsys_state *blkcg_css) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct bio *bio = NULL; @@ -322,6 +323,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, bio->bi_opf = REQ_OP_WRITE | write_flags; bio->bi_private = cb; bio->bi_end_io = end_compressed_bio_write; + + if (blkcg_css) { + bio->bi_opf |= REQ_CGROUP_PUNT; + bio_associate_blkg_from_css(bio, blkcg_css); + } refcount_set(&cb->pending_bios, 1); /* create and submit bios for the compressed pages */ diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 9976fe0f7526..7cbefab96ecf 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -93,7 +93,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, unsigned long compressed_len, struct page **compressed_pages, unsigned long nr_pages, - unsigned int write_flags); + unsigned int write_flags, + struct cgroup_subsys_state *blkcg_css); blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, int mirror_num, unsigned long bio_flags); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 323cab06f2a9..cc0aa77b8128 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -799,6 +799,12 @@ static void run_one_async_done(struct btrfs_work *work) return; } + /* + * All of the bios that pass through here are from async helpers. + * Use REQ_CGROUP_PUNT to issue them from the owning cgroup's + * context. This changes nothing when cgroups aren't in use. + */ + async->bio->bi_opf |= REQ_CGROUP_PUNT; ret = btrfs_map_bio(btrfs_sb(inode->i_sb), async->bio, async->mirror_num); if (ret) { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a31574df06aa..3f3942618e92 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4173,6 +4173,9 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, .nr_to_write = nr_pages * 2, .range_start = start, .range_end = end + 1, + /* we're called from an async helper function */ + .punt_to_cgroup = 1, + .no_cgroup_owner = 1, }; while (start <= end) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a81e9860ee1f..f5515aea6012 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -357,6 +357,7 @@ struct async_extent { }; struct async_chunk { + struct cgroup_subsys_state *blkcg_css; struct inode *inode; struct page *locked_page; u64 start; @@ -846,7 +847,8 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) ins.objectid, ins.offset, async_extent->pages, async_extent->nr_pages, - async_chunk->write_flags)) { + async_chunk->write_flags, + async_chunk->blkcg_css)) { struct page *p = async_extent->pages[0]; const u64 start = async_extent->start; const u64 end = start + async_extent->ram_size - 1; @@ -1170,6 +1172,8 @@ static noinline void async_cow_free(struct btrfs_work *work) async_chunk = container_of(work, struct async_chunk, work); if (async_chunk->inode) btrfs_add_delayed_iput(async_chunk->inode); + if (async_chunk->blkcg_css) + css_put(async_chunk->blkcg_css); /* * Since the pointer to 'pending' is at the beginning of the array of * async_chunk's, freeing it ensures the whole array has been freed. @@ -1178,12 +1182,15 @@ static noinline void async_cow_free(struct btrfs_work *work) kvfree(async_chunk->pending); } -static int cow_file_range_async(struct inode *inode, struct page *locked_page, +static int cow_file_range_async(struct inode *inode, + struct writeback_control *wbc, + struct page *locked_page, u64 start, u64 end, int *page_started, unsigned long *nr_written, unsigned int write_flags) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc); struct async_cow *ctx; struct async_chunk *async_chunk; unsigned long nr_pages; @@ -1251,12 +1258,30 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, * to unlock it. */ if (locked_page) { + /* + * Depending on the compressibility, the pages + * might or might not go through async. We want + * all of them to be accounted against @wbc once. + * Let's do it here before the paths diverge. wbc + * accounting is used only for foreign writeback + * detection and doesn't need full accuracy. Just + * account the whole thing against the first page. + */ + wbc_account_cgroup_owner(wbc, locked_page, + cur_end - start); async_chunk[i].locked_page = locked_page; locked_page = NULL; } else { async_chunk[i].locked_page = NULL; } + if (blkcg_css != blkcg_root_css) { + css_get(blkcg_css); + async_chunk[i].blkcg_css = blkcg_css; + } else { + async_chunk[i].blkcg_css = NULL; + } + btrfs_init_work(&async_chunk[i].work, btrfs_delalloc_helper, async_cow_start, async_cow_submit, @@ -1653,7 +1678,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &BTRFS_I(inode)->runtime_flags); - ret = cow_file_range_async(inode, locked_page, start, end, + ret = cow_file_range_async(inode, wbc, locked_page, start, end, page_started, nr_written, write_flags); } From patchwork Wed Jul 10 19:28:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 11038853 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 389161395 for ; Wed, 10 Jul 2019 19:28:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B947202A5 for ; Wed, 10 Jul 2019 19:28:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1F9BA285E3; Wed, 10 Jul 2019 19:28:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C0A28202A5 for ; Wed, 10 Jul 2019 19:28:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728350AbfGJT2i (ORCPT ); Wed, 10 Jul 2019 15:28:38 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:44867 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728351AbfGJT2h (ORCPT ); Wed, 10 Jul 2019 15:28:37 -0400 Received: by mail-pg1-f195.google.com with SMTP id i18so1689796pgl.11; Wed, 10 Jul 2019 12:28:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=AMHDCQJp00b0GaJRAcbgw9/cQ+XdnOJMO0emTYwVbRg=; b=dFXO1h8FL9+uwk8NuikXeMlIOY+2EtJeG1pODvDW1BJWdwxUCE92e2o4M0ty29UNpq TcKJcmVQqlo6NPT4atT7VQgeCwtcyt7i5meNQkjORcow/RbPm0BL4qM4rOafHrx8wskM /q9bP3BrgeXVczkJy4ebFbtzsisB2QIjATFyGtI5KXL4tAH/pY/sZByfaHFyBkv3symS kC6jxh+Rv+ncSrxXTEJhEEbS1wQZCWYC4QJ0aDz5tXmRPaGA/OxAi3fTDHoufU53lH8F H6Fosk+09N7Hn5XGML5i31/QzGGODuh8T9z2pPKYKFZfNBdCxY4KWQbVhz9l4sl9mEqs Ho8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=AMHDCQJp00b0GaJRAcbgw9/cQ+XdnOJMO0emTYwVbRg=; b=QMzslUaepMEH15/dQFohemB8zH5RqUTX9ygET3/OzIVTyPa/ZMBGXlt08IeUBjhQ38 9FVvlw/jFBqcjcfNzmdzA+Ku1kvW522Ol8G3oryEtGRRDo59oQhZKkTGJTCTwJnSPF1J pdkCbOQIexbn480829qifJ+qkNpGzJo//zhwd7nCoz04Lm+IawZQJlYBgyXmbBeKG4LD XMiMZkr0YLSUg/cDyK5HoI9hOrwvruHIZvqe1k5r2ve7amSK5B1CekUS4vmCZqkzMvBP eu/aksJOIV7qqT6MGstv3p5gM0zFay5TiurjnAnAUBw+Sz40+v69gBpQoj4ns72lmHYr XGjA== X-Gm-Message-State: APjAAAWNvljMWzOXUXvogUfIutKcCLWb0z+NG1s+/1ap0WUgiuQu8Je+ VY9TXThwqMgeAiZot0oqQs0= X-Google-Smtp-Source: APXvYqxMQkIH3kGzvJAbYuoN2UhpbBfgR6Zwo0oeACvIVcQgajo9nf/l/U4eklqRAzJHsPSIYWXAwA== X-Received: by 2002:a63:fa0d:: with SMTP id y13mr38722141pgh.258.1562786915400; Wed, 10 Jul 2019 12:28:35 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::3:2bbe]) by smtp.gmail.com with ESMTPSA id q22sm3049645pgh.49.2019.07.10.12.28.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Jul 2019 12:28:35 -0700 (PDT) From: Tejun Heo To: josef@toxicpanda.com, clm@fb.com, dsterba@suse.com Cc: axboe@kernel.dk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/5] Btrfs: extent_write_locked_range() should attach inode->i_wb Date: Wed, 10 Jul 2019 12:28:18 -0700 Message-Id: <20190710192818.1069475-6-tj@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190710192818.1069475-1-tj@kernel.org> References: <20190710192818.1069475-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Mason extent_write_locked_range() is used when we're falling back to buffered IO from inside of compression. It allocates its own wbc and should associate it with the inode's i_wb to make sure the IO goes down from the correct cgroup. Signed-off-by: Chris Mason Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3f3942618e92..5606a38b64ff 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4178,6 +4178,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, .no_cgroup_owner = 1, }; + wbc_attach_fdatawrite_inode(&wbc_writepages, inode); while (start <= end) { page = find_get_page(mapping, start >> PAGE_SHIFT); if (clear_page_dirty_for_io(page)) @@ -4192,11 +4193,12 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end, } ASSERT(ret <= 0); - if (ret < 0) { + if (ret == 0) + ret = flush_write_bio(&epd); + else end_write_bio(&epd, ret); - return ret; - } - ret = flush_write_bio(&epd); + + wbc_detach_inode(&wbc_writepages); return ret; }