From patchwork Tue Feb 9 20:30:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Rostecki X-Patchwork-Id: 12079231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DA1CC4332D for ; Tue, 9 Feb 2021 21:31:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6D87664E9D for ; Tue, 9 Feb 2021 21:31:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234182AbhBIV3J (ORCPT ); Tue, 9 Feb 2021 16:29:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:51920 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234159AbhBIUyW (ORCPT ); Tue, 9 Feb 2021 15:54:22 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BE040ADE1; Tue, 9 Feb 2021 20:31:19 +0000 (UTC) From: Michal Rostecki To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org (open list:BTRFS FILE SYSTEM), linux-kernel@vger.kernel.org (open list) Cc: Michal Rostecki Subject: [PATCH RFC 1/6] btrfs: Add inflight BIO request counter Date: Tue, 9 Feb 2021 21:30:35 +0100 Message-Id: <20210209203041.21493-2-mrostecki@suse.de> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210209203041.21493-1-mrostecki@suse.de> References: <20210209203041.21493-1-mrostecki@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Michal Rostecki Add a per-CPU inflight BIO counter to btrfs_device which stores the number of requests currently processed by the device. This information is going to be used in roundrobin raid1 read policy. Signed-off-by: Michal Rostecki --- fs/btrfs/volumes.c | 11 +++++++++-- fs/btrfs/volumes.h | 3 +++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 3948f5b50d11..d4f452dcce95 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -376,6 +376,7 @@ void btrfs_free_device(struct btrfs_device *device) extent_io_tree_release(&device->alloc_state); bio_put(device->flush_bio); btrfs_destroy_dev_zone_info(device); + percpu_counter_destroy(&device->inflight); kfree(device); } @@ -439,6 +440,11 @@ static struct btrfs_device *__alloc_device(struct btrfs_fs_info *fs_info) extent_io_tree_init(fs_info, &dev->alloc_state, IO_TREE_DEVICE_ALLOC_STATE, NULL); + if (percpu_counter_init(&dev->inflight, 0, GFP_KERNEL)) { + kfree(dev); + return ERR_PTR(-ENOMEM); + } + return dev; } @@ -6305,6 +6311,7 @@ static inline void btrfs_end_bbio(struct btrfs_bio *bbio, struct bio *bio) static void btrfs_end_bio(struct bio *bio) { + struct btrfs_device *dev = btrfs_io_bio(bio)->device; struct btrfs_bio *bbio = bio->bi_private; int is_orig_bio = 0; @@ -6312,8 +6319,6 @@ static void btrfs_end_bio(struct bio *bio) atomic_inc(&bbio->error); if (bio->bi_status == BLK_STS_IOERR || bio->bi_status == BLK_STS_TARGET) { - struct btrfs_device *dev = btrfs_io_bio(bio)->device; - ASSERT(dev->bdev); if (bio_op(bio) == REQ_OP_WRITE) btrfs_dev_stat_inc_and_print(dev, @@ -6331,6 +6336,7 @@ static void btrfs_end_bio(struct bio *bio) is_orig_bio = 1; btrfs_bio_counter_dec(bbio->fs_info); + percpu_counter_dec(&dev->inflight); if (atomic_dec_and_test(&bbio->stripes_pending)) { if (!is_orig_bio) { @@ -6375,6 +6381,7 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, bio_set_dev(bio, dev->bdev); btrfs_bio_counter_inc_noblocked(fs_info); + percpu_counter_inc(&dev->inflight); btrfsic_submit_bio(bio); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 04e2b26823c2..938c5292250c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -143,6 +143,9 @@ struct btrfs_device { struct completion kobj_unregister; /* For sysfs/FSID/devinfo/devid/ */ struct kobject devid_kobj; + + /* I/O stats for raid1 mirror selection */ + struct percpu_counter inflight; }; /* From patchwork Tue Feb 9 20:30:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Rostecki X-Patchwork-Id: 12079225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 949FBC433DB for ; Tue, 9 Feb 2021 21:31:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5869D64E8C for ; Tue, 9 Feb 2021 21:31:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233723AbhBIV0B (ORCPT ); Tue, 9 Feb 2021 16:26:01 -0500 Received: from mx2.suse.de ([195.135.220.15]:51924 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234161AbhBIUyV (ORCPT ); Tue, 9 Feb 2021 15:54:21 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 98BBDB109; Tue, 9 Feb 2021 20:31:21 +0000 (UTC) From: Michal Rostecki To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org (open list:BTRFS FILE SYSTEM), linux-kernel@vger.kernel.org (open list) Cc: Michal Rostecki Subject: [PATCH RFC 2/6] btrfs: Store the last device I/O offset Date: Tue, 9 Feb 2021 21:30:36 +0100 Message-Id: <20210209203041.21493-3-mrostecki@suse.de> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210209203041.21493-1-mrostecki@suse.de> References: <20210209203041.21493-1-mrostecki@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Michal Rostecki Add an atomic field which stores the physical offset of the last I/O operation scheduled to the device. This information is going to be used to measure the locality of I/O requests. Signed-off-by: Michal Rostecki --- fs/btrfs/volumes.c | 4 ++++ fs/btrfs/volumes.h | 1 + 2 files changed, 5 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d4f452dcce95..292175206873 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -444,6 +444,7 @@ static struct btrfs_device *__alloc_device(struct btrfs_fs_info *fs_info) kfree(dev); return ERR_PTR(-ENOMEM); } + atomic_set(&dev->last_offset, 0); return dev; } @@ -6368,11 +6369,13 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, u64 physical, struct btrfs_device *dev) { struct btrfs_fs_info *fs_info = bbio->fs_info; + u64 length; bio->bi_private = bbio; btrfs_io_bio(bio)->device = dev; bio->bi_end_io = btrfs_end_bio; bio->bi_iter.bi_sector = physical >> 9; + length = bio->bi_iter.bi_size; btrfs_debug_in_rcu(fs_info, "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, @@ -6382,6 +6385,7 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_bio_counter_inc_noblocked(fs_info); percpu_counter_inc(&dev->inflight); + atomic_set(&dev->last_offset, physical + length); btrfsic_submit_bio(bio); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 938c5292250c..6e544317a377 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -146,6 +146,7 @@ struct btrfs_device { /* I/O stats for raid1 mirror selection */ struct percpu_counter inflight; + atomic_t last_offset; }; /* From patchwork Tue Feb 9 20:30:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Rostecki X-Patchwork-Id: 12079205 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7E2BC433DB for ; Tue, 9 Feb 2021 21:25:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 68B2864E92 for ; Tue, 9 Feb 2021 21:25:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233985AbhBIVYd (ORCPT ); Tue, 9 Feb 2021 16:24:33 -0500 Received: from mx2.suse.de ([195.135.220.15]:51922 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234158AbhBIUyV (ORCPT ); Tue, 9 Feb 2021 15:54:21 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 753BEB14C; Tue, 9 Feb 2021 20:31:23 +0000 (UTC) From: Michal Rostecki To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org (open list:BTRFS FILE SYSTEM), linux-kernel@vger.kernel.org (open list) Cc: Michal Rostecki Subject: [PATCH RFC 3/6] btrfs: Add stripe_physical function Date: Tue, 9 Feb 2021 21:30:37 +0100 Message-Id: <20210209203041.21493-4-mrostecki@suse.de> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210209203041.21493-1-mrostecki@suse.de> References: <20210209203041.21493-1-mrostecki@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Michal Rostecki Move the calculation of the physical address for a stripe to the new function - stripe_physical(). It can be used by raid1 read policies to calculate the offset and select mirrors based on I/O locality. Signed-off-by: Michal Rostecki --- fs/btrfs/volumes.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 292175206873..1ac364a2f105 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5498,6 +5498,23 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len) return ret; } +/* + * Calculates the physical location for the given stripe and I/O geometry. + * + * @map: mapping containing the logical extent + * @stripe_index: index of the stripe to make a calculation for + * @stripe_offset: offset of the block in its stripe + * @stripe_nr: index of the stripe whete the block falls in + * + * Returns the physical location. + */ +static u64 stripe_physical(struct map_lookup *map, u32 stripe_index, + u64 stripe_offset, u64 stripe_nr) +{ + return map->stripes[stripe_index].physical + stripe_offset + + stripe_nr * map->stripe_len; +} + static int find_live_mirror(struct btrfs_fs_info *fs_info, struct map_lookup *map, int first, int dev_replace_is_ongoing) @@ -6216,8 +6233,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, } for (i = 0; i < num_stripes; i++) { - bbio->stripes[i].physical = map->stripes[stripe_index].physical + - stripe_offset + stripe_nr * map->stripe_len; + bbio->stripes[i].physical = stripe_physical(map, stripe_index, + stripe_offset, + stripe_nr); bbio->stripes[i].dev = map->stripes[stripe_index].dev; stripe_index++; } From patchwork Tue Feb 9 20:30:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Rostecki X-Patchwork-Id: 12079227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E399C433E9 for ; Tue, 9 Feb 2021 21:31:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EAC7164EAC for ; Tue, 9 Feb 2021 21:31:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234148AbhBIV1z (ORCPT ); Tue, 9 Feb 2021 16:27:55 -0500 Received: from mx2.suse.de ([195.135.220.15]:51930 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234152AbhBIUyW (ORCPT ); Tue, 9 Feb 2021 15:54:22 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 614F1B1E0; Tue, 9 Feb 2021 20:31:25 +0000 (UTC) From: Michal Rostecki To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org (open list:BTRFS FILE SYSTEM), linux-kernel@vger.kernel.org (open list) Cc: Michal Rostecki Subject: [PATCH RFC 4/6] btrfs: Check if the filesystem is has mixed type of devices Date: Tue, 9 Feb 2021 21:30:38 +0100 Message-Id: <20210209203041.21493-5-mrostecki@suse.de> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210209203041.21493-1-mrostecki@suse.de> References: <20210209203041.21493-1-mrostecki@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Michal Rostecki Add the btrfs_check_mixed() function which checks if the filesystem has the mixed type of devices (non-rotational and rotational). This information is going to be used in roundrobin raid1 read policy. Signed-off-by: Michal Rostecki --- fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 7 +++++++ 2 files changed, 49 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1ac364a2f105..1ad30a595722 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -617,6 +617,35 @@ static int btrfs_free_stale_devices(const char *path, return ret; } +/* + * Checks if after adding the new device the filesystem is going to have mixed + * types of devices (non-rotational and rotational). + * + * @fs_devices: list of devices + * @new_device_rotating: if the new device is rotational + * + * Returns true if there are mixed types of devices, otherwise returns false. + */ +static bool btrfs_check_mixed(struct btrfs_fs_devices *fs_devices, + bool new_device_rotating) +{ + struct btrfs_device *device, *prev_device; + + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (prev_device == NULL && + device->rotating != new_device_rotating) + return true; + if (prev_device != NULL && + (device->rotating != prev_device->rotating || + device->rotating != new_device_rotating)) + return true; + + prev_device = device; + } + + return false; +} + /* * This is only used on mount, and we are protected from competing things * messing with our fs_devices by the uuid_mutex, thus we do not need the @@ -629,6 +658,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, struct request_queue *q; struct block_device *bdev; struct btrfs_super_block *disk_super; + bool rotating; u64 devid; int ret; @@ -669,8 +699,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, } q = bdev_get_queue(bdev); - if (!blk_queue_nonrot(q)) + rotating = !blk_queue_nonrot(q); + device->rotating = rotating; + if (rotating) fs_devices->rotating = true; + if (!fs_devices->mixed) + fs_devices->mixed = btrfs_check_mixed(fs_devices, rotating); device->bdev = bdev; clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); @@ -2418,6 +2452,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info) fs_devices->open_devices = 0; fs_devices->missing_devices = 0; fs_devices->rotating = false; + fs_devices->mixed = false; list_add(&seed_devices->seed_list, &fs_devices->seed_list); generate_random_uuid(fs_devices->fsid); @@ -2522,6 +2557,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path int seeding_dev = 0; int ret = 0; bool locked = false; + bool rotating; if (sb_rdonly(sb) && !fs_devices->seeding) return -EROFS; @@ -2621,8 +2657,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path atomic64_add(device->total_bytes, &fs_info->free_chunk_space); - if (!blk_queue_nonrot(q)) + rotating = !blk_queue_nonrot(q); + device->rotating = rotating; + if (rotating) fs_devices->rotating = true; + if (!fs_devices->mixed) + fs_devices->mixed = btrfs_check_mixed(fs_devices, rotating); orig_super_total_bytes = btrfs_super_total_bytes(fs_info->super_copy); btrfs_set_super_total_bytes(fs_info->super_copy, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 6e544317a377..594f1207281c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -147,6 +147,9 @@ struct btrfs_device { /* I/O stats for raid1 mirror selection */ struct percpu_counter inflight; atomic_t last_offset; + + /* If the device is rotational */ + bool rotating; }; /* @@ -274,6 +277,10 @@ struct btrfs_fs_devices { * nonrot flag set */ bool rotating; + /* Set when we find or add both nonrot and rot disks in the + * filesystem + */ + bool mixed; struct btrfs_fs_info *fs_info; /* sysfs kobjects */ From patchwork Tue Feb 9 20:30:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Rostecki X-Patchwork-Id: 12079117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A58EFC433E0 for ; Tue, 9 Feb 2021 20:40:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E38564ECE for ; Tue, 9 Feb 2021 20:40:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233818AbhBIUjl (ORCPT ); Tue, 9 Feb 2021 15:39:41 -0500 Received: from mx2.suse.de ([195.135.220.15]:45416 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233712AbhBIUgg (ORCPT ); Tue, 9 Feb 2021 15:36:36 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 49802B146; Tue, 9 Feb 2021 20:31:27 +0000 (UTC) From: Michal Rostecki To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org (open list:BTRFS FILE SYSTEM), linux-kernel@vger.kernel.org (open list) Cc: Michal Rostecki Subject: [PATCH RFC 5/6] btrfs: sysfs: Add directory for read policies Date: Tue, 9 Feb 2021 21:30:39 +0100 Message-Id: <20210209203041.21493-6-mrostecki@suse.de> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210209203041.21493-1-mrostecki@suse.de> References: <20210209203041.21493-1-mrostecki@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Michal Rostecki Before this change, raid1 read policy could be selected by using the /sys/fs/btrfs/[fsid]/read_policy file. Change it to /sys/fs/btrfs/[fsid]/read_policies/policy. The motivation behing creating the read_policies directory is that the next changes and new read policies are going to intruduce settings specific to read policies. Signed-off-by: Michal Rostecki --- fs/btrfs/sysfs.c | 51 +++++++++++++++++++++++++++++++++------------- fs/btrfs/volumes.h | 1 + 2 files changed, 38 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 19b9fffa2c9c..a8f528eb4e50 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -896,6 +896,19 @@ static ssize_t btrfs_generation_show(struct kobject *kobj, } BTRFS_ATTR(, generation, btrfs_generation_show); +static const struct attribute *btrfs_attrs[] = { + BTRFS_ATTR_PTR(, label), + BTRFS_ATTR_PTR(, nodesize), + BTRFS_ATTR_PTR(, sectorsize), + BTRFS_ATTR_PTR(, clone_alignment), + BTRFS_ATTR_PTR(, quota_override), + BTRFS_ATTR_PTR(, metadata_uuid), + BTRFS_ATTR_PTR(, checksum), + BTRFS_ATTR_PTR(, exclusive_operation), + BTRFS_ATTR_PTR(, generation), + NULL, +}; + /* * Look for an exact string @string in @buffer with possible leading or * trailing whitespace @@ -920,7 +933,7 @@ static const char * const btrfs_read_policy_name[] = { "pid" }; static ssize_t btrfs_read_policy_show(struct kobject *kobj, struct kobj_attribute *a, char *buf) { - struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj); + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); ssize_t ret = 0; int i; @@ -944,7 +957,7 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj, struct kobj_attribute *a, const char *buf, size_t len) { - struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj); + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); int i; for (i = 0; i < BTRFS_NR_READ_POLICY; i++) { @@ -961,19 +974,10 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj, return -EINVAL; } -BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store); +BTRFS_ATTR_RW(read_policies, policy, btrfs_read_policy_show, btrfs_read_policy_store); -static const struct attribute *btrfs_attrs[] = { - BTRFS_ATTR_PTR(, label), - BTRFS_ATTR_PTR(, nodesize), - BTRFS_ATTR_PTR(, sectorsize), - BTRFS_ATTR_PTR(, clone_alignment), - BTRFS_ATTR_PTR(, quota_override), - BTRFS_ATTR_PTR(, metadata_uuid), - BTRFS_ATTR_PTR(, checksum), - BTRFS_ATTR_PTR(, exclusive_operation), - BTRFS_ATTR_PTR(, generation), - BTRFS_ATTR_PTR(, read_policy), +static const struct attribute *read_policies_attrs[] = { + BTRFS_ATTR_PTR(read_policies, policy), NULL, }; @@ -1112,6 +1116,12 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) sysfs_remove_link(fsid_kobj, "bdi"); + if (fs_info->fs_devices->read_policies_kobj) { + sysfs_remove_files(fs_info->fs_devices->read_policies_kobj, + read_policies_attrs); + kobject_del(fs_info->fs_devices->read_policies_kobj); + kobject_put(fs_info->fs_devices->read_policies_kobj); + } if (fs_info->space_info_kobj) { sysfs_remove_files(fs_info->space_info_kobj, allocation_attrs); kobject_del(fs_info->space_info_kobj); @@ -1658,6 +1668,19 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) if (error) goto failure; + fs_devs->read_policies_kobj = kobject_create_and_add("read_policies", + fsid_kobj); + + if (!fs_devs->read_policies_kobj) { + error = -ENOMEM; + goto failure; + } + + error = sysfs_create_files(fs_devs->read_policies_kobj, + read_policies_attrs); + if (error) + goto failure; + return 0; failure: btrfs_sysfs_remove_mounted(fs_info); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 594f1207281c..ee050fd48042 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -287,6 +287,7 @@ struct btrfs_fs_devices { struct kobject fsid_kobj; struct kobject *devices_kobj; struct kobject *devinfo_kobj; + struct kobject *read_policies_kobj; struct completion kobj_unregister; enum btrfs_chunk_allocation_policy chunk_alloc_policy; From patchwork Tue Feb 9 20:30:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Rostecki X-Patchwork-Id: 12079119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-21.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10946C433E0 for ; Tue, 9 Feb 2021 20:41:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9FE5964E28 for ; Tue, 9 Feb 2021 20:41:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233857AbhBIUkR (ORCPT ); Tue, 9 Feb 2021 15:40:17 -0500 Received: from mx2.suse.de ([195.135.220.15]:45414 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233714AbhBIUgi (ORCPT ); Tue, 9 Feb 2021 15:36:38 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5D98CB165; Tue, 9 Feb 2021 20:31:29 +0000 (UTC) From: Michal Rostecki To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org (open list:BTRFS FILE SYSTEM), linux-kernel@vger.kernel.org (open list) Cc: Michal Rostecki Subject: [PATCH RFC 6/6] btrfs: Add roundrobin raid1 read policy Date: Tue, 9 Feb 2021 21:30:40 +0100 Message-Id: <20210209203041.21493-7-mrostecki@suse.de> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210209203041.21493-1-mrostecki@suse.de> References: <20210209203041.21493-1-mrostecki@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Michal Rostecki Add a new raid1 read policy `roundrobin`. For each read request, it selects the mirror which has lower load than queue depth and it starts iterating from the last used mirror (by the current CPU). Load is defined as the number of inflight requests + a potential penalty value. The policy can be enabled through sysfs: # echo roundrobin > /sys/fs/btrfs/[fsid]/read_policies/policy This policy was tested with fio and compared with the default `pid` policy. The singlethreaded test has the following parameters: [global] name=btrfs-raid1-seqread filename=btrfs-raid1-seqread rw=read bs=64k direct=0 numjobs=1 time_based=0 [file1] size=10G ioengine=libaio and shows the following results: - raid1c3 with 3 HDDs: 3 x Segate Barracuda ST2000DM008 (2TB) * pid policy READ: bw=217MiB/s (228MB/s), 217MiB/s-217MiB/s (228MB/s-228MB/s), io=10.0GiB (10.7GB), run=47082-47082msec * roundrobin policy READ: bw=409MiB/s (429MB/s), 409MiB/s-409MiB/s (429MB/s-429MB/s), io=10.0GiB (10.7GB), run=25028-25028mse - raid1c3 with 2 HDDs and 1 SSD: 2 x Segate Barracuda ST2000DM008 (2TB) 1 x Crucial CT256M550SSD1 (256GB) * pid policy (the worst case when only HDDs were chosen) READ: bw=220MiB/s (231MB/s), 220MiB/s-220MiB/s (231MB/s-231MB/s), io=10.0GiB (10.7GB), run=46577-46577mse * pid policy (the best case when SSD was used as well) READ: bw=513MiB/s (538MB/s), 513MiB/s-513MiB/s (538MB/s-538MB/s), io=10.0GiB (10.7GB), run=19954-19954msec * roundrobin (there are no noticeable differences when testing multiple times) READ: bw=541MiB/s (567MB/s), 541MiB/s-541MiB/s (567MB/s-567MB/s), io=10.0GiB (10.7GB), run=18933-18933msec The multithreaded test has the following parameters: [global] name=btrfs-raid1-seqread filename=btrfs-raid1-seqread rw=read bs=64k direct=0 numjobs=8 time_based=0 [file1] size=10G ioengine=libaio and shows the following results: - raid1c3 with 3 HDDs: 3 x Segate Barracuda ST2000DM008 (2TB) 3 x Segate Barracuda ST2000DM008 (2TB) * pid policy READ: bw=1569MiB/s (1645MB/s), 196MiB/s-196MiB/s (206MB/s-206MB/s), io=80.0GiB (85.9GB), run=52210-52211msec * roundrobin READ: bw=1733MiB/s (1817MB/s), 217MiB/s-217MiB/s (227MB/s-227MB/s), io=80.0GiB (85.9GB), run=47269-47271msec - raid1c3 with 2 HDDs and 1 SSD: 2 x Segate Barracuda ST2000DM008 (2TB) 1 x Crucial CT256M550SSD1 (256GB) * pid policy READ: bw=1843MiB/s (1932MB/s), 230MiB/s-230MiB/s (242MB/s-242MB/s), io=80.0GiB (85.9GB), run=44449-44450msec * roundrobin READ: bw=2485MiB/s (2605MB/s), 311MiB/s-311MiB/s (326MB/s-326MB/s), io=80.0GiB (85.9GB), run=32969-32970msec The penalty value is an additional value added to the number of inflight requests when a scheduled request is non-local (which means it would start from the different physical location than the physical location of the last request processed by the given device). By default, it's applied only in filesystems which have mixed types of devices (non-rotational and rotational), but it can be configured to be applied without that condition. The configuration is done through sysfs: - /sys/fs/btrfs/[fsid]/read_policies/roundrobin_nonlocal_inc_mixed_only where 1 (the default) value means applying penalty only in mixed arrays, 0 means applying it unconditionally. The exact penalty value is defined separately for non-rotational and rotational devices. By default, it's 0 for non-rotational devices and 1 for rotational devices. Both values are configurable through sysfs: - /sys/fs/btrfs/[fsid]/read_policies/roundrobin_nonrot_nonlocal_inc - /sys/fs/btrfs/[fsid]/read_policies/roundrobin_rot_nonlocal_inc To sum it up - the default case is applying the penalty under the following conditions: - the raid1 array consists of mixed types of devices - the scheduled request is going to be non-local for the given disk - the device is rotational That default case is based on a slight preference towards non-rotational disks in mixed arrays and has proven to give the best performance in tested arrays. For the array with 3 HDDs, not adding any penalty resulted in 409MiB/s (429MB/s) performance. Adding the penalty value 1 resulted in a performance drop to 404MiB/s (424MB/s). Increasing the value towards 10 was making the performance even worse. For the array with 2 HDDs and 1 SSD, adding penalty value 1 to rotational disks resulted in the best performance - 541MiB/s (567MB/s). Not adding any value and increasing the value was making the performance worse. Adding penalty value to non-rotational disks was always decreasing the performance, which motivated setting it as 0 by default. For the purpose of testing, it's still configurable. To measure the performance of each policy and find optimal penalty values, I created scripts which are available here: https://gitlab.com/vadorovsky/btrfs-perf https://github.com/mrostecki/btrfs-perf Signed-off-by: Michal Rostecki --- fs/btrfs/ctree.h | 3 + fs/btrfs/disk-io.c | 3 + fs/btrfs/sysfs.c | 93 +++++++++++++++++++++++++++++- fs/btrfs/volumes.c | 137 ++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.h | 10 ++++ 5 files changed, 242 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a9b0521d9e89..6ff0a18fd219 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -976,6 +976,9 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; + /* Last mirror picked in round-robin selection */ + int __percpu *last_mirror; + #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 71fab77873a5..937fcadbdd2f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1547,6 +1547,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) btrfs_extent_buffer_leak_debug_check(fs_info); kfree(fs_info->super_copy); kfree(fs_info->super_for_commit); + free_percpu(fs_info->last_mirror); kvfree(fs_info); } @@ -2857,6 +2858,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) fs_info->swapfile_pins = RB_ROOT; fs_info->send_in_progress = 0; + + fs_info->last_mirror = __alloc_percpu(sizeof(int), __alignof__(int)); } static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index a8f528eb4e50..b9a6d38843ef 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -928,7 +928,7 @@ static bool strmatch(const char *buffer, const char *string) return false; } -static const char * const btrfs_read_policy_name[] = { "pid" }; +static const char * const btrfs_read_policy_name[] = { "pid", "roundrobin" }; static ssize_t btrfs_read_policy_show(struct kobject *kobj, struct kobj_attribute *a, char *buf) @@ -976,8 +976,99 @@ static ssize_t btrfs_read_policy_store(struct kobject *kobj, } BTRFS_ATTR_RW(read_policies, policy, btrfs_read_policy_show, btrfs_read_policy_store); +static ssize_t btrfs_roundrobin_nonlocal_inc_mixed_only_show( + struct kobject *kobj, struct kobj_attribute *a, char *buf) +{ + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); + + return scnprintf(buf, PAGE_SIZE, "%d\n", + READ_ONCE(fs_devices->roundrobin_nonlocal_inc_mixed_only)); +} + +static ssize_t btrfs_roundrobin_nonlocal_inc_mixed_only_store( + struct kobject *kobj, struct kobj_attribute *a, const char *buf, + size_t len) +{ + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); + bool val; + int ret; + + ret = kstrtobool(buf, &val); + if (ret) + return -EINVAL; + + WRITE_ONCE(fs_devices->roundrobin_nonlocal_inc_mixed_only, val); + return len; +} +BTRFS_ATTR_RW(read_policies, roundrobin_nonlocal_inc_mixed_only, + btrfs_roundrobin_nonlocal_inc_mixed_only_show, + btrfs_roundrobin_nonlocal_inc_mixed_only_store); + +static ssize_t btrfs_roundrobin_nonrot_nonlocal_inc_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); + + return scnprintf(buf, PAGE_SIZE, "%d\n", + READ_ONCE(fs_devices->roundrobin_nonrot_nonlocal_inc)); +} + +static ssize_t btrfs_roundrobin_nonrot_nonlocal_inc_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, + size_t len) +{ + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); + u32 val; + int ret; + + ret = kstrtou32(buf, 10, &val); + if (ret) + return -EINVAL; + + WRITE_ONCE(fs_devices->roundrobin_nonrot_nonlocal_inc, val); + return len; +} +BTRFS_ATTR_RW(read_policies, roundrobin_nonrot_nonlocal_inc, + btrfs_roundrobin_nonrot_nonlocal_inc_show, + btrfs_roundrobin_nonrot_nonlocal_inc_store); + +static ssize_t btrfs_roundrobin_rot_nonlocal_inc_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); + + return scnprintf(buf, PAGE_SIZE, "%d\n", + READ_ONCE(fs_devices->roundrobin_rot_nonlocal_inc)); +} + +static ssize_t btrfs_roundrobin_rot_nonlocal_inc_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, + size_t len) +{ + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj->parent); + u32 val; + int ret; + + ret = kstrtou32(buf, 10, &val); + if (ret) + return -EINVAL; + + WRITE_ONCE(fs_devices->roundrobin_rot_nonlocal_inc, val); + return len; +} +BTRFS_ATTR_RW(read_policies, roundrobin_rot_nonlocal_inc, + btrfs_roundrobin_rot_nonlocal_inc_show, + btrfs_roundrobin_rot_nonlocal_inc_store); + static const struct attribute *read_policies_attrs[] = { BTRFS_ATTR_PTR(read_policies, policy), + BTRFS_ATTR_PTR(read_policies, roundrobin_nonlocal_inc_mixed_only), + BTRFS_ATTR_PTR(read_policies, roundrobin_nonrot_nonlocal_inc), + BTRFS_ATTR_PTR(read_policies, roundrobin_rot_nonlocal_inc), NULL, }; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1ad30a595722..c6dd393190f6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1265,6 +1265,11 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices, fs_devices->total_rw_bytes = 0; fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR; fs_devices->read_policy = BTRFS_READ_POLICY_PID; + fs_devices->roundrobin_nonlocal_inc_mixed_only = true; + fs_devices->roundrobin_nonrot_nonlocal_inc = + BTRFS_DEFAULT_ROUNDROBIN_NONROT_NONLOCAL_INC; + fs_devices->roundrobin_rot_nonlocal_inc = + BTRFS_DEFAULT_ROUNDROBIN_ROT_NONLOCAL_INC; return 0; } @@ -5555,8 +5560,125 @@ static u64 stripe_physical(struct map_lookup *map, u32 stripe_index, stripe_nr * map->stripe_len; } +/* + * Calculates the load of the given mirror. Load is defines as the number of + * inflight requests + potential penalty value. + * + * @fs_info: the filesystem + * @map: mapping containing the logical extent + * @mirror_index: number of mirror to check + * @stripe_offset: offset of the block in its stripe + * @stripe_nr: index of the stripe whete the block falls in + */ +static int mirror_load(struct btrfs_fs_info *fs_info, struct map_lookup *map, + int mirror_index, u64 stripe_offset, u64 stripe_nr) +{ + struct btrfs_fs_devices *fs_devices; + struct btrfs_device *dev; + int last_offset; + u64 physical; + int load; + + dev = map->stripes[mirror_index].dev; + load = percpu_counter_sum(&dev->inflight); + last_offset = atomic_read(&dev->last_offset); + physical = stripe_physical(map, mirror_index, stripe_offset, stripe_nr); + + fs_devices = fs_info->fs_devices; + + /* + * If the filesystem has mixed type of devices (or we enable adding a + * penalty value regardless) and the request is non-local, add a + * penalty value. + */ + if ((!fs_devices->roundrobin_nonlocal_inc_mixed_only || + fs_devices->mixed) && last_offset != physical) { + if (dev->rotating) + return load + fs_devices->roundrobin_rot_nonlocal_inc; + return load + fs_devices->roundrobin_nonrot_nonlocal_inc; + } + + return load; +} + +/* + * Checks if the given mirror can process more requests. + * + * @fs_info: the filesystem + * @map: mapping containing the logical extent + * @mirror_index: index of the mirror to check + * @stripe_offset: offset of the block in its stripe + * @stripe_nr: index of the stripe whete the block falls in + * + * Returns true if more requests can be processes, otherwise returns false. + */ +static bool mirror_queue_not_filled(struct btrfs_fs_info *fs_info, + struct map_lookup *map, int mirror_index, + u64 stripe_offset, u64 stripe_nr) +{ + struct block_device *bdev; + unsigned int queue_depth; + int inflight; + + bdev = map->stripes[mirror_index].dev->bdev; + inflight = mirror_load(fs_info, map, mirror_index, stripe_offset, + stripe_nr); + queue_depth = blk_queue_depth(bdev->bd_disk->queue); + + return inflight < queue_depth; +} + +/* + * Find a mirror using the round-robin technique which has lower load than + * queue depth. Load is defined as the number of inflight requests + potential + * penalty value. + * + * @fs_info: the filesystem + * @map: mapping containing the logical extent + * @first: index of the first device in the stripes array + * @num_stripes: number of stripes in the stripes array + * @stripe_offset: offset of the block in its stripe + * @stripe_nr: index of the stripe whete the block falls in + * + * Returns the index of selected mirror. + */ +static int find_live_mirror_roundrobin(struct btrfs_fs_info *fs_info, + struct map_lookup *map, int first, + int num_stripes, u64 stripe_offset, + u64 stripe_nr) +{ + int preferred_mirror; + int last_mirror; + int i; + + last_mirror = this_cpu_read(*fs_info->last_mirror); + + for (i = last_mirror; i < first + num_stripes; i++) { + if (mirror_queue_not_filled(fs_info, map, i, stripe_offset, + stripe_nr)) { + preferred_mirror = i; + goto out; + } + } + + for (i = first; i < last_mirror; i++) { + if (mirror_queue_not_filled(fs_info, map, i, stripe_offset, + stripe_nr)) { + preferred_mirror = i; + goto out; + } + } + + preferred_mirror = last_mirror; + +out: + this_cpu_write(*fs_info->last_mirror, preferred_mirror); + return preferred_mirror; +} + static int find_live_mirror(struct btrfs_fs_info *fs_info, struct map_lookup *map, int first, + u64 stripe_offset, u64 stripe_nr, int dev_replace_is_ongoing) { int i; @@ -5584,6 +5706,11 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, case BTRFS_READ_POLICY_PID: preferred_mirror = first + (current->pid % num_stripes); break; + case BTRFS_READ_POLICY_ROUNDROBIN: + preferred_mirror = find_live_mirror_roundrobin( + fs_info, map, first, num_stripes, stripe_offset, + stripe_nr); + break; } if (dev_replace_is_ongoing && @@ -6178,7 +6305,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, stripe_index = mirror_num - 1; else { stripe_index = find_live_mirror(fs_info, map, 0, - dev_replace_is_ongoing); + stripe_offset, + stripe_nr, + dev_replace_is_ongoing); mirror_num = stripe_index + 1; } @@ -6204,8 +6333,10 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, else { int old_stripe_index = stripe_index; stripe_index = find_live_mirror(fs_info, map, - stripe_index, - dev_replace_is_ongoing); + stripe_index, + stripe_offset, + stripe_nr, + dev_replace_is_ongoing); mirror_num = stripe_index - old_stripe_index + 1; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ee050fd48042..47ca47b60ea9 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -230,9 +230,15 @@ enum btrfs_chunk_allocation_policy { enum btrfs_read_policy { /* Use process PID to choose the stripe */ BTRFS_READ_POLICY_PID, + /* Round robin */ + BTRFS_READ_POLICY_ROUNDROBIN, BTRFS_NR_READ_POLICY, }; +/* Default raid1 policies config */ +#define BTRFS_DEFAULT_ROUNDROBIN_NONROT_NONLOCAL_INC 0 +#define BTRFS_DEFAULT_ROUNDROBIN_ROT_NONLOCAL_INC 1 + struct btrfs_fs_devices { u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */ u8 metadata_uuid[BTRFS_FSID_SIZE]; @@ -294,6 +300,10 @@ struct btrfs_fs_devices { /* Policy used to read the mirrored stripes */ enum btrfs_read_policy read_policy; + /* Policies config */ + bool roundrobin_nonlocal_inc_mixed_only; + u32 roundrobin_nonrot_nonlocal_inc; + u32 roundrobin_rot_nonlocal_inc; }; #define BTRFS_BIO_INLINE_CSUM_SIZE 64