From patchwork Fri Apr 18 01:09:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 14056481 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D63254A2D; Fri, 18 Apr 2025 01:16:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939014; cv=none; b=mVH1H+X542Z4Xa7gQtWXmveW2Eds/+Qw00/fgRcp8nPvlmPk/3HWxhbo1vYgOaSR/08nyswxUdFOVHCnJIGfQGqspeEJMDKJq+19zkUhKeNY0FndNXVNglpujPW9YSy4+4L3ObpjoRsHgIJafX42ik1mVBtAwj+UyO9L/E+3IEQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939014; c=relaxed/simple; bh=Jau39SBrXHknyOdYiaz9O82TYiffYp3tf+jbGEJAANo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MAEmajT8fdarY5LJ4cNvWdzXWAyZFJYO91vyCiuZXr4A57N3Qkswy0Fi0BpGdPKQJ3q9Yth1TetjYh8Y6/QXd2x6NHw227l0MSPFGodOvnG3ukWFCeRmpZwrKSGqNjPCFWiTtEeXsM/DpWajPLUCE2A++s0Bcr4yw0G6qfxY0Ew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Zdxg13XNhz4f3k68; Fri, 18 Apr 2025 09:16:29 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id DC4041A06D7; Fri, 18 Apr 2025 09:16:47 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDHK2D8pwFoK5w2Jw--.18201S5; Fri, 18 Apr 2025 09:16:47 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, nadav.amit@gmail.com, ubizjak@gmail.com, cl@linux.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH v2 1/5] block: cleanup and export bdev IO inflight APIs Date: Fri, 18 Apr 2025 09:09:37 +0800 Message-Id: <20250418010941.667138-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250418010941.667138-1-yukuai1@huaweicloud.com> References: <20250418010941.667138-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgDHK2D8pwFoK5w2Jw--.18201S5 X-Coremail-Antispam: 1UD129KBjvJXoW3AF1rtr1rXr4kCr15GF43trb_yoWxtr17pr 45GasxJrWUKr1xur1Utw4xXr1ftw4qkryIvr1fA34akF4DtrnxZF1ktr1kArySvrWkAFW7 uw1YyF9rGF1jkwUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I 0E8cxan2IY04v7MxkF7I0En4kS14v26r4a6rW5MxAIw28IcxkI7VAKI48JMxC20s026xCa FVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_Jr Wlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j 6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr 0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUv cSsGvfC2KfnxnUUI43ZEXa7sRMv31JUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai - rename part_in_{flight, flight_rw} and to bdev_count_{inflight, inflight_rw}. - export bdev_count_inflight_rw, to fix a problem in mdraid that foreground IO can be starved by background sync IO in later patches. - reuse bdev_count_inflight_rw for bdev_count_inflight, also add warning if inflight is negative, means related driver is buggy. - remove unused blk_mq_in_flight - rename blk_mq_in_flight_rw to blk_mq_count_in_driver_rw, to distinguish from bdev_count_inflight_rw. Signed-off-by: Yu Kuai --- block/blk-core.c | 2 +- block/blk-mq.c | 15 +++--------- block/blk-mq.h | 7 +++--- block/blk.h | 1 - block/genhd.c | 48 +++++++++++++++++++-------------------- include/linux/part_stat.h | 10 ++++++++ 6 files changed, 40 insertions(+), 43 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 4623de79effa..ead2072eb3bd 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1018,7 +1018,7 @@ void update_io_ticks(struct block_device *part, unsigned long now, bool end) stamp = READ_ONCE(part->bd_stamp); if (unlikely(time_after(now, stamp)) && likely(try_cmpxchg(&part->bd_stamp, &stamp, now)) && - (end || part_in_flight(part))) + (end || bdev_count_inflight(part))) __part_stat_add(part, io_ticks, now - stamp); if (bdev_is_partition(part)) { diff --git a/block/blk-mq.c b/block/blk-mq.c index c2697db59109..a2bd77601623 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -101,18 +101,9 @@ static bool blk_mq_check_inflight(struct request *rq, void *priv) return true; } -unsigned int blk_mq_in_flight(struct request_queue *q, - struct block_device *part) -{ - struct mq_inflight mi = { .part = part }; - - blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); - - return mi.inflight[0] + mi.inflight[1]; -} - -void blk_mq_in_flight_rw(struct request_queue *q, struct block_device *part, - unsigned int inflight[2]) +void blk_mq_count_in_driver_rw(struct request_queue *q, + struct block_device *part, + unsigned int inflight[2]) { struct mq_inflight mi = { .part = part }; diff --git a/block/blk-mq.h b/block/blk-mq.h index 3011a78cf16a..3897522f014f 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -246,10 +246,9 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx) return hctx->nr_ctx && hctx->tags; } -unsigned int blk_mq_in_flight(struct request_queue *q, - struct block_device *part); -void blk_mq_in_flight_rw(struct request_queue *q, struct block_device *part, - unsigned int inflight[2]); +void blk_mq_count_in_driver_rw(struct request_queue *q, + struct block_device *part, + unsigned int inflight[2]); static inline void blk_mq_put_dispatch_budget(struct request_queue *q, int budget_token) diff --git a/block/blk.h b/block/blk.h index 006e3be433d2..f476f233f195 100644 --- a/block/blk.h +++ b/block/blk.h @@ -418,7 +418,6 @@ void blk_apply_bdi_limits(struct backing_dev_info *bdi, int blk_dev_init(void); void update_io_ticks(struct block_device *part, unsigned long now, bool end); -unsigned int part_in_flight(struct block_device *part); static inline void req_set_nomerge(struct request_queue *q, struct request *req) { diff --git a/block/genhd.c b/block/genhd.c index c2bd86cd09de..8329bd539be3 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -125,37 +125,35 @@ static void part_stat_read_all(struct block_device *part, } } -unsigned int part_in_flight(struct block_device *part) -{ - unsigned int inflight = 0; - int cpu; - - for_each_possible_cpu(cpu) { - inflight += part_stat_local_read_cpu(part, in_flight[0], cpu) + - part_stat_local_read_cpu(part, in_flight[1], cpu); - } - if ((int)inflight < 0) - inflight = 0; - - return inflight; -} - -static void part_in_flight_rw(struct block_device *part, - unsigned int inflight[2]) +/** + * bdev_count_inflight_rw - get the number of inflight IOs for a block device. + * + * @bdev: the block device. + * @inflight: return number of inflight IOs, @inflight[0] for read and + * @inflight[1] for write. + * + * For bio-based block device, IO is inflight from bdev_start_io_acct(), and for + * rq-based block device, IO is inflight from blk_account_io_start(). + * + * Noted, for rq-based block device, use blk_mq_count_in_driver_rw() to get the + * number of requests issued to driver. + */ +void bdev_count_inflight_rw(struct block_device *bdev, unsigned int inflight[2]) { int cpu; inflight[0] = 0; inflight[1] = 0; for_each_possible_cpu(cpu) { - inflight[0] += part_stat_local_read_cpu(part, in_flight[0], cpu); - inflight[1] += part_stat_local_read_cpu(part, in_flight[1], cpu); + inflight[0] += part_stat_local_read_cpu(bdev, in_flight[0], cpu); + inflight[1] += part_stat_local_read_cpu(bdev, in_flight[1], cpu); } - if ((int)inflight[0] < 0) + if (WARN_ON_ONCE((int)inflight[0] < 0)) inflight[0] = 0; - if ((int)inflight[1] < 0) + if (WARN_ON_ONCE((int)inflight[1] < 0)) inflight[1] = 0; } +EXPORT_SYMBOL_GPL(bdev_count_inflight_rw); /* * Can be deleted altogether. Later. @@ -1005,7 +1003,7 @@ ssize_t part_stat_show(struct device *dev, struct disk_stats stat; unsigned int inflight; - inflight = part_in_flight(bdev); + inflight = bdev_count_inflight(bdev); if (inflight) { part_stat_lock(); update_io_ticks(bdev, jiffies, true); @@ -1050,9 +1048,9 @@ ssize_t part_inflight_show(struct device *dev, struct device_attribute *attr, unsigned int inflight[2]; if (queue_is_mq(q)) - blk_mq_in_flight_rw(q, bdev, inflight); + blk_mq_count_in_driver_rw(q, bdev, inflight); else - part_in_flight_rw(bdev, inflight); + bdev_count_inflight_rw(bdev, inflight); return sysfs_emit(buf, "%8u %8u\n", inflight[0], inflight[1]); } @@ -1307,7 +1305,7 @@ static int diskstats_show(struct seq_file *seqf, void *v) if (bdev_is_partition(hd) && !bdev_nr_sectors(hd)) continue; - inflight = part_in_flight(hd); + inflight = bdev_count_inflight(hd); if (inflight) { part_stat_lock(); update_io_ticks(hd, jiffies, true); diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h index c5e9cac0575e..6248e59dc891 100644 --- a/include/linux/part_stat.h +++ b/include/linux/part_stat.h @@ -79,4 +79,14 @@ static inline void part_stat_set_all(struct block_device *part, int value) #define part_stat_local_read_cpu(part, field, cpu) \ local_read(&(part_stat_get_cpu(part, field, cpu))) +void bdev_count_inflight_rw(struct block_device *bdev, unsigned int inflight[2]); + +static inline unsigned int bdev_count_inflight(struct block_device *bdev) +{ + unsigned int inflight[2]; + + bdev_count_inflight_rw(bdev, inflight); + + return inflight[0] + inflight[1]; +} #endif /* _LINUX_PART_STAT_H */ From patchwork Fri Apr 18 01:09:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 14056480 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B5A038DDB; Fri, 18 Apr 2025 01:16:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939014; cv=none; b=SqP7549DQWwd+y5u0tVmNspJSVx6inPpYEQahpnms97QDrzKeTpG8c9a3DQ8z7jVBrFEvLRkmlnRv2BxiKO2kS3HZlzsC59rZfDVdBjpCe3cNiTLJoSiJ55WwvtQjWm7/CrdSm2J26tETfIHBMFepC6+w0meZB/7uR7z7c2T3xQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939014; c=relaxed/simple; bh=Ubf9NVVxVgXVJwynH42c+GpCfZfjXZQYOXILtxUffBM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CcxJ6ZdQu7QqQYB/hO3IqeK7jduYSA4iAikoRWwZC8YlJRBOkjMnWjOFdkMwgjIjP/26/NUihN2LK3mr9vgxmacNFKYpepObJH1p28M05GhjyFLv4S/J1i/mWXVuofodGk79j6TcpfXUgWlar6SJPbq6Fwx9XOjysIJmtueJ+y8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4Zdxfw68rMz4f3kFB; Fri, 18 Apr 2025 09:16:24 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 987CD1A058E; Fri, 18 Apr 2025 09:16:48 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDHK2D8pwFoK5w2Jw--.18201S6; Fri, 18 Apr 2025 09:16:48 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, nadav.amit@gmail.com, ubizjak@gmail.com, cl@linux.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH v2 2/5] md: record dm-raid gendisk in mddev Date: Fri, 18 Apr 2025 09:09:38 +0800 Message-Id: <20250418010941.667138-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250418010941.667138-1-yukuai1@huaweicloud.com> References: <20250418010941.667138-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgDHK2D8pwFoK5w2Jw--.18201S6 X-Coremail-Antispam: 1UD129KBjvJXoW7AFy8Gw4xtF1kGF4rCr1ftFb_yoW8Wryrpa 1DA3yrArW5A39Fq3WDJr4DZa45X3WqqryrKF9xCaySvayavF98Ww1kGF42qr4DGrZIgFnx ur4jvws8Kry0yrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIev Ja73UjIFyTuYvjTRNiSHDUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai Following patch will use gendisk to check if there are normal IO completed or inflight, to fix a problem in mdraid that foreground IO can be starved by background sync IO in later patches. Signed-off-by: Yu Kuai --- drivers/md/dm-raid.c | 3 +++ drivers/md/md.h | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 6adc55fd90d3..127138c61be5 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -14,6 +14,7 @@ #include "raid5.h" #include "raid10.h" #include "md-bitmap.h" +#include "dm-core.h" #include @@ -3308,6 +3309,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) /* Disable/enable discard support on raid set. */ configure_discard_support(rs); + rs->md.dm_gendisk = ti->table->md->disk; mddev_unlock(&rs->md); return 0; @@ -3327,6 +3329,7 @@ static void raid_dtr(struct dm_target *ti) mddev_lock_nointr(&rs->md); md_stop(&rs->md); + rs->md.dm_gendisk = NULL; mddev_unlock(&rs->md); if (work_pending(&rs->md.event_work)) diff --git a/drivers/md/md.h b/drivers/md/md.h index 1cf00a04bcdd..9d55b4630077 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -404,7 +404,8 @@ struct mddev { * are happening, so run/ * takeover/stop are not safe */ - struct gendisk *gendisk; + struct gendisk *gendisk; /* mdraid gendisk */ + struct gendisk *dm_gendisk; /* dm-raid gendisk */ struct kobject kobj; int hold_active; From patchwork Fri Apr 18 01:09:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 14056484 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B5D939ACF; Fri, 18 Apr 2025 01:16:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939015; cv=none; b=jaqmq8uRby2C5TR7xhcN2eYqXgyzLTewHSph1yi7r63AG1zTXjD/y80dBBAjamc0lkZITH9uyCE5kwAcv90fPOMk1smvfIJf3qQAd+N4Pd2IibgLDwxqiiNF/sOmNZFUkM5Qk/WjZ9i+dZs9B6U3lWuwn02FXCs+ADSc0Gt0nLg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939015; c=relaxed/simple; bh=KZu7IqF7NwdZavaKYK1LOOM1qL9xtr2+q32UGZ4T7O8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Y3wwqLyjHe0vKt4izzIgh22JuAzyP0GZ5QnAg3Z4zN5RNBVdF2t6X9hIJHzfIXoHngHPQ6wYVXtWsKWpthOOwgHryy80oqykqfWHEtY8LdWljto1mykd64EYKD3ETBsXJD9ruhsHRF0t/QD6xo0o2GkZqOChOJwFpBKzxZg7QJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Zdxfw07sgz4f3m6t; Fri, 18 Apr 2025 09:16:24 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 4C9421A058E; Fri, 18 Apr 2025 09:16:49 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDHK2D8pwFoK5w2Jw--.18201S7; Fri, 18 Apr 2025 09:16:49 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, nadav.amit@gmail.com, ubizjak@gmail.com, cl@linux.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH v2 3/5] md: add a new api sync_io_depth Date: Fri, 18 Apr 2025 09:09:39 +0800 Message-Id: <20250418010941.667138-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250418010941.667138-1-yukuai1@huaweicloud.com> References: <20250418010941.667138-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgDHK2D8pwFoK5w2Jw--.18201S7 X-Coremail-Antispam: 1UD129KBjvJXoWxKFW7uw47JrWrKw4fWF4xCrg_yoWxtFWrpa y2yFy3Gr1UZFZxXr43JFsxCa4rXr4fK3yUt3y7Gw1xJF13Wr9rGF1SgFW5XF9rWa4fCrnr ZF1UJFZ8ua1xtr7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIev Ja73UjIFyTuYvjTRM6wCDUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai Currently if sync speed is above speed_min and below speed_max, md_do_sync() will wait for all sync IOs to be done before issuing new sync IO, means sync IO depth is limited to just 1. This limit is too low, in order to prevent sync speed drop conspicuously after fixing is_mddev_idle() in the next patch, add a new api for limiting sync IO depth, the default value is 32. Signed-off-by: Yu Kuai --- drivers/md/md.c | 109 +++++++++++++++++++++++++++++++++++++++--------- drivers/md/md.h | 1 + 2 files changed, 91 insertions(+), 19 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 438e71e45c16..52cadfce7e8d 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -111,32 +111,48 @@ static void md_wakeup_thread_directly(struct md_thread __rcu *thread); /* Default safemode delay: 200 msec */ #define DEFAULT_SAFEMODE_DELAY ((200 * HZ)/1000 +1) /* - * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' - * is 1000 KB/sec, so the extra system load does not show up that much. - * Increase it if you want to have more _guaranteed_ speed. Note that - * the RAID driver will use the maximum available bandwidth if the IO - * subsystem is idle. There is also an 'absolute maximum' reconstruction - * speed limit - in case reconstruction slows down your system despite - * idle IO detection. + * Current RAID-1,4,5,6,10 parallel reconstruction 'guaranteed speed limit' + * is sysctl_speed_limit_min, 1000 KB/sec by default, so the extra system load + * does not show up that much. Increase it if you want to have more guaranteed + * speed. Note that the RAID driver will use the maximum bandwidth + * sysctl_speed_limit_max, 200 MB/sec by default, if the IO subsystem is idle. * - * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. - * or /sys/block/mdX/md/sync_speed_{min,max} + * Background sync IO speed control: + * + * - below speed min: + * no limit; + * - above speed min and below speed max: + * a) if mddev is idle, then no limit; + * b) if mddev is busy handling normal IO, then limit inflight sync IO + * to sync_io_depth; + * - above speed max: + * sync IO can't be issued; + * + * Following configurations can be changed via /proc/sys/dev/raid/ for system + * or /sys/block/mdX/md/ for one array. */ - static int sysctl_speed_limit_min = 1000; static int sysctl_speed_limit_max = 200000; -static inline int speed_min(struct mddev *mddev) +static int sysctl_sync_io_depth = 32; + +static int speed_min(struct mddev *mddev) { return mddev->sync_speed_min ? mddev->sync_speed_min : sysctl_speed_limit_min; } -static inline int speed_max(struct mddev *mddev) +static int speed_max(struct mddev *mddev) { return mddev->sync_speed_max ? mddev->sync_speed_max : sysctl_speed_limit_max; } +static int sync_io_depth(struct mddev *mddev) +{ + return mddev->sync_io_depth ? + mddev->sync_io_depth : sysctl_sync_io_depth; +} + static void rdev_uninit_serial(struct md_rdev *rdev) { if (!test_and_clear_bit(CollisionCheck, &rdev->flags)) @@ -293,14 +309,21 @@ static const struct ctl_table raid_table[] = { .procname = "speed_limit_min", .data = &sysctl_speed_limit_min, .maxlen = sizeof(int), - .mode = S_IRUGO|S_IWUSR, + .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "speed_limit_max", .data = &sysctl_speed_limit_max, .maxlen = sizeof(int), - .mode = S_IRUGO|S_IWUSR, + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = "sync_io_depth", + .data = &sysctl_sync_io_depth, + .maxlen = sizeof(int), + .mode = 0644, .proc_handler = proc_dointvec, }, }; @@ -5091,7 +5114,7 @@ static ssize_t sync_min_show(struct mddev *mddev, char *page) { return sprintf(page, "%d (%s)\n", speed_min(mddev), - mddev->sync_speed_min ? "local": "system"); + mddev->sync_speed_min ? "local" : "system"); } static ssize_t @@ -5100,7 +5123,7 @@ sync_min_store(struct mddev *mddev, const char *buf, size_t len) unsigned int min; int rv; - if (strncmp(buf, "system", 6)==0) { + if (strncmp(buf, "system", 6) == 0) { min = 0; } else { rv = kstrtouint(buf, 10, &min); @@ -5120,7 +5143,7 @@ static ssize_t sync_max_show(struct mddev *mddev, char *page) { return sprintf(page, "%d (%s)\n", speed_max(mddev), - mddev->sync_speed_max ? "local": "system"); + mddev->sync_speed_max ? "local" : "system"); } static ssize_t @@ -5129,7 +5152,7 @@ sync_max_store(struct mddev *mddev, const char *buf, size_t len) unsigned int max; int rv; - if (strncmp(buf, "system", 6)==0) { + if (strncmp(buf, "system", 6) == 0) { max = 0; } else { rv = kstrtouint(buf, 10, &max); @@ -5145,6 +5168,35 @@ sync_max_store(struct mddev *mddev, const char *buf, size_t len) static struct md_sysfs_entry md_sync_max = __ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store); +static ssize_t +sync_io_depth_show(struct mddev *mddev, char *page) +{ + return sprintf(page, "%d (%s)\n", sync_io_depth(mddev), + mddev->sync_io_depth ? "local" : "system"); +} + +static ssize_t +sync_io_depth_store(struct mddev *mddev, const char *buf, size_t len) +{ + unsigned int max; + int rv; + + if (strncmp(buf, "system", 6) == 0) { + max = 0; + } else { + rv = kstrtouint(buf, 10, &max); + if (rv < 0) + return rv; + if (max == 0) + return -EINVAL; + } + mddev->sync_io_depth = max; + return len; +} + +static struct md_sysfs_entry md_sync_io_depth = +__ATTR_RW(sync_io_depth); + static ssize_t degraded_show(struct mddev *mddev, char *page) { @@ -5671,6 +5723,7 @@ static struct attribute *md_redundancy_attrs[] = { &md_mismatches.attr, &md_sync_min.attr, &md_sync_max.attr, + &md_sync_io_depth.attr, &md_sync_speed.attr, &md_sync_force_parallel.attr, &md_sync_completed.attr, @@ -8927,6 +8980,23 @@ static sector_t md_sync_position(struct mddev *mddev, enum sync_action action) } } +static bool sync_io_within_limit(struct mddev *mddev) +{ + int io_sectors; + + /* + * For raid456, sync IO is stripe(4k) per IO, for other levels, it's + * RESYNC_PAGES(64k) per IO. + */ + if (mddev->level == 4 || mddev->level == 5 || mddev->level == 6) + io_sectors = 8; + else + io_sectors = 128; + + return atomic_read(&mddev->recovery_active) < + io_sectors * sync_io_depth(mddev); +} + #define SYNC_MARKS 10 #define SYNC_MARK_STEP (3*HZ) #define UPDATE_FREQUENCY (5*60*HZ) @@ -9195,7 +9265,8 @@ void md_do_sync(struct md_thread *thread) msleep(500); goto repeat; } - if (!is_mddev_idle(mddev, 0)) { + if (!sync_io_within_limit(mddev) && + !is_mddev_idle(mddev, 0)) { /* * Give other IO more of a chance. * The faster the devices, the less we wait. diff --git a/drivers/md/md.h b/drivers/md/md.h index 9d55b4630077..b57842188f18 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -484,6 +484,7 @@ struct mddev { /* if zero, use the system-wide default */ int sync_speed_min; int sync_speed_max; + int sync_io_depth; /* resync even though the same disks are shared among md-devices */ int parallel_resync; From patchwork Fri Apr 18 01:09:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 14056482 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C839A481A3; Fri, 18 Apr 2025 01:16:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939014; cv=none; b=WWumPtZXayS5hHlkfgFxSV+pvSYTlnQWRdDPIqgPx9ifRHgtfrG3LN+gq/+tqfRfQh0qQSpEnpp6EmFqM4tPaE5k7fYvZUs85cYmsEetHJdeFfCjTqKzwOdQ7NZHHsu2K8MxBGSDxNJIODALAYJxRkff03QUFuW+1mOvGM9MmKU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939014; c=relaxed/simple; bh=SP6IEDez3ywyfEcGSwvguVNEtstiBWYr8YsoFHc3QlI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EDaX7sMknmPaInpni5O7BNJpUFF9VkouiUGwQULCo++T5C5g1yvUNFAPsc/mlyuXmOz8yVWvEuKvnJxy4yJTGAbnAP0xJKqfzBPzWxDy8eM/zAhEeZKvDrOdsjW9hocnXo4oX618ws4O5KuxvzgZ0kv5e3xHfNSrjHI1DWAM6lE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Zdxfw5GLTz4f3m7P; Fri, 18 Apr 2025 09:16:24 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 0A49C1A06D7; Fri, 18 Apr 2025 09:16:50 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDHK2D8pwFoK5w2Jw--.18201S8; Fri, 18 Apr 2025 09:16:49 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, nadav.amit@gmail.com, ubizjak@gmail.com, cl@linux.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH v2 4/5] md: fix is_mddev_idle() Date: Fri, 18 Apr 2025 09:09:40 +0800 Message-Id: <20250418010941.667138-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250418010941.667138-1-yukuai1@huaweicloud.com> References: <20250418010941.667138-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgDHK2D8pwFoK5w2Jw--.18201S8 X-Coremail-Antispam: 1UD129KBjvJXoW3Xr48WrW5Xw1kJF1DGFW3ZFb_yoW7Ar4xpF W3ua43trW8XFWSqw45JFyv9a4Fg34fJ39xtFy3A34xZF15Grs0kF4agFWYv3s8W3ykZFya qa15KF4DCa4UJFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnI WIevJa73UjIFyTuYvjTRNdb1DUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai If sync_speed is above speed_min, then is_mddev_idle() will be called for each sync IO to check if the array is idle, and inflihgt sync_io will be limited if the array is not idle. However, while mkfs.ext4 for a large raid5 array while recovery is in progress, it's found that sync_speed is already above speed_min while lots of stripes are used for sync IO, causing long delay for mkfs.ext4. Root cause is the following checking from is_mddev_idle(): t1: submit sync IO: events1 = completed IO - issued sync IO t2: submit next sync IO: events2 = completed IO - issued sync IO if (events2 - events1 > 64) For consequence, the more sync IO issued, the less likely checking will pass. And when completed normal IO is more than issued sync IO, the condition will finally pass and is_mddev_idle() will return false, however, last_events will be updated hence is_mddev_idle() can only return false once in a while. Fix this problem by changing the checking as following: 1) mddev doesn't have normal IO completed; 2) mddev doesn't have normal IO inflight; 3) if any member disks is partition, and all other partitions doesn't have IO completed. Signed-off-by: Yu Kuai --- drivers/md/md.c | 84 +++++++++++++++++++++++++++---------------------- drivers/md/md.h | 3 +- 2 files changed, 48 insertions(+), 39 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 52cadfce7e8d..dfd85a5d6112 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8625,50 +8625,58 @@ void md_cluster_stop(struct mddev *mddev) put_cluster_ops(mddev); } -static int is_mddev_idle(struct mddev *mddev, int init) +static bool is_rdev_holder_idle(struct md_rdev *rdev, bool init) { + unsigned long last_events = rdev->last_events; + + if (!bdev_is_partition(rdev->bdev)) + return true; + + /* + * If rdev is partition, and user doesn't issue IO to the array, the + * array is still not idle if user issues IO to other partitions. + */ + rdev->last_events = part_stat_read_accum(rdev->bdev->bd_disk->part0, + sectors) - + part_stat_read_accum(rdev->bdev, sectors); + + if (!init && rdev->last_events > last_events) + return false; + + return true; +} + +/* + * mddev is idle if following conditions are match since last check: + * 1) mddev doesn't have normal IO completed; + * 2) mddev doesn't have inflight normal IO; + * 3) if any member disk is partition, and other partitions doesn't have IO + * completed; + * + * Noted this checking rely on IO accounting is enabled. + */ +static bool is_mddev_idle(struct mddev *mddev, int init) +{ + unsigned long last_events = mddev->last_events; + struct gendisk *disk; struct md_rdev *rdev; - int idle; - int curr_events; + bool idle = true; - idle = 1; - rcu_read_lock(); - rdev_for_each_rcu(rdev, mddev) { - struct gendisk *disk = rdev->bdev->bd_disk; + disk = mddev_is_dm(mddev) ? mddev->dm_gendisk : mddev->gendisk; + if (!disk) + return true; - if (!init && !blk_queue_io_stat(disk->queue)) - continue; + mddev->last_events = part_stat_read_accum(disk->part0, sectors); + if (!init && (mddev->last_events > last_events || + bdev_count_inflight(disk->part0))) + idle = false; - curr_events = (int)part_stat_read_accum(disk->part0, sectors) - - atomic_read(&disk->sync_io); - /* sync IO will cause sync_io to increase before the disk_stats - * as sync_io is counted when a request starts, and - * disk_stats is counted when it completes. - * So resync activity will cause curr_events to be smaller than - * when there was no such activity. - * non-sync IO will cause disk_stat to increase without - * increasing sync_io so curr_events will (eventually) - * be larger than it was before. Once it becomes - * substantially larger, the test below will cause - * the array to appear non-idle, and resync will slow - * down. - * If there is a lot of outstanding resync activity when - * we set last_event to curr_events, then all that activity - * completing might cause the array to appear non-idle - * and resync will be slowed down even though there might - * not have been non-resync activity. This will only - * happen once though. 'last_events' will soon reflect - * the state where there is little or no outstanding - * resync requests, and further resync activity will - * always make curr_events less than last_events. - * - */ - if (init || curr_events - rdev->last_events > 64) { - rdev->last_events = curr_events; - idle = 0; - } - } + rcu_read_lock(); + rdev_for_each_rcu(rdev, mddev) + if (!is_rdev_holder_idle(rdev, init)) + idle = false; rcu_read_unlock(); + return idle; } diff --git a/drivers/md/md.h b/drivers/md/md.h index b57842188f18..1d51c2405d3d 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -132,7 +132,7 @@ struct md_rdev { sector_t sectors; /* Device size (in 512bytes sectors) */ struct mddev *mddev; /* RAID array if running */ - int last_events; /* IO event timestamp */ + unsigned long last_events; /* IO event timestamp */ /* * If meta_bdev is non-NULL, it means that a separate device is @@ -520,6 +520,7 @@ struct mddev { * adding a spare */ + unsigned long last_events; /* IO event timestamp */ atomic_t recovery_active; /* blocks scheduled, but not written */ wait_queue_head_t recovery_wait; sector_t recovery_cp; From patchwork Fri Apr 18 01:09:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 14056483 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B50AD78F32; Fri, 18 Apr 2025 01:16:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939015; cv=none; b=RC7ZZIcnxnX3r8F+3nEsAaiNbJ9TFDt6MDcrxo6IQPzOsoss63qdolgXx5xl38HfoYH0crNFu1Y6vC9kF6FNkeZ1gB92qpS2EeMkIWx0xqBdfLJos9LlIx4PcGEPNmE+kZ3lHzdd+OvvtbVG/KMuBdoWxGglNQeSodBPgLy7heo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744939015; c=relaxed/simple; bh=J5DTBmaMUMn88myuXmQXHIFEYhw/gvxfPajXf5dOIWk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dXQH18DtLjlUnm4EPlabpEKkBqmQFyzlmRARv+kpH+mnUkVCBnKV6njEv2Kq3P8AMZtBwtelTZVezQmK8E3mBL8wYtEqxFxq8jVhFIY6r1piXobSOfmJqW5c+B2UhRlBCd5oUts/vuWcoOxtAPNS0w3O0IZsFLbWNyBlwnYqrwM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Zdxg42sZzz4f3jt7; Fri, 18 Apr 2025 09:16:32 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id BEDD01A19CC; Fri, 18 Apr 2025 09:16:50 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgDHK2D8pwFoK5w2Jw--.18201S9; Fri, 18 Apr 2025 09:16:50 +0800 (CST) From: Yu Kuai To: axboe@kernel.dk, xni@redhat.com, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, nadav.amit@gmail.com, ubizjak@gmail.com, cl@linux.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com Subject: [PATCH v2 5/5] md: cleanup accounting for issued sync IO Date: Fri, 18 Apr 2025 09:09:41 +0800 Message-Id: <20250418010941.667138-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250418010941.667138-1-yukuai1@huaweicloud.com> References: <20250418010941.667138-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: gCh0CgDHK2D8pwFoK5w2Jw--.18201S9 X-Coremail-Antispam: 1UD129KBjvJXoW3Ww13JrWrJF4fKFy8uw18uFg_yoW7Zw1Upa 9xXa4xAFWUGrWF93WDJrWqk3ZY9347KrW7ArW7u393u3WFyryDZF1UJFWYqF98AFy5urW5 Xa4kGan8CFs7tFUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai It's no longer used and can be removed, also remove the field 'gendisk->sync_io'. Signed-off-by: Yu Kuai --- drivers/md/md.h | 11 ----------- drivers/md/raid1.c | 3 --- drivers/md/raid10.c | 9 --------- drivers/md/raid5.c | 8 -------- include/linux/blkdev.h | 1 - 5 files changed, 32 deletions(-) diff --git a/drivers/md/md.h b/drivers/md/md.h index 1d51c2405d3d..07b40a55ed5f 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -717,17 +717,6 @@ static inline int mddev_trylock(struct mddev *mddev) } extern void mddev_unlock(struct mddev *mddev); -static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors) -{ - if (blk_queue_io_stat(bdev->bd_disk->queue)) - atomic_add(nr_sectors, &bdev->bd_disk->sync_io); -} - -static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors) -{ - md_sync_acct(bio->bi_bdev, nr_sectors); -} - struct md_personality { struct md_submodule_head head; diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index de9bccbe7337..657d481525be 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2382,7 +2382,6 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio) wbio->bi_end_io = end_sync_write; atomic_inc(&r1_bio->remaining); - md_sync_acct(conf->mirrors[i].rdev->bdev, bio_sectors(wbio)); submit_bio_noacct(wbio); } @@ -3055,7 +3054,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, bio = r1_bio->bios[i]; if (bio->bi_end_io == end_sync_read) { read_targets--; - md_sync_acct_bio(bio, nr_sectors); if (read_targets == 1) bio->bi_opf &= ~MD_FAILFAST; submit_bio_noacct(bio); @@ -3064,7 +3062,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, } else { atomic_set(&r1_bio->remaining, 1); bio = r1_bio->bios[r1_bio->read_disk]; - md_sync_acct_bio(bio, nr_sectors); if (read_targets == 1) bio->bi_opf &= ~MD_FAILFAST; submit_bio_noacct(bio); diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index ba32bac975b8..dce06bf65016 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2426,7 +2426,6 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) atomic_inc(&conf->mirrors[d].rdev->nr_pending); atomic_inc(&r10_bio->remaining); - md_sync_acct(conf->mirrors[d].rdev->bdev, bio_sectors(tbio)); if (test_bit(FailFast, &conf->mirrors[d].rdev->flags)) tbio->bi_opf |= MD_FAILFAST; @@ -2448,8 +2447,6 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) bio_copy_data(tbio, fbio); d = r10_bio->devs[i].devnum; atomic_inc(&r10_bio->remaining); - md_sync_acct(conf->mirrors[d].replacement->bdev, - bio_sectors(tbio)); submit_bio_noacct(tbio); } @@ -2583,13 +2580,10 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio) d = r10_bio->devs[1].devnum; if (wbio->bi_end_io) { atomic_inc(&conf->mirrors[d].rdev->nr_pending); - md_sync_acct(conf->mirrors[d].rdev->bdev, bio_sectors(wbio)); submit_bio_noacct(wbio); } if (wbio2) { atomic_inc(&conf->mirrors[d].replacement->nr_pending); - md_sync_acct(conf->mirrors[d].replacement->bdev, - bio_sectors(wbio2)); submit_bio_noacct(wbio2); } } @@ -3757,7 +3751,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, r10_bio->sectors = nr_sectors; if (bio->bi_end_io == end_sync_read) { - md_sync_acct_bio(bio, nr_sectors); bio->bi_status = 0; submit_bio_noacct(bio); } @@ -4880,7 +4873,6 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, r10_bio->sectors = nr_sectors; /* Now submit the read */ - md_sync_acct_bio(read_bio, r10_bio->sectors); atomic_inc(&r10_bio->remaining); read_bio->bi_next = NULL; submit_bio_noacct(read_bio); @@ -4940,7 +4932,6 @@ static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio) continue; atomic_inc(&rdev->nr_pending); - md_sync_acct_bio(b, r10_bio->sectors); atomic_inc(&r10_bio->remaining); b->bi_next = NULL; submit_bio_noacct(b); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 6389383166c0..ca5b0e8ba707 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1240,10 +1240,6 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) } if (rdev) { - if (s->syncing || s->expanding || s->expanded - || s->replacing) - md_sync_acct(rdev->bdev, RAID5_STRIPE_SECTORS(conf)); - set_bit(STRIPE_IO_STARTED, &sh->state); bio_init(bi, rdev->bdev, &dev->vec, 1, op | op_flags); @@ -1300,10 +1296,6 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) submit_bio_noacct(bi); } if (rrdev) { - if (s->syncing || s->expanding || s->expanded - || s->replacing) - md_sync_acct(rrdev->bdev, RAID5_STRIPE_SECTORS(conf)); - set_bit(STRIPE_IO_STARTED, &sh->state); bio_init(rbi, rrdev->bdev, &dev->rvec, 1, op | op_flags); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e39c45bc0a97..f3a625b00734 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -182,7 +182,6 @@ struct gendisk { struct list_head slave_bdevs; #endif struct timer_rand_state *random; - atomic_t sync_io; /* RAID */ struct disk_events *ev; #ifdef CONFIG_BLK_DEV_ZONED