From patchwork Tue Feb 18 08:28:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 13979254 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EFF922AE7F for ; Tue, 18 Feb 2025 08:29:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867366; cv=none; b=nLgKBNIFqZ8F1eBJx1h4iBlMrMiRR2kM2Erj5JiOZUjGrSAHlC5no4/7F0LOTd+TAsRhKpDrkPDuM7R+RHTugtSEKlL3d704sMm+4FhrVFaK8Cqolfd0xi8MNkV2lUHjKvwskMRMVCfPPzGpYKnyOrb7QwdjfEKnP99dlnhdFTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867366; c=relaxed/simple; bh=dkxURuYzRfJuSJPVhfwACsfgHPbG7h5tpAFQfXOFHOc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tfFVtAM6u1WQnVJt6b3K+JoGRHTDs4Mqaj5mB9vFxnltQYFSvAn9MP9ooWIpsR9xvHiRKjKZVkFKcYycD2bCZ2TuanR9dPsTBRh4roOTFCyPY5koZSv3VKbFSLuDWz2YOV2nTllXOB+IPIUuVSqLUjpCNANWQU/oJYix8JoGyXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=rPPczPRZ; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="rPPczPRZ" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51I18qrH020796; Tue, 18 Feb 2025 08:29:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=XJ0Wczo/XXxkErHfn Jfx0Ov8oLPx04SfsYbfkPePp60=; b=rPPczPRZm6Nbum27DopAf3F6sPoWtTnbk U5X0Z+DJmfzKzQiTqeGMUs/LhAnmoF8zzgBA+vJyJT4jaSX9BVaOnjkiCNAFKHEy qFSL/Yw31/KUsXidTcjtCaizUM1sU0T95w2jIl2oN3Rc86/rLdkd7tv6biKSCvC0 hpbnL0V2esKRERzNDh76th2ypPqqwsz8QLvpOvYBMwYtbPCBAxrEpDjos3fXvsv2 I/rnTW70dEHta98K5qfihpFZkqimdJ2uKTMkal3eRlHMoeUgU2BztjiX2uj26a4D sH2+xJrWAgyM43FSg1pXro5bqt6WySuh2/afx0BEPdGvQma0nH3mQ== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 44vg99sjh0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:16 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 51I63nEO001641; Tue, 18 Feb 2025 08:29:15 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 44u5myt5xs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:15 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 51I8TDuk31654352 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Feb 2025 08:29:13 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 273FF20043; Tue, 18 Feb 2025 08:29:13 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7349F20040; Tue, 18 Feb 2025 08:29:11 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.198]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 18 Feb 2025 08:29:11 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk, gjoyce@ibm.com Subject: [PATCHv2 1/6] blk-sysfs: remove q->sysfs_lock for attributes which don't need it Date: Tue, 18 Feb 2025 13:58:54 +0530 Message-ID: <20250218082908.265283-2-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250218082908.265283-1-nilay@linux.ibm.com> References: <20250218082908.265283-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: jVTUaNN2HJkeP2F0ySxESLqnh19RZIQV X-Proofpoint-ORIG-GUID: jVTUaNN2HJkeP2F0ySxESLqnh19RZIQV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_03,2025-02-18_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 adultscore=0 mlxscore=0 suspectscore=0 clxscore=1015 mlxlogscore=999 priorityscore=1501 bulkscore=0 malwarescore=0 phishscore=0 spamscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502180065 There're few sysfs attributes in block layer which don't really need acquiring q->sysfs_lock while accessing it. The reason being, writing a value to such attributes are either atomic or could be easily protected using WRITE_ONCE()/READ_ONCE(). Moreover, sysfs attributes are inherently protected with sysfs/kernfs internal locking. So this change help segregate all existing sysfs attributes for which we could avoid acquiring q->sysfs_lock. We group all such attributes, which don't require any sorts of locking, using macro QUEUE_RO_ENTRY_ NOLOCK() or QUEUE_RW_ENTRY_NOLOCK(). The newly introduced show/store method (show_nolock/store_nolock) is assigned to attributes using these new macros. The show_nolock/store_nolock run without holding q->sysfs_ lock. Signed-off-by: Nilay Shroff Reviewed-by: Christoph Hellwig --- block/blk-settings.c | 2 +- block/blk-sysfs.c | 106 ++++++++++++++++++++++++++++++++----------- 2 files changed, 81 insertions(+), 27 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index c44dadc35e1e..c541bf22f543 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -21,7 +21,7 @@ void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout) { - q->rq_timeout = timeout; + WRITE_ONCE(q->rq_timeout, timeout); } EXPORT_SYMBOL_GPL(blk_queue_rq_timeout); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 6f548a4376aa..0c9be7c7ecc1 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -23,9 +23,14 @@ struct queue_sysfs_entry { struct attribute attr; ssize_t (*show)(struct gendisk *disk, char *page); + ssize_t (*show_nolock)(struct gendisk *disk, char *page); + ssize_t (*store)(struct gendisk *disk, const char *page, size_t count); + ssize_t (*store_nolock)(struct gendisk *disk, + const char *page, size_t count); int (*store_limit)(struct gendisk *disk, const char *page, size_t count, struct queue_limits *lim); + void (*load_module)(struct gendisk *disk, const char *page, size_t count); }; @@ -320,7 +325,12 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count) ret = queue_var_store(&val, page, count); if (ret < 0) return ret; - + /* + * Here we update two queue flags each using atomic bitops, although + * updating two flags isn't atomic it should be harmless as those flags + * are accessed individually using atomic test_bit operation. So we + * don't grab any lock while updating these flags. + */ if (val == 2) { blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q); @@ -353,7 +363,8 @@ static ssize_t queue_poll_store(struct gendisk *disk, const char *page, static ssize_t queue_io_timeout_show(struct gendisk *disk, char *page) { - return sysfs_emit(page, "%u\n", jiffies_to_msecs(disk->queue->rq_timeout)); + return sysfs_emit(page, "%u\n", + jiffies_to_msecs(READ_ONCE(disk->queue->rq_timeout))); } static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page, @@ -405,6 +416,19 @@ static struct queue_sysfs_entry _prefix##_entry = { \ .show = _prefix##_show, \ }; +#define QUEUE_RO_ENTRY_NOLOCK(_prefix, _name) \ +static struct queue_sysfs_entry _prefix##_entry = { \ + .attr = {.name = _name, .mode = 0644 }, \ + .show_nolock = _prefix##_show, \ +} + +#define QUEUE_RW_ENTRY_NOLOCK(_prefix, _name) \ +static struct queue_sysfs_entry _prefix##_entry = { \ + .attr = {.name = _name, .mode = 0644 }, \ + .show_nolock = _prefix##_show, \ + .store_nolock = _prefix##_store, \ +} + #define QUEUE_RW_ENTRY(_prefix, _name) \ static struct queue_sysfs_entry _prefix##_entry = { \ .attr = { .name = _name, .mode = 0644 }, \ @@ -446,7 +470,7 @@ QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments"); QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity"); QUEUE_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes"); QUEUE_LIM_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes"); -QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data"); +QUEUE_RO_ENTRY_NOLOCK(queue_discard_zeroes_data, "discard_zeroes_data"); QUEUE_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes"); QUEUE_RO_ENTRY(queue_atomic_write_boundary_sectors, @@ -454,25 +478,25 @@ QUEUE_RO_ENTRY(queue_atomic_write_boundary_sectors, QUEUE_RO_ENTRY(queue_atomic_write_unit_max, "atomic_write_unit_max_bytes"); QUEUE_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes"); -QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes"); +QUEUE_RO_ENTRY_NOLOCK(queue_write_same_max, "write_same_max_bytes"); QUEUE_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); QUEUE_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); QUEUE_RO_ENTRY(queue_zoned, "zoned"); -QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); +QUEUE_RO_ENTRY_NOLOCK(queue_nr_zones, "nr_zones"); QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); -QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); +QUEUE_RW_ENTRY_NOLOCK(queue_nomerges, "nomerges"); QUEUE_LIM_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough"); -QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); -QUEUE_RW_ENTRY(queue_poll, "io_poll"); -QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay"); +QUEUE_RW_ENTRY_NOLOCK(queue_rq_affinity, "rq_affinity"); +QUEUE_RW_ENTRY_NOLOCK(queue_poll, "io_poll"); +QUEUE_RW_ENTRY_NOLOCK(queue_poll_delay, "io_poll_delay"); QUEUE_LIM_RW_ENTRY(queue_wc, "write_cache"); QUEUE_RO_ENTRY(queue_fua, "fua"); QUEUE_RO_ENTRY(queue_dax, "dax"); -QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout"); +QUEUE_RW_ENTRY_NOLOCK(queue_io_timeout, "io_timeout"); QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment"); @@ -561,9 +585,11 @@ QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec"); /* Common attributes for bio-based and request-based queues. */ static struct attribute *queue_attrs[] = { + /* + * attributes protected with q->sysfs_lock + */ &queue_ra_entry.attr, &queue_max_hw_sectors_entry.attr, - &queue_max_sectors_entry.attr, &queue_max_segments_entry.attr, &queue_max_discard_segments_entry.attr, &queue_max_integrity_segments_entry.attr, @@ -575,46 +601,63 @@ static struct attribute *queue_attrs[] = { &queue_io_min_entry.attr, &queue_io_opt_entry.attr, &queue_discard_granularity_entry.attr, - &queue_max_discard_sectors_entry.attr, &queue_max_hw_discard_sectors_entry.attr, - &queue_discard_zeroes_data_entry.attr, &queue_atomic_write_max_sectors_entry.attr, &queue_atomic_write_boundary_sectors_entry.attr, &queue_atomic_write_unit_min_entry.attr, &queue_atomic_write_unit_max_entry.attr, - &queue_write_same_max_entry.attr, &queue_max_write_zeroes_sectors_entry.attr, &queue_max_zone_append_sectors_entry.attr, &queue_zone_write_granularity_entry.attr, - &queue_rotational_entry.attr, &queue_zoned_entry.attr, - &queue_nr_zones_entry.attr, &queue_max_open_zones_entry.attr, &queue_max_active_zones_entry.attr, - &queue_nomerges_entry.attr, + &queue_fua_entry.attr, + &queue_dax_entry.attr, + &queue_virt_boundary_mask_entry.attr, + &queue_dma_alignment_entry.attr, + + /* + * attributes protected with q->limits_lock + */ + &queue_max_sectors_entry.attr, + &queue_max_discard_sectors_entry.attr, + &queue_rotational_entry.attr, &queue_iostats_passthrough_entry.attr, &queue_iostats_entry.attr, &queue_stable_writes_entry.attr, &queue_add_random_entry.attr, - &queue_poll_entry.attr, &queue_wc_entry.attr, - &queue_fua_entry.attr, - &queue_dax_entry.attr, + + /* + * attributes which don't require locking + */ + &queue_nomerges_entry.attr, + &queue_poll_entry.attr, &queue_poll_delay_entry.attr, - &queue_virt_boundary_mask_entry.attr, - &queue_dma_alignment_entry.attr, + &queue_discard_zeroes_data_entry.attr, + &queue_write_same_max_entry.attr, + &queue_nr_zones_entry.attr, + NULL, }; /* Request-based queue attributes that are not relevant for bio-based queues. */ static struct attribute *blk_mq_queue_attrs[] = { + /* + * attributes protected with q->sysfs_lock + */ &queue_requests_entry.attr, &elv_iosched_entry.attr, - &queue_rq_affinity_entry.attr, - &queue_io_timeout_entry.attr, #ifdef CONFIG_BLK_WBT &queue_wb_lat_entry.attr, #endif + /* + * attrbiutes which don't require locking + */ + &queue_rq_affinity_entry.attr, + &queue_io_timeout_entry.attr, + NULL, }; @@ -666,8 +709,12 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page) struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); ssize_t res; - if (!entry->show) + if (!entry->show && !entry->show_nolock) return -EIO; + + if (entry->show_nolock) + return entry->show_nolock(disk, page); + mutex_lock(&disk->queue->sysfs_lock); res = entry->show(disk, page); mutex_unlock(&disk->queue->sysfs_lock); @@ -684,7 +731,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, unsigned int memflags; ssize_t res; - if (!entry->store_limit && !entry->store) + if (!entry->store_limit && !entry->store_nolock && !entry->store) return -EIO; /* @@ -695,6 +742,13 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, if (entry->load_module) entry->load_module(disk, page, length); + if (entry->store_nolock) { + memflags = blk_mq_freeze_queue(q); + res = entry->store_nolock(disk, page, length); + blk_mq_unfreeze_queue(q, memflags); + return res; + } + if (entry->store_limit) { struct queue_limits lim = queue_limits_start_update(q); From patchwork Tue Feb 18 08:28:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 13979255 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEA9122B5A1 for ; Tue, 18 Feb 2025 08:29:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867367; cv=none; b=i4TN6AXUHI7KdlfeMW10Sh6v0gW52H6H3gjkmUZxFvITBDr7qwxAIEv5oO+gatMXRjx6q1Oup2OtEUIoWjKFEJSvwN1ykj/mx7LxRkJ/j0xQZGnVsEPw2+traymu1+tq17rQulxL02HDILmC0x1cJ71HzrRk2phtY/ivZFMvgY0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867367; c=relaxed/simple; bh=Phus4DwYUNqUc+UPT+ZBMDtQuoQAe4d7tQuCJebW3gA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kRi9bpkpB7mMhRWRSshciRg9JumYhVRIeqlcogXZDxkprS+Q+w9cvEWu438ReabRcSEv66PSd9vT2GT3j7JgFsqsUohyg7HGef6UjZ7G6QSZNQqioFTfHtt5q5h78eCkEPUE8U9fmD6FuZBxR2CDUm+8JNh1xUo6f3hDz24Q/Rk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Y/JQ7fNK; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Y/JQ7fNK" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51I5HZBw018904; Tue, 18 Feb 2025 08:29:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=5+F01j7J/iDJDWult W/oHZXVfk6O/28GsvXmYpN1LiE=; b=Y/JQ7fNKdL1d8d/AoEJS42QY1RZ8t4AAQ B1LaFM9VZ5y7RfFs3y4cI6ir66l+jt5w8tKqrejJG77+ulLeWMf1pVMhPMHwNLaM mvF23z0Knyzok0mZaVcdWM5FMXwyGcEIrbCyZB3bMKIh54ncefxvgSaKwScSakno gH01q4KJrM6e733wImChAnXX7EuvGykowj33w+fgVjz+Obav/NCr0fXyjnSiz7oC Ds91BX4IpUDxHX8PH9hqMgtBFkfX86l2pqo8ZG1HsZwuSpjvdpFMckpOs0qmDY4i 6sGr/dF04wIOVkoiWk57c5meRe5p+zVt6JWgDk9D7Nt9HYMY2m71w== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 44v7xubmdm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:17 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 51I81P6t024851; Tue, 18 Feb 2025 08:29:16 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 44u7y1hr7f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:16 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 51I8TFhX34079432 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Feb 2025 08:29:15 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0176920040; Tue, 18 Feb 2025 08:29:15 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8B2D220043; Tue, 18 Feb 2025 08:29:13 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.198]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 18 Feb 2025 08:29:13 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk, gjoyce@ibm.com Subject: [PATCHv2 2/6] blk-sysfs: acquire q->limits_lock while reading attributes Date: Tue, 18 Feb 2025 13:58:55 +0530 Message-ID: <20250218082908.265283-3-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250218082908.265283-1-nilay@linux.ibm.com> References: <20250218082908.265283-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: lOQ7_XoKEGKiy7N2ZrMHxWNUpvLRcfxQ X-Proofpoint-GUID: lOQ7_XoKEGKiy7N2ZrMHxWNUpvLRcfxQ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_03,2025-02-18_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 mlxscore=0 clxscore=1015 lowpriorityscore=0 phishscore=0 priorityscore=1501 spamscore=0 adultscore=0 impostorscore=0 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502180065 There're few sysfs attributes(RW) whose store method is protected with q->limits_lock, however the corresponding show method of such attributes runs with holding q->sysfs_lock and that doesn't make sense. Hence update the show method of these sysfs attributes so that reading of these attributes acquire q->limits_lock instead of q->sysfs_lock. Similarly, there're few sysfs attributes(RO) whose show method is currently protected with q->sysfs_lock however updates to these attributes could occur using atomic limit update APIs such as queue_ limits_start_update() and queue_limits_commit_update() which run holding q->limits_lock. So that means that reading such attributes holding q->sysfs_lock doesn't make sense. Hence update the show method of these sysfs attributes(RO) such that show method of these attributes runs with holding q->limits_lock instead of q->sysfs_lock. We have defined a new macro QUEUE_LIM_RO_ENTRY() which uses new show_ limit() method and it runs holding q->limits_lock. All sysfs existing attributes(RO) which needs protection using q->limits_lock while reading the entry have been now moved to use this new macro for attribute initialization. Similarly, the existing QUEUE_LIM_RW_ENTRY() is updated to use new show_limit() method for reading attributes instead of existing show() method. As show_limit() runs holding q->limits_lock the existing sysfs attributes(RW) whose read/show method needs protection using q->limits_ lock are now inherently protected. Signed-off-by: Nilay Shroff Reviewed-by: Christoph Hellwig --- block/blk-sysfs.c | 100 ++++++++++++++++++++++++++-------------------- 1 file changed, 57 insertions(+), 43 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 0c9be7c7ecc1..7e22ec96f2b3 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -24,6 +24,7 @@ struct queue_sysfs_entry { struct attribute attr; ssize_t (*show)(struct gendisk *disk, char *page); ssize_t (*show_nolock)(struct gendisk *disk, char *page); + ssize_t (*show_limit)(struct gendisk *disk, char *page); ssize_t (*store)(struct gendisk *disk, const char *page, size_t count); ssize_t (*store_nolock)(struct gendisk *disk, @@ -436,10 +437,16 @@ static struct queue_sysfs_entry _prefix##_entry = { \ .store = _prefix##_store, \ }; +#define QUEUE_LIM_RO_ENTRY(_prefix, _name) \ +static struct queue_sysfs_entry _prefix##_entry = { \ + .attr = { .name = _name, .mode = 0644 }, \ + .show_limit = _prefix##_show, \ +} + #define QUEUE_LIM_RW_ENTRY(_prefix, _name) \ -static struct queue_sysfs_entry _prefix##_entry = { \ +static struct queue_sysfs_entry _prefix##_entry = { \ .attr = { .name = _name, .mode = 0644 }, \ - .show = _prefix##_show, \ + .show_limit = _prefix##_show, \ .store_limit = _prefix##_store, \ } @@ -454,39 +461,39 @@ static struct queue_sysfs_entry _prefix##_entry = { \ QUEUE_RW_ENTRY(queue_requests, "nr_requests"); QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb"); QUEUE_LIM_RW_ENTRY(queue_max_sectors, "max_sectors_kb"); -QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); -QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); -QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); -QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); +QUEUE_LIM_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); +QUEUE_LIM_RO_ENTRY(queue_max_segments, "max_segments"); +QUEUE_LIM_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); +QUEUE_LIM_RO_ENTRY(queue_max_segment_size, "max_segment_size"); QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler"); -QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); -QUEUE_RO_ENTRY(queue_physical_block_size, "physical_block_size"); -QUEUE_RO_ENTRY(queue_chunk_sectors, "chunk_sectors"); -QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size"); -QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size"); +QUEUE_LIM_RO_ENTRY(queue_logical_block_size, "logical_block_size"); +QUEUE_LIM_RO_ENTRY(queue_physical_block_size, "physical_block_size"); +QUEUE_LIM_RO_ENTRY(queue_chunk_sectors, "chunk_sectors"); +QUEUE_LIM_RO_ENTRY(queue_io_min, "minimum_io_size"); +QUEUE_LIM_RO_ENTRY(queue_io_opt, "optimal_io_size"); -QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments"); -QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity"); -QUEUE_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes"); +QUEUE_LIM_RO_ENTRY(queue_max_discard_segments, "max_discard_segments"); +QUEUE_LIM_RO_ENTRY(queue_discard_granularity, "discard_granularity"); +QUEUE_LIM_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes"); QUEUE_LIM_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes"); QUEUE_RO_ENTRY_NOLOCK(queue_discard_zeroes_data, "discard_zeroes_data"); -QUEUE_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes"); -QUEUE_RO_ENTRY(queue_atomic_write_boundary_sectors, +QUEUE_LIM_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_atomic_write_boundary_sectors, "atomic_write_boundary_bytes"); -QUEUE_RO_ENTRY(queue_atomic_write_unit_max, "atomic_write_unit_max_bytes"); -QUEUE_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes"); +QUEUE_LIM_RO_ENTRY(queue_atomic_write_unit_max, "atomic_write_unit_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes"); QUEUE_RO_ENTRY_NOLOCK(queue_write_same_max, "write_same_max_bytes"); -QUEUE_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); -QUEUE_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); -QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); +QUEUE_LIM_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); -QUEUE_RO_ENTRY(queue_zoned, "zoned"); +QUEUE_LIM_RO_ENTRY(queue_zoned, "zoned"); QUEUE_RO_ENTRY_NOLOCK(queue_nr_zones, "nr_zones"); -QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); -QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); +QUEUE_LIM_RO_ENTRY(queue_max_open_zones, "max_open_zones"); +QUEUE_LIM_RO_ENTRY(queue_max_active_zones, "max_active_zones"); QUEUE_RW_ENTRY_NOLOCK(queue_nomerges, "nomerges"); QUEUE_LIM_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough"); @@ -494,16 +501,16 @@ QUEUE_RW_ENTRY_NOLOCK(queue_rq_affinity, "rq_affinity"); QUEUE_RW_ENTRY_NOLOCK(queue_poll, "io_poll"); QUEUE_RW_ENTRY_NOLOCK(queue_poll_delay, "io_poll_delay"); QUEUE_LIM_RW_ENTRY(queue_wc, "write_cache"); -QUEUE_RO_ENTRY(queue_fua, "fua"); -QUEUE_RO_ENTRY(queue_dax, "dax"); +QUEUE_LIM_RO_ENTRY(queue_fua, "fua"); +QUEUE_LIM_RO_ENTRY(queue_dax, "dax"); QUEUE_RW_ENTRY_NOLOCK(queue_io_timeout, "io_timeout"); -QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); -QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment"); +QUEUE_LIM_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); +QUEUE_LIM_RO_ENTRY(queue_dma_alignment, "dma_alignment"); /* legacy alias for logical_block_size: */ static struct queue_sysfs_entry queue_hw_sector_size_entry = { - .attr = {.name = "hw_sector_size", .mode = 0444 }, - .show = queue_logical_block_size_show, + .attr = {.name = "hw_sector_size", .mode = 0444 }, + .show_limit = queue_logical_block_size_show, }; QUEUE_LIM_RW_ENTRY(queue_rotational, "rotational"); @@ -589,6 +596,18 @@ static struct attribute *queue_attrs[] = { * attributes protected with q->sysfs_lock */ &queue_ra_entry.attr, + + /* + * attributes protected with q->limits_lock + */ + &queue_max_sectors_entry.attr, + &queue_max_discard_sectors_entry.attr, + &queue_rotational_entry.attr, + &queue_iostats_passthrough_entry.attr, + &queue_iostats_entry.attr, + &queue_stable_writes_entry.attr, + &queue_add_random_entry.attr, + &queue_wc_entry.attr, &queue_max_hw_sectors_entry.attr, &queue_max_segments_entry.attr, &queue_max_discard_segments_entry.attr, @@ -617,18 +636,6 @@ static struct attribute *queue_attrs[] = { &queue_virt_boundary_mask_entry.attr, &queue_dma_alignment_entry.attr, - /* - * attributes protected with q->limits_lock - */ - &queue_max_sectors_entry.attr, - &queue_max_discard_sectors_entry.attr, - &queue_rotational_entry.attr, - &queue_iostats_passthrough_entry.attr, - &queue_iostats_entry.attr, - &queue_stable_writes_entry.attr, - &queue_add_random_entry.attr, - &queue_wc_entry.attr, - /* * attributes which don't require locking */ @@ -709,12 +716,19 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page) struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); ssize_t res; - if (!entry->show && !entry->show_nolock) + if (!entry->show && !entry->show_nolock && !entry->show_limit) return -EIO; if (entry->show_nolock) return entry->show_nolock(disk, page); + if (entry->show_limit) { + mutex_lock(&disk->queue->limits_lock); + res = entry->show_limit(disk, page); + mutex_unlock(&disk->queue->limits_lock); + return res; + } + mutex_lock(&disk->queue->sysfs_lock); res = entry->show(disk, page); mutex_unlock(&disk->queue->sysfs_lock); From patchwork Tue Feb 18 08:28:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 13979256 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 001CC22AE7F for ; Tue, 18 Feb 2025 08:29:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867370; cv=none; b=fl0keyPrFw18WOsEaVnHpkNV6qdTqbyv3uxTKvezYZUDLyarGYXvr4vnjRSRU7Z/2p4Y5MdfBz1XDM672JjUj5GDz2POdPt1+ENBbFeucTH4ne0YVFesyHIw4IxfwwKvibN+Otdp49Ime64IK4lpLazvu7tep3/M41q0i6nvwNA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867370; c=relaxed/simple; bh=x1xjj1BSIlvv4HeOIpJiW/kBOLX7NSrXJUn3wTcD23g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oyleb2AOAMt0cMqlOZc2tPO2iRULsCNjdlSYX1HJ3wpQ94m9G1+GxETlwDMH7zk77ut1bcJ8AXt3zBOfcUU5ChUg7dGiWddjdz22wJqyKa1hliBy7lPHiM7PjvgpjfKDS/b6hebo2sd2UxWIirC3rjud10TNEbqXo9Uqi33xXkA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Gq9B1ZQf; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Gq9B1ZQf" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51I7X5mj009511; Tue, 18 Feb 2025 08:29:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=d+Zr/RevP52C7m8K0 YhVraBM4LeaySsEiDtfjwjJAeM=; b=Gq9B1ZQfNctv/qXD2aRawCWpaCL3DCwTW lQKAEti/e+unuF2nuOuubDONf09qEV/JV9Ggc0xUQpSIW4DW6U4mmmEIaLKKSi3f eLeiQR3M4dhBkc3RoMBlvIfJkiNwyVkj/NnaP2fmw8jVr1FsjK9e8ue3eeAfo/sE RYB2b2TTfHe40jXYyLYydSAuDW/gX9EPnXRVM7lsmGD2MqGc/tClpQPXJUiGB7ZE 2+8787a137Nx8GTayjnNPuAQEOLxI0TO40vfRmVhaTU9fA/fzWwkznzne7pFnMqx wg0GKGn3mrOpEausqXNRW3K/pRpaOu4FB3RWm3V0BrsRPLHwr8RVQ== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 44vnwpg7s0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:20 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 51I6Q98S003891; Tue, 18 Feb 2025 08:29:19 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 44u68nt3cd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:19 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 51I8THvQ8454536 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Feb 2025 08:29:17 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F3DBC20043; Tue, 18 Feb 2025 08:29:16 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 642CA20040; Tue, 18 Feb 2025 08:29:15 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.198]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 18 Feb 2025 08:29:15 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk, gjoyce@ibm.com Subject: [PATCHv2 3/6] block: Introduce a dedicated lock for protecting queue elevator updates Date: Tue, 18 Feb 2025 13:58:56 +0530 Message-ID: <20250218082908.265283-4-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250218082908.265283-1-nilay@linux.ibm.com> References: <20250218082908.265283-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: VeShlwMoBrv8Ec2loD3C_x_71VRpgzqP X-Proofpoint-ORIG-GUID: VeShlwMoBrv8Ec2loD3C_x_71VRpgzqP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_03,2025-02-18_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 mlxlogscore=989 impostorscore=0 lowpriorityscore=0 malwarescore=0 phishscore=0 mlxscore=0 adultscore=0 suspectscore=0 bulkscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502180065 A queue's elevator can be updated either when modifying nr_hw_queues or through the sysfs scheduler attribute. To prevent simultaneous updates from causing race conditions, introduce a dedicated lock q->elevator_lock. Currently, elevator switching/updating is protected using q->sysfs_lock, but this has led to lockdep splats[1] due to inconsistent lock ordering between q->sysfs_lock and the freeze-lock in multiple block layer call sites. As the scope of q->sysfs_lock is not well-defined, its misuse has resulted in numerous lockdep warnings. To resolve this, replace q-> sysfs_lock with a new dedicated q->elevator_lock, which will be exclusively used to protect elevator switching and updates. [1] https://lore.kernel.org/all/67637e70.050a0220.3157ee.000c.GAE@google.com/ Signed-off-by: Nilay Shroff --- block/blk-core.c | 1 + block/blk-mq.c | 12 ++-- block/blk-sysfs.c | 133 ++++++++++++++++++++++++++++------------- block/elevator.c | 18 ++++-- block/genhd.c | 9 ++- include/linux/blkdev.h | 1 + 6 files changed, 119 insertions(+), 55 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index d6c4fa3943b5..222cdcb662c2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -431,6 +431,7 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id) mutex_init(&q->debugfs_mutex); mutex_init(&q->sysfs_lock); mutex_init(&q->limits_lock); + mutex_init(&q->elevator_lock); mutex_init(&q->rq_qos_mutex); spin_lock_init(&q->queue_lock); diff --git a/block/blk-mq.c b/block/blk-mq.c index 40490ac88045..f58e11dee8a0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4467,7 +4467,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, unsigned long i, j; /* protect against switching io scheduler */ - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); for (i = 0; i < set->nr_hw_queues; i++) { int old_node; int node = blk_mq_get_hctx_node(set, i); @@ -4500,7 +4500,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, xa_for_each_start(&q->hctx_table, j, hctx, j) blk_mq_exit_hctx(q, set, hctx, j); - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); /* unregister cpuhp callbacks for exited hctxs */ blk_mq_remove_hw_queues_cpuhp(q); @@ -4934,7 +4934,7 @@ static bool blk_mq_elv_switch_none(struct list_head *head, return false; /* q->elevator needs protection from ->sysfs_lock */ - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); /* the check has to be done with holding sysfs_lock */ if (!q->elevator) { @@ -4950,7 +4950,7 @@ static bool blk_mq_elv_switch_none(struct list_head *head, list_add(&qe->node, head); elevator_disable(q); unlock: - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); return true; } @@ -4980,11 +4980,11 @@ static void blk_mq_elv_switch_back(struct list_head *head, list_del(&qe->node); kfree(qe); - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); elevator_switch(q, t); /* drop the reference acquired in blk_mq_elv_switch_none */ elevator_put(t); - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); } static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 7e22ec96f2b3..355dfb514712 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -58,7 +58,13 @@ queue_var_store(unsigned long *var, const char *page, size_t count) static ssize_t queue_requests_show(struct gendisk *disk, char *page) { - return queue_var_show(disk->queue->nr_requests, page); + int ret; + + mutex_lock(&disk->queue->sysfs_lock); + ret = queue_var_show(disk->queue->nr_requests, page); + mutex_unlock(&disk->queue->sysfs_lock); + + return ret; } static ssize_t @@ -66,27 +72,42 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count) { unsigned long nr; int ret, err; + unsigned int memflags; + struct request_queue *q = disk->queue; - if (!queue_is_mq(disk->queue)) - return -EINVAL; + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); + + if (!queue_is_mq(disk->queue)) { + ret = -EINVAL; + goto out; + } ret = queue_var_store(&nr, page, count); if (ret < 0) - return ret; + goto out; if (nr < BLKDEV_MIN_RQ) nr = BLKDEV_MIN_RQ; err = blk_mq_update_nr_requests(disk->queue, nr); if (err) - return err; - + ret = err; +out: + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); return ret; } static ssize_t queue_ra_show(struct gendisk *disk, char *page) { - return queue_var_show(disk->bdi->ra_pages << (PAGE_SHIFT - 10), page); + int ret; + + mutex_lock(&disk->queue->sysfs_lock); + ret = queue_var_show(disk->bdi->ra_pages << (PAGE_SHIFT - 10), page); + mutex_unlock(&disk->queue->sysfs_lock); + + return ret; } static ssize_t @@ -94,11 +115,19 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) { unsigned long ra_kb; ssize_t ret; + unsigned int memflags; + struct request_queue *q = disk->queue; + + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); ret = queue_var_store(&ra_kb, page, count); if (ret < 0) - return ret; + goto out; disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10); +out: + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); return ret; } @@ -534,14 +563,24 @@ static ssize_t queue_var_store64(s64 *var, const char *page) static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page) { - if (!wbt_rq_qos(disk->queue)) - return -EINVAL; + int ret; + + mutex_lock(&disk->queue->sysfs_lock); + + if (!wbt_rq_qos(disk->queue)) { + ret = -EINVAL; + goto out; + } if (wbt_disabled(disk->queue)) - return sysfs_emit(page, "0\n"); + ret = sysfs_emit(page, "0\n"); + else + ret = sysfs_emit(page, "%llu\n", + div_u64(wbt_get_min_lat(disk->queue), 1000)); - return sysfs_emit(page, "%llu\n", - div_u64(wbt_get_min_lat(disk->queue), 1000)); +out: + mutex_unlock(&disk->queue->sysfs_lock); + return ret; } static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, @@ -551,18 +590,24 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, struct rq_qos *rqos; ssize_t ret; s64 val; + unsigned int memflags; + + mutex_lock(&q->sysfs_lock); + memflags = blk_mq_freeze_queue(q); ret = queue_var_store64(&val, page); if (ret < 0) - return ret; - if (val < -1) - return -EINVAL; + goto out; + if (val < -1) { + ret = -EINVAL; + goto out; + } rqos = wbt_rq_qos(q); if (!rqos) { ret = wbt_init(disk); if (ret) - return ret; + goto out; } if (val == -1) @@ -570,8 +615,10 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, else if (val >= 0) val *= 1000ULL; - if (wbt_get_min_lat(q) == val) - return count; + if (wbt_get_min_lat(q) == val) { + ret = count; + goto out; + } /* * Ensure that the queue is idled, in case the latency update @@ -584,7 +631,10 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, blk_mq_unquiesce_queue(q); - return count; +out: + blk_mq_unfreeze_queue(q, memflags); + mutex_unlock(&q->sysfs_lock); + return ret; } QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec"); @@ -593,7 +643,7 @@ QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec"); /* Common attributes for bio-based and request-based queues. */ static struct attribute *queue_attrs[] = { /* - * attributes protected with q->sysfs_lock + * attributes which require some form of locking */ &queue_ra_entry.attr, @@ -652,10 +702,10 @@ static struct attribute *queue_attrs[] = { /* Request-based queue attributes that are not relevant for bio-based queues. */ static struct attribute *blk_mq_queue_attrs[] = { /* - * attributes protected with q->sysfs_lock + * attributes which require some form of locking */ - &queue_requests_entry.attr, &elv_iosched_entry.attr, + &queue_requests_entry.attr, #ifdef CONFIG_BLK_WBT &queue_wb_lat_entry.attr, #endif @@ -729,10 +779,7 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page) return res; } - mutex_lock(&disk->queue->sysfs_lock); - res = entry->show(disk, page); - mutex_unlock(&disk->queue->sysfs_lock); - return res; + return entry->show(disk, page); } static ssize_t @@ -778,12 +825,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, return length; } - mutex_lock(&q->sysfs_lock); - memflags = blk_mq_freeze_queue(q); - res = entry->store(disk, page, length); - blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); - return res; + return entry->store(disk, page, length); } static const struct sysfs_ops queue_sysfs_ops = { @@ -852,15 +894,19 @@ int blk_register_queue(struct gendisk *disk) if (ret) goto out_debugfs_remove; + ret = blk_crypto_sysfs_register(disk); + if (ret) + goto out_unregister_ia_ranges; + + mutex_lock(&q->elevator_lock); if (q->elevator) { ret = elv_register_queue(q, false); - if (ret) - goto out_unregister_ia_ranges; + if (ret) { + mutex_unlock(&q->elevator_lock); + goto out_crypto_sysfs_unregister; + } } - - ret = blk_crypto_sysfs_register(disk); - if (ret) - goto out_elv_unregister; + mutex_unlock(&q->elevator_lock); blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); wbt_enable_default(disk); @@ -885,8 +931,8 @@ int blk_register_queue(struct gendisk *disk) return ret; -out_elv_unregister: - elv_unregister_queue(q); +out_crypto_sysfs_unregister: + blk_crypto_sysfs_unregister(disk); out_unregister_ia_ranges: disk_unregister_independent_access_ranges(disk); out_debugfs_remove: @@ -932,8 +978,11 @@ void blk_unregister_queue(struct gendisk *disk) blk_mq_sysfs_unregister(disk); blk_crypto_sysfs_unregister(disk); - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); elv_unregister_queue(q); + mutex_unlock(&q->elevator_lock); + + mutex_lock(&q->sysfs_lock); disk_unregister_independent_access_ranges(disk); mutex_unlock(&q->sysfs_lock); diff --git a/block/elevator.c b/block/elevator.c index cd2ce4921601..6ee372f0220c 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -457,7 +457,7 @@ int elv_register_queue(struct request_queue *q, bool uevent) struct elevator_queue *e = q->elevator; int error; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); error = kobject_add(&e->kobj, &q->disk->queue_kobj, "iosched"); if (!error) { @@ -481,7 +481,7 @@ void elv_unregister_queue(struct request_queue *q) { struct elevator_queue *e = q->elevator; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); if (e && test_and_clear_bit(ELEVATOR_FLAG_REGISTERED, &e->flags)) { kobject_uevent(&e->kobj, KOBJ_REMOVE); @@ -618,7 +618,7 @@ int elevator_switch(struct request_queue *q, struct elevator_type *new_e) unsigned int memflags; int ret; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); memflags = blk_mq_freeze_queue(q); blk_mq_quiesce_queue(q); @@ -655,7 +655,7 @@ void elevator_disable(struct request_queue *q) { unsigned int memflags; - lockdep_assert_held(&q->sysfs_lock); + lockdep_assert_held(&q->elevator_lock); memflags = blk_mq_freeze_queue(q); blk_mq_quiesce_queue(q); @@ -723,11 +723,19 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf, { char elevator_name[ELV_NAME_MAX]; int ret; + unsigned int memflags; + struct request_queue *q = disk->queue; strscpy(elevator_name, buf, sizeof(elevator_name)); + + memflags = blk_mq_freeze_queue(q); + mutex_lock(&q->elevator_lock); ret = elevator_change(disk->queue, strstrip(elevator_name)); + mutex_unlock(&q->elevator_lock); + blk_mq_unfreeze_queue(q, memflags); if (!ret) return count; + return ret; } @@ -738,6 +746,7 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name) struct elevator_type *cur = NULL, *e; int len = 0; + mutex_lock(&q->elevator_lock); if (!q->elevator) { len += sprintf(name+len, "[none] "); } else { @@ -755,6 +764,7 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name) spin_unlock(&elv_list_lock); len += sprintf(name+len, "\n"); + mutex_unlock(&q->elevator_lock); return len; } diff --git a/block/genhd.c b/block/genhd.c index e9375e20d866..c2bd86cd09de 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -565,8 +565,11 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk, if (disk->major == BLOCK_EXT_MAJOR) blk_free_ext_minor(disk->first_minor); out_exit_elevator: - if (disk->queue->elevator) + if (disk->queue->elevator) { + mutex_lock(&disk->queue->elevator_lock); elevator_exit(disk->queue); + mutex_unlock(&disk->queue->elevator_lock); + } return ret; } EXPORT_SYMBOL_GPL(add_disk_fwnode); @@ -742,9 +745,9 @@ void del_gendisk(struct gendisk *disk) blk_mq_quiesce_queue(q); if (q->elevator) { - mutex_lock(&q->sysfs_lock); + mutex_lock(&q->elevator_lock); elevator_exit(q); - mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->elevator_lock); } rq_qos_exit(q); blk_mq_unquiesce_queue(q); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 248416ecd01c..5690d2f3588e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -562,6 +562,7 @@ struct request_queue { struct mutex sysfs_lock; struct mutex limits_lock; + struct mutex elevator_lock; /* * for reusing dead hctx instance in case of updating From patchwork Tue Feb 18 08:28:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 13979257 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2826D22B59B for ; Tue, 18 Feb 2025 08:29:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867371; cv=none; b=fCZmbHpyyFBt9EarsicEDXd9iHaNr5RlEJU838Pcd+omVAu1hsfb4vLU6zDr3ad1HmOidDPKkM1CNsT0X/NFu4sIZd/82BYW5oMNuRE+0k+8Ds3nsi02QxpaxL/CQLZWyAXsCjYU57KPqnyfhgu3P76DrsfkgHlCfY+2RUb2JSI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867371; c=relaxed/simple; bh=igUe4AUJuiPh/XOC7o6fLV+p/JheBjxL+W5aWPu1FZY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CI5kwT82zWi+oe0n6FHHIDgmt3XZFkAGGn3dKAuiq2nR3GIEIG+Ae8HbxmVMhpbtb5NQ8w4O6QS7GmyAFCv7L23Bn0ZLbjLjyx34XuDDGSigiz0KWZ/14UB/ebk5TWImjURlLdQABO+ekDG6KUvaX1dDo8s35lrJWmvS006HDdY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=SukSCgZl; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="SukSCgZl" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51I7XlR8010678; Tue, 18 Feb 2025 08:29:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=wydVK/+EAtjCRo3nS 4JCQKr3lnhUUiYGG8xzVePau0s=; b=SukSCgZlZLji5vLjHA1ZxYbzTt38HR2FU 8hlbFRwQey0VRXSBf92b6qGTpUpPIslwIu5ROgo8m6LiWNa8EXAMiFZSxhM5ygT4 eC0OgXGryziF3xoG8oCPbIfIfVIq96n925TgC5Bt90ynqEARKjdsdPSEmgeurwsI KlQ8eUjg8QCt+VZmlH0HQEwICZeEnm2VoVZmLvxGS1RxWqRXMUAzOtTlZxzmi6vc TnRYUpUAMlhDw09NAdFIqxJFV2dvFki7RO7phYCNhxWmpb1SNIApuIlFxlj5emnf oSOECjD5iptGWBA23Xl4kVulpXdB46PjOVNCiHH+VEEN6Jux23NgQ== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 44vnwpg7sf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:21 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 51I5raK0001599; Tue, 18 Feb 2025 08:29:21 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 44u5myt5ym-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:20 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 51I8TILI22937866 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Feb 2025 08:29:19 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DA71D20043; Tue, 18 Feb 2025 08:29:18 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4BC1F20040; Tue, 18 Feb 2025 08:29:17 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.198]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 18 Feb 2025 08:29:17 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk, gjoyce@ibm.com Subject: [PATCHv2 4/6] blk-sysfs: protect nr_requests update using q->elevator_lock Date: Tue, 18 Feb 2025 13:58:57 +0530 Message-ID: <20250218082908.265283-5-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250218082908.265283-1-nilay@linux.ibm.com> References: <20250218082908.265283-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: MXSz1MTvOth_f69PoBE85yJKxMs_WCOv X-Proofpoint-ORIG-GUID: MXSz1MTvOth_f69PoBE85yJKxMs_WCOv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_03,2025-02-18_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 mlxlogscore=999 impostorscore=0 lowpriorityscore=0 malwarescore=0 phishscore=0 mlxscore=0 adultscore=0 suspectscore=0 bulkscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502180065 The sysfs attribute nr_requests could be simultaneously updated from elevator switch/update or nr_hw_queue update code path. The update to nr_requests for each of those code path now runs holding q->elevator_ lock. So we should now protect access to sysfs attribute nr_requests using q->elevator_lock instead of q->sysfs_lock. Signed-off-by: Nilay Shroff --- block/blk-sysfs.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 355dfb514712..37ac73468d4e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -59,10 +59,11 @@ queue_var_store(unsigned long *var, const char *page, size_t count) static ssize_t queue_requests_show(struct gendisk *disk, char *page) { int ret; + struct request_queue *q = disk->queue; - mutex_lock(&disk->queue->sysfs_lock); - ret = queue_var_show(disk->queue->nr_requests, page); - mutex_unlock(&disk->queue->sysfs_lock); + mutex_lock(&q->elevator_lock); + ret = queue_var_show(q->nr_requests, page); + mutex_unlock(&q->elevator_lock); return ret; } @@ -75,8 +76,8 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count) unsigned int memflags; struct request_queue *q = disk->queue; - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); + mutex_lock(&q->elevator_lock); if (!queue_is_mq(disk->queue)) { ret = -EINVAL; @@ -94,8 +95,9 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count) if (err) ret = err; out: + mutex_unlock(&q->elevator_lock); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); + return ret; } From patchwork Tue Feb 18 08:28:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 13979259 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2B4514A614 for ; Tue, 18 Feb 2025 08:31:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867492; cv=none; b=iX+d0mAVWOZMrgKyHZC4HAa+PUEX4hlNvwIoipmdawIv3sMN2GvgIkr/0rdafnZt9ltWXnjrM3WANen535FH3ekqUy1fUvRznWzKunj0egcXj/6i4ZT9wBT0oBha9cV+JC4+dPo9MSXH5pbQk4jUuuu9n/SBCLlXM4a5qrEfOAA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867492; c=relaxed/simple; bh=ar6BziJk4nf4WYvIFKWueLymL4XcLqxLj7IUzhZzUsw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NkoN0xCwmWqW5t3S4J4toV3jz3flH0mHK0FuW999tWMTtV8GvoRePpe0qHUGW+1oGaZ1Ac7mBKHtWm5ZdHlKKYvqE4ivF8m6OybMxPBJ5EHMgZ4TnVyONkyKExSrLwfsy3MkmgVPaOQCUnmeESIrYdqMGuZnW25tlk4XMbPWppE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=pVSx/UTq; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="pVSx/UTq" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51I3btle020580; Tue, 18 Feb 2025 08:29:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=/q6/k1++O4exoyiHB ZfBQjVCTGvcQDsV14MZMZ6DR0I=; b=pVSx/UTqlaL9cWkbajtVqdoJ6ha+Q14Xn o4Y3B0BTBPZ88NqYSzWGejQvpyu2KrQ+2tTfXHcybkE0HaDvcjlMKeCu2kcj53QA +K2iopj3JZs7fYwAtdrbdc5LSxaXUDVl+TDJtoBNf7F/3+lrgwx7tGMNXS/yMkio ZoSxeaRC38iQH/B56cKylCAG0Ee+tPGNrkRuQ4W9rOl340vsg2onY5SA/DVeEZXS s2D4Fisw2OgKSnksRc2IqeljticDDQg8rdr6A2k7rPn/HvVcLYkJP/AlcHofjCwY s97Z+9Sy2Mzf92Pq4ZcHVBwWpmOqa8gj0t5318mg9YIt16okCW/0g== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 44v7xubme9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:23 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 51I7Lbq9029537; Tue, 18 Feb 2025 08:29:22 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 44v9mmjumf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:22 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 51I8TKYD50594222 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Feb 2025 08:29:20 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A7EFE2004B; Tue, 18 Feb 2025 08:29:20 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4A04220040; Tue, 18 Feb 2025 08:29:19 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.198]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 18 Feb 2025 08:29:19 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk, gjoyce@ibm.com Subject: [PATCHv2 5/6] blk-sysfs: protect wbt_lat_usec using q->elevator_lock Date: Tue, 18 Feb 2025 13:58:58 +0530 Message-ID: <20250218082908.265283-6-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250218082908.265283-1-nilay@linux.ibm.com> References: <20250218082908.265283-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Ib5anEazT31f6EQxCZ_pr4-72qlR1bdP X-Proofpoint-GUID: Ib5anEazT31f6EQxCZ_pr4-72qlR1bdP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_03,2025-02-18_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 mlxscore=0 clxscore=1015 lowpriorityscore=0 phishscore=0 priorityscore=1501 spamscore=0 adultscore=0 impostorscore=0 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502180065 The wbt latency and state could be updated while initializing the elevator or exiting the elevator. The elevator code path is now protected with q->elevator_lock. So we should now protect the access to sysfs attribute wbt_lat_usec using q->elevator_lock instead of q->sysfs_lock. Signed-off-by: Nilay Shroff --- block/blk-sysfs.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 37ac73468d4e..876376bfdac3 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -567,7 +567,7 @@ static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page) { int ret; - mutex_lock(&disk->queue->sysfs_lock); + mutex_lock(&disk->queue->elevator_lock); if (!wbt_rq_qos(disk->queue)) { ret = -EINVAL; @@ -581,7 +581,7 @@ static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page) div_u64(wbt_get_min_lat(disk->queue), 1000)); out: - mutex_unlock(&disk->queue->sysfs_lock); + mutex_unlock(&disk->queue->elevator_lock); return ret; } @@ -594,8 +594,8 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, s64 val; unsigned int memflags; - mutex_lock(&q->sysfs_lock); memflags = blk_mq_freeze_queue(q); + mutex_lock(&q->elevator_lock); ret = queue_var_store64(&val, page); if (ret < 0) @@ -634,8 +634,8 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page, blk_mq_unquiesce_queue(q); out: + mutex_unlock(&q->elevator_lock); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); return ret; } @@ -907,11 +907,11 @@ int blk_register_queue(struct gendisk *disk) mutex_unlock(&q->elevator_lock); goto out_crypto_sysfs_unregister; } + wbt_enable_default(disk); } mutex_unlock(&q->elevator_lock); blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); - wbt_enable_default(disk); /* Now everything is ready and send out KOBJ_ADD uevent */ kobject_uevent(&disk->queue_kobj, KOBJ_ADD); From patchwork Tue Feb 18 08:28:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nilay Shroff X-Patchwork-Id: 13979260 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4BE522AE59 for ; Tue, 18 Feb 2025 08:31:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867514; cv=none; b=nriH4iUa+/OSdG5HcsRf2HwjGrBX0+EpiOsJqBTfKZizJKKQI1kr6RZtcUVy1J34gWolFu8mTQcsPLzwBAdh1bae0rKuaUcj0KMUtLT6EGmanYnxsH4Tmigx8NA/QU2ddzhP9WyGLAspbUlBqTuItxEIzK5Gemi/aZQzSH81iGA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739867514; c=relaxed/simple; bh=05nCs+48O8m7FXfOX8FGjxKSZdMSKaaz2qFSNc/lOn8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Sf3yJWSriGppDJ+MVc/lmiLpT2OQjMY6nZO87Gt9gkQIIGOSVpRXrExn8sqgVMk/2KjMpP9sSo8GZBCaCrDZyYjP3rUpa7/g2xWDoR2WA+mh0qsdY4f8JVzufVlJmloZAGPNIA+WgouZU5P8PR71UKJOLslYbHZY2R+J1+ZUd+w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Qb4QglB1; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Qb4QglB1" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51I7WuQE009132; Tue, 18 Feb 2025 08:29:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=F5N6shTMG8AYFaHU2 J/21JJ/FGRMyZ+RiWmVdq0gPiE=; b=Qb4QglB1Df2iA34sdVjbncAUwEz8HI7Ck 635hd4tUF285dcbbRAYrfSQZpRiV/paZRkan9CgUmp4s3UE5rs6uddJnHRODVNpQ WT63DTqtt3zZd+t/OdGn4XdusiS5fanfMY/BKy67TEBZPxx4yWQ50oHEKQzOugwK 4KMd/oU5/ili40GoKOzlk7kSNU0dSLqfrQDEx+a4HTPtg1PskjuAdPAK68P9dqSA xDWAOqSbHMGzBhQeFZf08KM1cbik4uj9MubZuEOv1mZxzlW666VBT9qiYPK5xJ16 p35rsU39YngD64NrVQPmdLKwYrkXL18o4PPDv0dWgicIC/I53tP6A== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 44vnwpg7sn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:25 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 51I6QLTO001633; Tue, 18 Feb 2025 08:29:24 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 44u5myt601-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 08:29:24 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 51I8TM6A11338000 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Feb 2025 08:29:22 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A969020043; Tue, 18 Feb 2025 08:29:22 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 177C420040; Tue, 18 Feb 2025 08:29:21 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.198]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 18 Feb 2025 08:29:20 +0000 (GMT) From: Nilay Shroff To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk, gjoyce@ibm.com Subject: [PATCHv2 6/6] blk-sysfs: protect read_ahead_kb using q->limits_lock Date: Tue, 18 Feb 2025 13:58:59 +0530 Message-ID: <20250218082908.265283-7-nilay@linux.ibm.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250218082908.265283-1-nilay@linux.ibm.com> References: <20250218082908.265283-1-nilay@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Z1CCyfyT9kARBBqCgIj5X-uRjL6aj9U_ X-Proofpoint-ORIG-GUID: Z1CCyfyT9kARBBqCgIj5X-uRjL6aj9U_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_03,2025-02-18_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 mlxlogscore=999 impostorscore=0 lowpriorityscore=0 malwarescore=0 phishscore=0 mlxscore=0 adultscore=0 suspectscore=0 bulkscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502180065 The bdi->ra_pages could be updated under q->limits_lock while applying bdi limits (please refer blk_apply_bdi_limits()). So protect accessing sysfs attribute read_ahead_kb using q->limits_lock instead of q->sysfs_ lock. Signed-off-by: Nilay Shroff --- block/blk-sysfs.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 876376bfdac3..a8116d3d9127 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -105,9 +105,9 @@ static ssize_t queue_ra_show(struct gendisk *disk, char *page) { int ret; - mutex_lock(&disk->queue->sysfs_lock); + mutex_lock(&disk->queue->limits_lock); ret = queue_var_show(disk->bdi->ra_pages << (PAGE_SHIFT - 10), page); - mutex_unlock(&disk->queue->sysfs_lock); + mutex_unlock(&disk->queue->limits_lock); return ret; } @@ -119,17 +119,24 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) ssize_t ret; unsigned int memflags; struct request_queue *q = disk->queue; - - mutex_lock(&q->sysfs_lock); + /* + * We don't use atomic update helper queue_limits_start_update() and + * queue_limits_commit_update() here for updaing ra_pages bacause + * blk_apply_bdi_limits() which is invoked from queue_limits_commit_ + * update() can overwrite the ra_pages value which user actaully wants + * to store here. The blk_apply_bdi_limits() sets value of ra_pages + * based on the optimal I/O size(io_opt). + */ + mutex_lock(&q->limits_lock); memflags = blk_mq_freeze_queue(q); - ret = queue_var_store(&ra_kb, page, count); if (ret < 0) goto out; disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10); out: + mutex_unlock(&q->limits_lock); blk_mq_unfreeze_queue(q, memflags); - mutex_unlock(&q->sysfs_lock); + return ret; }