From patchwork Fri Feb 17 12:14:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EDE7C05027 for ; Fri, 17 Feb 2023 12:14:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229805AbjBQMOj (ORCPT ); Fri, 17 Feb 2023 07:14:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229769AbjBQMOg (ORCPT ); Fri, 17 Feb 2023 07:14:36 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68D486664F; Fri, 17 Feb 2023 04:14:35 -0800 (PST) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HB0NiR012035; Fri, 17 Feb 2023 12:14:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=ayErR4z7RU1E0mY4Rl/twITSKE3ULnmDMtZUPIVR9cM=; b=FtrPN8y8LkYyameX2TzeFmcTEB0CL8OqJsTiwz9sLX6oRY+4CA5EZxDiK5NSXXtlz26A t8pjWeJalEcIf9Gh+X0igz0fQf6LqSHNyawj9E+M+rDRnz7UVd5Tx9mGfAMSX2p7ESO+ mf3YwHhv3aIim0GRBt+sZHOL1cYGuKNPR1tkSUaHcOurcduIPTxOIJJ33XbVUVvUJrvy d68i9r7fja6iJPU/Z4Aw/MfXiA1720uDT/d623tGZs2blJiINItlsimyDDVEDO+2NwR7 BHqDwv7OLEABKwbfV7/E3gojbam94efqZ02pqaHDWjcNKqO4hP0CbsX2UL/xfww3V05W +w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310t6d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:31 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBfU4v024660; Fri, 17 Feb 2023 12:14:30 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310t5t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:30 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31GJIxgo017665; Fri, 17 Feb 2023 12:14:28 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3np2n6qyuu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:28 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEPak38011306 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:26 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D62E220043; Fri, 17 Feb 2023 12:14:25 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CCA0020040; Fri, 17 Feb 2023 12:14:23 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:23 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v4 1/9] ext4: Stop searching if PA doesn't satisfy non-extent file Date: Fri, 17 Feb 2023 17:44:10 +0530 Message-Id: <0e6cd1ad4ecf5b22546fe28a7fc795351d1577c9.1676634592.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: k11oXUZJfynJ1h0f0mpy9s0L6_KyaDFf X-Proofpoint-GUID: 3fSh89pekOouCL1vf_K5OwBTnfRMcNIH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=961 adultscore=0 spamscore=0 phishscore=0 clxscore=1015 priorityscore=1501 impostorscore=0 malwarescore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If we come across a PA that matches the logical offset but is unable to satisfy a non-extent file due to its physical start being higher than that supported by non extent files, then simply stop searching for another PA and break out of loop. This is because, since PAs don't overlap, we won't be able to find another inode PA which can satisfy the original request. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 5b2ae37a8b80..e0bbca523b4b 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4382,8 +4382,13 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) /* non-extent files can't have physical blocks past 2^32 */ if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) && (pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len) > - EXT4_MAX_BLOCK_FILE_PHYS)) - continue; + EXT4_MAX_BLOCK_FILE_PHYS)) { + /* + * Since PAs don't overlap, we won't find any + * other PA to satisfy this. + */ + break; + } /* found preallocated blocks, use them */ spin_lock(&pa->pa_lock); From patchwork Fri Feb 17 12:14:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144606 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A3E4C636D4 for ; Fri, 17 Feb 2023 12:14:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229923AbjBQMOq (ORCPT ); Fri, 17 Feb 2023 07:14:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229785AbjBQMOi (ORCPT ); Fri, 17 Feb 2023 07:14:38 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8180467468; Fri, 17 Feb 2023 04:14:37 -0800 (PST) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HB0NiS012035; Fri, 17 Feb 2023 12:14:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=ok7YbW6YpMCYxhzqrtvlXy1RVNtgESxGdAZll8GvBIM=; b=nMbrbjvLUUHlnngc10jIiISQyoRp2CBuppqjFhj0eKJSH4LjtBX0Z3Jjw9X9/hdXyNot BxaV3z5fUX5SvozAa6TrohmN8EM9xPLtFMNiVqtfUvG6kyRmguDQFsK6jHHEFJobNXzP GYrL7QgL0sGsH12hRANAOjzIG3ojDUk9tY2ZwuCN/sfkAaJiq8Mam7LoIhxhYpVuRBuH FDAIvW6g1mXzLmCH6WUqFogD+0KmTr78KWMinucfEF00d6a49rLbCC6v5H6TkG9n22Uf kzk4emq/VnzQ0QOQJyQ48kEbBskGj9AEYgc7HgzJcfW2vwKViYNkiHokyxi8ngakdJm+ fw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310t7s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:33 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBvFId017662; Fri, 17 Feb 2023 12:14:33 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310t6j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:33 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31HBtvgL031344; Fri, 17 Feb 2023 12:14:30 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma01fra.de.ibm.com (PPS) with ESMTPS id 3np2n6dwv9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:30 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCESPL37552560 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:28 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3D3A520040; Fri, 17 Feb 2023 12:14:28 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 42AAF2004B; Fri, 17 Feb 2023 12:14:26 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:26 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v4 2/9] ext4: Refactor code related to freeing PAs Date: Fri, 17 Feb 2023 17:44:11 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: fVd1gPPr7kUBY87D2TX-prx2KnZhwn9K X-Proofpoint-GUID: Ijg1h5czmbRl4ERVJKdGriYSHgKjYch9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 adultscore=0 spamscore=0 phishscore=0 clxscore=1015 priorityscore=1501 impostorscore=0 malwarescore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch makes the following changes: * Rename ext4_mb_pa_free to ext4_mb_pa_put_free to better reflect its purpose * Add new ext4_mb_pa_free() which only handles freeing * Refactor ext4_mb_pa_callback() to use ext4_mb_pa_free() There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index e0bbca523b4b..a5f2803aff93 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4530,16 +4530,22 @@ static void ext4_mb_mark_pa_deleted(struct super_block *sb, } } -static void ext4_mb_pa_callback(struct rcu_head *head) +static inline void ext4_mb_pa_free(struct ext4_prealloc_space *pa) { - struct ext4_prealloc_space *pa; - pa = container_of(head, struct ext4_prealloc_space, u.pa_rcu); - + BUG_ON(!pa); BUG_ON(atomic_read(&pa->pa_count)); BUG_ON(pa->pa_deleted == 0); kmem_cache_free(ext4_pspace_cachep, pa); } +static void ext4_mb_pa_callback(struct rcu_head *head) +{ + struct ext4_prealloc_space *pa; + + pa = container_of(head, struct ext4_prealloc_space, u.pa_rcu); + ext4_mb_pa_free(pa); +} + /* * drops a reference to preallocated space descriptor * if this was the last reference and the space is consumed @@ -5066,14 +5072,20 @@ static int ext4_mb_pa_alloc(struct ext4_allocation_context *ac) return 0; } -static void ext4_mb_pa_free(struct ext4_allocation_context *ac) +static void ext4_mb_pa_put_free(struct ext4_allocation_context *ac) { struct ext4_prealloc_space *pa = ac->ac_pa; BUG_ON(!pa); ac->ac_pa = NULL; WARN_ON(!atomic_dec_and_test(&pa->pa_count)); - kmem_cache_free(ext4_pspace_cachep, pa); + /* + * current function is only called due to an error or due to + * len of found blocks < len of requested blocks hence the PA has not + * been added to grp->bb_prealloc_list. So we don't need to lock it + */ + pa->pa_deleted = 1; + ext4_mb_pa_free(pa); } #ifdef CONFIG_EXT4_DEBUG @@ -5616,13 +5628,13 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, * So we have to free this pa here itself. */ if (*errp) { - ext4_mb_pa_free(ac); + ext4_mb_pa_put_free(ac); ext4_discard_allocated_blocks(ac); goto errout; } if (ac->ac_status == AC_STATUS_FOUND && ac->ac_o_ex.fe_len >= ac->ac_f_ex.fe_len) - ext4_mb_pa_free(ac); + ext4_mb_pa_put_free(ac); } if (likely(ac->ac_status == AC_STATUS_FOUND)) { *errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_clstrs); @@ -5641,7 +5653,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, * If block allocation fails then the pa allocated above * needs to be freed here itself. */ - ext4_mb_pa_free(ac); + ext4_mb_pa_put_free(ac); *errp = -ENOSPC; } From patchwork Fri Feb 17 12:14:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09EFCC6379F for ; Fri, 17 Feb 2023 12:15:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230012AbjBQMO6 (ORCPT ); Fri, 17 Feb 2023 07:14:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229848AbjBQMOp (ORCPT ); Fri, 17 Feb 2023 07:14:45 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9525A6A070; Fri, 17 Feb 2023 04:14:40 -0800 (PST) Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HAGnfL006498; Fri, 17 Feb 2023 12:14:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=KZS3hSXgTaegBvRxQgY6cJoL6DE+TBJpdySt/Ob5gMI=; b=G1wwk4qY9Qk//IIKj6O9DuMFAYoNqwUxEoPhIp/tsO+NjM+xjNTex1U46b0sZ2OcpdBU E6lENY3FcQU91PbDoBKD/0joazQt19f1X5vW0pmkiitLwu9tZXoEg02T1OOwQomVr+yq CuOr5U7X1Y/e/enD51/v0acdAmdHUV4Vy6eYLcvdSnL85tCbsY9GtpM6s3nL1g3dmp5u bS2we1hguUgW7tZDo+7ezjejNVh7Kef+2sXrdlRWDHzzkwPWNL0VG04hxKi2WEX2qKz3 j/PMWgONIyjoy7a6BarV6SC5jH4Ux1UQ31Fg/rzWtLrGfCjR8C47xOqIPEzfgk69VsF7 Ig== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nsv6f1mjs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:36 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBc2QT013530; Fri, 17 Feb 2023 12:14:35 GMT Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nsv6f1mhj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:35 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31GKwiBH009276; Fri, 17 Feb 2023 12:14:33 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma03fra.de.ibm.com (PPS) with ESMTPS id 3np2n6dvtn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:33 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEU7s49479970 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:30 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B562B2004B; Fri, 17 Feb 2023 12:14:30 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9BCB420040; Fri, 17 Feb 2023 12:14:28 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:28 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v4 3/9] ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated() Date: Fri, 17 Feb 2023 17:44:12 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: PO1ovVJa2qzeHoJ3gOOut16nI5nsw4V- X-Proofpoint-ORIG-GUID: PVGHYUfRd1LogOErqR4UrYTVN8D9ax3Q X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 adultscore=0 lowpriorityscore=0 mlxlogscore=999 clxscore=1015 mlxscore=0 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Change some variable names to be more consistent and refactor some of the code to make it easier to read. There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 97 ++++++++++++++++++++++++----------------------- 1 file changed, 49 insertions(+), 48 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a5f2803aff93..f4e699bce99f 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3999,7 +3999,8 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, loff_t orig_size __maybe_unused; ext4_lblk_t start; struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); - struct ext4_prealloc_space *pa; + struct ext4_prealloc_space *tmp_pa; + ext4_lblk_t tmp_pa_start, tmp_pa_end; /* do normalize only data requests, metadata requests do not need preallocation */ @@ -4102,56 +4103,53 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, /* check we don't cross already preallocated blocks */ rcu_read_lock(); - list_for_each_entry_rcu(pa, &ei->i_prealloc_list, pa_inode_list) { - ext4_lblk_t pa_end; - - if (pa->pa_deleted) + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + if (tmp_pa->pa_deleted) continue; - spin_lock(&pa->pa_lock); - if (pa->pa_deleted) { - spin_unlock(&pa->pa_lock); + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted) { + spin_unlock(&tmp_pa->pa_lock); continue; } - pa_end = pa->pa_lstart + EXT4_C2B(EXT4_SB(ac->ac_sb), - pa->pa_len); + tmp_pa_start = tmp_pa->pa_lstart; + tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); /* PA must not overlap original request */ - BUG_ON(!(ac->ac_o_ex.fe_logical >= pa_end || - ac->ac_o_ex.fe_logical < pa->pa_lstart)); + BUG_ON(!(ac->ac_o_ex.fe_logical >= tmp_pa_end || + ac->ac_o_ex.fe_logical < tmp_pa_start)); /* skip PAs this normalized request doesn't overlap with */ - if (pa->pa_lstart >= end || pa_end <= start) { - spin_unlock(&pa->pa_lock); + if (tmp_pa_start >= end || tmp_pa_end <= start) { + spin_unlock(&tmp_pa->pa_lock); continue; } - BUG_ON(pa->pa_lstart <= start && pa_end >= end); + BUG_ON(tmp_pa_start <= start && tmp_pa_end >= end); /* adjust start or end to be adjacent to this pa */ - if (pa_end <= ac->ac_o_ex.fe_logical) { - BUG_ON(pa_end < start); - start = pa_end; - } else if (pa->pa_lstart > ac->ac_o_ex.fe_logical) { - BUG_ON(pa->pa_lstart > end); - end = pa->pa_lstart; + if (tmp_pa_end <= ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_end < start); + start = tmp_pa_end; + } else if (tmp_pa_start > ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_start > end); + end = tmp_pa_start; } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); size = end - start; /* XXX: extra loop to check we really don't overlap preallocations */ rcu_read_lock(); - list_for_each_entry_rcu(pa, &ei->i_prealloc_list, pa_inode_list) { - ext4_lblk_t pa_end; + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0) { + tmp_pa_start = tmp_pa->pa_lstart; + tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); - spin_lock(&pa->pa_lock); - if (pa->pa_deleted == 0) { - pa_end = pa->pa_lstart + EXT4_C2B(EXT4_SB(ac->ac_sb), - pa->pa_len); - BUG_ON(!(start >= pa_end || end <= pa->pa_lstart)); + BUG_ON(!(start >= tmp_pa_end || end <= tmp_pa_start)); } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); @@ -4361,7 +4359,8 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) int order, i; struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); struct ext4_locality_group *lg; - struct ext4_prealloc_space *pa, *cpa = NULL; + struct ext4_prealloc_space *tmp_pa, *cpa = NULL; + ext4_lblk_t tmp_pa_start, tmp_pa_end; ext4_fsblk_t goal_block; /* only data can be preallocated */ @@ -4370,18 +4369,20 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) /* first, try per-file preallocation */ rcu_read_lock(); - list_for_each_entry_rcu(pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { /* all fields in this condition don't change, * so we can skip locking for them */ - if (ac->ac_o_ex.fe_logical < pa->pa_lstart || - ac->ac_o_ex.fe_logical >= (pa->pa_lstart + - EXT4_C2B(sbi, pa->pa_len))) + tmp_pa_start = tmp_pa->pa_lstart; + tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + if (ac->ac_o_ex.fe_logical < tmp_pa_start || + ac->ac_o_ex.fe_logical >= tmp_pa_end) continue; /* non-extent files can't have physical blocks past 2^32 */ if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) && - (pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len) > + (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) > EXT4_MAX_BLOCK_FILE_PHYS)) { /* * Since PAs don't overlap, we won't find any @@ -4391,16 +4392,16 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) } /* found preallocated blocks, use them */ - spin_lock(&pa->pa_lock); - if (pa->pa_deleted == 0 && pa->pa_free) { - atomic_inc(&pa->pa_count); - ext4_mb_use_inode_pa(ac, pa); - spin_unlock(&pa->pa_lock); + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0 && tmp_pa->pa_free) { + atomic_inc(&tmp_pa->pa_count); + ext4_mb_use_inode_pa(ac, tmp_pa); + spin_unlock(&tmp_pa->pa_lock); ac->ac_criteria = 10; rcu_read_unlock(); return true; } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); @@ -4424,16 +4425,16 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) */ for (i = order; i < PREALLOC_TB_SIZE; i++) { rcu_read_lock(); - list_for_each_entry_rcu(pa, &lg->lg_prealloc_list[i], + list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[i], pa_inode_list) { - spin_lock(&pa->pa_lock); - if (pa->pa_deleted == 0 && - pa->pa_free >= ac->ac_o_ex.fe_len) { + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0 && + tmp_pa->pa_free >= ac->ac_o_ex.fe_len) { cpa = ext4_mb_check_group_pa(goal_block, - pa, cpa); + tmp_pa, cpa); } - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); } rcu_read_unlock(); } From patchwork Fri Feb 17 12:14:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144608 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74768C64EC4 for ; Fri, 17 Feb 2023 12:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229891AbjBQMPC (ORCPT ); Fri, 17 Feb 2023 07:15:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229955AbjBQMO5 (ORCPT ); Fri, 17 Feb 2023 07:14:57 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6F8A66652; Fri, 17 Feb 2023 04:14:42 -0800 (PST) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31H9mk8P018470; Fri, 17 Feb 2023 12:14:38 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=xCvCGiyZshaoo7u5KiEiVhWdKt4a5oqWxtBVzxggjfU=; b=tQbiIjWOcZkpODQCMOq5c8uewj+sj/xyDg6wPHRykjn53Xfzn2yQVAlPBZqkZEn6YHPL L4Kqbp2Te7sn7yYpk7lEzLCjuH0NLKppf8H/DhUpkWLwY4XhdP7zeIiy/3XMWFSswDw3 tyQAhxIazAVuZYDjZIW2MKh0uqEZ86MgPe+b1naNNsGHhW75NVwUr7Al9NMYyfY8TpYf LkWOoBUd6Fsjmo5xf0epL8CkTbS0Bs/x+pSZIhJzZFGVKn8N4lEYby7Ejw8VjCgVzjH3 yQpn/VrXMBu8S1KWtzBlK5n3PN0sHkQtFM/+ysgbB3HO//CxCVzJkVLjHHswN2JEPBsZ RA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt2c5hjgq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:38 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBlZO6007331; Fri, 17 Feb 2023 12:14:37 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt2c5hjfy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:37 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31H2xIOq028621; Fri, 17 Feb 2023 12:14:35 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3np2n700h0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:35 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEXjO50921796 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:33 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 33F5920040; Fri, 17 Feb 2023 12:14:33 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2053E20043; Fri, 17 Feb 2023 12:14:31 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:30 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v4 4/9] ext4: Move overlap assert logic into a separate function Date: Fri, 17 Feb 2023 17:44:13 +0530 Message-Id: <7beaf71135b70b7ed4bac4f63bca362548192fd0.1676634592.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: d9DiraftoTii3uZIKyIPh4ezYcEjW1LT X-Proofpoint-GUID: yHCaRpAfwfKdlXDPaQ9iry-lf7gJNmgc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 suspectscore=0 spamscore=0 phishscore=0 adultscore=0 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Abstract out the logic to double check for overlaps in normalize_pa to a separate function. Since there has been no reports in past where we have seen any overlaps which hits this bug_on(), in future we can consider calling this function under "#ifdef AGGRESSIVE_CHECK" only. There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index f4e699bce99f..1628b008a096 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3984,6 +3984,29 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) mb_debug(sb, "goal %u blocks for locality group\n", ac->ac_g_ex.fe_len); } +static inline void +ext4_mb_pa_assert_overlap(struct ext4_allocation_context *ac, + ext4_lblk_t start, ext4_lblk_t end) +{ + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); + struct ext4_prealloc_space *tmp_pa; + ext4_lblk_t tmp_pa_start, tmp_pa_end; + + rcu_read_lock(); + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0) { + tmp_pa_start = tmp_pa->pa_lstart; + tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + BUG_ON(!(start >= tmp_pa_end || end <= tmp_pa_start)); + } + spin_unlock(&tmp_pa->pa_lock); + } + rcu_read_unlock(); +} + /* * Normalization means making request better in terms of * size and alignment @@ -4140,18 +4163,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, size = end - start; /* XXX: extra loop to check we really don't overlap preallocations */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted == 0) { - tmp_pa_start = tmp_pa->pa_lstart; - tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); - - BUG_ON(!(start >= tmp_pa_end || end <= tmp_pa_start)); - } - spin_unlock(&tmp_pa->pa_lock); - } - rcu_read_unlock(); + ext4_mb_pa_assert_overlap(ac, start, end); /* * In this function "start" and "size" are normalized for better From patchwork Fri Feb 17 12:14:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1F00C636D4 for ; Fri, 17 Feb 2023 12:15:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230022AbjBQMPT (ORCPT ); Fri, 17 Feb 2023 07:15:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229785AbjBQMO7 (ORCPT ); Fri, 17 Feb 2023 07:14:59 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0E9C66061; Fri, 17 Feb 2023 04:14:45 -0800 (PST) Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HACSK9026191; Fri, 17 Feb 2023 12:14:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=SeldDRZQ0+W2L3R9AmlihtnMi8L2bo3aoXvILnti/ts=; b=GxIz0deYjq45TZIcu8oN+GJAwjSWph/Blq0qNVfLifhzAqpIQifAARiq30OIKgyUcSBL r4pwREwRy7OFeBkd6PLEap9G9jhwi6cGvyxZvC83S66t9nzKdfcBBHkPRdebYyzCT7U7 pnZEz6LYYPJiIi444Gf8qPQL1rKohLxMlBbCxSdrkTPtMQtZG+qVBhThPnx3J5+/JGGO GabOZfTnlWdGnjIifKb17X/GIqzq2zsQQNiri/bPvenSxh3nNedudYP5ZvoQgaM7CD/q NBY/yPXFSXPNQjNoGq0bQnzOulW1DxrO7zNgZ0YkSS27EzDKaDkdDnOyUuJRCNkIx5cu xg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nsv6f1mnd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:41 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HAZM9r027239; Fri, 17 Feb 2023 12:14:41 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nsv6f1mmb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:40 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31GLHjsk006523; Fri, 17 Feb 2023 12:14:38 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma04fra.de.ibm.com (PPS) with ESMTPS id 3np2n6dwmb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:38 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEZGS36372946 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:35 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9625C2004B; Fri, 17 Feb 2023 12:14:35 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 938EA20040; Fri, 17 Feb 2023 12:14:33 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:33 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v4 5/9] ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request() Date: Fri, 17 Feb 2023 17:44:14 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: UA-nwkYjKHkmUWxcCI5ke3in8CGyWT-g X-Proofpoint-ORIG-GUID: p5aEubmqPazTZ2ExpfZqrBk4RzLp2PAn X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 adultscore=0 lowpriorityscore=0 mlxlogscore=999 clxscore=1015 mlxscore=0 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Abstract out the logic of fixing PA overlaps in ext4_mb_normalize_request to improve readability of code. This also makes it easier to make changes to the overlap logic in future. There are no functional changes in this patch Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 110 +++++++++++++++++++++++++++++----------------- 1 file changed, 69 insertions(+), 41 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 1628b008a096..fdb9d0a8f35d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4007,6 +4007,74 @@ ext4_mb_pa_assert_overlap(struct ext4_allocation_context *ac, rcu_read_unlock(); } +/* + * Given an allocation context "ac" and a range "start", "end", check + * and adjust boundaries if the range overlaps with any of the existing + * preallocatoins stored in the corresponding inode of the allocation context. + * + *Parameters: + * ac allocation context + * start start of the new range + * end end of the new range + */ +static inline void +ext4_mb_pa_adjust_overlap(struct ext4_allocation_context *ac, + ext4_lblk_t *start, ext4_lblk_t *end) +{ + struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + struct ext4_prealloc_space *tmp_pa; + ext4_lblk_t new_start, new_end; + ext4_lblk_t tmp_pa_start, tmp_pa_end; + + new_start = *start; + new_end = *end; + + /* check we don't cross already preallocated blocks */ + rcu_read_lock(); + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + if (tmp_pa->pa_deleted) + continue; + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted) { + spin_unlock(&tmp_pa->pa_lock); + continue; + } + + tmp_pa_start = tmp_pa->pa_lstart; + tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + /* PA must not overlap original request */ + BUG_ON(!(ac->ac_o_ex.fe_logical >= tmp_pa_end || + ac->ac_o_ex.fe_logical < tmp_pa_start)); + + /* skip PAs this normalized request doesn't overlap with */ + if (tmp_pa_start >= new_end || tmp_pa_end <= new_start) { + spin_unlock(&tmp_pa->pa_lock); + continue; + } + BUG_ON(tmp_pa_start <= new_start && tmp_pa_end >= new_end); + + /* adjust start or end to be adjacent to this pa */ + if (tmp_pa_end <= ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_end < new_start); + new_start = tmp_pa_end; + } else if (tmp_pa_start > ac->ac_o_ex.fe_logical) { + BUG_ON(tmp_pa_start > new_end); + new_end = tmp_pa_start; + } + spin_unlock(&tmp_pa->pa_lock); + } + rcu_read_unlock(); + + /* XXX: extra loop to check we really don't overlap preallocations */ + ext4_mb_pa_assert_overlap(ac, new_start, new_end); + + *start = new_start; + *end = new_end; + return; +} + /* * Normalization means making request better in terms of * size and alignment @@ -4021,9 +4089,6 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, loff_t size, start_off; loff_t orig_size __maybe_unused; ext4_lblk_t start; - struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); - struct ext4_prealloc_space *tmp_pa; - ext4_lblk_t tmp_pa_start, tmp_pa_end; /* do normalize only data requests, metadata requests do not need preallocation */ @@ -4124,47 +4189,10 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, end = start + size; - /* check we don't cross already preallocated blocks */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { - if (tmp_pa->pa_deleted) - continue; - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted) { - spin_unlock(&tmp_pa->pa_lock); - continue; - } - - tmp_pa_start = tmp_pa->pa_lstart; - tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); - - /* PA must not overlap original request */ - BUG_ON(!(ac->ac_o_ex.fe_logical >= tmp_pa_end || - ac->ac_o_ex.fe_logical < tmp_pa_start)); - - /* skip PAs this normalized request doesn't overlap with */ - if (tmp_pa_start >= end || tmp_pa_end <= start) { - spin_unlock(&tmp_pa->pa_lock); - continue; - } - BUG_ON(tmp_pa_start <= start && tmp_pa_end >= end); + ext4_mb_pa_adjust_overlap(ac, &start, &end); - /* adjust start or end to be adjacent to this pa */ - if (tmp_pa_end <= ac->ac_o_ex.fe_logical) { - BUG_ON(tmp_pa_end < start); - start = tmp_pa_end; - } else if (tmp_pa_start > ac->ac_o_ex.fe_logical) { - BUG_ON(tmp_pa_start > end); - end = tmp_pa_start; - } - spin_unlock(&tmp_pa->pa_lock); - } - rcu_read_unlock(); size = end - start; - /* XXX: extra loop to check we really don't overlap preallocations */ - ext4_mb_pa_assert_overlap(ac, start, end); - /* * In this function "start" and "size" are normalized for better * alignment and length such that we could preallocate more blocks. From patchwork Fri Feb 17 12:14:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144610 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53032C05027 for ; Fri, 17 Feb 2023 12:15:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230082AbjBQMPZ (ORCPT ); Fri, 17 Feb 2023 07:15:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230093AbjBQMO7 (ORCPT ); Fri, 17 Feb 2023 07:14:59 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E950E6605B; Fri, 17 Feb 2023 04:14:46 -0800 (PST) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HB0S8H012085; Fri, 17 Feb 2023 12:14:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pp1; bh=3Aoc7po0EoQIZm0dBtmhB5yc7Mo+m+A9HqmnpsDSHAY=; b=rSSZ7AxfuV2ffljsEKifuU5MoeGW1dltZJVZIvBoqzKUhNZlSD8UmK2BiycZqVOfcjsf FTolX5M7cNte1eyKT5wKUsoMKghKq3ClllkWQYaY05YUA88V/9JKXU6SvTyrZRDbcU+z ticaRkDqas6BiAZOu0dZJEZHrHJaTCRZbGQV68bMgAnt4swFNyfPG+h3QVqGFBnfwudr 5eNddnpQK265w4ph+v/VWUJxcjp+geA/NEGUt6q1lXWlNuvz7naHOFLVwITDImyR+UyT JzKxVGSYcn9ykcUjpwGrKdp4CnmYxahShiWg4oNyPvUUzAjmv0tIaRaZDY/x68Q0aCEQ bw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310tb4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:43 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBN3RL025225; Fri, 17 Feb 2023 12:14:42 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3310taf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:42 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31HBUJBU010698; Fri, 17 Feb 2023 12:14:40 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3np2n700h3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:40 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEbEq46596600 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:38 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CFBAA2004D; Fri, 17 Feb 2023 12:14:37 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0047620040; Fri, 17 Feb 2023 12:14:36 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:35 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu Subject: [PATCH v4 6/9] ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa() Date: Fri, 17 Feb 2023 17:44:15 +0530 Message-Id: <79b5240a6168171577b1bb9ef7a27c0c52676d37.1676634592.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: vPwLMry1_oXZ-X3kOsVT1PbtfwuiDn8g X-Proofpoint-GUID: 89MkkpUqUWSEHQxDnsMKRrnpyxAvFgL5 X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 adultscore=0 spamscore=0 phishscore=0 clxscore=1015 priorityscore=1501 impostorscore=0 malwarescore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When the length of best extent found is less than the length of goal extent we need to make sure that the best extent atleast covers the start of the original request. This is done by adjusting the ac_b_ex.fe_logical (logical start) of the extent. While doing so, the current logic sometimes results in the best extent's logical range overflowing the goal extent. Since this best extent is later added to the inode preallocation list, we have a possibility of introducing overlapping preallocations. This is discussed in detail here [1]. To fix this, replace the existing logic with the below logic for adjusting best extent as it keeps fragmentation in check while ensuring logical range of best extent doesn't overflow out of goal extent: 1. Check if best extent can be kept at end of goal range and still cover original start. 2. Else, check if best extent can be kept at start of goal range and still cover original start. 3. Else, keep the best extent at start of original request. Also, add a few extra BUG_ONs that might help catch errors faster. [1] https://lore.kernel.org/r/Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 49 ++++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index fdb9d0a8f35d..ba9d26e2f2aa 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4330,6 +4330,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac, BUG_ON(start < pa->pa_pstart); BUG_ON(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len)); BUG_ON(pa->pa_free < len); + BUG_ON(ac->ac_b_ex.fe_len <= 0); pa->pa_free -= len; mb_debug(ac->ac_sb, "use %llu/%d from inode pa %p\n", start, len, pa); @@ -4668,10 +4669,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa = ac->ac_pa; if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len) { - int winl; - int wins; - int win; - int offs; + int new_bex_start; + int new_bex_end; /* we can't allocate as much as normalizer wants. * so, found space must get proper lstart @@ -4679,26 +4678,40 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) BUG_ON(ac->ac_g_ex.fe_logical > ac->ac_o_ex.fe_logical); BUG_ON(ac->ac_g_ex.fe_len < ac->ac_o_ex.fe_len); - /* we're limited by original request in that - * logical block must be covered any way - * winl is window we can move our chunk within */ - winl = ac->ac_o_ex.fe_logical - ac->ac_g_ex.fe_logical; + /* + * Use the below logic for adjusting best extent as it keeps + * fragmentation in check while ensuring logical range of best + * extent doesn't overflow out of goal extent: + * + * 1. Check if best ex can be kept at end of goal and still + * cover original start + * 2. Else, check if best ex can be kept at start of goal and + * still cover original start + * 3. Else, keep the best ex at start of original request. + */ + new_bex_end = ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len); + new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical >= new_bex_start) + goto adjust_bex; - /* also, we should cover whole original request */ - wins = EXT4_C2B(sbi, ac->ac_b_ex.fe_len - ac->ac_o_ex.fe_len); + new_bex_start = ac->ac_g_ex.fe_logical; + new_bex_end = + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + if (ac->ac_o_ex.fe_logical < new_bex_end) + goto adjust_bex; - /* the smallest one defines real window */ - win = min(winl, wins); + new_bex_start = ac->ac_o_ex.fe_logical; + new_bex_end = + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - offs = ac->ac_o_ex.fe_logical % - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - if (offs && offs < win) - win = offs; +adjust_bex: + ac->ac_b_ex.fe_logical = new_bex_start; - ac->ac_b_ex.fe_logical = ac->ac_o_ex.fe_logical - - EXT4_NUM_B2C(sbi, win); BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); + BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + + EXT4_C2B(sbi, ac->ac_g_ex.fe_len))); } /* preallocation can change ac_b_ex, thus we store actually From patchwork Fri Feb 17 12:14:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8360C05027 for ; Fri, 17 Feb 2023 12:15:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230325AbjBQMPi (ORCPT ); Fri, 17 Feb 2023 07:15:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230149AbjBQMPR (ORCPT ); Fri, 17 Feb 2023 07:15:17 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 074C26A05B; Fri, 17 Feb 2023 04:14:49 -0800 (PST) Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HBSgYC040059; Fri, 17 Feb 2023 12:14:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=v53ZXlSNypmup6yBGc8VwIs3WgyLSerKKfvOgLaQpoU=; b=fko5isSZKqV9szcGCcMI1ps92Hr8uoxzBP2hCZC7sg2vvuVdNRfmdSbKkaRUxbtsixqc cAkvpV4ImPk87ro+Mdaee7IL2EnGsBxCYyypzCKaAQWCAXATw2BGblYFMKUbEUSNWt4z 5tzjkyF2zr+Ral9EBmepYxBwHgDgt2MdIQEzbFecXyIBrXT22Z8UJF414Ee1VkEZKKsY +BS8nShFkVj6v+G0Z5CgSKEsUhhR78BLtmt/JB8TcB/sDNF8Vcy5eTUyiu4gtmN/3dHl Pp3vjkFpkfJG0z7Duvpppccp4cHNFn1an1sF+Eiz+3CxK1hoekJZqHrCSVU3yV9Pepv4 RA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3ngr4m6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:45 +0000 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBZJLJ015429; Fri, 17 Feb 2023 12:14:45 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt3ngr4kt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:45 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31GNkTcK027510; Fri, 17 Feb 2023 12:14:42 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3np2n6nwem-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:42 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEeRZ46072098 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:40 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4F4C02004B; Fri, 17 Feb 2023 12:14:40 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3B1CE20043; Fri, 17 Feb 2023 12:14:38 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:38 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v4 7/9] ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union Date: Fri, 17 Feb 2023 17:44:16 +0530 Message-Id: <17125fbc2d6311515470320c1f6b1c71e66b6417.1676634592.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ijJld_KUbTqUGjf3zmFL1rceLiV4WaB9 X-Proofpoint-ORIG-GUID: CKv3bqWvqxgBeqJ_JFjp4zriCUrF3t6L X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 mlxscore=0 malwarescore=0 phishscore=0 suspectscore=0 bulkscore=0 spamscore=0 impostorscore=0 lowpriorityscore=0 adultscore=0 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ** Splitting pa->pa_inode_list ** Currently, we use the same pa->pa_inode_list to add a pa to either the inode preallocation list or the locality group preallocation list. For better clarity, split this list into a union of 2 list_heads and use either of the them based on the type of pa. ** Splitting pa->pa_obj_lock ** Currently, pa->pa_obj_lock is either assigned &ei->i_prealloc_lock for inode PAs or lg_prealloc_lock for lg PAs, and is then used to lock the lists containing these PAs. Make the distinction between the 2 PA types clear by changing this lock to a union of 2 locks. Explicitly use the pa_lock_node.inode_lock for inode PAs and pa_lock_node.lg_lock for lg PAs. This patch is required so that the locality group preallocation code remains the same as in upcoming patches we are going to make changes to inode preallocation code to move from list to rbtree based implementation. This patch also makes it easier to review the upcoming patches. There are no functional changes in this patch. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 76 +++++++++++++++++++++++++++-------------------- fs/ext4/mballoc.h | 10 +++++-- 2 files changed, 52 insertions(+), 34 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index ba9d26e2f2aa..7b8bbfb9ad58 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3994,7 +3994,7 @@ ext4_mb_pa_assert_overlap(struct ext4_allocation_context *ac, ext4_lblk_t tmp_pa_start, tmp_pa_end; rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list) { spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted == 0) { tmp_pa_start = tmp_pa->pa_lstart; @@ -4032,7 +4032,7 @@ ext4_mb_pa_adjust_overlap(struct ext4_allocation_context *ac, /* check we don't cross already preallocated blocks */ rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list) { if (tmp_pa->pa_deleted) continue; spin_lock(&tmp_pa->pa_lock); @@ -4410,7 +4410,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) /* first, try per-file preallocation */ rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_inode_list) { + list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list) { /* all fields in this condition don't change, * so we can skip locking for them */ @@ -4467,7 +4467,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) for (i = order; i < PREALLOC_TB_SIZE; i++) { rcu_read_lock(); list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[i], - pa_inode_list) { + pa_node.lg_list) { spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted == 0 && tmp_pa->pa_free >= ac->ac_o_ex.fe_len) { @@ -4641,9 +4641,15 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, list_del(&pa->pa_group_list); ext4_unlock_group(sb, grp); - spin_lock(pa->pa_obj_lock); - list_del_rcu(&pa->pa_inode_list); - spin_unlock(pa->pa_obj_lock); + if (pa->pa_type == MB_INODE_PA) { + spin_lock(pa->pa_node_lock.inode_lock); + list_del_rcu(&pa->pa_node.inode_list); + spin_unlock(pa->pa_node_lock.inode_lock); + } else { + spin_lock(pa->pa_node_lock.lg_lock); + list_del_rcu(&pa->pa_node.lg_list); + spin_unlock(pa->pa_node_lock.lg_lock); + } call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); } @@ -4723,7 +4729,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa->pa_len = ac->ac_b_ex.fe_len; pa->pa_free = pa->pa_len; spin_lock_init(&pa->pa_lock); - INIT_LIST_HEAD(&pa->pa_inode_list); + INIT_LIST_HEAD(&pa->pa_node.inode_list); INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted = 0; pa->pa_type = MB_INODE_PA; @@ -4738,14 +4744,14 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) ei = EXT4_I(ac->ac_inode); grp = ext4_get_group_info(sb, ac->ac_b_ex.fe_group); - pa->pa_obj_lock = &ei->i_prealloc_lock; + pa->pa_node_lock.inode_lock = &ei->i_prealloc_lock; pa->pa_inode = ac->ac_inode; list_add(&pa->pa_group_list, &grp->bb_prealloc_list); - spin_lock(pa->pa_obj_lock); - list_add_rcu(&pa->pa_inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_obj_lock); + spin_lock(pa->pa_node_lock.inode_lock); + list_add_rcu(&pa->pa_node.inode_list, &ei->i_prealloc_list); + spin_unlock(pa->pa_node_lock.inode_lock); atomic_inc(&ei->i_prealloc_active); } @@ -4777,7 +4783,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac) pa->pa_len = ac->ac_b_ex.fe_len; pa->pa_free = pa->pa_len; spin_lock_init(&pa->pa_lock); - INIT_LIST_HEAD(&pa->pa_inode_list); + INIT_LIST_HEAD(&pa->pa_node.lg_list); INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted = 0; pa->pa_type = MB_GROUP_PA; @@ -4793,7 +4799,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac) lg = ac->ac_lg; BUG_ON(lg == NULL); - pa->pa_obj_lock = &lg->lg_prealloc_lock; + pa->pa_node_lock.lg_lock = &lg->lg_prealloc_lock; pa->pa_inode = NULL; list_add(&pa->pa_group_list, &grp->bb_prealloc_list); @@ -4969,9 +4975,15 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) { /* remove from object (inode or locality group) */ - spin_lock(pa->pa_obj_lock); - list_del_rcu(&pa->pa_inode_list); - spin_unlock(pa->pa_obj_lock); + if (pa->pa_type == MB_GROUP_PA) { + spin_lock(pa->pa_node_lock.lg_lock); + list_del_rcu(&pa->pa_node.lg_list); + spin_unlock(pa->pa_node_lock.lg_lock); + } else { + spin_lock(pa->pa_node_lock.inode_lock); + list_del_rcu(&pa->pa_node.inode_list); + spin_unlock(pa->pa_node_lock.inode_lock); + } if (pa->pa_type == MB_GROUP_PA) ext4_mb_release_group_pa(&e4b, pa); @@ -5034,8 +5046,8 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) spin_lock(&ei->i_prealloc_lock); while (!list_empty(&ei->i_prealloc_list) && needed) { pa = list_entry(ei->i_prealloc_list.prev, - struct ext4_prealloc_space, pa_inode_list); - BUG_ON(pa->pa_obj_lock != &ei->i_prealloc_lock); + struct ext4_prealloc_space, pa_node.inode_list); + BUG_ON(pa->pa_node_lock.inode_lock != &ei->i_prealloc_lock); spin_lock(&pa->pa_lock); if (atomic_read(&pa->pa_count)) { /* this shouldn't happen often - nobody should @@ -5052,7 +5064,7 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) if (pa->pa_deleted == 0) { ext4_mb_mark_pa_deleted(sb, pa); spin_unlock(&pa->pa_lock); - list_del_rcu(&pa->pa_inode_list); + list_del_rcu(&pa->pa_node.inode_list); list_add(&pa->u.pa_tmp_list, &list); needed--; continue; @@ -5342,7 +5354,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb, spin_lock(&lg->lg_prealloc_lock); list_for_each_entry_rcu(pa, &lg->lg_prealloc_list[order], - pa_inode_list, + pa_node.lg_list, lockdep_is_held(&lg->lg_prealloc_lock)) { spin_lock(&pa->pa_lock); if (atomic_read(&pa->pa_count)) { @@ -5365,7 +5377,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb, ext4_mb_mark_pa_deleted(sb, pa); spin_unlock(&pa->pa_lock); - list_del_rcu(&pa->pa_inode_list); + list_del_rcu(&pa->pa_node.lg_list); list_add(&pa->u.pa_tmp_list, &discard_list); total_entries--; @@ -5426,7 +5438,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac) /* Add the prealloc space to lg */ spin_lock(&lg->lg_prealloc_lock); list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[order], - pa_inode_list, + pa_node.lg_list, lockdep_is_held(&lg->lg_prealloc_lock)) { spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted) { @@ -5435,8 +5447,8 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac) } if (!added && pa->pa_free < tmp_pa->pa_free) { /* Add to the tail of the previous entry */ - list_add_tail_rcu(&pa->pa_inode_list, - &tmp_pa->pa_inode_list); + list_add_tail_rcu(&pa->pa_node.lg_list, + &tmp_pa->pa_node.lg_list); added = 1; /* * we want to count the total @@ -5447,7 +5459,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac) lg_prealloc_count++; } if (!added) - list_add_tail_rcu(&pa->pa_inode_list, + list_add_tail_rcu(&pa->pa_node.lg_list, &lg->lg_prealloc_list[order]); spin_unlock(&lg->lg_prealloc_lock); @@ -5503,9 +5515,9 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac) * doesn't grow big. */ if (likely(pa->pa_free)) { - spin_lock(pa->pa_obj_lock); - list_del_rcu(&pa->pa_inode_list); - spin_unlock(pa->pa_obj_lock); + spin_lock(pa->pa_node_lock.lg_lock); + list_del_rcu(&pa->pa_node.lg_list); + spin_unlock(pa->pa_node_lock.lg_lock); ext4_mb_add_n_trim(ac); } } @@ -5515,9 +5527,9 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac) * treat per-inode prealloc list as a lru list, then try * to trim the least recently used PA. */ - spin_lock(pa->pa_obj_lock); - list_move(&pa->pa_inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_obj_lock); + spin_lock(pa->pa_node_lock.inode_lock); + list_move(&pa->pa_node.inode_list, &ei->i_prealloc_list); + spin_unlock(pa->pa_node_lock.inode_lock); } ext4_mb_put_pa(ac, ac->ac_sb, pa); diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index dcda2a943cee..398a6688c341 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -114,7 +114,10 @@ struct ext4_free_data { }; struct ext4_prealloc_space { - struct list_head pa_inode_list; + union { + struct list_head inode_list; /* for inode PAs */ + struct list_head lg_list; /* for lg PAs */ + } pa_node; struct list_head pa_group_list; union { struct list_head pa_tmp_list; @@ -128,7 +131,10 @@ struct ext4_prealloc_space { ext4_grpblk_t pa_len; /* len of preallocated chunk */ ext4_grpblk_t pa_free; /* how many blocks are free */ unsigned short pa_type; /* pa type. inode or group */ - spinlock_t *pa_obj_lock; + union { + spinlock_t *inode_lock; /* locks the inode list holding this PA */ + spinlock_t *lg_lock; /* locks the lg list holding this PA */ + } pa_node_lock; struct inode *pa_inode; /* hack, for history only */ }; From patchwork Fri Feb 17 12:14:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90758C05027 for ; Fri, 17 Feb 2023 12:15:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230280AbjBQMPt (ORCPT ); Fri, 17 Feb 2023 07:15:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230040AbjBQMPT (ORCPT ); Fri, 17 Feb 2023 07:15:19 -0500 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 281CA66069; Fri, 17 Feb 2023 04:14:51 -0800 (PST) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31HBv9RW014453; Fri, 17 Feb 2023 12:14:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=g32XP0zH0GfVXglUsilNDtNqwvy7NFRZ+hbcInZJPFI=; b=BsNqGcslGw7g+V/dzgut2kdT1hd/HKU/xktV5++adjRmT/DQgCHu/LXYNIANOWFli6Rt 2VXtNN8kRz5EYF47EI90WC7mRMS2YWSlIEvNPnJMikyl7m3K8EuPvR5Pe8NO9TfpZugU 718o8g6dKbdEsUfMkcgiGPeCSdkTnO9JnjfZcX7+Zm2PJj5xB+rpb8ip2Ttvfd6uScce ClMapQK+uUnaHRnC6NiRaZ7c+Hs9Fr3SuM5qzybhm2knFC5eSza4HoKQD2ilq/hs6qQW Zar5lcKQ+HuwpSoSo2jCusy9751RIZkXk4loCTPgVEPRmT8UO1snc7fVQGlXipB0BMrC Yw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt15wu1fn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:47 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HCEkMt030448; Fri, 17 Feb 2023 12:14:46 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt15wu1f6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:46 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31HCBFbk010856; Fri, 17 Feb 2023 12:14:45 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3np2n700h5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:44 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEgd346399930 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:42 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A5FC120043; Fri, 17 Feb 2023 12:14:42 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AD60320040; Fri, 17 Feb 2023 12:14:40 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:40 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu Subject: [PATCH v4 8/9] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list Date: Fri, 17 Feb 2023 17:44:17 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: _GkjhUiLsCqJhcMOKmqmURjpoZHSMw1B X-Proofpoint-GUID: uB8oyMNDuDanniFkMjv67Nz6AW4bjQ9J X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 suspectscore=0 malwarescore=0 adultscore=0 bulkscore=0 lowpriorityscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 spamscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Currently, the kernel uses i_prealloc_list to hold all the inode preallocations. This is known to cause degradation in performance in workloads which perform large number of sparse writes on a single file. This is mainly because functions like ext4_mb_normalize_request() and ext4_mb_use_preallocated() iterate over this complete list, resulting in slowdowns when large number of PAs are present. Patch 27bc446e2 partially fixed this by enforcing a limit of 512 for the inode preallocation list and adding logic to continually trim the list if it grows above the threshold, however our testing revealed that a hardcoded value is not suitable for all kinds of workloads. To optimize this, add an rbtree to the inode and hold the inode preallocations in this rbtree. This will make iterating over inode PAs faster and scale much better than a linked list. Additionally, we also had to remove the LRU logic that was added during trimming of the list (in ext4_mb_release_context()) as it will add extra overhead in rbtree. The discards now happen in the lowest-logical-offset-first order. ** Locking notes ** With the introduction of rbtree to maintain inode PAs, we can't use RCU to walk the tree for searching since it can result in partial traversals which might miss some nodes(or entire subtrees) while discards happen in parallel (which happens under a lock). Hence this patch converts the ei->i_prealloc_lock spin_lock to rw_lock. Almost all the codepaths that read/modify the PA rbtrees are protected by the higher level inode->i_data_sem (except ext4_mb_discard_group_preallocations() and ext4_clear_inode()) IIUC, the only place we need lock protection is when one thread is reading "searching" the PA rbtree (earlier protected under rcu_read_lock()) and another is "deleting" the PAs in ext4_mb_discard_group_preallocations() function (which iterates all the PAs using the grp->bb_prealloc_list and deletes PAs from the tree without taking any inode lock (i_data_sem)). So, this patch converts all rcu_read_lock/unlock() paths for inode list PA to use read_lock() and all places where we were using ei->i_prealloc_lock spinlock will now be using write_lock(). Note that this makes the fast path (searching of the right PA e.g. ext4_mb_use_preallocated() or ext4_mb_normalize_request()), now use read_lock() instead of rcu_read_lock/unlock(). Ths also will now block due to slow discard path (ext4_mb_discard_group_preallocations()) which uses write_lock(). But this is not as bad as it looks. This is because - 1. The slow path only occurs when the normal allocation failed and we can say that we are low on disk space. One can argue this scenario won't be much frequent. 2. ext4_mb_discard_group_preallocations(), locks and unlocks the rwlock for deleting every individual PA. This gives enough opportunity for the fast path to acquire the read_lock for searching the PA inode list. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/ext4.h | 4 +- fs/ext4/mballoc.c | 284 +++++++++++++++++++++++++++++++++------------- fs/ext4/mballoc.h | 6 +- fs/ext4/super.c | 4 +- 4 files changed, 211 insertions(+), 87 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 140e1eb300d1..fad5f087e4c6 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1120,8 +1120,8 @@ struct ext4_inode_info { /* mballoc */ atomic_t i_prealloc_active; - struct list_head i_prealloc_list; - spinlock_t i_prealloc_lock; + struct rb_root i_prealloc_node; + rwlock_t i_prealloc_lock; /* extents status tree */ struct ext4_es_tree i_es_tree; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 7b8bbfb9ad58..1bee8a46662b 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3984,6 +3984,24 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) mb_debug(sb, "goal %u blocks for locality group\n", ac->ac_g_ex.fe_len); } +/* + * This function returns the next element to look at during inode + * PA rbtree walk. We assume that we have held the inode PA rbtree lock + * (ei->i_prealloc_lock) + * + * new_start The start of the range we want to compare + * cur_start The existing start that we are comparing against + * node The node of the rb_tree + */ +static inline struct rb_node* +ext4_mb_pa_rb_next_iter(ext4_lblk_t new_start, ext4_lblk_t cur_start, struct rb_node *node) +{ + if (new_start < cur_start) + return node->rb_left; + else + return node->rb_right; +} + static inline void ext4_mb_pa_assert_overlap(struct ext4_allocation_context *ac, ext4_lblk_t start, ext4_lblk_t end) @@ -3992,80 +4010,162 @@ ext4_mb_pa_assert_overlap(struct ext4_allocation_context *ac, struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); struct ext4_prealloc_space *tmp_pa; ext4_lblk_t tmp_pa_start, tmp_pa_end; + struct rb_node *iter; - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list) { - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted == 0) { - tmp_pa_start = tmp_pa->pa_lstart; - tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + read_lock(&ei->i_prealloc_lock); + for (iter = ei->i_prealloc_node.rb_node; iter; + iter = ext4_mb_pa_rb_next_iter(start, tmp_pa_start, iter)) { + tmp_pa = rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); + tmp_pa_start = tmp_pa->pa_lstart; + tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0) BUG_ON(!(start >= tmp_pa_end || end <= tmp_pa_start)); - } spin_unlock(&tmp_pa->pa_lock); } - rcu_read_unlock(); + read_unlock(&ei->i_prealloc_lock); } - /* * Given an allocation context "ac" and a range "start", "end", check * and adjust boundaries if the range overlaps with any of the existing * preallocatoins stored in the corresponding inode of the allocation context. * - *Parameters: + * Parameters: * ac allocation context * start start of the new range * end end of the new range */ static inline void ext4_mb_pa_adjust_overlap(struct ext4_allocation_context *ac, - ext4_lblk_t *start, ext4_lblk_t *end) + ext4_lblk_t *start, ext4_lblk_t *end) { struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); - struct ext4_prealloc_space *tmp_pa; + struct ext4_prealloc_space *tmp_pa = NULL, *left_pa = NULL, *right_pa = NULL; + struct rb_node *iter; ext4_lblk_t new_start, new_end; - ext4_lblk_t tmp_pa_start, tmp_pa_end; + ext4_lblk_t tmp_pa_start, tmp_pa_end, left_pa_end = -1 , right_pa_start = -1; new_start = *start; new_end = *end; - /* check we don't cross already preallocated blocks */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list) { - if (tmp_pa->pa_deleted) - continue; - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted) { - spin_unlock(&tmp_pa->pa_lock); - continue; - } - + /* + * Adjust the normalized range so that it doesn't overlap with any + * existing preallocated blocks(PAs). Make sure to hold the rbtree lock + * so it doesn't change underneath us. + */ + read_lock(&ei->i_prealloc_lock); + + /* Step 1: find any one immediate neighboring PA of the normalized range */ + for (iter = ei->i_prealloc_node.rb_node; iter; + iter = ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical, + tmp_pa_start, iter)) { + tmp_pa = rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); tmp_pa_start = tmp_pa->pa_lstart; tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); /* PA must not overlap original request */ - BUG_ON(!(ac->ac_o_ex.fe_logical >= tmp_pa_end || - ac->ac_o_ex.fe_logical < tmp_pa_start)); + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0) + BUG_ON(!(ac->ac_o_ex.fe_logical >= tmp_pa_end || + ac->ac_o_ex.fe_logical < tmp_pa_start)); + spin_unlock(&tmp_pa->pa_lock); + } - /* skip PAs this normalized request doesn't overlap with */ - if (tmp_pa_start >= new_end || tmp_pa_end <= new_start) { + /* + * Step 2: check if the found PA is left or right neighbor and + * get the other neighbor + */ + if (tmp_pa) { + if (tmp_pa->pa_lstart < ac->ac_o_ex.fe_logical) { + struct rb_node *tmp; + + left_pa = tmp_pa; + tmp = rb_next(&left_pa->pa_node.inode_node); + if (tmp) { + right_pa = rb_entry(tmp, + struct ext4_prealloc_space, + pa_node.inode_node); + } + } else { + struct rb_node *tmp; + + right_pa = tmp_pa; + tmp = rb_prev(&right_pa->pa_node.inode_node); + if (tmp) { + left_pa = rb_entry(tmp, + struct ext4_prealloc_space, + pa_node.inode_node); + } + } + } + + /* Step 3: get the non deleted neighbors */ + if (left_pa) { + for (iter = &left_pa->pa_node.inode_node;; + iter = rb_prev(iter)) { + if (!iter) { + left_pa = NULL; + break; + } + + tmp_pa = rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); + left_pa = tmp_pa; + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0) { + spin_unlock(&tmp_pa->pa_lock); + break; + } spin_unlock(&tmp_pa->pa_lock); - continue; } - BUG_ON(tmp_pa_start <= new_start && tmp_pa_end >= new_end); - - /* adjust start or end to be adjacent to this pa */ - if (tmp_pa_end <= ac->ac_o_ex.fe_logical) { - BUG_ON(tmp_pa_end < new_start); - new_start = tmp_pa_end; - } else if (tmp_pa_start > ac->ac_o_ex.fe_logical) { - BUG_ON(tmp_pa_start > new_end); - new_end = tmp_pa_start; + } + + if (right_pa) { + for (iter = &right_pa->pa_node.inode_node;; + iter = rb_next(iter)) { + if (!iter) { + right_pa = NULL; + break; + } + + tmp_pa = rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); + right_pa = tmp_pa; + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0) { + spin_unlock(&tmp_pa->pa_lock); + break; + } + spin_unlock(&tmp_pa->pa_lock); } - spin_unlock(&tmp_pa->pa_lock); } - rcu_read_unlock(); + + if (left_pa) { + left_pa_end = + left_pa->pa_lstart + EXT4_C2B(sbi, left_pa->pa_len); + BUG_ON(left_pa_end > ac->ac_o_ex.fe_logical); + } + + if (right_pa) { + right_pa_start = right_pa->pa_lstart; + BUG_ON(right_pa_start <= ac->ac_o_ex.fe_logical); + } + + /* Step 4: trim our normalized range to not overlap with the neighbors */ + if (left_pa) { + if (left_pa_end > new_start) + new_start = left_pa_end; + } + + if (right_pa) { + if (right_pa_start < new_end) + new_end = right_pa_start; + } + read_unlock(&ei->i_prealloc_lock); /* XXX: extra loop to check we really don't overlap preallocations */ ext4_mb_pa_assert_overlap(ac, new_start, new_end); @@ -4402,6 +4502,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) struct ext4_locality_group *lg; struct ext4_prealloc_space *tmp_pa, *cpa = NULL; ext4_lblk_t tmp_pa_start, tmp_pa_end; + struct rb_node *iter; ext4_fsblk_t goal_block; /* only data can be preallocated */ @@ -4409,14 +4510,17 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) return false; /* first, try per-file preallocation */ - rcu_read_lock(); - list_for_each_entry_rcu(tmp_pa, &ei->i_prealloc_list, pa_node.inode_list) { + read_lock(&ei->i_prealloc_lock); + for (iter = ei->i_prealloc_node.rb_node; iter; + iter = ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical, tmp_pa_start, iter)) { + tmp_pa = rb_entry(iter, struct ext4_prealloc_space, pa_node.inode_node); /* all fields in this condition don't change, * so we can skip locking for them */ tmp_pa_start = tmp_pa->pa_lstart; tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + /* original request start doesn't lie in this PA */ if (ac->ac_o_ex.fe_logical < tmp_pa_start || ac->ac_o_ex.fe_logical >= tmp_pa_end) continue; @@ -4439,12 +4543,12 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) ext4_mb_use_inode_pa(ac, tmp_pa); spin_unlock(&tmp_pa->pa_lock); ac->ac_criteria = 10; - rcu_read_unlock(); + read_unlock(&ei->i_prealloc_lock); return true; } spin_unlock(&tmp_pa->pa_lock); } - rcu_read_unlock(); + read_unlock(&ei->i_prealloc_lock); /* can we use group allocation? */ if (!(ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)) @@ -4597,6 +4701,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, { ext4_group_t grp; ext4_fsblk_t grp_blk; + struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); /* in this short window concurrent discard can set pa_deleted */ spin_lock(&pa->pa_lock); @@ -4642,16 +4747,41 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, ext4_unlock_group(sb, grp); if (pa->pa_type == MB_INODE_PA) { - spin_lock(pa->pa_node_lock.inode_lock); - list_del_rcu(&pa->pa_node.inode_list); - spin_unlock(pa->pa_node_lock.inode_lock); + write_lock(pa->pa_node_lock.inode_lock); + rb_erase(&pa->pa_node.inode_node, &ei->i_prealloc_node); + write_unlock(pa->pa_node_lock.inode_lock); + ext4_mb_pa_free(pa); } else { spin_lock(pa->pa_node_lock.lg_lock); list_del_rcu(&pa->pa_node.lg_list); spin_unlock(pa->pa_node_lock.lg_lock); + call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + } +} + +static void ext4_mb_pa_rb_insert(struct rb_root *root, struct rb_node *new) +{ + struct rb_node **iter = &root->rb_node, *parent = NULL; + struct ext4_prealloc_space *iter_pa, *new_pa; + ext4_lblk_t iter_start, new_start; + + while (*iter) { + iter_pa = rb_entry(*iter, struct ext4_prealloc_space, + pa_node.inode_node); + new_pa = rb_entry(new, struct ext4_prealloc_space, + pa_node.inode_node); + iter_start = iter_pa->pa_lstart; + new_start = new_pa->pa_lstart; + + parent = *iter; + if (new_start < iter_start) + iter = &((*iter)->rb_left); + else + iter = &((*iter)->rb_right); } - call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + rb_link_node(new, parent, iter); + rb_insert_color(new, root); } /* @@ -4729,7 +4859,6 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa->pa_len = ac->ac_b_ex.fe_len; pa->pa_free = pa->pa_len; spin_lock_init(&pa->pa_lock); - INIT_LIST_HEAD(&pa->pa_node.inode_list); INIT_LIST_HEAD(&pa->pa_group_list); pa->pa_deleted = 0; pa->pa_type = MB_INODE_PA; @@ -4749,9 +4878,9 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) list_add(&pa->pa_group_list, &grp->bb_prealloc_list); - spin_lock(pa->pa_node_lock.inode_lock); - list_add_rcu(&pa->pa_node.inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_node_lock.inode_lock); + write_lock(pa->pa_node_lock.inode_lock); + ext4_mb_pa_rb_insert(&ei->i_prealloc_node, &pa->pa_node.inode_node); + write_unlock(pa->pa_node_lock.inode_lock); atomic_inc(&ei->i_prealloc_active); } @@ -4917,6 +5046,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, struct ext4_prealloc_space *pa, *tmp; struct list_head list; struct ext4_buddy e4b; + struct ext4_inode_info *ei; int err; int free = 0; @@ -4980,18 +5110,21 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, list_del_rcu(&pa->pa_node.lg_list); spin_unlock(pa->pa_node_lock.lg_lock); } else { - spin_lock(pa->pa_node_lock.inode_lock); - list_del_rcu(&pa->pa_node.inode_list); - spin_unlock(pa->pa_node_lock.inode_lock); + write_lock(pa->pa_node_lock.inode_lock); + ei = EXT4_I(pa->pa_inode); + rb_erase(&pa->pa_node.inode_node, &ei->i_prealloc_node); + write_unlock(pa->pa_node_lock.inode_lock); } - if (pa->pa_type == MB_GROUP_PA) + list_del(&pa->u.pa_tmp_list); + + if (pa->pa_type == MB_GROUP_PA) { ext4_mb_release_group_pa(&e4b, pa); - else + call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + } else { ext4_mb_release_inode_pa(&e4b, bitmap_bh, pa); - - list_del(&pa->u.pa_tmp_list); - call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + ext4_mb_pa_free(pa); + } } ext4_unlock_group(sb, group); @@ -5021,6 +5154,7 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) ext4_group_t group = 0; struct list_head list; struct ext4_buddy e4b; + struct rb_node *iter; int err; if (!S_ISREG(inode->i_mode)) { @@ -5043,17 +5177,18 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) repeat: /* first, collect all pa's in the inode */ - spin_lock(&ei->i_prealloc_lock); - while (!list_empty(&ei->i_prealloc_list) && needed) { - pa = list_entry(ei->i_prealloc_list.prev, - struct ext4_prealloc_space, pa_node.inode_list); + write_lock(&ei->i_prealloc_lock); + for (iter = rb_first(&ei->i_prealloc_node); iter && needed; iter = rb_next(iter)) { + pa = rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); BUG_ON(pa->pa_node_lock.inode_lock != &ei->i_prealloc_lock); + spin_lock(&pa->pa_lock); if (atomic_read(&pa->pa_count)) { /* this shouldn't happen often - nobody should * use preallocation while we're discarding it */ spin_unlock(&pa->pa_lock); - spin_unlock(&ei->i_prealloc_lock); + write_unlock(&ei->i_prealloc_lock); ext4_msg(sb, KERN_ERR, "uh-oh! used pa while discarding"); WARN_ON(1); @@ -5064,7 +5199,7 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) if (pa->pa_deleted == 0) { ext4_mb_mark_pa_deleted(sb, pa); spin_unlock(&pa->pa_lock); - list_del_rcu(&pa->pa_node.inode_list); + rb_erase(&pa->pa_node.inode_node, &ei->i_prealloc_node); list_add(&pa->u.pa_tmp_list, &list); needed--; continue; @@ -5072,7 +5207,7 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) /* someone is deleting pa right now */ spin_unlock(&pa->pa_lock); - spin_unlock(&ei->i_prealloc_lock); + write_unlock(&ei->i_prealloc_lock); /* we have to wait here because pa_deleted * doesn't mean pa is already unlinked from @@ -5089,7 +5224,7 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) schedule_timeout_uninterruptible(HZ); goto repeat; } - spin_unlock(&ei->i_prealloc_lock); + write_unlock(&ei->i_prealloc_lock); list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) { BUG_ON(pa->pa_type != MB_INODE_PA); @@ -5121,7 +5256,7 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed) put_bh(bitmap_bh); list_del(&pa->u.pa_tmp_list); - call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); + ext4_mb_pa_free(pa); } } @@ -5495,7 +5630,6 @@ static void ext4_mb_trim_inode_pa(struct inode *inode) static int ext4_mb_release_context(struct ext4_allocation_context *ac) { struct inode *inode = ac->ac_inode; - struct ext4_inode_info *ei = EXT4_I(inode); struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); struct ext4_prealloc_space *pa = ac->ac_pa; if (pa) { @@ -5522,16 +5656,6 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac) } } - if (pa->pa_type == MB_INODE_PA) { - /* - * treat per-inode prealloc list as a lru list, then try - * to trim the least recently used PA. - */ - spin_lock(pa->pa_node_lock.inode_lock); - list_move(&pa->pa_node.inode_list, &ei->i_prealloc_list); - spin_unlock(pa->pa_node_lock.inode_lock); - } - ext4_mb_put_pa(ac, ac->ac_sb, pa); } if (ac->ac_bitmap_page) diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 398a6688c341..f8e8ee493867 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -115,7 +115,7 @@ struct ext4_free_data { struct ext4_prealloc_space { union { - struct list_head inode_list; /* for inode PAs */ + struct rb_node inode_node; /* for inode PA rbtree */ struct list_head lg_list; /* for lg PAs */ } pa_node; struct list_head pa_group_list; @@ -132,10 +132,10 @@ struct ext4_prealloc_space { ext4_grpblk_t pa_free; /* how many blocks are free */ unsigned short pa_type; /* pa type. inode or group */ union { - spinlock_t *inode_lock; /* locks the inode list holding this PA */ + rwlock_t *inode_lock; /* locks the rbtree holding this PA */ spinlock_t *lg_lock; /* locks the lg list holding this PA */ } pa_node_lock; - struct inode *pa_inode; /* hack, for history only */ + struct inode *pa_inode; /* used to get the inode during group discard */ }; enum { diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 72ead3b56706..5fb3e401de6b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1325,9 +1325,9 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) inode_set_iversion(&ei->vfs_inode, 1); ei->i_flags = 0; spin_lock_init(&ei->i_raw_lock); - INIT_LIST_HEAD(&ei->i_prealloc_list); + ei->i_prealloc_node = RB_ROOT; atomic_set(&ei->i_prealloc_active, 0); - spin_lock_init(&ei->i_prealloc_lock); + rwlock_init(&ei->i_prealloc_lock); ext4_es_init_tree(&ei->i_es_tree); rwlock_init(&ei->i_es_lock); INIT_LIST_HEAD(&ei->i_es_list); From patchwork Fri Feb 17 12:14:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13144613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF0DEC05027 for ; Fri, 17 Feb 2023 12:16:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230328AbjBQMQM (ORCPT ); Fri, 17 Feb 2023 07:16:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230329AbjBQMPi (ORCPT ); Fri, 17 Feb 2023 07:15:38 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D33ED68562; Fri, 17 Feb 2023 04:15:08 -0800 (PST) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31H9Auwo031984; Fri, 17 Feb 2023 12:14:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=Pn7cHwYbMvyyd16yHtFxnaw23zykUOX8S45rh2u7WBg=; b=lRLyCMP0+IoWzETfB7NucxUdZodhYunzSCp6ZuxBTwzFwsXflV6g9unR1GLIDOZtgd50 oJ1tYfTlIL1nU8F4zFOcQwFuxDASdqh6QjpmJv84EAVoHg5bwwuhajyGW3xcSo+I86Ci BhxTBVvzjPvTuwNWh0pQUeSPWYXvZvgpkYziCZlW75tWEhHBq8uDVIsLYs9a5IX3UVYb V0hCAwJhBRDSUJJd4950VKg4cSbAXkfC6uZkqrbmhIWqiGEdUqXhlE45vIBSZyZfLOsd MLVyNOeiKhw/1pH8cYJTqjcSy3tluhzurmTDac3Y/NsxUWM4bLAlA47qNek9CcDUFwoh FQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt49ses45-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:50 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31HBg6eS013255; Fri, 17 Feb 2023 12:14:49 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nt49ses3m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:49 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31HBegas029878; Fri, 17 Feb 2023 12:14:47 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3np29fr17n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 12:14:47 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31HCEjjt26346112 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Feb 2023 12:14:45 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 068C020043; Fri, 17 Feb 2023 12:14:45 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0FB4C20040; Fri, 17 Feb 2023 12:14:43 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com (unknown [9.43.3.39]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 17 Feb 2023 12:14:42 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , rookxu , Ritesh Harjani Subject: [PATCH v4 9/9] ext4: Remove the logic to trim inode PAs Date: Fri, 17 Feb 2023 17:44:18 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: NvqCev5n7nr537Ns5wWGCStJwybc3jk2 X-Proofpoint-GUID: j6Bc3skFP1D2ldejZAwKJFxZ5zNVU6F_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-17_06,2023-02-17_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 priorityscore=1501 mlxscore=0 impostorscore=0 bulkscore=0 phishscore=0 adultscore=0 lowpriorityscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302170109 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Earlier, inode PAs were stored in a linked list. This caused a need to periodically trim the list down inorder to avoid growing it to a very large size, as this would severly affect performance during list iteration. Recent patches changed this list to an rbtree, and since the tree scales up much better, we no longer need to have the trim functionality, hence remove it. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- Documentation/admin-guide/ext4.rst | 3 --- fs/ext4/ext4.h | 1 - fs/ext4/mballoc.c | 20 -------------------- fs/ext4/mballoc.h | 5 ----- fs/ext4/sysfs.c | 2 -- 5 files changed, 31 deletions(-) diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst index 4c559e08d11e..5740d85439ff 100644 --- a/Documentation/admin-guide/ext4.rst +++ b/Documentation/admin-guide/ext4.rst @@ -489,9 +489,6 @@ Files in /sys/fs/ext4/: multiple of this tuning parameter if the stripe size is not set in the ext4 superblock - mb_max_inode_prealloc - The maximum length of per-inode ext4_prealloc_space list. - mb_max_to_scan The maximum number of extents the multiblock allocator will search to find the best extent. diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index fad5f087e4c6..d2869ad7d885 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1612,7 +1612,6 @@ struct ext4_sb_info { unsigned int s_mb_stats; unsigned int s_mb_order2_reqs; unsigned int s_mb_group_prealloc; - unsigned int s_mb_max_inode_prealloc; unsigned int s_max_dir_size_kb; /* where last allocation was done - for stream allocation */ unsigned long s_mb_last_group; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 1bee8a46662b..dfd90d7a9735 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3419,7 +3419,6 @@ int ext4_mb_init(struct super_block *sb) sbi->s_mb_stats = MB_DEFAULT_STATS; sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD; sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS; - sbi->s_mb_max_inode_prealloc = MB_DEFAULT_MAX_INODE_PREALLOC; /* * The default group preallocation is 512, which for 4k block * sizes translates to 2 megabytes. However for bigalloc file @@ -5607,29 +5606,11 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac) return ; } -/* - * if per-inode prealloc list is too long, trim some PA - */ -static void ext4_mb_trim_inode_pa(struct inode *inode) -{ - struct ext4_inode_info *ei = EXT4_I(inode); - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - int count, delta; - - count = atomic_read(&ei->i_prealloc_active); - delta = (sbi->s_mb_max_inode_prealloc >> 2) + 1; - if (count > sbi->s_mb_max_inode_prealloc + delta) { - count -= sbi->s_mb_max_inode_prealloc; - ext4_discard_preallocations(inode, count); - } -} - /* * release all resource we used in allocation */ static int ext4_mb_release_context(struct ext4_allocation_context *ac) { - struct inode *inode = ac->ac_inode; struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); struct ext4_prealloc_space *pa = ac->ac_pa; if (pa) { @@ -5665,7 +5646,6 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac) if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC) mutex_unlock(&ac->ac_lg->lg_mutex); ext4_mb_collect_stats(ac); - ext4_mb_trim_inode_pa(inode); return 0; } diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index f8e8ee493867..6d85ee8674a6 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -73,11 +73,6 @@ */ #define MB_DEFAULT_GROUP_PREALLOC 512 -/* - * maximum length of inode prealloc list - */ -#define MB_DEFAULT_MAX_INODE_PREALLOC 512 - /* * Number of groups to search linearly before performing group scanning * optimization. diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index d233c24ea342..f0d42cf44c71 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -214,7 +214,6 @@ EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan); EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs); EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request); EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc); -EXT4_RW_ATTR_SBI_UI(mb_max_inode_prealloc, s_mb_max_inode_prealloc); EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups); EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb); EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error); @@ -264,7 +263,6 @@ static struct attribute *ext4_attrs[] = { ATTR_LIST(mb_order2_req), ATTR_LIST(mb_stream_req), ATTR_LIST(mb_group_prealloc), - ATTR_LIST(mb_max_inode_prealloc), ATTR_LIST(mb_max_linear_groups), ATTR_LIST(max_writeback_mb_bump), ATTR_LIST(extent_max_zeroout_kb),