From patchwork Thu May 25 11:32:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5A27C7EE2D for ; Thu, 25 May 2023 11:33:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240796AbjEYLdc (ORCPT ); Thu, 25 May 2023 07:33:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231491AbjEYLd3 (ORCPT ); Thu, 25 May 2023 07:33:29 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C88A412F; Thu, 25 May 2023 04:33:28 -0700 (PDT) Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBCCBu019043; Thu, 25 May 2023 11:33:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=7YSNE0XaB12wnQLwuj40MZnYICKAfTn2RoMLDFH90Vg=; b=lAtddoqcM/NeGjQvk6SSANEkkj+N9rmLjAqCsGA6LrAKc2P2oJPHOpkCjMK2dBd9vLEh uqkbeCs9d1oiuwOeExDInflWfMev9Kir05EiclLMUsAj6UALGgxk9hQtftc/7VcB0VQk ZlugFvUzgZhdcHWOJkpm/CexxhZKUT2PXXv/NZzOh/D4WwnjKsVigiE1Qd/IOHi7ioj4 twDAuIF36sNbnHksrU/L6u/hMvmQmlljHWXN49Icv7mRUnC8Zh1VAgdMxgn6bHKLW0iZ QeB2OwYWZ08mmJgUQY2Z8RCiLSDhn2WzoJBF714VT+9RTwxIIWFPAMiEHabLYYzFKdI7 ig== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6kb0gjr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:18 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBD5Pg021935; Thu, 25 May 2023 11:33:17 GMT Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6kb0ggg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:17 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P4U8Xk021476; Thu, 25 May 2023 11:33:15 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma03fra.de.ibm.com (PPS) with ESMTPS id 3qppe09ysc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:15 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXDXr63439224 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:13 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E524D2004B; Thu, 25 May 2023 11:33:12 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F07DB20043; Thu, 25 May 2023 11:33:10 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:10 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 01/13] Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits" Date: Thu, 25 May 2023 17:02:55 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: zoLrth61KtZYUg7XNY8MpkZIf7hNVpgV X-Proofpoint-GUID: wO0Lo-cL5UQuthSRFm5tVrZ3ddMC12eQ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=919 spamscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 adultscore=0 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250096 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1. The reverted commit was intended to remove a dead check however it was observed that this check was actually being used to exit early instead of looping sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than the goal extent. Due to this, a my performance tests (fsmark, parallel file writes in a highly fragmented FS) were seeing a 2x-3x regression. Example, the default value of the following variables is: sbi->s_mb_max_to_scan = 200 sbi->s_mb_min_to_scan = 10 In ext4_mb_check_limits() if we find an extent smaller than goal, then we return early and try again. This loop will go on until we have processed sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and just use whatever we have even if it is smaller than goal extent. Now, the regression comes when we find an extent bigger than goal. Earlier, in this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use the bigger extent. However with commit 32c08693 that check was removed and hence we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough free extent to satisfy the request. The only time we would exit early would be when the free extent is *exactly* the size of our goal, which is pretty uncommon occurrence and so we would almost always end up looping 200 times. Hence, revert the commit by adding the check back to fix the regression. Also add a comment to outline this policy. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Kemeng Shi --- fs/ext4/mballoc.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 9c7881a4ea75..2e1a5f001883 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2062,7 +2062,7 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac, if (bex->fe_len < gex->fe_len) return; - if (finish_group) + if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan) ext4_mb_use_best_found(ac, e4b); } @@ -2074,6 +2074,20 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac, * in the context. Later, the best found extent will be used, if * mballoc can't find good enough extent. * + * The algorithm used is roughly as follows: + * + * * If free extent found is exactly as big as goal, then + * stop the scan and use it immediately + * + * * If free extent found is smaller than goal, then keep retrying + * upto a max of sbi->s_mb_max_to_scan times (default 200). After + * that stop scanning and use whatever we have. + * + * * If free extent found is bigger than goal, then keep retrying + * upto a max of sbi->s_mb_min_to_scan times (default 10) before + * stopping the scan and using the extent. + * + * * FIXME: real allocation policy is to be designed yet! */ static void ext4_mb_measure_extent(struct ext4_allocation_context *ac, From patchwork Thu May 25 11:32:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AECC7C77B7A for ; Thu, 25 May 2023 11:34:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240966AbjEYLdl (ORCPT ); Thu, 25 May 2023 07:33:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240755AbjEYLdc (ORCPT ); Thu, 25 May 2023 07:33:32 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AA2EE7; Thu, 25 May 2023 04:33:31 -0700 (PDT) Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBIHRa015124; Thu, 25 May 2023 11:33:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=OemCn1E4YGTfPcU/Xjj6Y9mi4ltTHwa0DIvygIQ2Nk4=; b=HpRXh+AdlDIQ/O5J/uYxEyAGVLUINtN3aGi+JBFcByvB/bfLq1EwE0MfCLFqrjdmfWef CC/XeJVkaL9A/PZOqzPsO0tHwASHSEa13WJu6ae8wQOkhSXcSapARuLuSw2UQR/6IfuB hlynS0GXg1+XJRCZ0MOS3XPUx3soMapQxQeyqu1dRqW79pGq8cz6/cdEt65CdKvW9FGq 3HaJVMU7kAYLNzGW/b2GUpzwNlCf1vz/4+ouF+vfQzoaDO9dDvcXnDj9eNBkYhjrPBPe 1JGVMksILmK31Hn11sgP6WzjcUyIeiBPv0IPXCFZHQfkgcIeqs8OQMLp5WCVgwsAu12y Sw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6p60bp0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:21 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBIePo016937; Thu, 25 May 2023 11:33:20 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6p60bks-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:20 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P0lCxF007529; Thu, 25 May 2023 11:33:17 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3qppc3jduu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:17 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXFvD17433178 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:15 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5F9972004E; Thu, 25 May 2023 11:33:15 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 52F702004D; Thu, 25 May 2023 11:33:13 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:13 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , "Ritesh Harjani (IBM)" Subject: [PATCH 02/13] ext4: mballoc: Remove useless setting of ac_criteria Date: Thu, 25 May 2023 17:02:56 +0530 Message-Id: <9190a546a98e053364583b499804472ec04d747b.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: DNLIjiq17q6H5-Gr97_jYi_OTLVFX2nb X-Proofpoint-GUID: bwrCziY6CJnaqqmmkh1VlVvlIv_Au4Ji X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 adultscore=0 suspectscore=0 mlxscore=0 malwarescore=0 spamscore=0 bulkscore=0 phishscore=0 impostorscore=0 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250096 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Ritesh Harjani (IBM)" There will be changes coming in future patches which will introduce a new criteria for block allocation. This removes the useless setting of ac_criteria. AFAIU, this might be only used to differentiate between whether a preallocated blocks was allocated or was regular allocator called for allocating blocks. Hence this also adds the debug prints to identify what type of block allocation was done in ext4_mb_show_ac(). Signed-off-by: Ritesh Harjani (IBM) Signed-off-by: Ojaswin Mujoo Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 2e1a5f001883..288d504ee744 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4582,7 +4582,6 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) atomic_inc(&tmp_pa->pa_count); ext4_mb_use_inode_pa(ac, tmp_pa); spin_unlock(&tmp_pa->pa_lock); - ac->ac_criteria = 10; read_unlock(&ei->i_prealloc_lock); return true; } @@ -4625,7 +4624,6 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) } if (cpa) { ext4_mb_use_group_pa(ac, cpa); - ac->ac_criteria = 20; return true; } return false; @@ -5407,6 +5405,10 @@ static void ext4_mb_show_ac(struct ext4_allocation_context *ac) (unsigned long)ac->ac_b_ex.fe_logical, (int)ac->ac_criteria); mb_debug(sb, "%u found", ac->ac_found); + mb_debug(sb, "used pa: %s, ", ac->ac_pa ? "yes" : "no"); + if (ac->ac_pa) + mb_debug(sb, "pa_type %s\n", ac->ac_pa->pa_type == MB_GROUP_PA ? + "group pa" : "inode pa"); ext4_mb_show_pa(sb); } #else From patchwork Thu May 25 11:32:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D4C4C7EE2D for ; Thu, 25 May 2023 11:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241133AbjEYLgG (ORCPT ); Thu, 25 May 2023 07:36:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241039AbjEYLf4 (ORCPT ); Thu, 25 May 2023 07:35:56 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43B44E5C; Thu, 25 May 2023 04:34:22 -0700 (PDT) Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBC7X7018940; Thu, 25 May 2023 11:34:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=GYCS2gbWixp7go99TTfbJvm/cCaCF0uTvpfxdLKNBkk=; b=TPeXskFjXSK9mNEGDh7s15rBojMUZDHurr0kKDSUy1XTQKwKGoLsIETfTHDsnAje4wO2 +clkOP75UHcx4DhOfKRXsPXiadRbmGozJZua/rtlSHE+uHYwBVmGN7yGKJllG+i3rYcb KG9uBNyfWqAOTzplJQc0uQnMyCwNCDKvjWpn8OAm5FrDScFj7O+YHPqZl0IDJsOa21dJ OYMcL9QJMWnEF/44T4Ljmg2r8qZu0tyD+n7N8vUwuXieQijXu75TsuyOrdyjuLf9xWyW nzj/dZs6KEOCVlG/incTwpJV7h9h26osPgDjdm3gTpQj4cIOYZc9+7P6zNPVrkUgwLd3 og== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6kb0jb7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:34:14 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBEBCX026026; Thu, 25 May 2023 11:34:14 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6kb0j7k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:34:14 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P3IouD021721; Thu, 25 May 2023 11:33:20 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3qppdk2dsr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:20 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXHEi43385440 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:17 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CEEB820040; Thu, 25 May 2023 11:33:17 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C28FA2004D; Thu, 25 May 2023 11:33:15 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:15 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , "Ritesh Harjani (IBM)" Subject: [PATCH 03/13] ext4: Remove unused extern variables declaration Date: Thu, 25 May 2023 17:02:57 +0530 Message-Id: <1784dcc5422f1f37068704be565b8e16f4d51f19.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 6kHerCjYGCRvjrcGdZLss1tby-r8p3wA X-Proofpoint-GUID: Xwq3s7sAwzL9f8yde6hmyItPOsmdNEBN X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 spamscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 adultscore=0 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250096 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: "Ritesh Harjani (IBM)" ext4_mb_stats & ext4_mb_max_to_scan are never used. We use sbi->s_mb_stats and sbi->s_mb_max_to_scan instead. Hence kill these extern declarations. Signed-off-by: Ritesh Harjani (IBM) Signed-off-by: Ojaswin Mujoo Reviewed-by: Jan Kara --- fs/ext4/ext4.h | 2 -- fs/ext4/mballoc.h | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 7e8f66ba17f4..80e01fbcd0a3 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2815,8 +2815,6 @@ int ext4_fc_record_regions(struct super_block *sb, int ino, /* mballoc.c */ extern const struct seq_operations ext4_mb_seq_groups_ops; extern const struct seq_operations ext4_mb_seq_structs_summary_ops; -extern long ext4_mb_stats; -extern long ext4_mb_max_to_scan; extern int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset); extern int ext4_mb_init(struct super_block *); extern int ext4_mb_release(struct super_block *); diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 6d85ee8674a6..24b666e558f1 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -49,7 +49,7 @@ #define MB_DEFAULT_MIN_TO_SCAN 10 /* - * with 'ext4_mb_stats' allocator will collect stats that will be + * with 's_mb_stats' allocator will collect stats that will be * shown at umount. The collecting costs though! */ #define MB_DEFAULT_STATS 0 From patchwork Thu May 25 11:32:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD032C7EE31 for ; Thu, 25 May 2023 11:34:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240798AbjEYLdo (ORCPT ); Thu, 25 May 2023 07:33:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231491AbjEYLdg (ORCPT ); Thu, 25 May 2023 07:33:36 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6227B12F; Thu, 25 May 2023 04:33:32 -0700 (PDT) Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PADGRW007782; Thu, 25 May 2023 11:33:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=jpKB/Cu0/a1TyjNxyFz/CQQnLADMgzFNjxzQ90nNq90=; b=sA0AQO9Q+YNzM2U1zto0nyIOz8C6AYci3kcsoxF5VKlFuBQETRIgVdekzHB0B+XoiYbQ CDyREf9jG2ybza6v5ZtxEUFdL0ZhFGKFp0nskW4f/o+lCzr0P7oQoi0IQQvxiXUHou1s wb1QZmH7e5cjVVgGSGMV516UM79KG40hcqj7ZKNjMeZ3FmWDfJvq/1bLcaxUdIki+MNF fRFbo66WtXimP6wBBE0+l3KpUp31RKh2tWPC4Vt6maNthihK12EX1wFt8dEe1AP7g5+L R8e6ylEh3yfOEhYORBbvxNVS7YxtzRe4xxNI3itzwnrXXKzv9tSTlAWukALsplr1xku5 cw== Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt5ktj3gc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:24 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P5qoG4013258; Thu, 25 May 2023 11:33:22 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma04fra.de.ibm.com (PPS) with ESMTPS id 3qppcf20qq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:22 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXKNE60948806 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:20 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F403D2004D; Thu, 25 May 2023 11:33:19 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 251922004B; Thu, 25 May 2023 11:33:18 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:17 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi Subject: [PATCH 04/13] ext4: Fix a small typo in ext4_mb_prefetch_fini() Date: Thu, 25 May 2023 17:02:58 +0530 Message-Id: <39e9a6c41244ecd49552262af51aa8aa5ae1bc9d.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: _3Qfh_YVxRUizpHRJGoOMHnYi5tYNjzM X-Proofpoint-GUID: _3Qfh_YVxRUizpHRJGoOMHnYi5tYNjzM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 lowpriorityscore=0 impostorscore=0 malwarescore=0 phishscore=0 clxscore=1015 mlxscore=0 spamscore=0 adultscore=0 priorityscore=1501 bulkscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250092 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Fix a small typo in the if condition introduced in the commit b6bef1b5. Signed-off-by: Ojaswin Mujoo --- fs/ext4/mballoc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 288d504ee744..72c5d1a33bad 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2609,7 +2609,7 @@ void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group, gdp = ext4_get_group_desc(sb, group, NULL); grp = ext4_get_group_info(sb, group); - if (grp && grp && EXT4_MB_GRP_NEED_INIT(grp) && + if (gdp && grp && EXT4_MB_GRP_NEED_INIT(grp) && ext4_free_group_clusters(sb, gdp) > 0 && !(ext4_has_group_desc_csum(sb) && (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) { From patchwork Thu May 25 11:32:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32488C7EE37 for ; Thu, 25 May 2023 11:34:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240983AbjEYLdp (ORCPT ); Thu, 25 May 2023 07:33:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240953AbjEYLdj (ORCPT ); Thu, 25 May 2023 07:33:39 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BFB412F; Thu, 25 May 2023 04:33:37 -0700 (PDT) Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBEBK2009568; Thu, 25 May 2023 11:33:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=xdniZbIR5n4XhghpE9z65LKpy7FxlOT8/VhUvjUkuM0=; b=Gxr5yh16IWDw1DmdjKhnKvbVCdDfU9Kwo3Xbqx1TcW/7mhwD67yktAdK7iZqdOBogId5 qrnNpmsSpctClmOb2RI1ODkpwDvT9JlEp2jAUyPa7gbXa4mYvE3y4+T60wpNoA5XQewh mgE1Im+tew9tHaQQp+zXQDvzmGaRBsO1XhXJIWHE2vQqJ6+7D51I2aiKAfyRX58nTCdu ME7gV3CHv0B/5c8FeDVFmn1YxmedK79ODTWjo82Qb0iCbpDY7pJsx4wJw/73UG/6U2a0 ySdeqcEdkbtYMF8Klv+FB4ZFhg/ig95FAyaKGb9tmcDjs67GC6qAA4YKTLH6Z/ZG6Qxv mQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6m6rexs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:27 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBF1ak010739; Thu, 25 May 2023 11:33:26 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6m6rewx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:26 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P4pocf023481; Thu, 25 May 2023 11:33:25 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma05fra.de.ibm.com (PPS) with ESMTPS id 3qppbmj07b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:24 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXMLL15401494 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:22 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 561BB20040; Thu, 25 May 2023 11:33:22 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6316620043; Thu, 25 May 2023 11:33:20 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:20 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 05/13] ext4: Convert mballoc cr (criteria) to enum Date: Thu, 25 May 2023 17:02:59 +0530 Message-Id: <263f1c5774fd04550a9c04f88ca583bb693eb604.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: FZM1qBGWHHrjR4kS762bkoOUz6Act2x8 X-Proofpoint-GUID: W4aaGPI3KJG_kbFI3nu1xC_Co0bDk0YC X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 bulkscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 impostorscore=0 mlxlogscore=999 priorityscore=1501 malwarescore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250092 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Convert criteria to be an enum so it easier to maintain and update the tracefiles to use enum names. This change also makes it easier to insert new criterias in the future. There is no functional change in this patch. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/ext4.h | 23 +++++++-- fs/ext4/mballoc.c | 96 ++++++++++++++++++------------------- include/trace/events/ext4.h | 16 ++++++- 3 files changed, 82 insertions(+), 53 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 80e01fbcd0a3..2fa4e77eb3a1 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -127,6 +127,23 @@ enum SHIFT_DIRECTION { SHIFT_RIGHT, }; +/* + * Number of criterias defined. For each criteria, mballoc has slightly + * different way of finding the required blocks nad usually, higher the + * criteria the slower the allocation. We start at lower criterias and keep + * falling back to higher ones if we are not able to find any blocks. + */ +#define EXT4_MB_NUM_CRS 4 +/* + * All possible allocation criterias for mballoc + */ +enum criteria { + CR0, + CR1, + CR2, + CR3, +}; + /* * Flags used in mballoc's allocation_context flags field. * @@ -1542,9 +1559,9 @@ struct ext4_sb_info { atomic_t s_bal_2orders; /* 2^order hits */ atomic_t s_bal_cr0_bad_suggestions; atomic_t s_bal_cr1_bad_suggestions; - atomic64_t s_bal_cX_groups_considered[4]; - atomic64_t s_bal_cX_hits[4]; - atomic64_t s_bal_cX_failed[4]; /* cX loop didn't find blocks */ + atomic64_t s_bal_cX_groups_considered[EXT4_MB_NUM_CRS]; + atomic64_t s_bal_cX_hits[EXT4_MB_NUM_CRS]; + atomic64_t s_bal_cX_failed[EXT4_MB_NUM_CRS]; /* cX loop didn't find blocks */ atomic_t s_mb_buddies_generated; /* number of buddies generated */ atomic64_t s_mb_generation_time; atomic_t s_mb_lost_chunks; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 72c5d1a33bad..10dd86a02997 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -154,19 +154,19 @@ * structures to decide the order in which groups are to be traversed for * fulfilling an allocation request. * - * At CR = 0, we look for groups which have the largest_free_order >= the order + * At CR0 , we look for groups which have the largest_free_order >= the order * of the request. We directly look at the largest free order list in the data * structure (1) above where largest_free_order = order of the request. If that * list is empty, we look at remaining list in the increasing order of - * largest_free_order. This allows us to perform CR = 0 lookup in O(1) time. + * largest_free_order. This allows us to perform CR0 lookup in O(1) time. * - * At CR = 1, we only consider groups where average fragment size > request + * At CR1, we only consider groups where average fragment size > request * size. So, we lookup a group which has average fragment size just above or * equal to request size using our average fragment size group lists (data * structure 2) in O(1) time. * * If "mb_optimize_scan" mount option is not set, mballoc traverses groups in - * linear order which requires O(N) search time for each CR 0 and CR 1 phase. + * linear order which requires O(N) search time for each CR0 and CR1 phase. * * The regular allocator (using the buddy cache) supports a few tunables. * @@ -409,7 +409,7 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap, static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac); static bool ext4_mb_good_group(struct ext4_allocation_context *ac, - ext4_group_t group, int cr); + ext4_group_t group, enum criteria cr); static int ext4_try_to_trim_range(struct super_block *sb, struct ext4_buddy *e4b, ext4_grpblk_t start, @@ -859,7 +859,7 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp) * cr level needs an update. */ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, - int *new_cr, ext4_group_t *group, ext4_group_t ngroups) + enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) { struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); struct ext4_group_info *iter, *grp; @@ -884,8 +884,8 @@ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, list_for_each_entry(iter, &sbi->s_mb_largest_free_orders[i], bb_largest_free_order_node) { if (sbi->s_mb_stats) - atomic64_inc(&sbi->s_bal_cX_groups_considered[0]); - if (likely(ext4_mb_good_group(ac, iter->bb_group, 0))) { + atomic64_inc(&sbi->s_bal_cX_groups_considered[CR0]); + if (likely(ext4_mb_good_group(ac, iter->bb_group, CR0))) { grp = iter; break; } @@ -897,7 +897,7 @@ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, if (!grp) { /* Increment cr and search again */ - *new_cr = 1; + *new_cr = CR1; } else { *group = grp->bb_group; ac->ac_flags |= EXT4_MB_CR0_OPTIMIZED; @@ -909,7 +909,7 @@ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, * order. Updates *new_cr if cr level needs an update. */ static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, - int *new_cr, ext4_group_t *group, ext4_group_t ngroups) + enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) { struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); struct ext4_group_info *grp = NULL, *iter; @@ -932,8 +932,8 @@ static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, list_for_each_entry(iter, &sbi->s_mb_avg_fragment_size[i], bb_avg_fragment_size_node) { if (sbi->s_mb_stats) - atomic64_inc(&sbi->s_bal_cX_groups_considered[1]); - if (likely(ext4_mb_good_group(ac, iter->bb_group, 1))) { + atomic64_inc(&sbi->s_bal_cX_groups_considered[CR1]); + if (likely(ext4_mb_good_group(ac, iter->bb_group, CR1))) { grp = iter; break; } @@ -947,7 +947,7 @@ static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, *group = grp->bb_group; ac->ac_flags |= EXT4_MB_CR1_OPTIMIZED; } else { - *new_cr = 2; + *new_cr = CR2; } } @@ -955,7 +955,7 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac) { if (unlikely(!test_opt2(ac->ac_sb, MB_OPTIMIZE_SCAN))) return 0; - if (ac->ac_criteria >= 2) + if (ac->ac_criteria >= CR2) return 0; if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) return 0; @@ -1000,7 +1000,7 @@ next_linear_group(struct ext4_allocation_context *ac, int group, int ngroups) * @ngroups Total number of groups */ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac, - int *new_cr, ext4_group_t *group, ext4_group_t ngroups) + enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) { *new_cr = ac->ac_criteria; @@ -1009,9 +1009,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac, return; } - if (*new_cr == 0) { + if (*new_cr == CR0) { ext4_mb_choose_next_group_cr0(ac, new_cr, group, ngroups); - } else if (*new_cr == 1) { + } else if (*new_cr == CR1) { ext4_mb_choose_next_group_cr1(ac, new_cr, group, ngroups); } else { /* @@ -2406,13 +2406,13 @@ void ext4_mb_scan_aligned(struct ext4_allocation_context *ac, * for the allocation or not. */ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, - ext4_group_t group, int cr) + ext4_group_t group, enum criteria cr) { ext4_grpblk_t free, fragments; int flex_size = ext4_flex_bg_size(EXT4_SB(ac->ac_sb)); struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group); - BUG_ON(cr < 0 || cr >= 4); + BUG_ON(cr < CR0 || cr >= EXT4_MB_NUM_CRS); if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp) || !grp)) return false; @@ -2426,7 +2426,7 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, return false; switch (cr) { - case 0: + case CR0: BUG_ON(ac->ac_2order == 0); /* Avoid using the first bg of a flexgroup for data files */ @@ -2445,15 +2445,15 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, return false; return true; - case 1: + case CR1: if ((free / fragments) >= ac->ac_g_ex.fe_len) return true; break; - case 2: + case CR2: if (free >= ac->ac_g_ex.fe_len) return true; break; - case 3: + case CR3: return true; default: BUG(); @@ -2474,7 +2474,7 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, * out"! */ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, - ext4_group_t group, int cr) + ext4_group_t group, enum criteria cr) { struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group); struct super_block *sb = ac->ac_sb; @@ -2494,7 +2494,7 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, free = grp->bb_free; if (free == 0) goto out; - if (cr <= 2 && free < ac->ac_g_ex.fe_len) + if (cr <= CR2 && free < ac->ac_g_ex.fe_len) goto out; if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp))) goto out; @@ -2509,7 +2509,7 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, ext4_get_group_desc(sb, group, NULL); int ret; - /* cr=0/1 is a very optimistic search to find large + /* cr=CR0/CR1 is a very optimistic search to find large * good chunks almost for free. If buddy data is not * ready, then this optimization makes no sense. But * we never skip the first block group in a flex_bg, @@ -2517,7 +2517,7 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, * and we want to make sure we locate metadata blocks * in the first block group in the flex_bg if possible. */ - if (cr < 2 && + if (cr < CR2 && (!sbi->s_log_groups_per_flex || ((group & ((1 << sbi->s_log_groups_per_flex) - 1)) != 0)) && !(ext4_has_group_desc_csum(sb) && @@ -2623,7 +2623,7 @@ static noinline_for_stack int ext4_mb_regular_allocator(struct ext4_allocation_context *ac) { ext4_group_t prefetch_grp = 0, ngroups, group, i; - int cr = -1, new_cr; + enum criteria cr, new_cr; int err = 0, first_err = 0; unsigned int nr = 0, prefetch_ios = 0; struct ext4_sb_info *sbi; @@ -2681,13 +2681,13 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) } /* Let's just scan groups to find more-less suitable blocks */ - cr = ac->ac_2order ? 0 : 1; + cr = ac->ac_2order ? CR0 : CR1; /* - * cr == 0 try to get exact allocation, - * cr == 3 try to get anything + * cr == CR0 try to get exact allocation, + * cr == CR3 try to get anything */ repeat: - for (; cr < 4 && ac->ac_status == AC_STATUS_CONTINUE; cr++) { + for (; cr < EXT4_MB_NUM_CRS && ac->ac_status == AC_STATUS_CONTINUE; cr++) { ac->ac_criteria = cr; /* * searching for the right group start @@ -2714,7 +2714,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) * spend a lot of time loading imperfect groups */ if ((prefetch_grp == group) && - (cr > 1 || + (cr > CR1 || prefetch_ios < sbi->s_mb_prefetch_limit)) { unsigned int curr_ios = prefetch_ios; @@ -2756,9 +2756,9 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) } ac->ac_groups_scanned++; - if (cr == 0) + if (cr == CR0) ext4_mb_simple_scan_group(ac, &e4b); - else if (cr == 1 && sbi->s_stripe && + else if (cr == CR1 && sbi->s_stripe && !(ac->ac_g_ex.fe_len % sbi->s_stripe)) ext4_mb_scan_aligned(ac, &e4b); else @@ -2798,7 +2798,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) ac->ac_b_ex.fe_len = 0; ac->ac_status = AC_STATUS_CONTINUE; ac->ac_flags |= EXT4_MB_HINT_FIRST; - cr = 3; + cr = CR3; goto repeat; } } @@ -2923,36 +2923,36 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) seq_printf(seq, "\tgroups_scanned: %u\n", atomic_read(&sbi->s_bal_groups_scanned)); seq_puts(seq, "\tcr0_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[0])); + seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR0])); seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[0])); + atomic64_read(&sbi->s_bal_cX_groups_considered[CR0])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[0])); + atomic64_read(&sbi->s_bal_cX_failed[CR0])); seq_printf(seq, "\t\tbad_suggestions: %u\n", atomic_read(&sbi->s_bal_cr0_bad_suggestions)); seq_puts(seq, "\tcr1_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[1])); + seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR1])); seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[1])); + atomic64_read(&sbi->s_bal_cX_groups_considered[CR1])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[1])); + atomic64_read(&sbi->s_bal_cX_failed[CR1])); seq_printf(seq, "\t\tbad_suggestions: %u\n", atomic_read(&sbi->s_bal_cr1_bad_suggestions)); seq_puts(seq, "\tcr2_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[2])); + seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR2])); seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[2])); + atomic64_read(&sbi->s_bal_cX_groups_considered[CR2])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[2])); + atomic64_read(&sbi->s_bal_cX_failed[CR2])); seq_puts(seq, "\tcr3_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[3])); + seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR3])); seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[3])); + atomic64_read(&sbi->s_bal_cX_groups_considered[CR3])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[3])); + atomic64_read(&sbi->s_bal_cX_failed[CR3])); seq_printf(seq, "\textents_scanned: %u\n", atomic_read(&sbi->s_bal_ex_scanned)); seq_printf(seq, "\t\tgoal_hits: %u\n", atomic_read(&sbi->s_bal_goals)); seq_printf(seq, "\t\t2^n_hits: %u\n", atomic_read(&sbi->s_bal_2orders)); diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index ebccf6a6aa1b..f062147ca32b 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -120,6 +120,18 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX); { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}, \ { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"}) +TRACE_DEFINE_ENUM(CR0); +TRACE_DEFINE_ENUM(CR1); +TRACE_DEFINE_ENUM(CR2); +TRACE_DEFINE_ENUM(CR3); + +#define show_criteria(cr) \ + __print_symbolic(cr, \ + { CR0, "CR0" }, \ + { CR1, "CR1" }, \ + { CR2, "CR2" }, \ + { CR3, "CR3" }) + TRACE_EVENT(ext4_other_inode_update_time, TP_PROTO(struct inode *inode, ino_t orig_ino), @@ -1063,7 +1075,7 @@ TRACE_EVENT(ext4_mballoc_alloc, ), TP_printk("dev %d,%d inode %lu orig %u/%d/%u@%u goal %u/%d/%u@%u " - "result %u/%d/%u@%u blks %u grps %u cr %u flags %s " + "result %u/%d/%u@%u blks %u grps %u cr %s flags %s " "tail %u broken %u", MAJOR(__entry->dev), MINOR(__entry->dev), (unsigned long) __entry->ino, @@ -1073,7 +1085,7 @@ TRACE_EVENT(ext4_mballoc_alloc, __entry->goal_len, __entry->goal_logical, __entry->result_group, __entry->result_start, __entry->result_len, __entry->result_logical, - __entry->found, __entry->groups, __entry->cr, + __entry->found, __entry->groups, show_criteria(__entry->cr), show_mballoc_flags(__entry->flags), __entry->tail, __entry->buddy ? 1 << __entry->buddy : 0) ); From patchwork Thu May 25 11:33:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E72EC7EE39 for ; Thu, 25 May 2023 11:34:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240238AbjEYLeO (ORCPT ); Thu, 25 May 2023 07:34:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240957AbjEYLdl (ORCPT ); Thu, 25 May 2023 07:33:41 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A42EFE7; Thu, 25 May 2023 04:33:36 -0700 (PDT) Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PAlLcK016047; Thu, 25 May 2023 11:33:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=ABwR8cxxhdtxFZyh3vWPH5jroCx0zVHitmhQOhPm+sE=; b=pCSEDS4A0wZuvbR6GLg3NI+zShyqNKF1wN3rFy7bjyuUvSvLjJv1WK3MWZM44mIZfN8U IhSaQ2htBRDf2/+x0NlwJ7kcD/KdLd+yId7RngIYKU7jSDESrD/posj+z+M9c1ZKFO2G inIj40bVBGSRrMB9PYjT+I7wSd76DJ76leHQdkiUkZ3irLk2l4jRMihicBs1Sf+61woH 6fj/RtseOp5vJXlgVBi6pS4jwcK6jN3VC3zZHyHd/nkpP/LhdqyzJ+fIY1JxL5X+lFco YhrKFDPVxkKfuRGdT37y+CGGq4Jl4nmHjqX3ap9T8VZRbeZH6j0Yc+6KvEr8Jw0cMHbQ 4g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt67w14f2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:29 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBRbVt005418; Thu, 25 May 2023 11:33:28 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt67w14ee-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:28 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P81dV4009515; Thu, 25 May 2023 11:33:27 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma01fra.de.ibm.com (PPS) with ESMTPS id 3qppc1209y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:26 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXOYW37486850 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:24 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 947A12004B; Thu, 25 May 2023 11:33:24 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BA91220040; Thu, 25 May 2023 11:33:22 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:22 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 06/13] ext4: Add per CR extent scanned counter Date: Thu, 25 May 2023 17:03:00 +0530 Message-Id: <271fd97bd7e4521b7be86ff4db33fb0b8a495af1.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: INbfQYUPOoZsaXmh7K6RSDUxS5gVP_Pn X-Proofpoint-GUID: YBWaslFQNtIquZdZVyPXVfxfmQKkddZ9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 impostorscore=0 clxscore=1015 mlxscore=0 suspectscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 malwarescore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250092 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This gives better visibility into the number of extents scanned in each particular CR. For example, this information can be used to see how out block group scanning logic is performing when the BG is fragmented. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/ext4.h | 1 + fs/ext4/mballoc.c | 12 ++++++++++++ fs/ext4/mballoc.h | 1 + 3 files changed, 14 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 2fa4e77eb3a1..7b460a31ac82 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1553,6 +1553,7 @@ struct ext4_sb_info { atomic_t s_bal_success; /* we found long enough chunks */ atomic_t s_bal_allocated; /* in blocks */ atomic_t s_bal_ex_scanned; /* total extents scanned */ + atomic_t s_bal_cX_ex_scanned[EXT4_MB_NUM_CRS]; /* total extents scanned */ atomic_t s_bal_groups_scanned; /* number of groups scanned */ atomic_t s_bal_goals; /* goal hits */ atomic_t s_bal_breaks; /* too long searches */ diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 10dd86a02997..98d93d2c5401 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2103,6 +2103,7 @@ static void ext4_mb_measure_extent(struct ext4_allocation_context *ac, BUG_ON(ac->ac_status != AC_STATUS_CONTINUE); ac->ac_found++; + ac->ac_cX_found[ac->ac_criteria]++; /* * The special case - take what you catch first @@ -2277,6 +2278,7 @@ void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac, break; } ac->ac_found++; + ac->ac_cX_found[ac->ac_criteria]++; ac->ac_b_ex.fe_len = 1 << i; ac->ac_b_ex.fe_start = k << i; @@ -2390,6 +2392,7 @@ void ext4_mb_scan_aligned(struct ext4_allocation_context *ac, max = mb_find_extent(e4b, i, sbi->s_stripe, &ex); if (max >= sbi->s_stripe) { ac->ac_found++; + ac->ac_cX_found[ac->ac_criteria]++; ex.fe_logical = 0xDEADF00D; /* debug value */ ac->ac_b_ex = ex; ext4_mb_use_best_found(ac, e4b); @@ -2926,6 +2929,7 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR0])); seq_printf(seq, "\t\tgroups_considered: %llu\n", atomic64_read(&sbi->s_bal_cX_groups_considered[CR0])); + seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR0])); seq_printf(seq, "\t\tuseless_loops: %llu\n", atomic64_read(&sbi->s_bal_cX_failed[CR0])); seq_printf(seq, "\t\tbad_suggestions: %u\n", @@ -2935,6 +2939,7 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR1])); seq_printf(seq, "\t\tgroups_considered: %llu\n", atomic64_read(&sbi->s_bal_cX_groups_considered[CR1])); + seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR1])); seq_printf(seq, "\t\tuseless_loops: %llu\n", atomic64_read(&sbi->s_bal_cX_failed[CR1])); seq_printf(seq, "\t\tbad_suggestions: %u\n", @@ -2944,6 +2949,7 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR2])); seq_printf(seq, "\t\tgroups_considered: %llu\n", atomic64_read(&sbi->s_bal_cX_groups_considered[CR2])); + seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR2])); seq_printf(seq, "\t\tuseless_loops: %llu\n", atomic64_read(&sbi->s_bal_cX_failed[CR2])); @@ -2951,6 +2957,7 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR3])); seq_printf(seq, "\t\tgroups_considered: %llu\n", atomic64_read(&sbi->s_bal_cX_groups_considered[CR3])); + seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR3])); seq_printf(seq, "\t\tuseless_loops: %llu\n", atomic64_read(&sbi->s_bal_cX_failed[CR3])); seq_printf(seq, "\textents_scanned: %u\n", atomic_read(&sbi->s_bal_ex_scanned)); @@ -4390,7 +4397,12 @@ static void ext4_mb_collect_stats(struct ext4_allocation_context *ac) atomic_add(ac->ac_b_ex.fe_len, &sbi->s_bal_allocated); if (ac->ac_b_ex.fe_len >= ac->ac_o_ex.fe_len) atomic_inc(&sbi->s_bal_success); + atomic_add(ac->ac_found, &sbi->s_bal_ex_scanned); + for (int i=0; iac_cX_found[i], &sbi->s_bal_cX_ex_scanned[i]); + } + atomic_add(ac->ac_groups_scanned, &sbi->s_bal_groups_scanned); if (ac->ac_g_ex.fe_start == ac->ac_b_ex.fe_start && ac->ac_g_ex.fe_group == ac->ac_b_ex.fe_group) diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 24b666e558f1..acfdc204e15d 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -184,6 +184,7 @@ struct ext4_allocation_context { __u16 ac_groups_scanned; __u16 ac_groups_linear_remaining; __u16 ac_found; + __u16 ac_cX_found[EXT4_MB_NUM_CRS]; __u16 ac_tail; __u16 ac_buddy; __u8 ac_status; From patchwork Thu May 25 11:33:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E440C7EE33 for ; Thu, 25 May 2023 11:34:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241026AbjEYLeS (ORCPT ); Thu, 25 May 2023 07:34:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240400AbjEYLdn (ORCPT ); Thu, 25 May 2023 07:33:43 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E3AC191; Thu, 25 May 2023 04:33:42 -0700 (PDT) Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBE8gb009140; Thu, 25 May 2023 11:33:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=7uIBMHx24pT6r8DIYAJx2ZeLBmpIyqe1cOoxMpsClRM=; b=iTpay6GHmdnzaTyKc3wrqXc16Ozph/PMZOKVL41EcuXgYLRF4Yb5DNbsqdIfRlbkwUcJ 0HTsXSdoYbTnnyD5fLTyVjP33L4V44um94Ep6W+0Jvot+NRYksUiJnuTd/8r2BYcRPOt 9LTFup9mAyAvST6oMnF6YGyulJ4iIc7h804kkS6R5d9Q5dch9osJBtfnb5sjpIocgZZq VBa8+MFnDiuqi3mlciZn9WjgRGSVT7IRsKuc8+eoRv2A9n0MGw8waMDUbxzjFhslbznP hQ5B0+CC7fG8ih3a9GNlqUwYuSe3Otc4fN8sbKv9XK9wR26h6OsPk//2VKo7K/qkR9F+ kA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6m6rf0p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:32 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBFTbX012584; Thu, 25 May 2023 11:33:31 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6m6rf03-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:31 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P5UXTu026858; Thu, 25 May 2023 11:33:29 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3qppcuadwk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:29 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXRxZ61931824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:27 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1009F2004B; Thu, 25 May 2023 11:33:27 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 032A120043; Thu, 25 May 2023 11:33:25 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:24 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 07/13] ext4: Add counter to track successful allocation of goal length Date: Thu, 25 May 2023 17:03:01 +0530 Message-Id: <51cc5ea958b734057a8b31289f6973edec8ab3e4.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: yCZBgmC6zuyKrv8uhpzcrXQaa3xUarr- X-Proofpoint-GUID: hJ4AIUpxwJRqSQTsZSbMlhkkAiE2sP1g X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 bulkscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 impostorscore=0 mlxlogscore=999 priorityscore=1501 malwarescore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250092 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Track number of allocations where the length of blocks allocated is equal to the length of goal blocks (post normalization). This metric could be useful if making changes to the allocator logic in the future as it could give us visibility into how often do we trim our requests. PS: ac_b_ex.fe_len might get modified due to preallocation efforts and hence we use ac_f_ex.fe_len instead since we want to compare how much the allocator was able to actually find. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/ext4.h | 1 + fs/ext4/mballoc.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 7b460a31ac82..8bb1edcd2dda 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1556,6 +1556,7 @@ struct ext4_sb_info { atomic_t s_bal_cX_ex_scanned[EXT4_MB_NUM_CRS]; /* total extents scanned */ atomic_t s_bal_groups_scanned; /* number of groups scanned */ atomic_t s_bal_goals; /* goal hits */ + atomic_t s_bal_len_goals; /* len goal hits */ atomic_t s_bal_breaks; /* too long searches */ atomic_t s_bal_2orders; /* 2^order hits */ atomic_t s_bal_cr0_bad_suggestions; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 98d93d2c5401..8786aa0dd57a 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2962,6 +2962,7 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) atomic64_read(&sbi->s_bal_cX_failed[CR3])); seq_printf(seq, "\textents_scanned: %u\n", atomic_read(&sbi->s_bal_ex_scanned)); seq_printf(seq, "\t\tgoal_hits: %u\n", atomic_read(&sbi->s_bal_goals)); + seq_printf(seq, "\t\tlen_goal_hits: %u\n", atomic_read(&sbi->s_bal_len_goals)); seq_printf(seq, "\t\t2^n_hits: %u\n", atomic_read(&sbi->s_bal_2orders)); seq_printf(seq, "\t\tbreaks: %u\n", atomic_read(&sbi->s_bal_breaks)); seq_printf(seq, "\t\tlost: %u\n", atomic_read(&sbi->s_mb_lost_chunks)); @@ -4407,6 +4408,8 @@ static void ext4_mb_collect_stats(struct ext4_allocation_context *ac) if (ac->ac_g_ex.fe_start == ac->ac_b_ex.fe_start && ac->ac_g_ex.fe_group == ac->ac_b_ex.fe_group) atomic_inc(&sbi->s_bal_goals); + if (ac->ac_f_ex.fe_len == ac->ac_g_ex.fe_len) + atomic_inc(&sbi->s_bal_len_goals); if (ac->ac_found > sbi->s_mb_max_to_scan) atomic_inc(&sbi->s_bal_breaks); } From patchwork Thu May 25 11:33:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3619FC7EE2F for ; Thu, 25 May 2023 11:34:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240957AbjEYLeQ (ORCPT ); Thu, 25 May 2023 07:34:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233615AbjEYLdn (ORCPT ); Thu, 25 May 2023 07:33:43 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F39D912F; Thu, 25 May 2023 04:33:41 -0700 (PDT) Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PB31U2026246; Thu, 25 May 2023 11:33:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=hlzOi87vQ6A+8zabEmuFbG1ZOr9eYKzr/E32RkRRSMs=; b=e5nj081MvRoVUe1fckayiLF5ho+5gKoftg3rL7P66matGFvA79IoJGK3m4hY+WGLmSy9 X8Hzy6p7UM6izDoaTqQ0wla2H0SzFqA6XO28cBy8lqsW6AtWNXlBszzQbBTcJtVY3F4z 6pAgGIKS6CqLwn5vwfPILjv6BQLL2zC2MR2kT6btKvHMW4wiermbvGo2ZbcjUGNrpgt1 5YoClimiXtRJPkziPVjlFWqLYp74XIW/mTnIJ4RuNDhF+B3UNdUDp/tTb0xKocMqcY8a A8Z3yNT73nqsT5PM/esidyqsnbcX/g+rAHcZGmz89pxPnmL4+gfZnIMlnyUtLPnDfG7F eA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6ey0r90-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:34 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PB3bNh028783; Thu, 25 May 2023 11:33:33 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6ey0r8d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:33 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P4mNUG007648; Thu, 25 May 2023 11:33:31 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3qppc3jduv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:31 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXTxq18023056 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:29 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 66B812004B; Thu, 25 May 2023 11:33:29 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7317720040; Thu, 25 May 2023 11:33:27 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:27 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 08/13] ext4: Avoid scanning smaller extents in BG during CR1 Date: Thu, 25 May 2023 17:03:02 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: HAIt1W54NP6mZBnLszOc69MHT05C_kgN X-Proofpoint-GUID: fUmjCosOw7NFolhXAk-h29IsQAXj_BtC X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=756 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 malwarescore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250092 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When we are inside ext4_mb_complex_scan_group() in CR1, we can be sure that this group has atleast 1 big enough continuous free extent to satisfy our request because (free / fragments) > goal length. Hence, instead of wasting time looping over smaller free extents, only try to consider the free extent if we are sure that it has enough continuous free space to satisfy goal length. This is particularly useful when scanning highly fragmented BGs in CR1 as, without this patch, the allocator might stop scanning early before reaching the big enough free extent (due to ac_found > mb_max_to_scan) which causes us to uncessarily trim the request. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 8786aa0dd57a..855fb7d440f1 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2307,7 +2307,7 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac, struct super_block *sb = ac->ac_sb; void *bitmap = e4b->bd_bitmap; struct ext4_free_extent ex; - int i; + int i, j, freelen; int free; free = e4b->bd_info->bb_free; @@ -2334,6 +2334,23 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac, break; } + if (ac->ac_criteria < CR2) { + /* + * In CR1, we are sure that this group will + * have a large enough continuous free extent, so skip + * over the smaller free extents + */ + j = mb_find_next_bit(bitmap, + EXT4_CLUSTERS_PER_GROUP(sb), i); + freelen = j - i; + + if (freelen < ac->ac_g_ex.fe_len) { + i = j; + free -= freelen; + continue; + } + } + mb_find_extent(e4b, i, ac->ac_g_ex.fe_len, &ex); if (WARN_ON(ex.fe_len <= 0)) break; From patchwork Thu May 25 11:33:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7970FC77B7E for ; Thu, 25 May 2023 11:34:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241048AbjEYLeU (ORCPT ); Thu, 25 May 2023 07:34:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241003AbjEYLdw (ORCPT ); Thu, 25 May 2023 07:33:52 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09EE21A2; Thu, 25 May 2023 04:33:46 -0700 (PDT) Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBHmMp013908; Thu, 25 May 2023 11:33:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=Z6fcc6NNSqJTClFZZHh8m3+GU7KSwuMd/F4hQa7lo3s=; b=SWU4L2q/d4QhD8WSxBNmXkwpD/f6we2yNtQYVFjKFX2aQ+Ff5VnrQrb7hRq6uOvfS43E OLYUq39Gzy5TQBmejg7fq02RyKyRJnYKkbGg4+Pg5ddGZq3VSsViLmevoQVJrcXbcoyY hDsngZaaBbDDfG/2JxooL3jwIWSnkn9jlcStly+0aVFyFSJpRkitY75L9ENaSHy3Nhn9 K4Pb34RnQWNV/sTou/eBXq/dOBq6LuVNkpB7zArhLGa/ZBD7uGKn3vs3AQaWyRzmk7bg EQocvIb1vFLoeWGaKmToDrjmT2ThIp4HUanqz70OaifZr1LMXKAN343rryQidBSZoxWE Lg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6p60c6k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:37 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBORPr004252; Thu, 25 May 2023 11:33:36 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6p60c4k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:36 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P3J8eu014849; Thu, 25 May 2023 11:33:34 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3qppcuadwn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:34 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXVjE43385460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:31 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B031E20043; Thu, 25 May 2023 11:33:31 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C9A5220040; Thu, 25 May 2023 11:33:29 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:29 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 09/13] ext4: Don't skip prefetching BLOCK_UNINIT groups Date: Thu, 25 May 2023 17:03:03 +0530 Message-Id: <7e648187c5cd4174ba798fc8305dfc7bedd8e89a.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: a9414PY9CEuaZhH2xPIKN9MFkU620ZSj X-Proofpoint-GUID: gqLPfRKrArgDeGCJVFrlin0dtcBN1fa5 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 adultscore=0 suspectscore=0 mlxscore=0 malwarescore=0 spamscore=0 bulkscore=0 phishscore=0 impostorscore=0 mlxlogscore=994 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250096 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Currently, ext4_mb_prefetch() and ext4_mb_prefetch_fini() skip BLOCK_UNINIT groups since fetching their bitmaps doesn't need disk IO. As a consequence, we end not initializing the buddy structures and CR0/1 lists for these BGs, even though it can be done without any disk IO overhead. Hence, don't skip such BGs during prefetch and prefetch_fini. This improves the accuracy of CR0/1 allocation as earlier, we could have essentially empty BLOCK_UNINIT groups being ignored by CR0/1 due to their buddy not being initialized, leading to slower CR2 allocations. With this patch CR0/1 will be able to discover these groups as well, thus improving performance. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 855fb7d440f1..b35c408cccc2 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2587,9 +2587,7 @@ ext4_group_t ext4_mb_prefetch(struct super_block *sb, ext4_group_t group, */ if (gdp && grp && !EXT4_MB_GRP_TEST_AND_SET_READ(grp) && EXT4_MB_GRP_NEED_INIT(grp) && - ext4_free_group_clusters(sb, gdp) > 0 && - !(ext4_has_group_desc_csum(sb) && - (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) { + ext4_free_group_clusters(sb, gdp) > 0 ) { bh = ext4_read_block_bitmap_nowait(sb, group, true); if (bh && !IS_ERR(bh)) { if (!buffer_uptodate(bh) && cnt) @@ -2630,9 +2628,7 @@ void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group, grp = ext4_get_group_info(sb, group); if (gdp && grp && EXT4_MB_GRP_NEED_INIT(grp) && - ext4_free_group_clusters(sb, gdp) > 0 && - !(ext4_has_group_desc_csum(sb) && - (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) { + ext4_free_group_clusters(sb, gdp) > 0) { if (ext4_mb_init_group(sb, group, GFP_NOFS)) break; } From patchwork Thu May 25 11:33:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61C33C7EE32 for ; Thu, 25 May 2023 11:34:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233615AbjEYLeT (ORCPT ); Thu, 25 May 2023 07:34:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240984AbjEYLdr (ORCPT ); Thu, 25 May 2023 07:33:47 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B92341A1; Thu, 25 May 2023 04:33:45 -0700 (PDT) Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBHmMs013908; Thu, 25 May 2023 11:33:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=bgdXhjl65x7+l2OGSSsYoGp2ZCBpNnTjVWIKj+On1qM=; b=CDNtnI6ElzJZQvCVaujLjtEMtWBj/Nks5UVyGE/ghXMI2mdYWrAAYExkMdVuCULuFX8B fWTGdjT9poWnw5g41QQTFcP207SJO9l3byGwNZt/u9kNTUrKw44wjtQVsh1eMTTIBaky 81x6LCOakyW/kt0WFPoNCEFiZ5H/zr+q6r2CCI2F0OpmfSjfvwF3BDcTRaXng5K8r6wG cim2nSLTjl4PwvvD5mq1L0kJGft4HYy0zHWec+qDt6x2yck+gJV14fLsbNCHqyVYuKxF 5WweCl7NybAumP+XROtzlXozJYr1SHkum3zP2ZNghAlkx9cNmVcy2vQ/0C2ErwWV+0h6 WA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6p60c7p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:39 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBTF2R018004; Thu, 25 May 2023 11:33:38 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6p60c6n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:38 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P2E9oR001905; Thu, 25 May 2023 11:33:36 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3qppdk2dst-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:36 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXYtw62063088 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:34 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2B5A020043; Thu, 25 May 2023 11:33:34 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2CD0C20040; Thu, 25 May 2023 11:33:32 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:31 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 10/13] ext4: Ensure ext4_mb_prefetch_fini() is called for all prefetched BGs Date: Thu, 25 May 2023 17:03:04 +0530 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: MhX1_wziKztF6Z7ovYNvKBhCISpLw5hM X-Proofpoint-GUID: hA6Pdv86fDJjuckfV-ZEOD-Nrpg5odjj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 adultscore=0 suspectscore=0 mlxscore=0 malwarescore=0 spamscore=0 bulkscore=0 phishscore=0 impostorscore=0 mlxlogscore=908 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250096 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Before this patch, the call stack in ext4_run_li_request is as follows: /* * nr = no. of BGs we want to fetch (=s_mb_prefetch) * prefetch_ios = no. of BGs not uptodate after * ext4_read_block_bitmap_nowait() */ next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios); ext4_mb_prefetch_fini(sb, next_group prefetch_ios); ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in range [next_group - prefetch_ios, next_group). This is incorrect since sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to incorrectly ignore some of the BGs that might need initialization. This issue is more notable now with the previous patch enabling "fetching" of BLOCK_UNINIT BGs which are marked buffer_uptodate by default. Fix this by passing nr to ext4_mb_prefetch_fini() instead of prefetch_ios so that it considers the right range of groups. Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT groups that would need buddy initialization. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 4 ---- fs/ext4/super.c | 11 ++++------- 2 files changed, 4 insertions(+), 11 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index b35c408cccc2..b6bb314d778e 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2732,8 +2732,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) if ((prefetch_grp == group) && (cr > CR1 || prefetch_ios < sbi->s_mb_prefetch_limit)) { - unsigned int curr_ios = prefetch_ios; - nr = sbi->s_mb_prefetch; if (ext4_has_feature_flex_bg(sb)) { nr = 1 << sbi->s_log_groups_per_flex; @@ -2742,8 +2740,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) } prefetch_grp = ext4_mb_prefetch(sb, group, nr, &prefetch_ios); - if (prefetch_ios == curr_ios) - nr = 0; } /* This now checks without needing the buddy page */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 495f99c10085..37a6fa118ced 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3694,16 +3694,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr) ext4_group_t group = elr->lr_next_group; unsigned int prefetch_ios = 0; int ret = 0; + int nr = EXT4_SB(sb)->s_mb_prefetch; u64 start_time; if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) { - elr->lr_next_group = ext4_mb_prefetch(sb, group, - EXT4_SB(sb)->s_mb_prefetch, &prefetch_ios); - if (prefetch_ios) - ext4_mb_prefetch_fini(sb, elr->lr_next_group, - prefetch_ios); - trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, - prefetch_ios); + elr->lr_next_group = ext4_mb_prefetch(sb, group, nr, &prefetch_ios); + ext4_mb_prefetch_fini(sb, elr->lr_next_group, nr); + trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, nr); if (group >= elr->lr_next_group) { ret = 1; if (elr->lr_first_not_zeroed != ngroups && From patchwork Thu May 25 11:33:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255098 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7696FC7EE2F for ; Thu, 25 May 2023 11:34:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241060AbjEYLeV (ORCPT ); Thu, 25 May 2023 07:34:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241014AbjEYLdx (ORCPT ); Thu, 25 May 2023 07:33:53 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA8691A7; Thu, 25 May 2023 04:33:48 -0700 (PDT) Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PAlSXp016248; Thu, 25 May 2023 11:33:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=Fuw4V0h6Y7p4d6qJwiwd6fcmAPHJrIzLvuvBDJ8fdeA=; b=dPkmr2wX5zErpW1kdFKUSw9AJ6cvK+0ikPzAx418tTka9QkWzGyDB5/BKQRNT/cvV9pC +3yuxl8L0kVvSDUGSp+SoXvil2wSxbfwl/B/IZjWWZkHPh7uZNSAsfgy7RGplUN3qHWt Yx6IP9miiaoDorvwdxqE2KuGVrEYAkao74xFl2TXqZQwsgnbF2W564ujLVfGV8+ddpi/ qJrGJoGILRKKN2YdGLy8wgWPrJrn+Zh6cWit73vB/wrwgSQyROpD7EJvImNg81X3CgG/ lKv3Qijj5He6/VQExz5xoa3daQYIOPezu8fpY80dcAcssKDxgH5DRaXYFORCckkXyo+g 7g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt67w14ku-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:41 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBBRYC013230; Thu, 25 May 2023 11:33:41 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt67w14k8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:41 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34PAuch0022314; Thu, 25 May 2023 11:33:39 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3qppcuadwp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:39 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXaxp38863136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:36 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7651F2004B; Thu, 25 May 2023 11:33:36 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 82E2220040; Thu, 25 May 2023 11:33:34 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:34 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 11/13] ext4: Abstract out logic to search average fragment list Date: Thu, 25 May 2023 17:03:05 +0530 Message-Id: <0ad6ab5b9000b6ce11cbfc64ef0f73684ad85710.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: qxy2P6yHD8T4kv6NLAQSa-42njPLjf-3 X-Proofpoint-GUID: ZirREitoO_bZa0wCRapLScgI11_5BT3M X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 impostorscore=0 clxscore=1015 mlxscore=0 suspectscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 malwarescore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250092 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Make the logic of searching average fragment list of a given order reusable by abstracting it out to a differnet function. This will also avoid code duplication in upcoming patches. No functional changes. Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) Reviewed-by: Jan Kara --- fs/ext4/mballoc.c | 51 ++++++++++++++++++++++++++++++----------------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index b6bb314d778e..fd29ee02685d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -904,6 +904,37 @@ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, } } +/* + * Find a suitable group of given order from the average fragments list. + */ +static struct ext4_group_info * +ext4_mb_find_good_group_avg_frag_lists(struct ext4_allocation_context *ac, int order) +{ + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + struct list_head *frag_list = &sbi->s_mb_avg_fragment_size[order]; + rwlock_t *frag_list_lock = &sbi->s_mb_avg_fragment_size_locks[order]; + struct ext4_group_info *grp = NULL, *iter; + enum criteria cr = ac->ac_criteria; + + if (list_empty(frag_list)) + return NULL; + read_lock(frag_list_lock); + if (list_empty(frag_list)) { + read_unlock(frag_list_lock); + return NULL; + } + list_for_each_entry(iter, frag_list, bb_avg_fragment_size_node) { + if (sbi->s_mb_stats) + atomic64_inc(&sbi->s_bal_cX_groups_considered[cr]); + if (likely(ext4_mb_good_group(ac, iter->bb_group, cr))) { + grp = iter; + break; + } + } + read_unlock(frag_list_lock); + return grp; +} + /* * Choose next group by traversing average fragment size list of suitable * order. Updates *new_cr if cr level needs an update. @@ -912,7 +943,7 @@ static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) { struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); - struct ext4_group_info *grp = NULL, *iter; + struct ext4_group_info *grp = NULL; int i; if (unlikely(ac->ac_flags & EXT4_MB_CR1_OPTIMIZED)) { @@ -922,23 +953,7 @@ static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, for (i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len); i < MB_NUM_ORDERS(ac->ac_sb); i++) { - if (list_empty(&sbi->s_mb_avg_fragment_size[i])) - continue; - read_lock(&sbi->s_mb_avg_fragment_size_locks[i]); - if (list_empty(&sbi->s_mb_avg_fragment_size[i])) { - read_unlock(&sbi->s_mb_avg_fragment_size_locks[i]); - continue; - } - list_for_each_entry(iter, &sbi->s_mb_avg_fragment_size[i], - bb_avg_fragment_size_node) { - if (sbi->s_mb_stats) - atomic64_inc(&sbi->s_bal_cX_groups_considered[CR1]); - if (likely(ext4_mb_good_group(ac, iter->bb_group, CR1))) { - grp = iter; - break; - } - } - read_unlock(&sbi->s_mb_avg_fragment_size_locks[i]); + grp = ext4_mb_find_good_group_avg_frag_lists(ac, i); if (grp) break; } From patchwork Thu May 25 11:33:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76E4DC77B7A for ; Thu, 25 May 2023 11:34:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241066AbjEYLeX (ORCPT ); Thu, 25 May 2023 07:34:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241022AbjEYLdx (ORCPT ); Thu, 25 May 2023 07:33:53 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C96681B5; Thu, 25 May 2023 04:33:50 -0700 (PDT) Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBC3lr018847; Thu, 25 May 2023 11:33:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=DIOaOkJM2CitQZkFoXK1Wtt4cd6PoLXJWw437d3izMM=; b=kvLv6a6T1Bs194ZHbMTXmgxGKBUFJ5snTMIWI8ed5D4uGJ/HsHnqz2cqEOulqnLbf+S4 0Wwc63ayhrtBgOaE8DyQJf4eJXWbZvNNylj8R1nwbTV2gGqkfcDyL4aJvliLb/me/9pm ABFfoxm/G3lx/3hZnKUwZ/XkPsgvHShCWRze5xVmWwMgczx95VMYYn3QK0ZA8l6R5oFb 4bJ/twW1odsVAL8PnRI9LUhlb7Yi3rcS2ZF6UI21vEV2fZ+eSlONcNIfTs1Pad5ecxJI tceG+++pEUXhb7oJg0336wbfHQrZ5ek5ALgHoQGWPyxGLAQkxSOThSpol/qy8XAg0o58 XQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6kb0hb8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:44 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34PBT8c9012158; Thu, 25 May 2023 11:33:43 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6kb0h9w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:43 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34PAmGTd028996; Thu, 25 May 2023 11:33:41 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3qppcsa0hy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:41 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXc0H37028450 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:39 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CD3A72004D; Thu, 25 May 2023 11:33:38 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D9FFE20040; Thu, 25 May 2023 11:33:36 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:36 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi , Ritesh Harjani Subject: [PATCH 12/13] ext4: Add allocation criteria 1.5 (CR1_5) Date: Thu, 25 May 2023 17:03:06 +0530 Message-Id: <9460de03128d7aa802e6e211777383caa4a57a7d.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: DxjhnRO8bU4V46OIsYorWnWxXobM55lX X-Proofpoint-GUID: swzkau1eCfRfhz1bFSfgAzJJYZ5JETWO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 spamscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 adultscore=0 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250096 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org CR1_5 aims to optimize allocations which can't be satisfied in CR1. The fact that we couldn't find a group in CR1 suggests that it would be difficult to find a continuous extent to compleltely satisfy our allocations. So before falling to the slower CR2, in CR1.5 we proactively trim the the preallocations so we can find a group with (free / fragments) big enough. This speeds up our allocation at the cost of slightly reduced preallocation. The patch also adds a new sysfs tunable: * /sys/fs/ext4//mb_cr1_5_max_trim_order This controls how much CR1.5 can trim a request before falling to CR2. For example, for a request of order 7 and max trim order 2, CR1.5 can trim this upto order 5. Suggested-by: Ritesh Harjani (IBM) Signed-off-by: Ojaswin Mujoo Reviewed-by: Ritesh Harjani (IBM) ext4 squash --- fs/ext4/ext4.h | 8 ++- fs/ext4/mballoc.c | 137 +++++++++++++++++++++++++++++++++--- fs/ext4/mballoc.h | 13 ++++ fs/ext4/sysfs.c | 2 + include/trace/events/ext4.h | 2 + 5 files changed, 151 insertions(+), 11 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 8bb1edcd2dda..0d30255cca2b 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -133,13 +133,14 @@ enum SHIFT_DIRECTION { * criteria the slower the allocation. We start at lower criterias and keep * falling back to higher ones if we are not able to find any blocks. */ -#define EXT4_MB_NUM_CRS 4 +#define EXT4_MB_NUM_CRS 5 /* * All possible allocation criterias for mballoc */ enum criteria { CR0, CR1, + CR1_5, CR2, CR3, }; @@ -185,6 +186,9 @@ enum criteria { #define EXT4_MB_CR0_OPTIMIZED 0x8000 /* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */ #define EXT4_MB_CR1_OPTIMIZED 0x00010000 +/* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */ +#define EXT4_MB_CR1_5_OPTIMIZED 0x00020000 + struct ext4_allocation_request { /* target inode for block we're allocating */ struct inode *inode; @@ -1547,6 +1551,7 @@ struct ext4_sb_info { unsigned long s_mb_last_start; unsigned int s_mb_prefetch; unsigned int s_mb_prefetch_limit; + unsigned int s_mb_cr1_5_max_trim_order; /* stats for buddy allocator */ atomic_t s_bal_reqs; /* number of reqs with len > 1 */ @@ -1561,6 +1566,7 @@ struct ext4_sb_info { atomic_t s_bal_2orders; /* 2^order hits */ atomic_t s_bal_cr0_bad_suggestions; atomic_t s_bal_cr1_bad_suggestions; + atomic_t s_bal_cr1_5_bad_suggestions; atomic64_t s_bal_cX_groups_considered[EXT4_MB_NUM_CRS]; atomic64_t s_bal_cX_hits[EXT4_MB_NUM_CRS]; atomic64_t s_bal_cX_failed[EXT4_MB_NUM_CRS]; /* cX loop didn't find blocks */ diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index fd29ee02685d..6f48f2fb843c 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -165,6 +165,14 @@ * equal to request size using our average fragment size group lists (data * structure 2) in O(1) time. * + * At CR1.5 (aka CR1_5), we aim to optimize allocations which can't be satisfied + * in CR1. The fact that we couldn't find a group in CR1 suggests that there is + * no BG that has average fragment size > goal length. So before falling to the + * slower CR2, in CR1.5 we proactively trim goal length and then use the same + * fragment lists as CR1 to find a BG with a big enough average fragment size. + * This increases the chances of finding a suitable block group in O(1) time and + * results * in faster allocation at the cost of reduced size of allocation. + * * If "mb_optimize_scan" mount option is not set, mballoc traverses groups in * linear order which requires O(N) search time for each CR0 and CR1 phase. * @@ -962,6 +970,91 @@ static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, *group = grp->bb_group; ac->ac_flags |= EXT4_MB_CR1_OPTIMIZED; } else { + *new_cr = CR1_5; + } +} + +/* + * We couldn't find a group in CR1 so try to find the highest free fragment + * order we have and proactively trim the goal request length to that order to + * find a suitable group faster. + * + * This optimizes allocation speed at the cost of slightly reduced + * preallocations. However, we make sure that we don't trim the request too + * much and fall to CR2 in that case. + */ +static void ext4_mb_choose_next_group_cr1_5(struct ext4_allocation_context *ac, + enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) +{ + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + struct ext4_group_info *grp = NULL; + int i, order, min_order; + unsigned long num_stripe_clusters = 0; + + if (unlikely(ac->ac_flags & EXT4_MB_CR1_5_OPTIMIZED)) { + if (sbi->s_mb_stats) + atomic_inc(&sbi->s_bal_cr1_5_bad_suggestions); + } + + /* + * mb_avg_fragment_size_order() returns order in a way that makes + * retrieving back the length using (1 << order) inaccurate. Hence, use + * fls() instead since we need to know the actual length while modifying + * goal length. + */ + order = fls(ac->ac_g_ex.fe_len); + min_order = order - sbi->s_mb_cr1_5_max_trim_order; + if (min_order < 0) + min_order = 0; + + if (1 << min_order < ac->ac_o_ex.fe_len) + min_order = fls(ac->ac_o_ex.fe_len) + 1; + + if (sbi->s_stripe > 0) { + /* + * We are assuming that stripe size is always a multiple of + * cluster ratio otherwise __ext4_fill_super exists early. + */ + num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe); + if (1 << min_order < num_stripe_clusters) + min_order = fls(num_stripe_clusters); + } + + for (i = order; i >= min_order; i--) { + int frag_order; + /* + * Scale down goal len to make sure we find something + * in the free fragments list. Basically, reduce + * preallocations. + */ + ac->ac_g_ex.fe_len = 1 << i; + + if (num_stripe_clusters > 0) { + /* + * Try to round up the adjusted goal to stripe size + * (in cluster units) multiple for efficiency. + * + * XXX: Is s->stripe always a power of 2? In that case + * we can use the faster round_up() variant. + */ + ac->ac_g_ex.fe_len = roundup(ac->ac_g_ex.fe_len, + num_stripe_clusters); + } + + frag_order = mb_avg_fragment_size_order(ac->ac_sb, + ac->ac_g_ex.fe_len); + + grp = ext4_mb_find_good_group_avg_frag_lists(ac, frag_order); + if (grp) + break; + } + + if (grp) { + *group = grp->bb_group; + ac->ac_flags |= EXT4_MB_CR1_5_OPTIMIZED; + } else { + /* Reset goal length to original goal length before falling into CR2 */ + ac->ac_g_ex.fe_len = ac->ac_orig_goal_len; *new_cr = CR2; } } @@ -1028,6 +1121,8 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac, ext4_mb_choose_next_group_cr0(ac, new_cr, group, ngroups); } else if (*new_cr == CR1) { ext4_mb_choose_next_group_cr1(ac, new_cr, group, ngroups); + } else if (*new_cr == CR1_5) { + ext4_mb_choose_next_group_cr1_5(ac, new_cr, group, ngroups); } else { /* * TODO: For CR=2, we can arrange groups in an rb tree sorted by @@ -2351,7 +2446,7 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac, if (ac->ac_criteria < CR2) { /* - * In CR1, we are sure that this group will + * In CR1 and CR1_5, we are sure that this group will * have a large enough continuous free extent, so skip * over the smaller free extents */ @@ -2481,6 +2576,7 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, return true; case CR1: + case CR1_5: if ((free / fragments) >= ac->ac_g_ex.fe_len) return true; break; @@ -2745,7 +2841,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) * spend a lot of time loading imperfect groups */ if ((prefetch_grp == group) && - (cr > CR1 || + (cr > CR1_5 || prefetch_ios < sbi->s_mb_prefetch_limit)) { nr = sbi->s_mb_prefetch; if (ext4_has_feature_flex_bg(sb)) { @@ -2785,8 +2881,8 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) ac->ac_groups_scanned++; if (cr == CR0) ext4_mb_simple_scan_group(ac, &e4b); - else if (cr == CR1 && sbi->s_stripe && - !(ac->ac_g_ex.fe_len % sbi->s_stripe)) + else if ((cr == CR1 || cr == CR1_5) && sbi->s_stripe && + !(ac->ac_g_ex.fe_len % sbi->s_stripe)) ext4_mb_scan_aligned(ac, &e4b); else ext4_mb_complex_scan_group(ac, &e4b); @@ -2800,6 +2896,11 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) /* Processed all groups and haven't found blocks */ if (sbi->s_mb_stats && i == ngroups) atomic64_inc(&sbi->s_bal_cX_failed[cr]); + + if (i == ngroups && ac->ac_criteria == CR1_5) + /* Reset goal length to original goal length before + * falling into CR2 */ + ac->ac_g_ex.fe_len = ac->ac_orig_goal_len; } if (ac->ac_b_ex.fe_len > 0 && ac->ac_status != AC_STATUS_FOUND && @@ -2969,6 +3070,16 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) seq_printf(seq, "\t\tbad_suggestions: %u\n", atomic_read(&sbi->s_bal_cr1_bad_suggestions)); + seq_puts(seq, "\tcr1.5_stats:\n"); + seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR1_5])); + seq_printf(seq, "\t\tgroups_considered: %llu\n", + atomic64_read(&sbi->s_bal_cX_groups_considered[CR1_5])); + seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR1_5])); + seq_printf(seq, "\t\tuseless_loops: %llu\n", + atomic64_read(&sbi->s_bal_cX_failed[CR1_5])); + seq_printf(seq, "\t\tbad_suggestions: %u\n", + atomic_read(&sbi->s_bal_cr1_5_bad_suggestions)); + seq_puts(seq, "\tcr2_stats:\n"); seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR2])); seq_printf(seq, "\t\tgroups_considered: %llu\n", @@ -3486,6 +3597,8 @@ int ext4_mb_init(struct super_block *sb) sbi->s_mb_stats = MB_DEFAULT_STATS; sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD; sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS; + sbi->s_mb_cr1_5_max_trim_order = MB_DEFAULT_CR1_5_TRIM_ORDER; + /* * The default group preallocation is 512, which for 4k block * sizes translates to 2 megabytes. However for bigalloc file @@ -4389,6 +4502,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, * placement or satisfy big request as is */ ac->ac_g_ex.fe_logical = start; ac->ac_g_ex.fe_len = EXT4_NUM_B2C(sbi, size); + ac->ac_orig_goal_len = ac->ac_g_ex.fe_len; /* define goal start in order to merge */ if (ar->pright && (ar->lright == (start + size)) && @@ -4432,8 +4546,10 @@ static void ext4_mb_collect_stats(struct ext4_allocation_context *ac) if (ac->ac_g_ex.fe_start == ac->ac_b_ex.fe_start && ac->ac_g_ex.fe_group == ac->ac_b_ex.fe_group) atomic_inc(&sbi->s_bal_goals); - if (ac->ac_f_ex.fe_len == ac->ac_g_ex.fe_len) + /* did we allocate as much as normalizer originally wanted? */ + if (ac->ac_f_ex.fe_len == ac->ac_orig_goal_len) atomic_inc(&sbi->s_bal_len_goals); + if (ac->ac_found > sbi->s_mb_max_to_scan) atomic_inc(&sbi->s_bal_breaks); } @@ -4886,7 +5002,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa = ac->ac_pa; - if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len) { + if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { int new_bex_start; int new_bex_end; @@ -4901,14 +5017,14 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) * fragmentation in check while ensuring logical range of best * extent doesn't overflow out of goal extent: * - * 1. Check if best ex can be kept at end of goal and still - * cover original start + * 1. Check if best ex can be kept at end of goal (before + * cr_best_avail trimmed it) and still cover original start * 2. Else, check if best ex can be kept at start of goal and * still cover original start * 3. Else, keep the best ex at start of original request. */ new_bex_end = ac->ac_g_ex.fe_logical + - EXT4_C2B(sbi, ac->ac_g_ex.fe_len); + EXT4_C2B(sbi, ac->ac_orig_goal_len); new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); if (ac->ac_o_ex.fe_logical >= new_bex_start) goto adjust_bex; @@ -4929,7 +5045,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + - EXT4_C2B(sbi, ac->ac_g_ex.fe_len))); + EXT4_C2B(sbi, ac->ac_orig_goal_len))); } pa->pa_lstart = ac->ac_b_ex.fe_logical; @@ -5557,6 +5673,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac, ac->ac_o_ex.fe_start = block; ac->ac_o_ex.fe_len = len; ac->ac_g_ex = ac->ac_o_ex; + ac->ac_orig_goal_len = ac->ac_g_ex.fe_len; ac->ac_flags = ar->flags; /* we have to define context: we'll work with a file or diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index acfdc204e15d..bddc0335c261 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -85,6 +85,13 @@ */ #define MB_DEFAULT_LINEAR_SCAN_THRESHOLD 16 +/* + * The maximum order upto which CR1.5 can trim a particular allocation request. + * Example, if we have an order 7 request and max trim order of 3, CR1.5 can + * trim this upto order 4. + */ +#define MB_DEFAULT_CR1_5_TRIM_ORDER 3 + /* * Number of valid buddy orders */ @@ -179,6 +186,12 @@ struct ext4_allocation_context { /* copy of the best found extent taken before preallocation efforts */ struct ext4_free_extent ac_f_ex; + /* + * goal len can change in CR1.5, so save the original len. This is + * used while adjusting the PA window and for accounting. + */ + ext4_grpblk_t ac_orig_goal_len; + __u32 ac_groups_considered; __u32 ac_flags; /* allocation hints */ __u16 ac_groups_scanned; diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 3042bc605bbf..4a5c08c8dddb 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -223,6 +223,7 @@ EXT4_RW_ATTR_SBI_UI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.int EXT4_RW_ATTR_SBI_UI(warning_ratelimit_burst, s_warning_ratelimit_state.burst); EXT4_RW_ATTR_SBI_UI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval); EXT4_RW_ATTR_SBI_UI(msg_ratelimit_burst, s_msg_ratelimit_state.burst); +EXT4_RW_ATTR_SBI_UI(mb_cr1_5_max_trim_order, s_mb_cr1_5_max_trim_order); #ifdef CONFIG_EXT4_DEBUG EXT4_RW_ATTR_SBI_UL(simulate_fail, s_simulate_fail); #endif @@ -273,6 +274,7 @@ static struct attribute *ext4_attrs[] = { ATTR_LIST(warning_ratelimit_burst), ATTR_LIST(msg_ratelimit_interval_ms), ATTR_LIST(msg_ratelimit_burst), + ATTR_LIST(mb_cr1_5_max_trim_order), ATTR_LIST(errors_count), ATTR_LIST(warning_count), ATTR_LIST(msg_count), diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index f062147ca32b..7ea9b4fcb21f 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -122,6 +122,7 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX); TRACE_DEFINE_ENUM(CR0); TRACE_DEFINE_ENUM(CR1); +TRACE_DEFINE_ENUM(CR1_5); TRACE_DEFINE_ENUM(CR2); TRACE_DEFINE_ENUM(CR3); @@ -129,6 +130,7 @@ TRACE_DEFINE_ENUM(CR3); __print_symbolic(cr, \ { CR0, "CR0" }, \ { CR1, "CR1" }, \ + { CR1_5, "CR1.5" } \ { CR2, "CR2" }, \ { CR3, "CR3" }) From patchwork Thu May 25 11:33:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ojaswin Mujoo X-Patchwork-Id: 13255100 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D585C77B7E for ; Thu, 25 May 2023 11:34:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241078AbjEYLeZ (ORCPT ); Thu, 25 May 2023 07:34:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241038AbjEYLd4 (ORCPT ); Thu, 25 May 2023 07:33:56 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E834E41; Thu, 25 May 2023 04:33:52 -0700 (PDT) Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PBHmWu013901; Thu, 25 May 2023 11:33:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=uq99HM4/dup6LbOFfDw5vnK3Qa0PQoP8byc+TXXdEfA=; b=dpgcpD6OnffGm9G+FBpRpEMGYYq552k/qdwNWjWGBMxFx03h48lFsn/CZwa6/BXutR06 W/dB88qRqls2azQpFJndTcLf5PzU6/pAp8bek4C+v/ZN3bdk5sJbPFJztv4GtVplIXWS m/YvpUK+2JvxBX7BCyCrXECDmkVM3LADCo7Q90zQsSkHW8k5C/ekUtH+Lwsy7MxUJ+3n PCyb8DBje+S2reHkfWzpcDZ9R2N5A1gyLE8tbdMxxIzZqugJAqO0j5/0J/3VsmCZOCnx 2kGcrG2vZYOZf3JosQRcJmGcoyZf1RV/7IXSZI1iGN9+xu7j1rDI1FkwNUguspqBdBrr Pg== Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qt6p60cce-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:45 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34P6gQXN017669; Thu, 25 May 2023 11:33:43 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3qppa4t0v8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 May 2023 11:33:43 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34PBXfTE61931832 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 May 2023 11:33:41 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 16AE620043; Thu, 25 May 2023 11:33:41 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3C56520040; Thu, 25 May 2023 11:33:39 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.in.ibm.com (unknown [9.109.253.169]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 25 May 2023 11:33:39 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , Kemeng Shi Subject: [PATCH 13/13] ext4: Give symbolic names to mballoc criterias Date: Thu, 25 May 2023 17:03:07 +0530 Message-Id: <00a2f84d749368bf4d7e6fb192aaa80e8f8b77b2.1685009579.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Vdq6-6jm7MTiz6jFVm0XFgDf2dV_GzSE X-Proofpoint-GUID: Vdq6-6jm7MTiz6jFVm0XFgDf2dV_GzSE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-25_06,2023-05-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 adultscore=0 suspectscore=0 mlxscore=0 malwarescore=0 spamscore=0 bulkscore=0 phishscore=0 impostorscore=0 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305250096 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org mballoc criterias have historically been called by numbers like CR0, CR1... however this makes it confusing to understand what each criteria is about. Change these criterias from numbers to symbolic names and add relevant comments. While we are at it, also reformat and add some comments to ext4_seq_mb_stats_show() for better readability. Additionally, define CR_FAST which signifies the criteria below which we can make quicker decisions like: * quitting early if (free block < requested len) * avoiding to scan free extents smaller than required len. * avoiding to initialize buddy cache and work with existing cache * limiting prefetches Suggested-by: Jan Kara Signed-off-by: Ojaswin Mujoo --- fs/ext4/ext4.h | 55 ++++++-- fs/ext4/mballoc.c | 269 ++++++++++++++++++++---------------- fs/ext4/mballoc.h | 8 +- fs/ext4/sysfs.c | 4 +- include/trace/events/ext4.h | 26 ++-- 5 files changed, 212 insertions(+), 150 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 0d30255cca2b..cd41591ddb22 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -135,16 +135,45 @@ enum SHIFT_DIRECTION { */ #define EXT4_MB_NUM_CRS 5 /* - * All possible allocation criterias for mballoc + * All possible allocation criterias for mballoc. Lower are faster. */ enum criteria { - CR0, - CR1, - CR1_5, - CR2, - CR3, + /* + * Used when number of blocks needed is a power of 2. This doesn't + * trigger any disk IO except prefetch and is the fastest criteria. + */ + CR_POWER2_ALIGNED, + + /* + * Tries to lookup in-memory data structures to find the most suitable + * group that satisfies goal request. No disk IO except block prefetch. + */ + CR_GOAL_LEN_FAST, + + /* + * Same as CR_GOAL_LEN_FAST but is allowed to reduce the goal length to + * the best available length for faster allocation. + */ + CR_BEST_AVAIL_LEN, + + /* + * Reads each block group sequentially, performing disk IO if necessary, to + * find find_suitable block group. Tries to allocate goal length but might trim + * the request if nothing is found after enough tries. + */ + CR_GOAL_LEN_SLOW, + + /* + * Finds the first free set of blocks and allocates those. This is only + * used in rare cases when CR_GOAL_LEN_SLOW also fails to allocate + * anything. + */ + CR_ANY_FREE, }; +/* criteria below which we use fast block scanning and avoid unnecessary IO */ +#define CR_FAST CR_GOAL_LEN_SLOW + /* * Flags used in mballoc's allocation_context flags field. * @@ -183,11 +212,11 @@ enum criteria { /* Do strict check for free blocks while retrying block allocation */ #define EXT4_MB_STRICT_CHECK 0x4000 /* Large fragment size list lookup succeeded at least once for cr = 0 */ -#define EXT4_MB_CR0_OPTIMIZED 0x8000 +#define EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED 0x8000 /* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */ -#define EXT4_MB_CR1_OPTIMIZED 0x00010000 +#define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED 0x00010000 /* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */ -#define EXT4_MB_CR1_5_OPTIMIZED 0x00020000 +#define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED 0x00020000 struct ext4_allocation_request { /* target inode for block we're allocating */ @@ -1551,7 +1580,7 @@ struct ext4_sb_info { unsigned long s_mb_last_start; unsigned int s_mb_prefetch; unsigned int s_mb_prefetch_limit; - unsigned int s_mb_cr1_5_max_trim_order; + unsigned int s_mb_best_avail_max_trim_order; /* stats for buddy allocator */ atomic_t s_bal_reqs; /* number of reqs with len > 1 */ @@ -1564,9 +1593,9 @@ struct ext4_sb_info { atomic_t s_bal_len_goals; /* len goal hits */ atomic_t s_bal_breaks; /* too long searches */ atomic_t s_bal_2orders; /* 2^order hits */ - atomic_t s_bal_cr0_bad_suggestions; - atomic_t s_bal_cr1_bad_suggestions; - atomic_t s_bal_cr1_5_bad_suggestions; + atomic_t s_bal_p2_aligned_bad_suggestions; + atomic_t s_bal_goal_fast_bad_suggestions; + atomic_t s_bal_best_avail_bad_suggestions; atomic64_t s_bal_cX_groups_considered[EXT4_MB_NUM_CRS]; atomic64_t s_bal_cX_hits[EXT4_MB_NUM_CRS]; atomic64_t s_bal_cX_failed[EXT4_MB_NUM_CRS]; /* cX loop didn't find blocks */ diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 6f48f2fb843c..c5f7a86553e0 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -154,27 +154,31 @@ * structures to decide the order in which groups are to be traversed for * fulfilling an allocation request. * - * At CR0 , we look for groups which have the largest_free_order >= the order - * of the request. We directly look at the largest free order list in the data - * structure (1) above where largest_free_order = order of the request. If that - * list is empty, we look at remaining list in the increasing order of - * largest_free_order. This allows us to perform CR0 lookup in O(1) time. + * At CR_POWER2_ALIGNED , we look for groups which have the largest_free_order + * >= the order of the request. We directly look at the largest free order list + * in the data structure (1) above where largest_free_order = order of the + * request. If that list is empty, we look at remaining list in the increasing + * order of largest_free_order. This allows us to perform CR_POWER2_ALIGNED + * lookup in O(1) time. * - * At CR1, we only consider groups where average fragment size > request - * size. So, we lookup a group which has average fragment size just above or - * equal to request size using our average fragment size group lists (data - * structure 2) in O(1) time. + * At CR_GOAL_LEN_FAST, we only consider groups where + * average fragment size > request size. So, we lookup a group which has average + * fragment size just above or equal to request size using our average fragment + * size group lists (data structure 2) in O(1) time. * - * At CR1.5 (aka CR1_5), we aim to optimize allocations which can't be satisfied - * in CR1. The fact that we couldn't find a group in CR1 suggests that there is - * no BG that has average fragment size > goal length. So before falling to the - * slower CR2, in CR1.5 we proactively trim goal length and then use the same - * fragment lists as CR1 to find a BG with a big enough average fragment size. - * This increases the chances of finding a suitable block group in O(1) time and - * results * in faster allocation at the cost of reduced size of allocation. + * At CR_BEST_AVAIL_LEN, we aim to optimize allocations which can't be satisfied + * in CR_GOAL_LEN_FAST. The fact that we couldn't find a group in + * CR_GOAL_LEN_FAST suggests that there is no BG that has avg + * fragment size > goal length. So before falling to the slower + * CR_GOAL_LEN_SLOW, in CR_BEST_AVAIL_LEN we proactively trim goal length and + * then use the same fragment lists as CR_GOAL_LEN_FAST to find a BG with a big + * enough average fragment size. This increases the chances of finding a + * suitable block group in O(1) time and results in faster allocation at the + * cost of reduced size of allocation. * * If "mb_optimize_scan" mount option is not set, mballoc traverses groups in - * linear order which requires O(N) search time for each CR0 and CR1 phase. + * linear order which requires O(N) search time for each CR_POWER2_ALIGNED and + * CR_GOAL_LEN_FAST phase. * * The regular allocator (using the buddy cache) supports a few tunables. * @@ -359,8 +363,8 @@ * - bitlock on a group (group) * - object (inode/locality) (object) * - per-pa lock (pa) - * - cr0 lists lock (cr0) - * - cr1 tree lock (cr1) + * - cr_power2_aligned lists lock (cr_power2_aligned) + * - cr_goal_len_fast lists lock (cr_goal_len_fast) * * Paths: * - new pa @@ -392,7 +396,7 @@ * * - allocation path (ext4_mb_regular_allocator) * group - * cr0/cr1 + * cr_power2_aligned/cr_goal_len_fast */ static struct kmem_cache *ext4_pspace_cachep; static struct kmem_cache *ext4_ac_cachep; @@ -866,7 +870,7 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp) * Choose next group by traversing largest_free_order lists. Updates *new_cr if * cr level needs an update. */ -static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, +static void ext4_mb_choose_next_group_p2_aligned(struct ext4_allocation_context *ac, enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) { struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); @@ -876,8 +880,8 @@ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, if (ac->ac_status == AC_STATUS_FOUND) return; - if (unlikely(sbi->s_mb_stats && ac->ac_flags & EXT4_MB_CR0_OPTIMIZED)) - atomic_inc(&sbi->s_bal_cr0_bad_suggestions); + if (unlikely(sbi->s_mb_stats && ac->ac_flags & EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED)) + atomic_inc(&sbi->s_bal_p2_aligned_bad_suggestions); grp = NULL; for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) { @@ -892,8 +896,8 @@ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, list_for_each_entry(iter, &sbi->s_mb_largest_free_orders[i], bb_largest_free_order_node) { if (sbi->s_mb_stats) - atomic64_inc(&sbi->s_bal_cX_groups_considered[CR0]); - if (likely(ext4_mb_good_group(ac, iter->bb_group, CR0))) { + atomic64_inc(&sbi->s_bal_cX_groups_considered[CR_POWER2_ALIGNED]); + if (likely(ext4_mb_good_group(ac, iter->bb_group, CR_POWER2_ALIGNED))) { grp = iter; break; } @@ -905,10 +909,10 @@ static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac, if (!grp) { /* Increment cr and search again */ - *new_cr = CR1; + *new_cr = CR_GOAL_LEN_FAST; } else { *group = grp->bb_group; - ac->ac_flags |= EXT4_MB_CR0_OPTIMIZED; + ac->ac_flags |= EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED; } } @@ -947,16 +951,16 @@ ext4_mb_find_good_group_avg_frag_lists(struct ext4_allocation_context *ac, int o * Choose next group by traversing average fragment size list of suitable * order. Updates *new_cr if cr level needs an update. */ -static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, +static void ext4_mb_choose_next_group_goal_fast(struct ext4_allocation_context *ac, enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) { struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); struct ext4_group_info *grp = NULL; int i; - if (unlikely(ac->ac_flags & EXT4_MB_CR1_OPTIMIZED)) { + if (unlikely(ac->ac_flags & EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED)) { if (sbi->s_mb_stats) - atomic_inc(&sbi->s_bal_cr1_bad_suggestions); + atomic_inc(&sbi->s_bal_goal_fast_bad_suggestions); } for (i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len); @@ -968,22 +972,22 @@ static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac, if (grp) { *group = grp->bb_group; - ac->ac_flags |= EXT4_MB_CR1_OPTIMIZED; + ac->ac_flags |= EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED; } else { - *new_cr = CR1_5; + *new_cr = CR_BEST_AVAIL_LEN; } } /* - * We couldn't find a group in CR1 so try to find the highest free fragment + * We couldn't find a group in CR_GOAL_LEN_FAST so try to find the highest free fragment * order we have and proactively trim the goal request length to that order to * find a suitable group faster. * * This optimizes allocation speed at the cost of slightly reduced * preallocations. However, we make sure that we don't trim the request too - * much and fall to CR2 in that case. + * much and fall to CR_GOAL_LEN_SLOW in that case. */ -static void ext4_mb_choose_next_group_cr1_5(struct ext4_allocation_context *ac, +static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context *ac, enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups) { struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); @@ -991,9 +995,9 @@ static void ext4_mb_choose_next_group_cr1_5(struct ext4_allocation_context *ac, int i, order, min_order; unsigned long num_stripe_clusters = 0; - if (unlikely(ac->ac_flags & EXT4_MB_CR1_5_OPTIMIZED)) { + if (unlikely(ac->ac_flags & EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED)) { if (sbi->s_mb_stats) - atomic_inc(&sbi->s_bal_cr1_5_bad_suggestions); + atomic_inc(&sbi->s_bal_best_avail_bad_suggestions); } /* @@ -1003,7 +1007,7 @@ static void ext4_mb_choose_next_group_cr1_5(struct ext4_allocation_context *ac, * goal length. */ order = fls(ac->ac_g_ex.fe_len); - min_order = order - sbi->s_mb_cr1_5_max_trim_order; + min_order = order - sbi->s_mb_best_avail_max_trim_order; if (min_order < 0) min_order = 0; @@ -1051,11 +1055,11 @@ static void ext4_mb_choose_next_group_cr1_5(struct ext4_allocation_context *ac, if (grp) { *group = grp->bb_group; - ac->ac_flags |= EXT4_MB_CR1_5_OPTIMIZED; + ac->ac_flags |= EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED; } else { - /* Reset goal length to original goal length before falling into CR2 */ + /* Reset goal length to original goal length before falling into CR_GOAL_LEN_SLOW */ ac->ac_g_ex.fe_len = ac->ac_orig_goal_len; - *new_cr = CR2; + *new_cr = CR_GOAL_LEN_SLOW; } } @@ -1063,7 +1067,7 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac) { if (unlikely(!test_opt2(ac->ac_sb, MB_OPTIMIZE_SCAN))) return 0; - if (ac->ac_criteria >= CR2) + if (ac->ac_criteria >= CR_GOAL_LEN_SLOW) return 0; if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) return 0; @@ -1117,12 +1121,12 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac, return; } - if (*new_cr == CR0) { - ext4_mb_choose_next_group_cr0(ac, new_cr, group, ngroups); - } else if (*new_cr == CR1) { - ext4_mb_choose_next_group_cr1(ac, new_cr, group, ngroups); - } else if (*new_cr == CR1_5) { - ext4_mb_choose_next_group_cr1_5(ac, new_cr, group, ngroups); + if (*new_cr == CR_POWER2_ALIGNED) { + ext4_mb_choose_next_group_p2_aligned(ac, new_cr, group, ngroups); + } else if (*new_cr == CR_GOAL_LEN_FAST) { + ext4_mb_choose_next_group_goal_fast(ac, new_cr, group, ngroups); + } else if (*new_cr == CR_BEST_AVAIL_LEN) { + ext4_mb_choose_next_group_best_avail(ac, new_cr, group, ngroups); } else { /* * TODO: For CR=2, we can arrange groups in an rb tree sorted by @@ -2444,11 +2448,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac, break; } - if (ac->ac_criteria < CR2) { + if (ac->ac_criteria < CR_FAST) { /* - * In CR1 and CR1_5, we are sure that this group will - * have a large enough continuous free extent, so skip - * over the smaller free extents + * In CR_GOAL_LEN_FAST and CR_BEST_AVAIL_LEN, we are + * sure that this group will have a large enough + * continuous free extent, so skip over the smaller free + * extents */ j = mb_find_next_bit(bitmap, EXT4_CLUSTERS_PER_GROUP(sb), i); @@ -2542,7 +2547,7 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, int flex_size = ext4_flex_bg_size(EXT4_SB(ac->ac_sb)); struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group); - BUG_ON(cr < CR0 || cr >= EXT4_MB_NUM_CRS); + BUG_ON(cr < CR_POWER2_ALIGNED || cr >= EXT4_MB_NUM_CRS); if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp) || !grp)) return false; @@ -2556,7 +2561,7 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, return false; switch (cr) { - case CR0: + case CR_POWER2_ALIGNED: BUG_ON(ac->ac_2order == 0); /* Avoid using the first bg of a flexgroup for data files */ @@ -2575,16 +2580,16 @@ static bool ext4_mb_good_group(struct ext4_allocation_context *ac, return false; return true; - case CR1: - case CR1_5: + case CR_GOAL_LEN_FAST: + case CR_BEST_AVAIL_LEN: if ((free / fragments) >= ac->ac_g_ex.fe_len) return true; break; - case CR2: + case CR_GOAL_LEN_SLOW: if (free >= ac->ac_g_ex.fe_len) return true; break; - case CR3: + case CR_ANY_FREE: return true; default: BUG(); @@ -2625,7 +2630,7 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, free = grp->bb_free; if (free == 0) goto out; - if (cr <= CR2 && free < ac->ac_g_ex.fe_len) + if (cr <= CR_FAST && free < ac->ac_g_ex.fe_len) goto out; if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp))) goto out; @@ -2640,15 +2645,16 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, ext4_get_group_desc(sb, group, NULL); int ret; - /* cr=CR0/CR1 is a very optimistic search to find large - * good chunks almost for free. If buddy data is not - * ready, then this optimization makes no sense. But - * we never skip the first block group in a flex_bg, - * since this gets used for metadata block allocation, - * and we want to make sure we locate metadata blocks - * in the first block group in the flex_bg if possible. + /* + * cr=CR_POWER2_ALIGNED/CR_GOAL_LEN_FAST is a very optimistic + * search to find large good chunks almost for free. If buddy + * data is not ready, then this optimization makes no sense. But + * we never skip the first block group in a flex_bg, since this + * gets used for metadata block allocation, and we want to make + * sure we locate metadata blocks in the first block group in + * the flex_bg if possible. */ - if (cr < CR2 && + if (cr < CR_FAST && (!sbi->s_log_groups_per_flex || ((group & ((1 << sbi->s_log_groups_per_flex) - 1)) != 0)) && !(ext4_has_group_desc_csum(sb) && @@ -2808,10 +2814,10 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) } /* Let's just scan groups to find more-less suitable blocks */ - cr = ac->ac_2order ? CR0 : CR1; + cr = ac->ac_2order ? CR_POWER2_ALIGNED : CR_GOAL_LEN_FAST; /* - * cr == CR0 try to get exact allocation, - * cr == CR3 try to get anything + * cr == CR_POWER2_ALIGNED try to get exact allocation, + * cr == CR_ANY_FREE try to get anything */ repeat: for (; cr < EXT4_MB_NUM_CRS && ac->ac_status == AC_STATUS_CONTINUE; cr++) { @@ -2841,7 +2847,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) * spend a lot of time loading imperfect groups */ if ((prefetch_grp == group) && - (cr > CR1_5 || + (cr >= CR_FAST || prefetch_ios < sbi->s_mb_prefetch_limit)) { nr = sbi->s_mb_prefetch; if (ext4_has_feature_flex_bg(sb)) { @@ -2879,9 +2885,9 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) } ac->ac_groups_scanned++; - if (cr == CR0) + if (cr == CR_POWER2_ALIGNED) ext4_mb_simple_scan_group(ac, &e4b); - else if ((cr == CR1 || cr == CR1_5) && sbi->s_stripe && + else if ((cr == CR_GOAL_LEN_FAST || cr == CR_BEST_AVAIL_LEN) && sbi->s_stripe && !(ac->ac_g_ex.fe_len % sbi->s_stripe)) ext4_mb_scan_aligned(ac, &e4b); else @@ -2897,9 +2903,9 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) if (sbi->s_mb_stats && i == ngroups) atomic64_inc(&sbi->s_bal_cX_failed[cr]); - if (i == ngroups && ac->ac_criteria == CR1_5) + if (i == ngroups && ac->ac_criteria == CR_BEST_AVAIL_LEN) /* Reset goal length to original goal length before - * falling into CR2 */ + * falling into CR_GOAL_LEN_SLOW */ ac->ac_g_ex.fe_len = ac->ac_orig_goal_len; } @@ -2926,7 +2932,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) ac->ac_b_ex.fe_len = 0; ac->ac_status = AC_STATUS_CONTINUE; ac->ac_flags |= EXT4_MB_HINT_FIRST; - cr = CR3; + cr = CR_ANY_FREE; goto repeat; } } @@ -3042,66 +3048,94 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) seq_puts(seq, "mballoc:\n"); if (!sbi->s_mb_stats) { seq_puts(seq, "\tmb stats collection turned off.\n"); - seq_puts(seq, "\tTo enable, please write \"1\" to sysfs file mb_stats.\n"); + seq_puts( + seq, + "\tTo enable, please write \"1\" to sysfs file mb_stats.\n"); return 0; } seq_printf(seq, "\treqs: %u\n", atomic_read(&sbi->s_bal_reqs)); seq_printf(seq, "\tsuccess: %u\n", atomic_read(&sbi->s_bal_success)); - seq_printf(seq, "\tgroups_scanned: %u\n", atomic_read(&sbi->s_bal_groups_scanned)); - - seq_puts(seq, "\tcr0_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR0])); - seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[CR0])); - seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR0])); + seq_printf(seq, "\tgroups_scanned: %u\n", + atomic_read(&sbi->s_bal_groups_scanned)); + + /* CR_POWER2_ALIGNED stats */ + seq_puts(seq, "\tcr_p2_aligned_stats:\n"); + seq_printf(seq, "\t\thits: %llu\n", + atomic64_read(&sbi->s_bal_cX_hits[CR_POWER2_ALIGNED])); + seq_printf( + seq, "\t\tgroups_considered: %llu\n", + atomic64_read( + &sbi->s_bal_cX_groups_considered[CR_POWER2_ALIGNED])); + seq_printf(seq, "\t\textents_scanned: %u\n", + atomic_read(&sbi->s_bal_cX_ex_scanned[CR_POWER2_ALIGNED])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[CR0])); + atomic64_read(&sbi->s_bal_cX_failed[CR_POWER2_ALIGNED])); seq_printf(seq, "\t\tbad_suggestions: %u\n", - atomic_read(&sbi->s_bal_cr0_bad_suggestions)); + atomic_read(&sbi->s_bal_p2_aligned_bad_suggestions)); - seq_puts(seq, "\tcr1_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR1])); + /* CR_GOAL_LEN_FAST stats */ + seq_puts(seq, "\tcr_goal_fast_stats:\n"); + seq_printf(seq, "\t\thits: %llu\n", + atomic64_read(&sbi->s_bal_cX_hits[CR_GOAL_LEN_FAST])); seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[CR1])); - seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR1])); + atomic64_read( + &sbi->s_bal_cX_groups_considered[CR_GOAL_LEN_FAST])); + seq_printf(seq, "\t\textents_scanned: %u\n", + atomic_read(&sbi->s_bal_cX_ex_scanned[CR_GOAL_LEN_FAST])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[CR1])); + atomic64_read(&sbi->s_bal_cX_failed[CR_GOAL_LEN_FAST])); seq_printf(seq, "\t\tbad_suggestions: %u\n", - atomic_read(&sbi->s_bal_cr1_bad_suggestions)); - - seq_puts(seq, "\tcr1.5_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR1_5])); - seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[CR1_5])); - seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR1_5])); + atomic_read(&sbi->s_bal_goal_fast_bad_suggestions)); + + /* CR_BEST_AVAIL_LEN stats */ + seq_puts(seq, "\tcr_best_avail_stats:\n"); + seq_printf(seq, "\t\thits: %llu\n", + atomic64_read(&sbi->s_bal_cX_hits[CR_BEST_AVAIL_LEN])); + seq_printf( + seq, "\t\tgroups_considered: %llu\n", + atomic64_read( + &sbi->s_bal_cX_groups_considered[CR_BEST_AVAIL_LEN])); + seq_printf(seq, "\t\textents_scanned: %u\n", + atomic_read(&sbi->s_bal_cX_ex_scanned[CR_BEST_AVAIL_LEN])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[CR1_5])); + atomic64_read(&sbi->s_bal_cX_failed[CR_BEST_AVAIL_LEN])); seq_printf(seq, "\t\tbad_suggestions: %u\n", - atomic_read(&sbi->s_bal_cr1_5_bad_suggestions)); + atomic_read(&sbi->s_bal_best_avail_bad_suggestions)); - seq_puts(seq, "\tcr2_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR2])); + /* CR_GOAL_LEN_SLOW stats */ + seq_puts(seq, "\tcr_goal_slow_stats:\n"); + seq_printf(seq, "\t\thits: %llu\n", + atomic64_read(&sbi->s_bal_cX_hits[CR_GOAL_LEN_SLOW])); seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[CR2])); - seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR2])); + atomic64_read( + &sbi->s_bal_cX_groups_considered[CR_GOAL_LEN_SLOW])); + seq_printf(seq, "\t\textents_scanned: %u\n", + atomic_read(&sbi->s_bal_cX_ex_scanned[CR_GOAL_LEN_SLOW])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[CR2])); - - seq_puts(seq, "\tcr3_stats:\n"); - seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[CR3])); - seq_printf(seq, "\t\tgroups_considered: %llu\n", - atomic64_read(&sbi->s_bal_cX_groups_considered[CR3])); - seq_printf(seq, "\t\textents_scanned: %u\n", atomic_read(&sbi->s_bal_cX_ex_scanned[CR3])); + atomic64_read(&sbi->s_bal_cX_failed[CR_GOAL_LEN_SLOW])); + + /* CR_ANY_FREE stats */ + seq_puts(seq, "\tcr_any_free_stats:\n"); + seq_printf(seq, "\t\thits: %llu\n", + atomic64_read(&sbi->s_bal_cX_hits[CR_ANY_FREE])); + seq_printf( + seq, "\t\tgroups_considered: %llu\n", + atomic64_read(&sbi->s_bal_cX_groups_considered[CR_ANY_FREE])); + seq_printf(seq, "\t\textents_scanned: %u\n", + atomic_read(&sbi->s_bal_cX_ex_scanned[CR_ANY_FREE])); seq_printf(seq, "\t\tuseless_loops: %llu\n", - atomic64_read(&sbi->s_bal_cX_failed[CR3])); - seq_printf(seq, "\textents_scanned: %u\n", atomic_read(&sbi->s_bal_ex_scanned)); + atomic64_read(&sbi->s_bal_cX_failed[CR_ANY_FREE])); + + /* Aggregates */ + seq_printf(seq, "\textents_scanned: %u\n", + atomic_read(&sbi->s_bal_ex_scanned)); seq_printf(seq, "\t\tgoal_hits: %u\n", atomic_read(&sbi->s_bal_goals)); - seq_printf(seq, "\t\tlen_goal_hits: %u\n", atomic_read(&sbi->s_bal_len_goals)); + seq_printf(seq, "\t\tlen_goal_hits: %u\n", + atomic_read(&sbi->s_bal_len_goals)); seq_printf(seq, "\t\t2^n_hits: %u\n", atomic_read(&sbi->s_bal_2orders)); seq_printf(seq, "\t\tbreaks: %u\n", atomic_read(&sbi->s_bal_breaks)); seq_printf(seq, "\t\tlost: %u\n", atomic_read(&sbi->s_mb_lost_chunks)); - seq_printf(seq, "\tbuddies_generated: %u/%u\n", atomic_read(&sbi->s_mb_buddies_generated), ext4_get_groups_count(sb)); @@ -3109,8 +3143,7 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) atomic64_read(&sbi->s_mb_generation_time)); seq_printf(seq, "\tpreallocated: %u\n", atomic_read(&sbi->s_mb_preallocated)); - seq_printf(seq, "\tdiscarded: %u\n", - atomic_read(&sbi->s_mb_discarded)); + seq_printf(seq, "\tdiscarded: %u\n", atomic_read(&sbi->s_mb_discarded)); return 0; } @@ -3597,7 +3630,7 @@ int ext4_mb_init(struct super_block *sb) sbi->s_mb_stats = MB_DEFAULT_STATS; sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD; sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS; - sbi->s_mb_cr1_5_max_trim_order = MB_DEFAULT_CR1_5_TRIM_ORDER; + sbi->s_mb_best_avail_max_trim_order = MB_DEFAULT_BEST_AVAIL_TRIM_ORDER; /* * The default group preallocation is 512, which for 4k block diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index bddc0335c261..df6b5e7c2274 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -86,11 +86,11 @@ #define MB_DEFAULT_LINEAR_SCAN_THRESHOLD 16 /* - * The maximum order upto which CR1.5 can trim a particular allocation request. - * Example, if we have an order 7 request and max trim order of 3, CR1.5 can - * trim this upto order 4. + * The maximum order upto which CR_BEST_AVAIL_LEN can trim a particular + * allocation request. Example, if we have an order 7 request and max trim order + * of 3, we can trim this request upto order 4. */ -#define MB_DEFAULT_CR1_5_TRIM_ORDER 3 +#define MB_DEFAULT_BEST_AVAIL_TRIM_ORDER 3 /* * Number of valid buddy orders diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 4a5c08c8dddb..6d332dff79dd 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -223,7 +223,7 @@ EXT4_RW_ATTR_SBI_UI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.int EXT4_RW_ATTR_SBI_UI(warning_ratelimit_burst, s_warning_ratelimit_state.burst); EXT4_RW_ATTR_SBI_UI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval); EXT4_RW_ATTR_SBI_UI(msg_ratelimit_burst, s_msg_ratelimit_state.burst); -EXT4_RW_ATTR_SBI_UI(mb_cr1_5_max_trim_order, s_mb_cr1_5_max_trim_order); +EXT4_RW_ATTR_SBI_UI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order); #ifdef CONFIG_EXT4_DEBUG EXT4_RW_ATTR_SBI_UL(simulate_fail, s_simulate_fail); #endif @@ -274,7 +274,7 @@ static struct attribute *ext4_attrs[] = { ATTR_LIST(warning_ratelimit_burst), ATTR_LIST(msg_ratelimit_interval_ms), ATTR_LIST(msg_ratelimit_burst), - ATTR_LIST(mb_cr1_5_max_trim_order), + ATTR_LIST(mb_best_avail_max_trim_order), ATTR_LIST(errors_count), ATTR_LIST(warning_count), ATTR_LIST(msg_count), diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 7ea9b4fcb21f..bab28121c7a4 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -120,19 +120,19 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX); { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}, \ { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"}) -TRACE_DEFINE_ENUM(CR0); -TRACE_DEFINE_ENUM(CR1); -TRACE_DEFINE_ENUM(CR1_5); -TRACE_DEFINE_ENUM(CR2); -TRACE_DEFINE_ENUM(CR3); - -#define show_criteria(cr) \ - __print_symbolic(cr, \ - { CR0, "CR0" }, \ - { CR1, "CR1" }, \ - { CR1_5, "CR1.5" } \ - { CR2, "CR2" }, \ - { CR3, "CR3" }) +TRACE_DEFINE_ENUM(CR_POWER2_ALIGNED); +TRACE_DEFINE_ENUM(CR_GOAL_LEN_FAST); +TRACE_DEFINE_ENUM(CR_BEST_AVAIL_LEN); +TRACE_DEFINE_ENUM(CR_GOAL_LEN_SLOW); +TRACE_DEFINE_ENUM(CR_ANY_FREE); + +#define show_criteria(cr) \ + __print_symbolic(cr, \ + { CR_POWER2_ALIGNED, "CR_POWER2_ALIGNED" }, \ + { CR_GOAL_LEN_FAST, "CR_GOAL_LEN_FAST" }, \ + { CR_BEST_AVAIL_LEN, "CR_BEST_AVAIL_LEN" }, \ + { CR_GOAL_LEN_SLOW, "CR_GOAL_LEN_SLOW" }, \ + { CR_ANY_FREE, "CR_ANY_FREE" }) TRACE_EVENT(ext4_other_inode_update_time, TP_PROTO(struct inode *inode, ino_t orig_ino),