From patchwork Wed Mar 31 06:01:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gao Xiang X-Patchwork-Id: 12174345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96D12C433C1 for ; Wed, 31 Mar 2021 06:02:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5D0DD61928 for ; Wed, 31 Mar 2021 06:02:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233705AbhCaGCB (ORCPT ); Wed, 31 Mar 2021 02:02:01 -0400 Received: from sonic309-22.consmr.mail.gq1.yahoo.com ([98.137.65.148]:35267 "EHLO sonic309-22.consmr.mail.gq1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233726AbhCaGBd (ORCPT ); Wed, 31 Mar 2021 02:01:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1617170492; bh=0DKaLMLKGPV4V7NUDw1i/Rk6USZIsGSOjxW+ihUu+eY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Subject:Reply-To; b=QtljiAbQF8GZwFHHZQOhnS/RxUWYdKvUbe0tMUgFycUe0SEo869PiFE8gxBDkkl1TY0zVeCwAr6Fd1dobrt2woVefTZeb00L+4SKZsT2U+niEM1k8k16ozjdrR4Lf21Efu+rMWdey/23FGXLi9MkrSHlmzPhJNqaIoe9Elbcru22/2/6sl89uXkhP6eLCE6O8/c88K61xGEj3eSjYTGZVxcV8/kRM617FfyLEecXELMpeqL7Cw2r7VeuL2zlXs3mOOJzCCyu4e4CKexALHxqv5zS30QLD2rpn1/dGmPMh+EXpB1i7G3MBVv2WIdDVrI2XYrw1VPJDdUKhId9eXL3Aw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1617170492; bh=rABuihHWYzDAHIqfV0Yg1Ywr3pBh4Sc7asHw/QVlgHZ=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=GszDsh330G0ATZRR4iI8giKyQpzEcre/9hiI9ps1WOTqq5jUrCph13dv+Xu274+MMpXNKK0qiZuOKEFodEQUDDiT15s1G0QAvtAW9Q2hxifsedoUE4sOm9O8567cjug/P/S2Tu41ZGCF/1xoSDfXqX0lJUmACGczt+KtXeJOZnBg5dyRIGL8mNDqiWO50c4aQ2S4FQk0wEg75W5zp3VkwgdzqOOjsDAJa4xmNy4Mnnu4Q3MP+JQJKlYIEvdMKlO/zotESWup/iH/A7bZ9h62U3vTzm68gChmuc0Kba0zSbaQPdGEmiuQwoo2EYsW+C2US25FvuOhLJ9053Y8QzGL5A== X-YMail-OSG: I198YfIVM1n8amPIZkC6BTBSgTcum29y1tiH.EZiuiNEaT0ubZqaH8Za27LDRtT 91r5PpujOKJlquA5PRG6B1UjBeJGgLfrWAurJJGVZIJHqCJSxSOnpMol8vjgnqLxLYj.0C.Z0k7O nGTF0XzNOzFc5WKms6WGqaud1RlQZGK4MI7eemlootLQ0ZL8721cJld1ZX5zacqSZRw8BAMxX4n5 YnhaHqv8CIkyLe_OYLSbpcYpKP9J.2FI_GFMXkPhoZSeQ_fLGTbH8RFF_0D_j5KqfY3tmHRROHrC 1Eon1wed6jgwwOSONossMlZq5n0LgN1IFDcqogYoj3bQkBTi1BWDohCarPoiIc4_ysz7b71rUn4K PZQoCW2_oM_cwN8M6EnwtiA7z._d_FnBtmaw_PLJ3YmYsxctYySr8wGJwbR15njGwdZTW4bT3doA oyfbP.Ec5cOs4MZIDeYUc79lycZhdAOs7gDC5JT7o_QwLx4KfOqOWFaPoYzN9iveZOMGo52eUriL NHik9ICVnLcDmMOfJ3bUhcNozszWT1XZZgNuJ5ACCVIGvu0_.k5dr5wexjbfuHEJh5waSOemO8hX I2M14CpEFV4tvodDxC4_jlcwKPPwRE2ZSZFlD9y0SqoQsSL_GhmGnjye68gpKC6PF2QWlgvHCAZz M3qstaRIMgmM8WeIKTYhvGQNzYcIxve5UAJCWY9fUz2DPm8q3OiIFHadTnX.p2rKlADTMXXv4TIy HbPUQgXGcTFyxqWxEZnfq_1Gsyn71Z09mYdM7OJtW91TNEhO_s_SLxozVNFLl6vOEt0dwuOfWwwQ ipoP0G03ka.oyWEmTY6uDgAgueCgnCjMaI0rxwFYTpMPsd_QWtxWz7Y7VjIYGKgS.mvcZEeDFKfJ b11UiFqOXLQXtQC5Jk8MGN5KEQzTwgiEN7w7uhZKiVzmv60pK5CDgfi.EQj8whxO7poIeHCM2VMh aj1X8ZlzpUQP_0f_S1Q4jxHCuyvKfMSY219OjZI5jAPNNqQm1nBGip0uOWX44TX_1JQC5oLTOaUq vqsYcC6Tu9hS3bKUELkvVcVaGDdwxbmv6hi2_JXUZ00yEXDuvR26_ku9rfRxxjbXHIU2Q1nijio_ 5lTq9CPlxDBMdzeIGxxBxcqxmNFmDtHXhJ9Mo7LoRVOO73kyAUd6KPkKNmWbqj_MrcsD8R36zdNW VETC7l3uD_YvaOmGN_8yf85BXmutMXCSebJsb70y.5cdoZexxNovAi.NFdg67nIGzSOkyDRbBOQX qLlDpV2jO3Mnr4N6lWzQAeGuxUux_xeg_4lGwP9gdINZcSESDGtnEPbTvtPpTP6pQu0xQajKfPSz AtnfvXKhePBYaOVaPQkEXm_1QSnEDsWSNecQ2Vu28ICQXQTcpEP1E97BCoIzCTQrxEAqdNy__pCL r_VrYlQx8b54DriGWiJGmKAOQBlGwMdZnayUDU_Y1RI6eDeFyWa2Qm6B1ob9CsGsPTWKltda.mWJ lTjAqIp62Tc9_Z2dtFt18U2LzvC.1DZ7AKKKuGeI3E0LWzQhy2QyPSKa70peWtXT7m9z3H90DwHN LXu9z36Z_cNYY4h08DO1zYlpR_4IML5e1Uh8GsrN4ZApKp4.Y4oEJFT6gn.pcQiZMyeaB0d3YuCO tGYBTkuidai7lonjzsJLluH73XJf5O.k5fRt8jlVI3eiqjXuVhI3EtsbCYXU7pXU5KLZoc_NQQGN 5yZUMHn9eMXOGjmy15x4XECEy_Ht9xUQtE43qoUw270tyVQzKQ0NfaEAixmb06uR9iZRI9Evf0mv YJbLvyNLYP5daSyyUcw4opG3wj2RJmbtFZ33UFMTvuCyNm4DXy4Pm7J1fupefkLSgdMtC4NlKARu 62ziamKNvVI.j9k0NQ.Oi5Cs7JIXOjcwvkqj2_VpC4eVMGp7LFGe76cD6vsuD3.Cb8qH8z8wXr5x ebRxGBqcXIC1h0xi7cNzPBJPUm1EtazhzakT7rU8bKKHcWcrI27jGqES5WCS3DKZTxBXI3DC9jti P9F.EtpMjg9vmm8EFy6Gx.hwFOMK4VG3lOJRLUhyIuXbpaGW6v4Ej0UnPyUsexopqRHsxDnDAUBh iABT6Xski27PrAz.vVyUoRgB_xZfbTvpnj_zE6LrLiSYN0iRf1P3fg8gSYk8Z4cp12KfmvkqpBoX Ixjize4QFmrwU5bhMptT0jeFUjXZW1Z1bkJS.NryflxWZ773Rjf3UYrfA0W2692o0ld5xyBhUdSh kjQRYJRDy578oLIecCsggEM1MrhvryFir3QKLonP_7omWCfU2wKotMGtAbKWp3UCZMXu3yIL6JM2 NZVKme5GHCdf98vkd3C4kF3ObNkJ2eSv18XDzLgAnUSuZ.RyKrUBdftwzW92ZSAa20fOv7SyFe_w z6KU.sFa7JUIoNAIEzwhkOnadZ.z4pZKYvMZgpJn9.lMY.E5ducwdUJhgW2._MlxJC5mcJuF.i9U bZqHh.B9I7yU4Iz3XpxpRAbiupaqukeoOGuFfoDEimCcpAt6e3bwfxCWv5EK0YLJhiBVJ494le09 .5aInoBJdAZ80cNSEvAMpdCsHYOGTYJkuZ2LmjY.oI1Ka4OZOUGpPGxt8k_27YjoVWGoteTZDvLr T4xzoUBiCBbSUslhrMfvVCRAtNm6l6KBwaASuIvgHhoxYWIGI48jamD6B0jxiQC2hLTBYgSrzKjl yPQ-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.gq1.yahoo.com with HTTP; Wed, 31 Mar 2021 06:01:32 +0000 Received: by kubenode525.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 190a809d5456af9661811acc5a61b089; Wed, 31 Mar 2021 06:01:27 +0000 (UTC) From: Gao Xiang To: linux-xfs@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Gao Xiang Subject: [PATCH v4 1/7] workqueue: bound maximum queue depth Date: Wed, 31 Mar 2021 14:01:11 +0800 Message-Id: <20210331060117.28159-2-hsiangkao@aol.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210331060117.28159-1-hsiangkao@aol.com> References: <20210331060117.28159-1-hsiangkao@aol.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Existing users of workqueues have bound maximum queue depths in their external algorithms (e.g. prefetch counts). For parallelising work that doesn't have an external bound, allow workqueues to throttle incoming requests at a maximum bound. Bounded workqueues also need to distribute work over all worker threads themselves as there is no external bounding or worker function throttling provided. Existing callers are not throttled and retain direct control of worker threads, only users of the new create interface will be throttled and concurrency managed. Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Gao Xiang --- libfrog/workqueue.c | 42 +++++++++++++++++++++++++++++++++++++++--- libfrog/workqueue.h | 4 ++++ 2 files changed, 43 insertions(+), 3 deletions(-) diff --git a/libfrog/workqueue.c b/libfrog/workqueue.c index fe3de4289379..8c1a163e145f 100644 --- a/libfrog/workqueue.c +++ b/libfrog/workqueue.c @@ -40,13 +40,21 @@ workqueue_thread(void *arg) } /* - * Dequeue work from the head of the list. + * Dequeue work from the head of the list. If the queue was + * full then send a wakeup if we're configured to do so. */ assert(wq->item_count > 0); + if (wq->max_queued) + pthread_cond_broadcast(&wq->queue_full); + wi = wq->next_item; wq->next_item = wi->next; wq->item_count--; + if (wq->max_queued && wq->next_item) { + /* more work, wake up another worker */ + pthread_cond_signal(&wq->wakeup); + } pthread_mutex_unlock(&wq->lock); (wi->function)(wi->queue, wi->index, wi->arg); @@ -58,10 +66,11 @@ workqueue_thread(void *arg) /* Allocate a work queue and threads. Returns zero or negative error code. */ int -workqueue_create( +workqueue_create_bound( struct workqueue *wq, void *wq_ctx, - unsigned int nr_workers) + unsigned int nr_workers, + unsigned int max_queue) { unsigned int i; int err = 0; @@ -70,12 +79,16 @@ workqueue_create( err = -pthread_cond_init(&wq->wakeup, NULL); if (err) return err; + err = -pthread_cond_init(&wq->queue_full, NULL); + if (err) + goto out_wake; err = -pthread_mutex_init(&wq->lock, NULL); if (err) goto out_cond; wq->wq_ctx = wq_ctx; wq->thread_count = nr_workers; + wq->max_queued = max_queue; wq->threads = malloc(nr_workers * sizeof(pthread_t)); if (!wq->threads) { err = -errno; @@ -102,10 +115,21 @@ workqueue_create( out_mutex: pthread_mutex_destroy(&wq->lock); out_cond: + pthread_cond_destroy(&wq->queue_full); +out_wake: pthread_cond_destroy(&wq->wakeup); return err; } +int +workqueue_create( + struct workqueue *wq, + void *wq_ctx, + unsigned int nr_workers) +{ + return workqueue_create_bound(wq, wq_ctx, nr_workers, 0); +} + /* * Create a work item consisting of a function and some arguments and schedule * the work item to be run via the thread pool. Returns zero or a negative @@ -140,6 +164,7 @@ workqueue_add( /* Now queue the new work structure to the work queue. */ pthread_mutex_lock(&wq->lock); +restart: if (wq->next_item == NULL) { assert(wq->item_count == 0); ret = -pthread_cond_signal(&wq->wakeup); @@ -150,6 +175,16 @@ workqueue_add( } wq->next_item = wi; } else { + /* throttle on a full queue if configured */ + if (wq->max_queued && wq->item_count == wq->max_queued) { + pthread_cond_wait(&wq->queue_full, &wq->lock); + /* + * Queue might be empty or even still full by the time + * we get the lock back, so restart the lookup so we do + * the right thing with the current state of the queue. + */ + goto restart; + } wq->last_item->next = wi; } wq->last_item = wi; @@ -201,5 +236,6 @@ workqueue_destroy( free(wq->threads); pthread_mutex_destroy(&wq->lock); pthread_cond_destroy(&wq->wakeup); + pthread_cond_destroy(&wq->queue_full); memset(wq, 0, sizeof(*wq)); } diff --git a/libfrog/workqueue.h b/libfrog/workqueue.h index a56d1cf14081..a9c108d0e66a 100644 --- a/libfrog/workqueue.h +++ b/libfrog/workqueue.h @@ -31,10 +31,14 @@ struct workqueue { unsigned int thread_count; bool terminate; bool terminated; + int max_queued; + pthread_cond_t queue_full; }; int workqueue_create(struct workqueue *wq, void *wq_ctx, unsigned int nr_workers); +int workqueue_create_bound(struct workqueue *wq, void *wq_ctx, + unsigned int nr_workers, unsigned int max_queue); int workqueue_add(struct workqueue *wq, workqueue_func_t fn, uint32_t index, void *arg); int workqueue_terminate(struct workqueue *wq); From patchwork Wed Mar 31 06:01:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gao Xiang X-Patchwork-Id: 12174351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECDA1C433E2 for ; Wed, 31 Mar 2021 06:02:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D563C619E1 for ; Wed, 31 Mar 2021 06:02:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233729AbhCaGCC (ORCPT ); Wed, 31 Mar 2021 02:02:02 -0400 Received: from sonic306-19.consmr.mail.gq1.yahoo.com ([98.137.68.82]:46514 "EHLO sonic306-19.consmr.mail.gq1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233723AbhCaGBc (ORCPT ); Wed, 31 Mar 2021 02:01:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1617170492; bh=O+X5RQ4KxhI731JaPwn2i9hxqbO0rX+Nkvj5t61u2Wc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Subject:Reply-To; b=caE72ZfDcv8XmU69KqUmQY+DrZwTnQk0wwuLSvhJWj8yyE86/Da9T/+z0/c1uGkkVuK8pC/gjIOwbcl9Ir1g3b7sZv+aMsuZ6eMqsVcaJdql1ClOkmWZpyvy9HetKcYoVEBb7AsMl/EuGnte9/55ZfXKzeqAjPKhPUM/0bG5NsHoeKEQzO/w0yuRqFo5NzUcYw2g2iww4f5IrnBJ5bS++4BqnTKwDg7E0A+BwjeJJG0BC0U7nOXH/1jty/7eWSyDqj1nANbARqvH3+6ct5L8cDm+gX+wU1IRODZCF2YabOWT5aMfMWRlppiqXHaC21dPHsyqThk83yrzNXmbVWAgTw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1617170492; bh=p7OFXrpkrIT/V3QQMBh/hNRp5JIZVLVBvyEbd7JN4aC=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=j+vdUSZfPCylPa4U5aVUqxxa8IzKnzpBjrhY96RIuePWTqnq23QRxeJMXmY63fDa9NoVs6/so9s8fSbJHtDJ5FK5nnfE1CFvV/tK2JVzeGSJ4c1k7a6XX9Bg+njD6HqTqxp1FzdQKKY2EVrk9Z+j5fMvdJjEnl7LwN5BC2XmZNFhmzMmCDh3VzwXc+Qn4f8zLrEdOF8KWqFKCRIL5532Yl8oPIs3XMnGyV0tQIumLRGzVcU/IvyULt6abj1Z+r2U1/iAPpGx8Q8x3Yn38qjp+NMGdITwM/ai+mXp0WGz6uDnrpx3pWzmAhwaHnOlHzavIx1KNIz0GpmigBVzQnPP0g== X-YMail-OSG: ORisqg8VM1mR4tl659tB3e0qQXqV8.hBD4LxDUgY3svC8qvzTZo.XG_CCmjTf4F LVW0SdOrdq0DxZWex1baYNFjWwfwhaNrrf8dBnAx6yDE9KSqDcrVsnmIo7yX4_jbaB9GBSBjiiEw 7yYIhbeLNErAREYDli2ow4Dd1AcXcfNlfq1q9QGJMDKL4MD9gPhQVQpXbOQpuBjbrnOPfP6C_W8i V6A3tCNcxqZIfrvD8U3MsmDdcgHxADLCwl7yNHTWXOd3Irt8GM4Lrdqin2W0GEuce_0QTlCGRSPe xzHJQ.xH3cmFMC6hOpJbOSdKaqgFBCFfL9nguM6crz5EsmJ_CA2C_twrNJpe7CgHy15qkaN6lEjt kRfiZoJAsZMOTdmJAHkcQwQjarnfO0oqqlVPMLcaR_V5d3sy31cGJ9Z6aUWkXLo96mTO1TCVdB_Z gTiVDv13CGRHt45vgfYte7VquMRTpbDCHxxOMLm.b0SjRR2ub2dA3Yy0MBWpo0b_rM2sLco9c_zL o97aLLq4.jPEaZMqXqyDANvGfs3s7f1TWJIypZH53qSd.uIDiGp2Cqm3HB4NQOsvAvUVCY0Qj.ql 61fTXvF3WEKS1xGgf2xyIjQZjxbX.BFsVRAqo9rZdI82WL6JEB.JCFTcAozS.1A98MYWxVL8nqvR 8ydVJHsTutbmhRzDxpTHndT2pFqE8oUZX9_GFHxsXkfT497EVqy9Ek1INVVkEwvGh6yO52ZcHKmD PZ0HKHwYbqsvSgBHWGz26O_9KGdkdfNqnABwAkVI23lzjii1iUuTSYmGfrFRzv.zinuZQvCYZ.wm kfrJvknb2EWTknh.2JjnXok3qRREMk4xGU_I8zC8Zm9XdxtE6XfDljAvu3lUoyhEXzG3yXZ9Foas edxUPyGAw1kih0n_oE4HKpQBPd6o6keaMiGU_FSuBz_OW3nOczZZtGeQzSENtfb4yOnzkIdIMDSQ oQXsMesK_0_U4PS35SyVZfCGKS9T12A0CpCRmhtt4oyWrLj3IFAV4u8yxwcL9JsRhZBUgfy.hwcf 8D_vyc2hsjYPuHiIlWGGhr1XBGLkY0YBgQx2r20Rz.YQyoggR5XOAcQ8vL.i0HVs82lp2bzv2bw_ 0UpU7vkVPCs5Ga7VcS9hmmwHX969hsvHHsNNzfmn2yg_PeNw2Dsas3vUJP4M1bQn3sOFkV4gx8bt .bMsvNjFseuxBqfL3dgpHJvwH6GKMa94qxHUYGEQHax.DUCa6RGrrTZ1rTn4mcZPDbkNNfT8myLI 2rBnLCMUFSYQaPAEX2JEtnVSkP_aaZsZOj05Z.IH8C8z7ePStpSP02QFmWDi7KTMUL25G9gRiZQe WGp6LgrgYUNWPU_ttdHOdA832gpBnV3irnLVocd1ix3UCdv.D7CXrb4dRCkoTB4GjGakk17AHBQo EdUpBW_GkRWXVwS2mp.PxxOlBOaAjV1uC9VwaXbC7Cs9ywqeysp8dUL2SHB_hUcldKHNoY3qUhyt HoC1ijTFf6B58ypC8zaaccujzPliaPqN2743D1Uienk7Cr_tNsCCxhQ9yaH7FqUzYdLf1VL.AxBA 8_Kv9RzfZQaB58HHdxvSd4zQJExXJfdNFm4M4vaBuQYfXPER83MYvVrXok0feerQeyS7l9KqBAn0 8lpwZuziTq1nPwTGgT6vqxZWy1nRau_8x5VhkbelYdfIVoWgWyjO54HLPTWriUJ3ozEcLkjrUV16 zBakwgsWif.ZFjzLh2JpjX_yA0FazXPJ.9qdc27KFvqMhKNUbp4NUEnCUZ4PV6NQCB16lSNAO_T7 kPK3yxNR...sp2Tu7JDaTaiLiIR5yxgbc2.ZPppndkJr1thkgWUbJX4Djv6yx.ikXd.ojp0yT3p3 thK_jF7f3hqEJyM8Orl4hnIN14.uuRKhWjtzBTXhsIWrOgJKc_SI6Sq7Zrw_M7PDeVJjNHWkJ4up psGdN01eEp9CPFRlKe4R6P_A7bL.WuRP.ucmj2xfZl3u3EDhSgJJIEx.CI3LLfhn9sqMhmZcACIB Af3EnbZKWyqfiqoD3MTmzolpHkAqL6adUfJZEHNwnzU84SAZG5.VfVtC54cSjcA7n_jX3hFFbJRw pT.SWBrkOWyf1yS7f9zz9fmE9kEAAGhF38YFFt9H.1muEUqJ0eo_zNOIEdvUo3ZrMBA7_tzzs8ZY lOeAAngvo_uCUTctEFELkqBD8k.HInnyFh5GG6naojpI8Nhl37YAL1XIwvN0NN9u8wxCBvAguiaK OL5I4TpUXHuhPY1pFPKEpFPHHZ_LGYAyD0hGFezAWbn_0mA7x9Jq.W1VIiH1ea.Cn1Fuiqfyo63y 32cm_GEy9eRcfsLdKQEuX7CzyqpE.LjjxKVDR5zKuHW.zvg7u43QXpQUQDeIIMhVOIDpYzBmUaFT NpLemSAUQilW2nuvGSGDuk3EFUEtmX9bHNUCCTvaF1zYe.sYVK1zFYxUvPceOgigp51aOCmQAtjl Uxzh0ZbTzPkmR8r.pE7If83iuiBWkpdCU8HWRkRAoUaNwIkgCZDD4fh1NrxNX6hDscNSqk2eM45z PIshX6MaFTBpw4wIt9aTrRDxnMDU2JFMMPB9GXmAnpDKIbEd6DDpyJR_eSPs2YLkwxy3brn1QheN JvRMvvw38teOuCGzPLu1vTUY4v8RHk8k- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic306.consmr.mail.gq1.yahoo.com with HTTP; Wed, 31 Mar 2021 06:01:32 +0000 Received: by kubenode525.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 190a809d5456af9661811acc5a61b089; Wed, 31 Mar 2021 06:01:29 +0000 (UTC) From: Gao Xiang To: linux-xfs@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Gao Xiang Subject: [PATCH v4 2/7] repair: Protect bad inode list with mutex Date: Wed, 31 Mar 2021 14:01:12 +0800 Message-Id: <20210331060117.28159-3-hsiangkao@aol.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210331060117.28159-1-hsiangkao@aol.com> References: <20210331060117.28159-1-hsiangkao@aol.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner To enable phase 6 parallelisation, we need to protect the bad inode list from concurrent modification and/or access. Wrap it with a mutex and clean up the nasty typedefs. Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Gao Xiang --- repair/dir2.c | 34 ++++++++++++++++++++++------------ repair/dir2.h | 2 +- 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/repair/dir2.c b/repair/dir2.c index eabdb4f2d497..fdf915327e2d 100644 --- a/repair/dir2.c +++ b/repair/dir2.c @@ -20,40 +20,50 @@ * Known bad inode list. These are seen when the leaf and node * block linkages are incorrect. */ -typedef struct dir2_bad { +struct dir2_bad { xfs_ino_t ino; struct dir2_bad *next; -} dir2_bad_t; +}; -static dir2_bad_t *dir2_bad_list; +static struct dir2_bad *dir2_bad_list; +pthread_mutex_t dir2_bad_list_lock = PTHREAD_MUTEX_INITIALIZER; static void dir2_add_badlist( xfs_ino_t ino) { - dir2_bad_t *l; + struct dir2_bad *l; - if ((l = malloc(sizeof(dir2_bad_t))) == NULL) { + l = malloc(sizeof(*l)); + if (!l) { do_error( _("malloc failed (%zu bytes) dir2_add_badlist:ino %" PRIu64 "\n"), - sizeof(dir2_bad_t), ino); + sizeof(*l), ino); exit(1); } + pthread_mutex_lock(&dir2_bad_list_lock); l->next = dir2_bad_list; dir2_bad_list = l; l->ino = ino; + pthread_mutex_unlock(&dir2_bad_list_lock); } -int +bool dir2_is_badino( xfs_ino_t ino) { - dir2_bad_t *l; + struct dir2_bad *l; + bool ret = false; - for (l = dir2_bad_list; l; l = l->next) - if (l->ino == ino) - return 1; - return 0; + pthread_mutex_lock(&dir2_bad_list_lock); + for (l = dir2_bad_list; l; l = l->next) { + if (l->ino == ino) { + ret = true; + break; + } + } + pthread_mutex_unlock(&dir2_bad_list_lock); + return ret; } /* diff --git a/repair/dir2.h b/repair/dir2.h index 5795aac5eaab..af4cfb1da329 100644 --- a/repair/dir2.h +++ b/repair/dir2.h @@ -27,7 +27,7 @@ process_sf_dir2_fixi8( struct xfs_dir2_sf_hdr *sfp, xfs_dir2_sf_entry_t **next_sfep); -int +bool dir2_is_badino( xfs_ino_t ino); From patchwork Wed Mar 31 06:01:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gao Xiang X-Patchwork-Id: 12174355 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42DE1C433E6 for ; Wed, 31 Mar 2021 06:02:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 22D6661935 for ; Wed, 31 Mar 2021 06:02:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233732AbhCaGCD (ORCPT ); Wed, 31 Mar 2021 02:02:03 -0400 Received: from sonic312-25.consmr.mail.gq1.yahoo.com ([98.137.69.206]:44802 "EHLO sonic312-25.consmr.mail.gq1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233731AbhCaGBg (ORCPT ); Wed, 31 Mar 2021 02:01:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1617170495; bh=zhVB4g6LGbwzKv4Cl52Xh+NEV+mE7B3cm4bqqbPqkaM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Subject:Reply-To; b=Oq5GZ7DGw3oV3o+GsZJqmAgbmsleWJXr9HLfAocFoBkjO0CujwLiorA0nMcLgd96wZraZzwvyjtEv8G2G3OIax/jxt150b1UdPLawJjLhTvzJkwbPGvXEU16WEBEAEZ1mvpECrxJYieF4pgnRhGwFXF2kqv+bNjoKICRR5dmpTS9fQg+16ULL/zXfVPOLClQORhcBTrpVaIRnqW0dOFZV6hLS5krkk6MmpD1xup1NZ5gOrGEgp19ghhPtboKDK37KyKAYgnUDyXKbgPl7lNVR+HCMRI0hNJc5o27sv6a0TWuuu5z4vTADjlTobnD5Wn8o7dyKqD1gTDANarpr0QRFA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1617170495; bh=lYW2vzgdZHP1EGoyU2wFyzXucgIVBaLS5R85JGXd+iN=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=k0Io+ou0nfE9PwNUuz/LqtFetIZvDaclcS5QyIvgsZ9LtENTOl8Pxx7OmxrTC/+zVu+OjPJwTynTYyGcNHgwszg7P9u/CqwVvYffSqdLJHndVm4ZYxILsKk8ulJ5pk5RuDCoIkrUU20PoWOg3UnY808+yzXXNLBoo+7r9Xy4ZhItQs/4QOOcrq8OG3XmlXYMqw+459f4VlgdPkwkG9JwDIm6CI1mld08qPqkCHnoHGnUwc0+twrsis0Kvh6+JRaxdvMo/zDegxroJkoK/1DuRsalaYK7WC7jKaX011QLL4zKI1uvLzgSol014fr1+jOksKaeSuoaSmXZvH7kTCbcXA== X-YMail-OSG: _fpea04VM1kH.t2PlLfF5SzhbJQcGtVH0B8duV1MGWUTuAhUHy9_as_MaI.aazD 4eTRxtHUYMWbnAJQHH5Ss2w63n1VocoTlmCIByqVKfurTsF5wiwipYQAESRVVlD0hv5xO_I1TSJn V62DBKWI_BT.PdXMlcgmuWkxatL9AU6.ARgZ0_zu2qZriIW.Gd3CjnAAthVlIv_pbEjfnchoIgnW rybjDLqxjSaaCa7VwsFhOeNJoqqZ2_F7xXTp9PNTuGkeHjCD2ebcrXOd64iJ13w4pHi70D5Oxr28 miSST5qmQE5duNR.ngSp94p6FYxW9Vpwe2bltZWaeb_E3.OGC8XZRc7yX1r9Nj._Q5nZZuB6MHtu ooazKCsDwPaeoBQCIxbSOYl43QBs5clYyxnaJ1veCnQHsCZv0_ZpCYux68nrxxwn57S7w._cjvKW LuwV5c0PnOzk01BTd51vD_Hs2xvNjoi8fGEo3osWUBr93kts3TOOACC3z2eYGG..OxWYWwIhnY9f qXY48u7jQ54ZqZXwgOLOt4dBUNxnuuLyfWyOIyO7_ubY6APQUrdoDvSiG5MOQxLfxTT.zqE5GObZ fL0hw_kTzh2ZjzVymriuhHc6Ec.rCBNaaL.QbxwE92Fh2W2vAtdpxCIr560s08KK1NNMXZZ_YMs_ 5u5UWBFJaOFA_EaOFQEI2bMQ0oC90zOW8sei2T7whhfRXhAGLjWizbfEZfwv8oUmDeAYtyopEN0z uI8CDwFX43rQEj1dEZteDhX7r36RGmBvr8Ng_vHQDwDlJKgidcZhajYyRPQLSe53xB7Gd8n4TE0u BrD58w_47nmh85zFlMxxFc5Bm5LVtK.uq8BzgwmyD1_JBaInoBZbSRtWGB.b5PMnacrniE7b5lmY hPbCJF3Vg0v6rwv5febgYY1FIq00D5sWxQOpUCoZGwmPxmNXN_UpbosO0iIRo3qk9TyH0StRRQ4m WRd84.L8GaXcp4ZRcqqKQ5N2JkZGwh0f6SVb7GP8uB8RDilaJBGgu_pY834994d8kSV5aHB463UF S16U2pxpyrdRpu0r1InbemxTp2CYinYcUTXhpnLHDQadtKS5wJiiBfTxrgWyXYT5JVqVk7x5ZNjO wJuAm9RcS.EkxPFls0grK__LXP3AINwZz7eQtMd.okDNg8RjKEArvnkNW.eo4C9sA0PPtQKKF7fi Wb3qFmMnoQxB3yJoq20Ro2_66trltu5uCpr4qTrphIx3sH_5hrDNuyv3otcsaahInYVmbQYJFTJP 48CGfH_lZMq6QZYxGo5xV37eLnn0iw7TiKXFhkxyC.KYBEsjvBc7MHEto8xwnJLaCezmkq4nP1Gz kF7ro5FB7.2ABmyjlYUEKXHyW51xHVD7c7mJYfOaIVdjdEhK5Ypz0thDHH15AZipcOWJ.t813r_Q eBtB4SoSsePKnB7.a_lsUxQfuwORGwdu_yJtdQRd7s7T9efAEu3cVJolTzhNlVMkLAkPvobfjjaj mV5lUhulQEkgkWW5Cp5sq_vFB.SxFR_mZEDRVbUUgEU4jjMJRm2yad4XPkaHpwQUChtg7D6Owdhm P2bR43wqDysocKIFG7z5epfw7zbSYECYPTQu4XagInGWw7aTfdE5Sj0AkM33iocU2x0uOiHTsF6A 5cXWmKvO3DVkTEvrkcEMueAEGshhQdQFr6zc234i6BKmxCPvDKiuO3fvwlVxngC99DNNl_dZsURJ KV4V0vjZQrgsdlUH7zRbj23JYQMADiCiM6OiX.giD8FCPdowYgGWCG_hn53IlkhgEi86t_p_Lvpt q47F1JXohYtbHjfBwl1SNlsUgcS9dztt_IlIoprclsYft6.1urSdRqp5G_4p_WozNXQwdvqvSeNV jRlcd9TAQhiTprOFcWAAo1TA6zMyRTfz4EPu19YGGQ92OMUBlEwLMThOYlamq7FZB8pMYL3geUum dNeYojqx1ryPDiB9rhNg17qwMSaauDAR1IbD3xqnnI1dEfQvcXNdsOYBTMIpcMNa9fvaHYBdiqDn hL7qWxDRjY8IvrcdESJbCrKJCS2PdxazVC52duscGlOOcBHjvMgp51dH5o7X6WKX_MSsHb_FyFaU x_P.kCfwQgbe0moIFO2UmUlq1x9sa0Q9gnyhJB1bOb57et1ic5VBUd260Hdl5YkpBl72vYlwcFYq fXl8fnAjyuAHN9ESJEshV_vIKNrTJwF3toIkXmcIijoiROSUXAdXrvcJagQLt7xiXT4b92VDUMTZ FTUmVuDaLU_7GdxqfJe4MrTjW1e3fzz6IK5hmT1ibIQ2tBHx5BGH26GoIkw417oaldO9RDGxXa3p SWTe7BiWyS0lsNaa5bA73ToZjpiv3MerAjqotPxjgxvXrH2YWROPMAIr.oFYt4GZpiY1LjTTq2b4 j4qpU2R3o8KI55J6Vxqv_NjBE4PBfBDvGJSdAnL8kU0_c3ApS3QA5uGSTqZ7jtwDr2wdohdBkcY_ ._Eo_zy6Km9saEwRg6u6mSt66DfpOWsvGArXH5daq1cIX_HqQ0ND1hJYmn3wAEiN3lWAiAoG_KDJ 8kp9HLPkjA2VAxyTDZ21o1RmXxSrK4mXpYcn1y9PM6NHq9jEjAphh0X7Kg.4TC4i8T2I71jt_8K. T873zM_W9jHNYlX6._2mR7x1rNYCefjXB0beb4FPr_6f3_LYRRYtzI.ZNEyoCI4GvIZj3u4fxID5 XZzJWEF1il9.tu4a33HwDTl_vrgm2wboe X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Wed, 31 Mar 2021 06:01:35 +0000 Received: by kubenode525.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 190a809d5456af9661811acc5a61b089; Wed, 31 Mar 2021 06:01:32 +0000 (UTC) From: Gao Xiang To: linux-xfs@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Gao Xiang Subject: [PATCH v4 3/7] repair: protect inode chunk tree records with a mutex Date: Wed, 31 Mar 2021 14:01:13 +0800 Message-Id: <20210331060117.28159-4-hsiangkao@aol.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210331060117.28159-1-hsiangkao@aol.com> References: <20210331060117.28159-1-hsiangkao@aol.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Phase 6 accesses inode chunk records mostly in an isolated manner. However, when it finds a corruption in a directory or there are multiple hardlinks to an inode, there can be concurrent access to the inode chunk record to update state. Hence the inode record itself needs a mutex. This protects all state changes within the inode chunk record, as well as inode link counts and chunk references. That allows us to process multiple chunks at once, providing concurrency within an AG as well as across AGs. The inode chunk tree itself is not modified in the directory scanning and rebuilding part of phase 6 which we are making concurrent, hence we do not need to worry about locking for AVL tree lookups to find the inode chunk records themselves. Therefore internal locking is all we need here. Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Gao Xiang --- repair/incore.h | 23 +++++++++++++++++++++++ repair/incore_ino.c | 15 +++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/repair/incore.h b/repair/incore.h index 977e5dd04336..d64315fd2585 100644 --- a/repair/incore.h +++ b/repair/incore.h @@ -281,6 +281,7 @@ typedef struct ino_tree_node { parent_list_t *plist; /* phases 2-5 */ } ino_un; uint8_t *ftypes; /* phases 3,6 */ + pthread_mutex_t lock; } ino_tree_node_t; #define INOS_PER_IREC (sizeof(uint64_t) * NBBY) @@ -411,7 +412,9 @@ next_free_ino_rec(ino_tree_node_t *ino_rec) */ static inline void add_inode_refchecked(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ino_un.ex_data->ino_processed |= IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline int is_inode_refchecked(struct ino_tree_node *irec, int offset) @@ -437,12 +440,16 @@ static inline int is_inode_confirmed(struct ino_tree_node *irec, int offset) */ static inline void set_inode_isadir(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ino_isa_dir |= IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline void clear_inode_isadir(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ino_isa_dir &= ~IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline int inode_isadir(struct ino_tree_node *irec, int offset) @@ -455,15 +462,19 @@ static inline int inode_isadir(struct ino_tree_node *irec, int offset) */ static inline void set_inode_free(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); set_inode_confirmed(irec, offset); irec->ir_free |= XFS_INOBT_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline void set_inode_used(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); set_inode_confirmed(irec, offset); irec->ir_free &= ~XFS_INOBT_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline int is_inode_free(struct ino_tree_node *irec, int offset) @@ -476,7 +487,9 @@ static inline int is_inode_free(struct ino_tree_node *irec, int offset) */ static inline void set_inode_sparse(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ir_sparse |= XFS_INOBT_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline bool is_inode_sparse(struct ino_tree_node *irec, int offset) @@ -489,12 +502,16 @@ static inline bool is_inode_sparse(struct ino_tree_node *irec, int offset) */ static inline void set_inode_was_rl(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ino_was_rl |= IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline void clear_inode_was_rl(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ino_was_rl &= ~IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline int inode_was_rl(struct ino_tree_node *irec, int offset) @@ -507,12 +524,16 @@ static inline int inode_was_rl(struct ino_tree_node *irec, int offset) */ static inline void set_inode_is_rl(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ino_is_rl |= IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline void clear_inode_is_rl(struct ino_tree_node *irec, int offset) { + pthread_mutex_lock(&irec->lock); irec->ino_is_rl &= ~IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } static inline int inode_is_rl(struct ino_tree_node *irec, int offset) @@ -545,7 +566,9 @@ static inline int is_inode_reached(struct ino_tree_node *irec, int offset) static inline void add_inode_reached(struct ino_tree_node *irec, int offset) { add_inode_ref(irec, offset); + pthread_mutex_lock(&irec->lock); irec->ino_un.ex_data->ino_reached |= IREC_MASK(offset); + pthread_mutex_unlock(&irec->lock); } /* diff --git a/repair/incore_ino.c b/repair/incore_ino.c index 82956ae93005..299e4f949e5e 100644 --- a/repair/incore_ino.c +++ b/repair/incore_ino.c @@ -91,6 +91,7 @@ void add_inode_ref(struct ino_tree_node *irec, int ino_offset) { ASSERT(irec->ino_un.ex_data != NULL); + pthread_mutex_lock(&irec->lock); switch (irec->nlink_size) { case sizeof(uint8_t): if (irec->ino_un.ex_data->counted_nlinks.un8[ino_offset] < 0xff) { @@ -112,6 +113,7 @@ void add_inode_ref(struct ino_tree_node *irec, int ino_offset) default: ASSERT(0); } + pthread_mutex_unlock(&irec->lock); } void drop_inode_ref(struct ino_tree_node *irec, int ino_offset) @@ -120,6 +122,7 @@ void drop_inode_ref(struct ino_tree_node *irec, int ino_offset) ASSERT(irec->ino_un.ex_data != NULL); + pthread_mutex_lock(&irec->lock); switch (irec->nlink_size) { case sizeof(uint8_t): ASSERT(irec->ino_un.ex_data->counted_nlinks.un8[ino_offset] > 0); @@ -139,6 +142,7 @@ void drop_inode_ref(struct ino_tree_node *irec, int ino_offset) if (refs == 0) irec->ino_un.ex_data->ino_reached &= ~IREC_MASK(ino_offset); + pthread_mutex_unlock(&irec->lock); } uint32_t num_inode_references(struct ino_tree_node *irec, int ino_offset) @@ -161,6 +165,7 @@ uint32_t num_inode_references(struct ino_tree_node *irec, int ino_offset) void set_inode_disk_nlinks(struct ino_tree_node *irec, int ino_offset, uint32_t nlinks) { + pthread_mutex_lock(&irec->lock); switch (irec->nlink_size) { case sizeof(uint8_t): if (nlinks < 0xff) { @@ -182,6 +187,7 @@ void set_inode_disk_nlinks(struct ino_tree_node *irec, int ino_offset, default: ASSERT(0); } + pthread_mutex_unlock(&irec->lock); } uint32_t get_inode_disk_nlinks(struct ino_tree_node *irec, int ino_offset) @@ -253,6 +259,7 @@ alloc_ino_node( irec->nlink_size = sizeof(uint8_t); irec->disk_nlinks.un8 = alloc_nlink_array(irec->nlink_size); irec->ftypes = alloc_ftypes_array(mp); + pthread_mutex_init(&irec->lock, NULL); return irec; } @@ -294,6 +301,7 @@ free_ino_tree_node( } free(irec->ftypes); + pthread_mutex_destroy(&irec->lock); free(irec); } @@ -600,6 +608,7 @@ set_inode_parent( uint64_t bitmask; parent_entry_t *tmp; + pthread_mutex_lock(&irec->lock); if (full_ino_ex_data) ptbl = irec->ino_un.ex_data->parents; else @@ -625,6 +634,7 @@ set_inode_parent( #endif ptbl->pentries[0] = parent; + pthread_mutex_unlock(&irec->lock); return; } @@ -642,6 +652,7 @@ set_inode_parent( #endif ptbl->pentries[target] = parent; + pthread_mutex_unlock(&irec->lock); return; } @@ -682,6 +693,7 @@ set_inode_parent( #endif ptbl->pentries[target] = parent; ptbl->pmask |= (1ULL << offset); + pthread_mutex_unlock(&irec->lock); } xfs_ino_t @@ -692,6 +704,7 @@ get_inode_parent(ino_tree_node_t *irec, int offset) int i; int target; + pthread_mutex_lock(&irec->lock); if (full_ino_ex_data) ptbl = irec->ino_un.ex_data->parents; else @@ -709,9 +722,11 @@ get_inode_parent(ino_tree_node_t *irec, int offset) #ifdef DEBUG ASSERT(target < ptbl->cnt); #endif + pthread_mutex_unlock(&irec->lock); return(ptbl->pentries[target]); } + pthread_mutex_unlock(&irec->lock); return(0LL); } From patchwork Wed Mar 31 06:01:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gao Xiang X-Patchwork-Id: 12174349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D99B1C433E3 for ; Wed, 31 Mar 2021 06:02:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C057E619E0 for ; Wed, 31 Mar 2021 06:02:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233723AbhCaGCD (ORCPT ); Wed, 31 Mar 2021 02:02:03 -0400 Received: from sonic305-19.consmr.mail.gq1.yahoo.com ([98.137.64.82]:46223 "EHLO sonic305-19.consmr.mail.gq1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233732AbhCaGBg (ORCPT ); Wed, 31 Mar 2021 02:01:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1617170496; bh=MDOUyrL1V7MikzhMTRouWqTO2KlJAgF7YlA8h+LJncc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Subject:Reply-To; b=PmCC0yvm6skN7f2w8kldSN8HWgSR3n9g+X36wmTYpvx/CxKMrijy0qmqDHcKw6qZtcXAxYp3jq1mTWZ2Wpdm0Gp3wITkBUvtanG6XmGgLQYU/pZBh03kH50xldm9GGg6ILCzBrztFH12VccMMSc4cBTO4n/+QLS3uorlzX+VKOhroBLPjKIQTtO3RF1DkFhwmonCDRSiwTrrdUrIh8viVZGM1HFm9MIohtHCiT1Kadx7m9OIfPullCCJNvBL90O1VMtB1imPqjD8NVn4+w1QAG/YCCuzmXLcN+0xU6U3K9Onh61AtkquV4Tsi3UhubEsw8++OzVLTQ2/O2fcRSW0Vw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1617170496; bh=D1/XYckzDG+vW7+CffcFPSYetFCEGBjU8nJyKYgFeWO=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=CqzCzfwhVCkP/rlqz7gXIcRimjXXVUhTnpNcvH1w8RAQJZlZsQY2KkZBB11xwQUJ2ro5wggx7SG8jfG5FSe0oXVJ3zQj4VqvehmBvnpS6EWYLccLbt8oMKUtwnsakEBp1tQRbHFFzriL6zZREOZJYAn+9uFXltxdzH1TS2lUHp89jvuOlwjExWFV8tBKVyGlVLdsvJwxu5hRiaa8zNuCTQX1WmHVcWD4LKnSWw0gKKdXF08bSTUzYbHPLtdtphNE6mkM9yOMV24iNfg8Mcj9EioRbc32XDCoSNAdex+78BZGD/H1gZnTDo41GeMjtKMpA2sqLhT0mK+zQSuo1eLsQA== X-YMail-OSG: k0pBU5wVM1mdNDChINx_zMWZZDrt5vPRhcj6aHJ5duV1lPHptlu1fMCxGil2hex bfW.UTOFl8hn05D.Z4zZZPtT75.GL5g0tMtROcAXAap5E4RxFU..dSZVl9wdET2ls2579YFLnrRE dBiAWsYWoyr05lFvOiTYqXz06Ht47VVaFr83Mjyu2vvfVAI3slaP6ZMP15coaKK5TqaJuSoCMyu1 3KfKquIyUKt4aphwH_FSDnoHyqA1EW0h1PZfclnGzVr1mzRZ3ypAosPLMJqgsr6gF9zyK3kWkXO3 kD0id45oPOZpfDSZgg48a8EPo8W2yOobyl9UL3.LPs71XWE7ANW7Ij_AqZU2PnxmnGmDAVJExpWc PdLPQUgsec2mgdei274Uhx6fDpnqFoIwwWwFBKrrCqkZiSm1_W_KuM0acT77FISs7i60wApWRMaW tPZ681cVJzNLjafo0FXQADK_kS79QeqSt7ShwscFKklBvJmS.6q9cXWgelN9CIxz0dcM9rZmYAE0 U.FxswxPB7CxZ77DkATu1LBQVYdA2YIfNnktZlHBMCBlEgfPNPQ9Uz3UenPj9oluDdCO3Gh_GLdq HC_B0bZ5esRjtPdLq_Iq1.lc6v02BLZBaftfFq547P6EoQen8IvAfD0HJ336Uf6rDHgjrpQgaltg scmr8lPoyAuQ_dM.bUNd17lVgFxrzqZEGWk4EP00UY3KfMrVFrIAo6NJvrsBzFgZxlYFAtjwht0y j_y0JMn_3zB.z2CHwqs4BpY3rb.Rz1eKy.1zQTDzeC0sA0W3WQALzLutLHnxdGrOKDFGVy8Lwci5 VOBBi0bFKdgJWF4FkzoZkIUnJFf3ff55WaAxx1Oi7S4wm9UBT1uM35gXnZntztgrx.DnZ5_46bvq w6robc5S6z_K5bM8W4Z0qYGIsB6nFCWtvNiHzo7BEkbShQZ4kWwzMYsQdpwqVcsUZJCrWwoeOb_O HBCz0C3T_FAtF0NYYYaMon.Qq3vqKb6Mbj3SKd9uAy2NqBJAeJOTaaYp_UQjI3.EFW_PY7LBak3z nzzGRYzvQX54myd13VyHIUqxNkVrzMUcqfLlv0eSFSfGYdspuGppdM2fSEYKZDlYWAgZfBMFn4EI XhtYZiBQ0wu.n0XUrOaIWAW9SUVRTy_PP9bl.bPfDZmxBOoH5cWeeeZD5chPpLgo.52Zu0aLQHlg ViWPThDqG00hcFlNRKTSwPJvGZfXaSrzuf1Q5YMGNPdQepxIfUsAzJGiJ1Tyj2_AlBW8g3RNPJhQ JqswaHoPvU9bYt6XmqsRa_US5cr85mEDIqOss43lReNJFJ_G2gujdNt4zEsBtNuR7Fh2rV7CKGAW DMFtA_prvIwx0vpw0ZxJwXOahPrjLDcUxyVMd6m5OuP3.mREIssp6LR40ZhsvWhUIIQs1LGanPs_ 45_ilVmnEEzp9k4gF987p7q0TgjFoWGOPtm7RBBAEB5AB2zqwf1zzjmKQwO5XE01CHeh9UYHmVcx JLjuPSsOiqhm4BGJVb9v.DeAeGgdHx5IczbAoPdhxZzoI8HgMkbt9hK306F_5c0lbmO2GRIR7Mhz 4ML7X7oFuEooV_NcXmxi8YSK1lzXKku3GVNOtyyIOoc9QmvMYJugLimikcY_80xZAanZZMMZfAKo hOYSAswZfP6CRURMlZ_.v.POWx2odjNkQMlvCb4uoUrGF5c5fihPthLTrEXaX1u_bnlp83Mh_pn5 8DEPjRIWaD4TvTQ5ewqkpWK5BteTQMdTPa6lxhFDqLeWfCgJ1NxypzzlR8Rk54oXzhOAKKMXQPMp ZGOWqTR7SccctxzJ3tSJ39PgToVonGl_a1ZOmQhg6R4xyUNKZlP2MivpB4AuKsl7ScUprNv_bx3i 1E8IX1Fq0Vf6PHy9AhDWvqG42oJfy8qKpY5vIsXGDdHf9iYDyjGLkbs5ANFNW4fRZrg.fu1XBUJf BUMWD2jAu0SJllPeN1qBCXyqvPWeQbX..tsN1Etzo7Nj6u5JR.bPQUKUgNfHOPAJMkUD8m.Tw3z6 vvb.QYYNKcYC5sqHPDma_Hm6MaHFWMvt1ojCX_KbuaAiM1v92QorEEsosJ.MD0Yhv.1SV5d5jVxt i.66GW70ZjqgONYJYM0lGaS9I6VjOWMYvFwsTGQYnJPLQw73B3AEJ_sUeYNNBM_JcU90c_1uA_eR KnYMNx6w4qsO6CqegBSuAN69Q29BRUOzj314LhVFWzX1x6UDqSByazi5VbRgpELYzI7IxWcDocDH kEjpinLIy6INlVSm_tVOwgAji0Bz7TaFzAVhAsJmgRaZ5PrYec0slMx.gd1hUkKczQCV78BYA_R0 V4qwQHgiy46GAgLmZYWKV7YISrOdoZe9LCwGqm7FPGcqS99ezkBK6A_2NmKfKWW50ErAN6Ppilbx pJECmAUz8MUFasKmxhMZByX9fkN20cUy0EbNS_XwHwM9Tt.DvbaF_SifSaL2V0jI9gLNocayiI5A SdR3DTm8YQyaIzd94WKRemlDM85_qRvZ_iX3ImJbtu_DxctTvxmrnxpp2QDhopOt1Ol3K5gYawW1 NUUW3K_EUK8z0lle.XELAO2RFSpiIYLetExbFWetu.J4U5E8Gap0VVKMTXus0QoEoej0CvJotkFU WC2bEs_8JhZSIHOXSgig6.7t9TqzDcm6IRqJ_CkrhwsP2V0mjDnEFtDXeLFlPm9W_6bBNaSA- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.gq1.yahoo.com with HTTP; Wed, 31 Mar 2021 06:01:36 +0000 Received: by kubenode525.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 190a809d5456af9661811acc5a61b089; Wed, 31 Mar 2021 06:01:35 +0000 (UTC) From: Gao Xiang To: linux-xfs@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Gao Xiang Subject: [PATCH v4 4/7] repair: parallelise phase 6 Date: Wed, 31 Mar 2021 14:01:14 +0800 Message-Id: <20210331060117.28159-5-hsiangkao@aol.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210331060117.28159-1-hsiangkao@aol.com> References: <20210331060117.28159-1-hsiangkao@aol.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner A recent metadump provided to us caused repair to take hours in phase6. It wasn't IO bound - it was fully CPU bound the entire time. The only way to speed it up is to make phase 6 run multiple concurrent processing threads. The obvious way to do this is to spread the concurrency across AGs, like the other phases, and while this works it is not optimal. When a processing thread hits a really large directory, it essentially sits CPU bound until that directory is processed. IF an AG has lots of large directories, we end up with a really long single threaded tail that limits concurrency. Hence we also need to have concurrency /within/ the AG. This is realtively easy, as the inode chunk records allow for a simple concurrency mechanism within an AG. We can simply feed each chunk record to a workqueue, and we get concurrency within the AG for free. However, this allows prefetch to run way ahead of processing and this blows out the buffer cache size and can cause OOM. However, we can use the new workqueue depth limiting to limit the number of inode chunks queued, and this then backs up the inode prefetching to it's maximum queue depth. Hence we prevent having the prefetch code queue the entire AG's inode chunks on the workqueue blowing out memory by throttling the prefetch consumer. This takes phase 6 from taking many, many hours down to: Phase 6: 10/30 21:12:58 10/30 21:40:48 27 minutes, 50 seconds And burning 20-30 cpus that entire time on my test rig. Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Gao Xiang --- repair/phase6.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index 14464befa8b6..e51784521d28 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -6,6 +6,7 @@ #include "libxfs.h" #include "threads.h" +#include "threads.h" #include "prefetch.h" #include "avl.h" #include "globals.h" @@ -3105,20 +3106,44 @@ check_for_orphaned_inodes( } static void -traverse_function( +do_dir_inode( struct workqueue *wq, - xfs_agnumber_t agno, + xfs_agnumber_t agno, void *arg) { - ino_tree_node_t *irec; + struct ino_tree_node *irec = arg; int i; + + for (i = 0; i < XFS_INODES_PER_CHUNK; i++) { + if (inode_isadir(irec, i)) + process_dir_inode(wq->wq_ctx, agno, irec, i); + } +} + +static void +traverse_function( + struct workqueue *wq, + xfs_agnumber_t agno, + void *arg) +{ + struct ino_tree_node *irec; prefetch_args_t *pf_args = arg; + struct workqueue lwq; + struct xfs_mount *mp = wq->wq_ctx; wait_for_inode_prefetch(pf_args); if (verbose) do_log(_(" - agno = %d\n"), agno); + /* + * The more AGs we have in flight at once, the fewer processing threads + * per AG. This means we don't overwhelm the machine with hundreds of + * threads when we start acting on lots of AGs at once. We just want + * enough that we can keep multiple CPUs busy across multiple AGs. + */ + workqueue_create_bound(&lwq, mp, ag_stride, 1000); + for (irec = findfirst_inode_rec(agno); irec; irec = next_ino_rec(irec)) { if (irec->ino_isa_dir == 0) continue; @@ -3126,18 +3151,19 @@ traverse_function( if (pf_args) { sem_post(&pf_args->ra_count); #ifdef XR_PF_TRACE + { + int i; sem_getvalue(&pf_args->ra_count, &i); pftrace( "processing inode chunk %p in AG %d (sem count = %d)", irec, agno, i); + } #endif } - for (i = 0; i < XFS_INODES_PER_CHUNK; i++) { - if (inode_isadir(irec, i)) - process_dir_inode(wq->wq_ctx, agno, irec, i); - } + queue_work(&lwq, do_dir_inode, agno, irec); } + destroy_work_queue(&lwq); cleanup_inode_prefetch(pf_args); } @@ -3165,7 +3191,7 @@ static void traverse_ags( struct xfs_mount *mp) { - do_inode_prefetch(mp, 0, traverse_function, false, true); + do_inode_prefetch(mp, ag_stride, traverse_function, false, true); } void From patchwork Wed Mar 31 06:01:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gao Xiang X-Patchwork-Id: 12174353 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15EF5C433E5 for ; Wed, 31 Mar 2021 06:02:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC663619D9 for ; Wed, 31 Mar 2021 06:02:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233733AbhCaGCE (ORCPT ); Wed, 31 Mar 2021 02:02:04 -0400 Received: from sonic303-23.consmr.mail.gq1.yahoo.com ([98.137.64.204]:39022 "EHLO sonic303-23.consmr.mail.gq1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233670AbhCaGBn (ORCPT ); Wed, 31 Mar 2021 02:01:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1617170503; bh=jQFLtUMpum7DIShm1LZrwbac5hctvXo9TuoGTQbQLxQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Subject:Reply-To; b=Foj2VuGsQKbRxsDHg2XR6XpegaWJOv1wdA6G0B473z+IXGsbr/COwCEKiJFdZMqMki2o2pfiCjr3/u5AaKVwPS3X2AOI2BwUfxcu7LH2l9vQF/meGGvQTWGNX4mfZ5/kXfhB6fC5eLIfYgmPOR/HXKmhagQbMNdE/MmgwvCQuJ3S3d1LEyMIcXL/dyuC5lplsTJGxrUO5yz4ZxwO6R52mOWmaYVVAFFhUe0EWie/8WoRULQxTJtzWQlzql69y0GxDv5d9YtoN6qsYA+vMnQ3avA/k23X+Riq/fHFlSFP/A/JkFyZrNGlL/d+QH7/TJTqXohKf/kbHEx5uOV/mfQLbA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1617170503; bh=pPOeJwU0Ib+wDzw1XSBEKThHdbUkjXVcJJI+pRJJ1hN=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=W+IDSdKAOEkLntKdxR6L+PHZjT/XoI84uXgoq8AK3FpJ4FiwkCGYNIp/DHfj3syKP3SDAhKuqMCqfCP28nU9P3QPkkKKk+C053bynBO1nCGPucTttfzl9P2l5s+zC+s5NadOD+HpMIcJb0mEULxpsBO4zPhGRZ62DwKZga0Rp/HEcLTwhONUIi6U152WlDfLxCmMdR0Sruj3DvqQhTba2MCwpTscFlHT/FkWJvXsgfiEY0nf1gvV8m11B99YYa4YOg+//IlalQJi21gCiqSqQIJ6Nohp49LSAFNjX2zeaQ1B7UiHp6K9rUTRSBG6Ox6dAxjTuWq80Qf7dcpM8BR2Pg== X-YMail-OSG: Y2O17P8VM1lUTvPb8RYfUf0sggTfYIFmIkj3QGC3tvFCpU6giSJy2w8svse75uR tADt_lQO32m4L.R7aN9ZqYoo5yhVJOI8d2Tva338Fvqi88TgFpUUmOMjAHeZZBN8.o9SDLNqokjT zUwxGQNCWtpZOAF5qEkuMECHSXZirgSWtq1kqVRMRRs6vWONuHVWtc0emvmL73kUIBkJ.lKCIDY8 fGgC4vVBo00.BqMelXNOr1TaqHWSHjA_QQ5TCJKMFnhtDPCx.7OeXSw.ZrINeXd39_ThRdaKGBhm 9DC24KWPIJyJSjc51U.hi5RwF3XxKaz6KWijE.vBhYKOxTAcNIMQdol.EAzTlk0eQ62rXt.PtdoK gPqd6BfHaut4zbThnxyIdgD1ckEhDWWzK8QJ0s_I2w2JENhS2h.TLYowSA6bO3AT3vtBMVPr1D8e vIWbWdE3jnJJEzBflFjx0m9mJQtm53DZA49TUR9x1bZLB.SetOKT471bbNAfLOxy7GG_raEQ7FtH rW7Xh0s.4a6.s_Mn5GjLRXs1ByKkpTGIUi59uBXkxZhYY9SJSEv8gTj57pUu90oOeLDAfk6Nbfss pWJfak2UnN6Mrt1Z0kbuWb0pbjEN1PgFPRTIyO2Qggq7ws5.sB0AwqZEtBeidzCTODPtR4FLg.YM iaMX_UaKihTJLeI7yTc1n5APSTvuxRFuwZrq6s1_85Z_Fb6BAqs3My2GJdZzxW2_GgQ2iONOUj1y QTyL82609EyzemF4gLSLBBAVN5GDjcfrNuOYWyTtxErprdMvwKuCiq5wOE1HgcOlzu2fKjV2zFMG WWTcfD91lxmVReL9QzIpM08li9GCx_Cu7XUkTlajryOm3rUk6rFCwvMuiy4zMILPFPcrfa4VwfPr lAnCQsOcxX17hZck1H.uwBIK1ngNxrPD.n_z31RnvYNW8w2daqrTW7Bp3lBmFq7x4X0Zo.JGPhjd 1EhyHOnfsp16pL8lBSubRBKxdMWF.q80QvA0cYb5V7CXoC4O5OpZPkWuuxVrPbztc1l9UQoVdCtl EslZz8bM7kPSUGqk_X9gpSADh_mFrGyl4sjzjkxhjiluT8XUd29HxuFNWLyZAfd7yBD_z3utvfX. dObx5VtUMCwYiJZZHR92YJ2bTRLG668qRBtmc7SVlwfDYSI_s1b3SBWG2hrJkemnohhxJcWTKyHq eAXu0h66o2dwVFpx2d5Wat71PoXcbrl.UAQ6F_aGZCqTlBlioSH49tAFXyBVObtUfE9FQGZL3Wzo a2RID1HrM8wFxpjHpp8Uc_kja2vkAhGs0UcphSx6RcgmwdEGrIpWTgcKfmyyUAgSmrHKDm.NAvUs Eppq24Rtvfva5fbse1oOxdDTSec8cFLikmL8c.iQZfE2MktNbX8KDlUOaiVnB61Jce3i7b6ePtyv cQQRYLGSXuWfDO.1Y3iwq75YbZQXVF6WAraw4MA2cewjplskO8LmCuajuhVQudS1jQbKMr87Qw1N 2.tYnAdhB.iQ_KfkXyt0CqzdLUIzhvrcRYnq4S38vRvRrOYR7dfO6WP6HNPSiOX4cDFp3qxSQ.tR rZoBXVv5FMxgeHp3aklUAKAgVWEGHF60j2x10LhvkKxHM8vEnT.nuO0XAp6jNnxMS2STMV9g99Lj gZU0N.Boqv.ry9bcb_K32adnim1Es7v4y9cqcy_dhW24eIpGScRq5bHjYIONtXt_UfhRTGf3scZ5 7FbVZOiZSmmtci9kiIBFOW5l41ReWUE2EattfALj_BrFRqnVpXEi3R4WtlpVVBUUnUT_5HWGTmvX E5suovTZ75OD4aw61sH7jAnZ7HDt0zDdOyFQ4JLe99Xc_qN9CVqgvHb9fAaoECsuAS3fDsAD1n79 1cIc0HPqbJJsijeflPSBj5elZCFFESXoKujYNdVZNrir.jKowwy5w5H7ZPAPcsH.NG_Z_e0rWKjN Wi1VmeQWBTdYFbNt0uTqA_0FbIAUd_osrvA4u.n46xD0znSTj8pCuv6UmYR3qUl_nWyGrikelKYg 4u7yA3bbAbhjNPEcR9Wz.73cdbwb5SqyvxWbL51zGBeucaaDhEqJSCo8aaY9Rz4X_a474fxrC5Rw sdfpOdFQdPW.hg0MBTyWSBKHbiHqE_8PwAOlcROZ98r_EeXBS9oy2NiIa1Cyue_Q2LCKqXqFDqi8 ZyXaKxi8jhoNXMUZmAtcVkWYnlM_oJuaicvVfrfnQ.848ZD2WR8HJPSB7Um3yW..QImmCjFB4K3t lp3kaEHy_X_nwHrQaDQ39q.4X4V1U5WuLHcntWD1c.ozklcB_kZSTnDqpA0xY0bQYT72xE7rfSS6 J6ShN1gXuv1tNHO3EdM_AfUNQNQCSKokMdMMAZsHk7RRMtwGdLQkSK7tK5o0IecpFQxX4fCij6nJ BagovMrSbc3eKYIEgZQ6KVG.pdc98QvQ0Z5MDOm0uUdGa4e6rCUN2AwxxiWTbdKY4PuqfteK10PQ UFKS4KfaUsRKkdeoEzNycqGAnU.vEo.RtotH8NVVuLp29f88ITFXQducSBdlyr1CtuNua5kUPXRS JxlTnPXzxLnh03vzLjW0bZQ8pPj7lW2_nvfVez7cnCGInQKk1C1RDzMD9bwaRpUGRS0c9CN_XUbo 05ugw94OK0dBBZ69_Nz6284fYIpP57CKL.4ny6tFYkVSbvA7o4X9U6UEMbJV77K._WZ8YaQ7wstb yz_PD24tUJv0vKg-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.gq1.yahoo.com with HTTP; Wed, 31 Mar 2021 06:01:43 +0000 Received: by kubenode525.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 190a809d5456af9661811acc5a61b089; Wed, 31 Mar 2021 06:01:37 +0000 (UTC) From: Gao Xiang To: linux-xfs@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Gao Xiang Subject: [PATCH v4 5/7] repair: don't duplicate names in phase 6 Date: Wed, 31 Mar 2021 14:01:15 +0800 Message-Id: <20210331060117.28159-6-hsiangkao@aol.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210331060117.28159-1-hsiangkao@aol.com> References: <20210331060117.28159-1-hsiangkao@aol.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner The name hash in phase 6 is constructed by using names that point directly into the directory buffers. Hence before the buffers can be released, the constructed name hash has to duplicate all those names into meory it owns via dir_hash_dup_names(). Given that the structure that holds the name is dynamically allocated, it makes no sense to store a pointer to the name dir_hash_add() and then later have dynamically allocate the name. Extend the name hash allocation to contain space for the name itself, and copy the name into the name hash structure in dir_hash_add(). This allows us to get rid of dir_hash_dup_names(), and the directory checking code no longer needs to hold all the directory buffers in memory until the entire directory walk is complete and the names duplicated. Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Gao Xiang --- repair/phase6.c | 101 ++++++++++++++---------------------------------- 1 file changed, 29 insertions(+), 72 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index e51784521d28..df8db146c187 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -72,15 +72,15 @@ typedef struct dir_hash_ent { struct dir_hash_ent *nextbyorder; /* next in order added */ xfs_dahash_t hashval; /* hash value of name */ uint32_t address; /* offset of data entry */ - xfs_ino_t inum; /* inode num of entry */ + xfs_ino_t inum; /* inode num of entry */ short junkit; /* name starts with / */ short seen; /* have seen leaf entry */ struct xfs_name name; + unsigned char namebuf[]; } dir_hash_ent_t; typedef struct dir_hash_tab { int size; /* size of hash tables */ - int names_duped; /* 1 = ent names malloced */ dir_hash_ent_t *first; /* ptr to first added entry */ dir_hash_ent_t *last; /* ptr to last added entry */ dir_hash_ent_t **byhash; /* ptr to name hash buckets */ @@ -171,8 +171,6 @@ dir_hash_add( short junk; struct xfs_name xname; - ASSERT(!hashtab->names_duped); - xname.name = name; xname.len = namelen; xname.type = ftype; @@ -199,7 +197,12 @@ dir_hash_add( } } - if ((p = malloc(sizeof(*p))) == NULL) + /* + * Allocate enough space for the hash entry and the name in a single + * allocation so we can store our own copy of the name for later use. + */ + p = calloc(1, sizeof(*p) + namelen + 1); + if (!p) do_error(_("malloc failed in dir_hash_add (%zu bytes)\n"), sizeof(*p)); @@ -220,8 +223,12 @@ dir_hash_add( p->address = addr; p->inum = inum; p->seen = 0; - p->name = xname; + /* Set up the name in the region trailing the hash entry. */ + memcpy(p->namebuf, name, namelen); + p->name.name = p->namebuf; + p->name.len = namelen; + p->name.type = ftype; return !dup; } @@ -287,8 +294,6 @@ dir_hash_done( for (i = 0; i < hashtab->size; i++) { for (p = hashtab->byaddr[i]; p; p = n) { n = p->nextbyaddr; - if (hashtab->names_duped) - free((void *)p->name.name); free(p); } } @@ -385,27 +390,6 @@ dir_hash_see_all( return j == stale ? DIR_HASH_CK_OK : DIR_HASH_CK_BADSTALE; } -/* - * Convert name pointers into locally allocated memory. - * This must only be done after all the entries have been added. - */ -static void -dir_hash_dup_names(dir_hash_tab_t *hashtab) -{ - unsigned char *name; - dir_hash_ent_t *p; - - if (hashtab->names_duped) - return; - - for (p = hashtab->first; p; p = p->nextbyorder) { - name = malloc(p->name.len); - memcpy(name, p->name.name, p->name.len); - p->name.name = name; - } - hashtab->names_duped = 1; -} - /* * Given a block number in a fork, return the next valid block number (not a * hole). If this is the last block number then NULLFILEOFF is returned. @@ -1383,6 +1367,7 @@ dir2_kill_block( res_failed(error); libxfs_trans_ijoin(tp, ip, 0); libxfs_trans_bjoin(tp, bp); + libxfs_trans_bhold(tp, bp); memset(&args, 0, sizeof(args)); args.dp = ip; args.trans = tp; @@ -1414,7 +1399,7 @@ longform_dir2_entry_check_data( int *need_dot, ino_tree_node_t *current_irec, int current_ino_offset, - struct xfs_buf **bpp, + struct xfs_buf *bp, dir_hash_tab_t *hashtab, freetab_t **freetabp, xfs_dablk_t da_bno, @@ -1422,7 +1407,6 @@ longform_dir2_entry_check_data( { xfs_dir2_dataptr_t addr; xfs_dir2_leaf_entry_t *blp; - struct xfs_buf *bp; xfs_dir2_block_tail_t *btp; struct xfs_dir2_data_hdr *d; xfs_dir2_db_t db; @@ -1453,7 +1437,6 @@ longform_dir2_entry_check_data( }; - bp = *bpp; d = bp->b_addr; ptr = (char *)d + mp->m_dir_geo->data_entry_offset; nbad = 0; @@ -1554,10 +1537,8 @@ longform_dir2_entry_check_data( dir2_kill_block(mp, ip, da_bno, bp); } else { do_warn(_("would junk block\n")); - libxfs_buf_relse(bp); } freetab->ents[db].v = NULLDATAOFF; - *bpp = NULL; return; } @@ -2215,17 +2196,15 @@ longform_dir2_entry_check(xfs_mount_t *mp, int ino_offset, dir_hash_tab_t *hashtab) { - struct xfs_buf **bplist; + struct xfs_buf *bp; xfs_dablk_t da_bno; freetab_t *freetab; - int num_bps; int i; int isblock; int isleaf; xfs_fileoff_t next_da_bno; int seeval; int fixit = 0; - xfs_dir2_db_t db; struct xfs_da_args args; *need_dot = 1; @@ -2242,11 +2221,6 @@ longform_dir2_entry_check(xfs_mount_t *mp, freetab->ents[i].v = NULLDATAOFF; freetab->ents[i].s = 0; } - num_bps = freetab->naents; - bplist = calloc(num_bps, sizeof(struct xfs_buf*)); - if (!bplist) - do_error(_("calloc failed in %s (%zu bytes)\n"), - __func__, num_bps * sizeof(struct xfs_buf*)); /* is this a block, leaf, or node directory? */ args.dp = ip; @@ -2275,28 +2249,12 @@ longform_dir2_entry_check(xfs_mount_t *mp, break; } - db = xfs_dir2_da_to_db(mp->m_dir_geo, da_bno); - if (db >= num_bps) { - int last_size = num_bps; - - /* more data blocks than expected */ - num_bps = db + 1; - bplist = realloc(bplist, num_bps * sizeof(struct xfs_buf*)); - if (!bplist) - do_error(_("realloc failed in %s (%zu bytes)\n"), - __func__, - num_bps * sizeof(struct xfs_buf*)); - /* Initialize the new elements */ - for (i = last_size; i < num_bps; i++) - bplist[i] = NULL; - } - if (isblock) ops = &xfs_dir3_block_buf_ops; else ops = &xfs_dir3_data_buf_ops; - error = dir_read_buf(ip, da_bno, &bplist[db], ops, &fixit); + error = dir_read_buf(ip, da_bno, &bp, ops, &fixit); if (error) { do_warn( _("can't read data block %u for directory inode %" PRIu64 " error %d\n"), @@ -2316,21 +2274,25 @@ longform_dir2_entry_check(xfs_mount_t *mp, } /* check v5 metadata */ - d = bplist[db]->b_addr; + d = bp->b_addr; if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { - struct xfs_buf *bp = bplist[db]; - error = check_dir3_header(mp, bp, ino); if (error) { fixit++; + if (isblock) + goto out_fix; continue; } } longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot, - irec, ino_offset, &bplist[db], hashtab, + irec, ino_offset, bp, hashtab, &freetab, da_bno, isblock); + if (isblock) + break; + + libxfs_buf_relse(bp); } fixit |= (*num_illegal != 0) || dir2_is_badino(ino) || *need_dot; @@ -2341,7 +2303,7 @@ longform_dir2_entry_check(xfs_mount_t *mp, xfs_dir2_block_tail_t *btp; xfs_dir2_leaf_entry_t *blp; - block = bplist[0]->b_addr; + block = bp->b_addr; btp = xfs_dir2_block_tail_p(mp->m_dir_geo, block); blp = xfs_dir2_block_leaf_p(btp); seeval = dir_hash_see_all(hashtab, blp, @@ -2358,11 +2320,10 @@ longform_dir2_entry_check(xfs_mount_t *mp, } } out_fix: + if (isblock && bp) + libxfs_buf_relse(bp); + if (!no_modify && (fixit || dotdot_update)) { - dir_hash_dup_names(hashtab); - for (i = 0; i < num_bps; i++) - if (bplist[i]) - libxfs_buf_relse(bplist[i]); longform_dir2_rebuild(mp, ino, ip, irec, ino_offset, hashtab); *num_illegal = 0; *need_dot = 0; @@ -2370,12 +2331,8 @@ out_fix: if (fixit || dotdot_update) do_warn( _("would rebuild directory inode %" PRIu64 "\n"), ino); - for (i = 0; i < num_bps; i++) - if (bplist[i]) - libxfs_buf_relse(bplist[i]); } - free(bplist); free(freetab); } From patchwork Wed Mar 31 06:01:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gao Xiang X-Patchwork-Id: 12174357 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2570BC433E4 for ; Wed, 31 Mar 2021 06:02:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0CA2C619DF for ; Wed, 31 Mar 2021 06:02:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233731AbhCaGCD (ORCPT ); Wed, 31 Mar 2021 02:02:03 -0400 Received: from sonic304-23.consmr.mail.gq1.yahoo.com ([98.137.68.204]:38915 "EHLO sonic304-23.consmr.mail.gq1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233734AbhCaGBn (ORCPT ); Wed, 31 Mar 2021 02:01:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1617170503; bh=7IUf5etUR4dCzujJK/Mt7eksviSNatZqAHagCErASOE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Subject:Reply-To; b=PomreJ8LVeQKKDzON5AmI/dosTsJ17fUMhd9ilPMRPnMXI4xexBsmFcTpZ5YwHYNdB43DQpokBnrnh/iDtcg2JXQFLwiE3UCk18muRm5GCF0+SzCx1SyG90bLa1PRr+rhNbBejV+Zg/0LR8K4bAY+1id5nwGV6mbRhd9Bkh+kOedcCMsX+6ti6M9MO+u5faEyf4rMW98uxkos0JS1YP+6px34mJq1R9YQXrhDlaWc8X/Ux5xgpkLJGOiM6OAVvIqQMBKNa+XO/DC8EW73D27Feu7uu8LzBueqmGEqZ/lRs2iIwdDdk7wWEsLIhrXumOIcTWB+Ko0DOeIwH74yUU4vw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1617170503; bh=o2ynFn0e6cBF7+HS0AgaXDYiTtAQwd3Q22GJjhj19IG=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=PQWTTdlKSK7pUH6eIC+zVsabgUy/OZyc8Mhi0jOX9oIudhsTcoRRZ63EmV5aysK8PsVq+0T8UMNl/vCbjY/n+V8sBIwlY4keY7duqN2pfGYe/kwDPY0jeNjDwk0nsSdeKhVha8LZs2zQ4W4Z08bTFmqBU0L/QF+GGvwnoUJCCn6RPFF2ePSQJ/ffCWrlPDAMpdksZMKfp5biG5dLWqcDZJIhJo/Mfr8uPC5cLu0q09w0ieCMdMuGXgBUB6hlUz48dvJcs7hyNh5Fsi0rwnioC2F/+Cz0iDG9o84aaUthqAONF4JZabf7nUnKh/B+8pVxryXhGGW3deX7ZH5YzyQIGg== X-YMail-OSG: 2sPjmkYVM1liwQ_IhoH4vntK5iEDcNjHRqePUOPFIztMOaKjeUWbl2Ens9UU1j2 QY428vGYlz1JAnwSzDMj7fvdMLF.Y0vix2qld68Skzf3Gys.ZJAfAkJbs0CClBI.wUb4XHTZyNM_ lsEOjnLLWaavAr8l4O1QyfQQo63.LHE2.kX_.BpdfWzjaZovcRBQrPRo7KTuQ_7yj48T4.fFHVn3 cnnyXJkrrnYtPw1QjACkWzDUqlPZOBUtFdcc3tzoVKl7_eVugx7EqVh.IgsMzSIlKpBhRV4nw4YF 3BxIx62gM7oAWs38CGDc4OPEZZolDY.tAI4wGWWYhufxYCHsZsubkvOwdZy9odrRdHwk.zaodtPl iTc_cI3HzH1HMHcXtwPm3l5ddYE.T2UCSy66qjVtt02ZTzhC1uDNYOH1DHTDTO5oSPZOKQ0NE5MB R1.BMa2LZ4wrX2h.qkJSotLdFn9_H3DGgwETn.ehzRGiNKLYxRJ63lj_6xrqD4kxkgjdxp8N5z4w LMv3lzpOxkQh64l_9CwC2X7fPUu2AbeQjt2MIEMtpdh__bdpifjT.wmYnE5BBG7dN.4EFu6SZeem VuIsR0l8UA9ddYwUCGFjfNj0iY3jAdjU49We05SJykFKlA.q_qcthxHRErzbfz1585Ak7oaihtX1 5lJS7zqJeijonRJARh7gwAA_wRqX0gLu5W7fautSs8fOMzrdsjrdjG_gcRlQWwSCtcTgO2D.1lYw WSpr0_upwP9svlWRGAuseoZCzAw2DWGUgRW0GttgjPnNKES68P0GyQJVEtJ5d6_6uRm.ok9b8wt8 MGa8e9cv51kmO1XVezTxc0757KnWrtjSs741smsjFPp1_5J1qh4Nu11bu7eZb515gomQuhwPYByv pCsr51AfbdMBV07ainUDt.nzIn5N0GWGKC_K9Cy_JGEakST7sRnHF2VuP_4NE95c.mkj6Qq.jy3i A3mkDAiGmacCUp1jV5JTTAVjd.MHU6i_Qp.77zz8BMt7qblP2FEzddXKE7JA6kyuDBJQr6PunCh0 6JOjD2q0QtpitNH.lJXlCryrq0_Vw82eWv3f9dbV28PKzO2x5xwIvunEmurxvLMjqNeT0MSIVScX 0WDEQffWwgZK_Oo4w4.f5zN03WjjaRSci.xuuGsXCWrAQRTijoZL30w5iO8KYmlBT0YYxfHS4SdL IUBF0f.EMkQhErHiQYm_gBoqifDnTeT7WXQNQ7f8hPLN3IjxZKGz7f6fJ6b_hDsnbQ8QZSlN8Kd4 qQe53GWbuGb5zkddTTEGndvSppi2yZ2YK3bcJaIo5QsbofZFc0vu1Ab7sqw2mqXwCKfe5L5SQzlp Ng.s3tbYabctnaWc10HNpuiPFxpj0GEmbkW2eIx3ojOK.m0ckGm7uMtyWEKfdUR9A9lrKI66j.Vp 5Y_7XwvGKUamMhMoUzZcX382083sUd7jTqaWF7jpQODhM8EpOFm_HxAlKnwez3mMzlCjd_fPWpaq V1zWqPgVeUJ0wLV2wLFoGt9aAdoiAIY5Juz.JwcH9KuOoF_VVYAwGmYLWNqYAjiUYU0aDJPwx2Vy QCmZOLx3ezwxSUI9ak4f.DA6Ki.4GDuA8ZdtLougcmp.dIckzorJ4yxOM3bqanImes8Gn62c7nj9 hgnLxQTdvwb20pHVuh4DwvJGkytjICy.ckyONKPVn3sMSoPhcvVBgfRdJ1C1xANR.1AUuZP2JjYY L.yGo43EmYg_gi2Q3fhcWV8rYkGA3Y1099aJVIODjx3B2dtuZCMatYJqrrmV5httc2Fcbi0QUXWy TXSVx3H6wRgO4Ph2_TKQ8ncFM8BTVeQ56._XghuU3dShmvcNikrV8KJk84Uds_UZ8DiVTsISvN3V slAQPyTR6ku1vhLaZJmNpdfBiXLvmL5orUL2aLnM7XvcojVP2CZDyQmWxgfthW1jtvgfPn5HvOvw Chb9OGtkICjeKSnt7pcehE_CIecBRmSr5.SiowWq5DxX6Htoi_VqPvVBbjS43xpRYNsV.DTv_i9p dyoAAzSn7oTg26AFhg_MyaKCHjIXyhmGo7FK8dnDYHROz55NxBPwqzAOls1qkJzqazkYJDzBbPtO 11TdVYrhOydE_pSZ0qXy0kCCCSYx.EERXrLbChbzN5ocDKXzpOSsvj_1WlMCVOqC0ZN7t53FtRev qs5FBMsS3kdlDpDplI3UG1VKnw9W7bUrJtYVJ2ZTocsTI4IbB48TmF15JP0bfrtN1MLKS50ZOfSf ZYtOxjrAYFrfRau0TkCXv6_3TKxttANc3Szyfq78rEaqf1oIT2K8WODdTs4vRoXthSqR0gLH1ntv zlxXRjINiPRbX.ozBq0ug.TaFncNnc4iTpkhia9vMe1oFQfmVLBQxQuY00E4mQ5nbkV6K49O72Hk j5MJ9jYPgIuM4lrUV7MkKlGoacElVj06AEh2rlNH.PT45U0goISOtLMDC7PTVtMZm8IzxMzpWO7a clHZM0HeIDXTE0N1.ZwIaOCddA2xIxUORmvI0bN6sk_7KHhmzkSHi.Bj8BmAJM2ya8nlQySHqxzY Ilewujk1l8wu8mnS.0lGrKVXDGefOtVWSt7npD.an1UyvGFsbe.wFsw0o6jujG.Ij21m0gW_pbXt ChrZzL3Y3pexFAPKYMywEziE03qBGIheG.u1Iu7_3Pl5ANyboXg-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.gq1.yahoo.com with HTTP; Wed, 31 Mar 2021 06:01:43 +0000 Received: by kubenode525.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 190a809d5456af9661811acc5a61b089; Wed, 31 Mar 2021 06:01:40 +0000 (UTC) From: Gao Xiang To: linux-xfs@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Gao Xiang Subject: [PATCH v4 6/7] repair: convert the dir byaddr hash to a radix tree Date: Wed, 31 Mar 2021 14:01:16 +0800 Message-Id: <20210331060117.28159-7-hsiangkao@aol.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210331060117.28159-1-hsiangkao@aol.com> References: <20210331060117.28159-1-hsiangkao@aol.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Phase 6 uses a hash table to track the data segment addresses of the entries it has seen in a directory. This is indexed by the offset into the data segment for the dirent, and is used to check if the entry exists, is a duplicate or has a bad hash value. The lookup operations involve walking long hash chains on large directories and they are done for every entry in the directory. This means certain operations have O(n^2) scalability (or worse!) and hence hurt on very large directories. It is also used to determine if the directory has unseen entries, which involves a full hash traversal that is very expensive on large directories. Hence the directory checking for unseen ends up being roughly a O(n^2 + n) algorithm. Switch the byaddr indexing to a radix tree. While a radix tree will burn more memory than the linked list, it gives us O(log n) lookup operations instead of O(n) on large directories, and use for tags gives us O(1) determination of whether all entries have been seen or not. This brings the "entry seen" algorithm scalability back to O(nlog n) and so is a major improvement for processing large directories. Given a filesystem with 10M empty files in a single directory, we see: 5.6.0: 97.56% xfs_repair [.] dir_hash_add.lto_priv.0 0.38% xfs_repair [.] avl_ino_start.lto_priv.0 0.37% libc-2.31.so [.] malloc 0.34% xfs_repair [.] longform_dir2_entry_check_data.lto_priv.0 Phase 6: 10/22 12:07:13 10/22 12:10:51 3 minutes, 38 seconds Patched: 97.11% xfs_repair [.] dir_hash_add 0.38% xfs_repair [.] longform_dir2_entry_check_data 0.34% libc-2.31.so [.] __libc_calloc 0.32% xfs_repair [.] avl_ino_start Phase 6: 10/22 12:11:40 10/22 12:14:28 2 minutes, 48 seconds So there's some improvement, but we are clearly still CPU bound due to the O(n^2) scalability of the duplicate name checking algorithm. Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Gao Xiang --- libfrog/radix-tree.c | 46 +++++++++ repair/phase6.c | 222 ++++++++++++++++++++----------------------- 2 files changed, 148 insertions(+), 120 deletions(-) diff --git a/libfrog/radix-tree.c b/libfrog/radix-tree.c index c1c74876964c..261fc2487de9 100644 --- a/libfrog/radix-tree.c +++ b/libfrog/radix-tree.c @@ -312,6 +312,52 @@ void *radix_tree_lookup_first(struct radix_tree_root *root, unsigned long *index #ifdef RADIX_TREE_TAGS +/** + * radix_tree_tag_get - get a tag on a radix tree node + * @root: radix tree root + * @index: index key + * @tag: tag index (< RADIX_TREE_MAX_TAGS) + * + * Return values: + * + * 0: tag not present or not set + * 1: tag set + * + * Note that the return value of this function may not be relied on, even if + * the RCU lock is held, unless tag modification and node deletion are excluded + * from concurrency. + */ +int radix_tree_tag_get(struct radix_tree_root *root, + unsigned long index, unsigned int tag) +{ + unsigned int height, shift; + struct radix_tree_node *slot; + + height = root->height; + if (index > radix_tree_maxindex(height)) + return 0; + + shift = (height - 1) * RADIX_TREE_MAP_SHIFT; + slot = root->rnode; + + while (height > 0) { + int offset; + + if (slot == NULL) + return 0; + + offset = (index >> shift) & RADIX_TREE_MAP_MASK; + if (!tag_get(slot, tag, offset)) + return 0; + + slot = slot->slots[offset]; + ASSERT(slot != NULL); + shift -= RADIX_TREE_MAP_SHIFT; + height--; + } + return 1; +} + /** * radix_tree_tag_set - set a tag on a radix tree node * @root: radix tree root diff --git a/repair/phase6.c b/repair/phase6.c index df8db146c187..063329636500 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -66,8 +66,7 @@ add_dotdot_update( * and whether their leaf entry has been seen. Also used for name * duplicate checking and rebuilding step if required. */ -typedef struct dir_hash_ent { - struct dir_hash_ent *nextbyaddr; /* next in addr bucket */ +struct dir_hash_ent { struct dir_hash_ent *nextbyhash; /* next in name bucket */ struct dir_hash_ent *nextbyorder; /* next in order added */ xfs_dahash_t hashval; /* hash value of name */ @@ -77,18 +76,19 @@ typedef struct dir_hash_ent { short seen; /* have seen leaf entry */ struct xfs_name name; unsigned char namebuf[]; -} dir_hash_ent_t; +}; -typedef struct dir_hash_tab { +struct dir_hash_tab { int size; /* size of hash tables */ - dir_hash_ent_t *first; /* ptr to first added entry */ - dir_hash_ent_t *last; /* ptr to last added entry */ - dir_hash_ent_t **byhash; /* ptr to name hash buckets */ - dir_hash_ent_t **byaddr; /* ptr to addr hash buckets */ -} dir_hash_tab_t; + struct dir_hash_ent *first; /* ptr to first added entry */ + struct dir_hash_ent *last; /* ptr to last added entry */ + struct dir_hash_ent **byhash; /* ptr to name hash buckets */ +#define HT_UNSEEN 1 + struct radix_tree_root byaddr; +}; #define DIR_HASH_TAB_SIZE(n) \ - (sizeof(dir_hash_tab_t) + (sizeof(dir_hash_ent_t *) * (n) * 2)) + (sizeof(struct dir_hash_tab) + (sizeof(struct dir_hash_ent *) * (n))) #define DIR_HASH_FUNC(t,a) ((a) % (t)->size) /* @@ -155,8 +155,8 @@ dir_read_buf( */ static int dir_hash_add( - xfs_mount_t *mp, - dir_hash_tab_t *hashtab, + struct xfs_mount *mp, + struct dir_hash_tab *hashtab, uint32_t addr, xfs_ino_t inum, int namelen, @@ -164,19 +164,18 @@ dir_hash_add( uint8_t ftype) { xfs_dahash_t hash = 0; - int byaddr; int byhash = 0; - dir_hash_ent_t *p; + struct dir_hash_ent *p; int dup; short junk; struct xfs_name xname; + int error; xname.name = name; xname.len = namelen; xname.type = ftype; junk = name[0] == '/'; - byaddr = DIR_HASH_FUNC(hashtab, addr); dup = 0; if (!junk) { @@ -206,8 +205,14 @@ dir_hash_add( do_error(_("malloc failed in dir_hash_add (%zu bytes)\n"), sizeof(*p)); - p->nextbyaddr = hashtab->byaddr[byaddr]; - hashtab->byaddr[byaddr] = p; + error = radix_tree_insert(&hashtab->byaddr, addr, p); + if (error == EEXIST) { + do_warn(_("duplicate addrs %u in directory!\n"), addr); + free(p); + return 0; + } + radix_tree_tag_set(&hashtab->byaddr, addr, HT_UNSEEN); + if (hashtab->last) hashtab->last->nextbyorder = p; else @@ -232,33 +237,14 @@ dir_hash_add( return !dup; } -/* - * checks to see if any data entries are not in the leaf blocks - */ -static int -dir_hash_unseen( - dir_hash_tab_t *hashtab) -{ - int i; - dir_hash_ent_t *p; - - for (i = 0; i < hashtab->size; i++) { - for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) { - if (p->seen == 0) - return 1; - } - } - return 0; -} - static int dir_hash_check( - dir_hash_tab_t *hashtab, - xfs_inode_t *ip, - int seeval) + struct dir_hash_tab *hashtab, + struct xfs_inode *ip, + int seeval) { - static char *seevalstr[DIR_HASH_CK_TOTAL]; - static int done; + static char *seevalstr[DIR_HASH_CK_TOTAL]; + static int done; if (!done) { seevalstr[DIR_HASH_CK_OK] = _("ok"); @@ -270,7 +256,8 @@ dir_hash_check( done = 1; } - if (seeval == DIR_HASH_CK_OK && dir_hash_unseen(hashtab)) + if (seeval == DIR_HASH_CK_OK && + radix_tree_tagged(&hashtab->byaddr, HT_UNSEEN)) seeval = DIR_HASH_CK_NOLEAF; if (seeval == DIR_HASH_CK_OK) return 0; @@ -285,27 +272,28 @@ dir_hash_check( static void dir_hash_done( - dir_hash_tab_t *hashtab) + struct dir_hash_tab *hashtab) { - int i; - dir_hash_ent_t *n; - dir_hash_ent_t *p; + int i; + struct dir_hash_ent *n; + struct dir_hash_ent *p; for (i = 0; i < hashtab->size; i++) { - for (p = hashtab->byaddr[i]; p; p = n) { - n = p->nextbyaddr; + for (p = hashtab->byhash[i]; p; p = n) { + n = p->nextbyhash; + radix_tree_delete(&hashtab->byaddr, p->address); free(p); } } free(hashtab); } -static dir_hash_tab_t * +static struct dir_hash_tab * dir_hash_init( - xfs_fsize_t size) + xfs_fsize_t size) { - dir_hash_tab_t *hashtab; - int hsize; + struct dir_hash_tab *hashtab; + int hsize; hsize = size / (16 * 4); if (hsize > 65536) @@ -315,51 +303,43 @@ dir_hash_init( if ((hashtab = calloc(DIR_HASH_TAB_SIZE(hsize), 1)) == NULL) do_error(_("calloc failed in dir_hash_init\n")); hashtab->size = hsize; - hashtab->byhash = (dir_hash_ent_t**)((char *)hashtab + - sizeof(dir_hash_tab_t)); - hashtab->byaddr = (dir_hash_ent_t**)((char *)hashtab + - sizeof(dir_hash_tab_t) + sizeof(dir_hash_ent_t*) * hsize); + hashtab->byhash = (struct dir_hash_ent **)((char *)hashtab + + sizeof(struct dir_hash_tab)); + INIT_RADIX_TREE(&hashtab->byaddr, 0); return hashtab; } static int dir_hash_see( - dir_hash_tab_t *hashtab, + struct dir_hash_tab *hashtab, xfs_dahash_t hash, xfs_dir2_dataptr_t addr) { - int i; - dir_hash_ent_t *p; + struct dir_hash_ent *p; - i = DIR_HASH_FUNC(hashtab, addr); - for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) { - if (p->address != addr) - continue; - if (p->seen) - return DIR_HASH_CK_DUPLEAF; - if (p->junkit == 0 && p->hashval != hash) - return DIR_HASH_CK_BADHASH; - p->seen = 1; - return DIR_HASH_CK_OK; - } - return DIR_HASH_CK_NODATA; + p = radix_tree_lookup(&hashtab->byaddr, addr); + if (!p) + return DIR_HASH_CK_NODATA; + if (!radix_tree_tag_get(&hashtab->byaddr, addr, HT_UNSEEN)) + return DIR_HASH_CK_DUPLEAF; + if (p->junkit == 0 && p->hashval != hash) + return DIR_HASH_CK_BADHASH; + radix_tree_tag_clear(&hashtab->byaddr, addr, HT_UNSEEN); + return DIR_HASH_CK_OK; } static void dir_hash_update_ftype( - dir_hash_tab_t *hashtab, + struct dir_hash_tab *hashtab, xfs_dir2_dataptr_t addr, uint8_t ftype) { - int i; - dir_hash_ent_t *p; + struct dir_hash_ent *p; - i = DIR_HASH_FUNC(hashtab, addr); - for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) { - if (p->address != addr) - continue; - p->name.type = ftype; - } + p = radix_tree_lookup(&hashtab->byaddr, addr); + if (!p) + return; + p->name.type = ftype; } /* @@ -368,7 +348,7 @@ dir_hash_update_ftype( */ static int dir_hash_see_all( - dir_hash_tab_t *hashtab, + struct dir_hash_tab *hashtab, xfs_dir2_leaf_entry_t *ents, int count, int stale) @@ -1222,19 +1202,19 @@ dir_binval( static void longform_dir2_rebuild( - xfs_mount_t *mp, + struct xfs_mount *mp, xfs_ino_t ino, - xfs_inode_t *ip, - ino_tree_node_t *irec, + struct xfs_inode *ip, + struct ino_tree_node *irec, int ino_offset, - dir_hash_tab_t *hashtab) + struct dir_hash_tab *hashtab) { int error; int nres; - xfs_trans_t *tp; + struct xfs_trans *tp; xfs_fileoff_t lastblock; - xfs_inode_t pip; - dir_hash_ent_t *p; + struct xfs_inode pip; + struct dir_hash_ent *p; int done = 0; /* @@ -1393,14 +1373,14 @@ _("directory shrink failed (%d)\n"), error); */ static void longform_dir2_entry_check_data( - xfs_mount_t *mp, - xfs_inode_t *ip, + struct xfs_mount *mp, + struct xfs_inode *ip, int *num_illegal, int *need_dot, - ino_tree_node_t *current_irec, + struct ino_tree_node *current_irec, int current_ino_offset, struct xfs_buf *bp, - dir_hash_tab_t *hashtab, + struct dir_hash_tab *hashtab, freetab_t **freetabp, xfs_dablk_t da_bno, int isblock) @@ -1927,10 +1907,10 @@ check_dir3_header( */ static int longform_dir2_check_leaf( - xfs_mount_t *mp, - xfs_inode_t *ip, - dir_hash_tab_t *hashtab, - freetab_t *freetab) + struct xfs_mount *mp, + struct xfs_inode *ip, + struct dir_hash_tab *hashtab, + struct freetab *freetab) { int badtail; __be16 *bestsp; @@ -2012,10 +1992,10 @@ longform_dir2_check_leaf( */ static int longform_dir2_check_node( - xfs_mount_t *mp, - xfs_inode_t *ip, - dir_hash_tab_t *hashtab, - freetab_t *freetab) + struct xfs_mount *mp, + struct xfs_inode *ip, + struct dir_hash_tab *hashtab, + struct freetab *freetab) { struct xfs_buf *bp; xfs_dablk_t da_bno; @@ -2187,14 +2167,15 @@ longform_dir2_check_node( * (ie. get libxfs to do all the grunt work) */ static void -longform_dir2_entry_check(xfs_mount_t *mp, - xfs_ino_t ino, - xfs_inode_t *ip, - int *num_illegal, - int *need_dot, - ino_tree_node_t *irec, - int ino_offset, - dir_hash_tab_t *hashtab) +longform_dir2_entry_check( + struct xfs_mount *mp, + xfs_ino_t ino, + struct xfs_inode *ip, + int *num_illegal, + int *need_dot, + struct ino_tree_node *irec, + int ino_offset, + struct dir_hash_tab *hashtab) { struct xfs_buf *bp; xfs_dablk_t da_bno; @@ -2397,13 +2378,14 @@ shortform_dir2_junk( } static void -shortform_dir2_entry_check(xfs_mount_t *mp, - xfs_ino_t ino, - xfs_inode_t *ip, - int *ino_dirty, - ino_tree_node_t *current_irec, - int current_ino_offset, - dir_hash_tab_t *hashtab) +shortform_dir2_entry_check( + struct xfs_mount *mp, + xfs_ino_t ino, + struct xfs_inode *ip, + int *ino_dirty, + struct ino_tree_node *current_irec, + int current_ino_offset, + struct dir_hash_tab *hashtab) { xfs_ino_t lino; xfs_ino_t parent; @@ -2745,15 +2727,15 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " is a duplicate name"), */ static void process_dir_inode( - xfs_mount_t *mp, + struct xfs_mount *mp, xfs_agnumber_t agno, - ino_tree_node_t *irec, + struct ino_tree_node *irec, int ino_offset) { xfs_ino_t ino; - xfs_inode_t *ip; - xfs_trans_t *tp; - dir_hash_tab_t *hashtab; + struct xfs_inode *ip; + struct xfs_trans *tp; + struct dir_hash_tab *hashtab; int need_dot; int dirty, num_illegal, error, nres; From patchwork Wed Mar 31 06:01:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gao Xiang X-Patchwork-Id: 12174359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C689C433E9 for ; Wed, 31 Mar 2021 06:02:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3886D619D7 for ; Wed, 31 Mar 2021 06:02:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233670AbhCaGCE (ORCPT ); Wed, 31 Mar 2021 02:02:04 -0400 Received: from sonic307-54.consmr.mail.gq1.yahoo.com ([98.137.64.30]:44960 "EHLO sonic307-54.consmr.mail.gq1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233736AbhCaGBs (ORCPT ); Wed, 31 Mar 2021 02:01:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1617170508; bh=jGveTCeHipi+FNag3xMuS7B1EN1XNz9bAo083NUWoEQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Subject:Reply-To; b=tTBZqPkNpRntIlJ5GAUPq9/Sa+4sMwC2k8PFak0R29xv6YweHZ5WFzyEi8GPZHFJ+c+y0UXaCi/cYFIntuHXLdqO9OCE6/5nqZyYbKNuHw1l6sCqWojniIo57XkYsJG3cl42c+Wr96gsT3djfFMVjcSbTloCfr2xDrq872Tq4/PX428iMpNMfhaSdxnGUuupKILS9v0zxCVw8a0YSZXYZOnwMZp4z7WipwkYkXtd9RTIyiNBxHOxRT520UgvmJhSa8lY9cfMPssmyVs6EGxBgMbK4YQmSYq0Q8Tn3XWDaVsV7xatbGf5QvIUf1PqVZJbMhhVG0z629XTvpRGhXm3sw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1617170508; bh=Ahbczpnjom6lXFyQP+WlTNIYdWJHKe31OtzkFn6RvPc=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=Vl+giXxLAIOzAtKTHVlk/NLdy3s4dazbfVCYYPZUTPiXN/hfoKmP9qdn5tqG02E8CkSvoIGpoCzv/MXCYUcLLOzLBMZdwin4+zIv3hY7mAOgvGrDq4zye5DC6SJvg6wGLgPBzOuGPNpoawYKvyFeuRw8nzDqtCiNpwKvWwMp7euwTCroxcxFvuYXjjC6r5op650srd3EU4P6Kdj51BmIj2dxPmFEoAtKp9bwfwk/mC8AFArUl8P8fbsPoZVphOjAeBKHcOC+sVnUyAFQ0VULjb8DbBqPA+zx1hkycpJFXs8tNaSqSYYgE4MRIKx1FojjUF5ShqJnfBGLOkVR9zUV6w== X-YMail-OSG: IGub3MUVM1nupkJ1TpOzab8dJcaKePX7fnz0ODY7GuPBHGiktbv28rsfAhMlS6D De8dDHoTNn.MT92gyp89pH53zEhbXk35aFI_ODfvTEbFNYai97Ser6rZX3AT1GMtbuNoa2c4cl3I go_Vr4iiZz6INat_l1pw6lpjUwuIweg99t3CT052Yp4dBLOSpvfHp1HCdeJYUPPeurejcoFCkdzb IXncuiTj.p3U6ACphSmD93dx1FIzrX1L7iiJgue2BuFuXMNgam377IkLxxnNG2rnjhtJ4UWpjAe6 HhEA7oiiViF_ANvsU0uBXYovcoaKnbXfmBNvT8g08wcXtw4i8eb3Bsxvqc4WzATQWij.0N1d2enp 4xEmsUg10FnLIVCX.eG.Lqk5MuyobNz_GVrAyLGVWcwL76.G6yYnmcWH_ZhRXss9FX5WPFukQBnN mcpSJpZjkMGgn2lS5AMVVAUa1m5SFkv7eQbsvnOWo3XPNmv3mjyxRA_QlYWulcraOzT5tjsJLA_K HybgR1P6T7fE1IREhLBg2hH1PqNPY1udcbOLcpr9sc9NVYe3bdD9FQD8xQ3bhMtxAoeJOpWjg152 5XhtPtN6UoNazGdVyP5xm5UV.L_cOdlODf80g1yoBEFdOAvuCyiPSYUhV_Pf7frQfUlzYMiThAc1 4vNT.mgN4Ft7KWmYobofFqWJykzTnLiCk6iEMCKmYn3qAQI5y7Acel3yYntaREhHJf.0ZSwWh7rV Z9w1BUra69j_txE63O_Xz6eybAkyHw_Day3tt0i.fVD5uXCXvzwONN5b.E.GT57QqhShI1WZKw3u ojCZ6lY7IHgmuIPQxKqaRdAn3EQ3qLO6QxgKTNyQjQflR9oAK_DlMNXStJkFmSJt4UiOvWIV.MQ5 B1xBCZSMpm2clk_j2GAICGTTobExaFfZGkcLyIjYDe.F3rumye4dDk7Gs19LCfbn.koHewKd8UBF GU8lBMK4rbPwzIgx4ZkKZo8lGAAMUOg.ebQjFJRxrXsqUyMNzmz31hYFZxC.TxRytdyI7QA3rviR xg7OGW4YgxSQsyCN_AuKcihYvXzK_iKDQP9oTKdNV6olEyWvtc8OKdFxh6O9XCIv_HMAjJou5des 9rlgRbscrjKOu8vpc6wX_VwA2GU9sToBvw6TMn87qbUIzieGvR11zUwPADx8eyAOGpkYNRfWWgTD rHISpbRi6Y.X46exkMDRd9RHDHiBx1L9cCvONByiN4yTpoHJ.TfmASHgfCyyzvmy.8c8jPmXPAse GyhVMZHxeyyU9sLFY8hWkhiG4WQwuDAuCoTpNtt5fYwMdsMnxmCAwBRhnf7gg7TzhHuMCB930pdO RO_z4Flni1iK_haq9114m_xoH9TNltztLHWX1.bD0Te8WTUlT0N7diHW8WSY32pOnFNkksDA1Jf8 Ot_ocgcxhFZ_rPgFevZm9IoXtgyEMI2k3kVz9bN1CdfVRhlKFsRfUvOeX7HkzqsQ4K3Nc34krB8m TD6pTLRsf_FTFHJFkMOIrtUgiGGTWjjmFZnLa.g9UULTvkdXjpzqSWsJhKwrEAQlUffdkQsSHtNp YrkY9qvuyQCb7z54XiAv333LTlGuv5a.scdrh.ExC4hj0h3TYHjwGbCh7po681Pi3fpw8i7e84an 3QYwOEwO.FxrxHOXcwgDomwC7O1b_H96i0q_gipXK.bVo50.S9iyaDyjC9uK46CKoIVW5Nfkyu7K vRCiRySdrTzKMJhh302a8pdkJn1wSpQAaDFBioH2KzE9fpse_2xYKPik9ApCANGTUcyHeQvauYR. TORTyceDsyeHSPrZryz_AS4A8JuAiOP4kai6pgUwaPO7YfKTpCn18oJBw6V8N3tzoU1_T26aHkr2 qm66IciewKJCY5SHp8JuD3Bl0..5zwgjuMIXitD0GuHYJET0DVs.kmwpv9eCgBpbugPRpSleFa5Q qC5rHe2dZGH0hkkY8FIv2AZIW1NTtNgsNqS.U7M1f_qw1RjYwwqI9VMwLOfkbU_yvoPLP.UY6efR VMMaelghaEb36vcCiAaPP11K2_NT2SgyfD72m1SrBVNa6niAMMGE3lfMZbzCALCb2T2lg6JY26Hf ib_KwQsQUC0aQSpM9jiHuKERd2fzzA9vw7rtU_YSLEHVp9urqYaX.bOPwVRmJt2mc30AxG6EOYND 08Wwl_e6NMGT1UXBsqaiuOugW88PJwju.CO6LsjIowhYosmSUpKN5M0eKZ_kmdV2PG2fhr9QfzIl XtPJTnlNLExjDq7Ms3jB2CPVmv1IxG5OyJ7lnkgW9PHGAKScIut0utxFQJTBm3SwnNsg45E3ThW4 VI9ntIqWv5VJuZQrtcAnbazxhoKoqqLq3qC8TCzxjxn_79TGJgTpPjZklDPUDjcoN9_TuiXWJtdQ aWYvggwmHQovN1J8pNHcCtcF.YyIo0BvpOy8CPTezP7Rs.5Xr0oWmphzBo0.zq3_G_WE9D_CsmRh RUBQOh3L2NhpFBU2iB27Zx0woF.qMh6RFhURdoNvFWvBerM7KK4Teh2LA5abBtEDwvIBIdQX7DeZ u7PcYWZTbaFd7BfoaA3pzGjkqvKrQmgwht4G7biaHadPf0QAVsN3bCpu78Dmh7rs7qj._RqMvVtE UYJJVLQhhj9eaxpr.W8rAZoSYmZNInSo6o7rRXRwB.gMLG1NvZr4ymrDTPmH9kWmGWrWpuLA5zCp AWD9tr6qdxDyYZwIUWHXPH4Y1 X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.gq1.yahoo.com with HTTP; Wed, 31 Mar 2021 06:01:48 +0000 Received: by kubenode525.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 190a809d5456af9661811acc5a61b089; Wed, 31 Mar 2021 06:01:43 +0000 (UTC) From: Gao Xiang To: linux-xfs@vger.kernel.org Cc: Dave Chinner , "Darrick J . Wong" , Gao Xiang Subject: [PATCH v4 7/7] repair: scale duplicate name checking in phase 6. Date: Wed, 31 Mar 2021 14:01:17 +0800 Message-Id: <20210331060117.28159-8-hsiangkao@aol.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210331060117.28159-1-hsiangkao@aol.com> References: <20210331060117.28159-1-hsiangkao@aol.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner phase 6 on large directories is cpu bound on duplicate name checking due to the algorithm having effectively O(n^2) scalability. Hence when the duplicate name hash table size is far smaller than the number of directory entries, we end up with long hash chains that are searched linearly on every new entry that is found in the directory to do duplicate detection. The in-memory hash table size is limited to 64k entries. Hence when we have millions of entries in a directory, duplicate entry lookups on the hash table have substantial overhead. Scale this table out to larger sizes so that we keep the chain lengths short and hence the O(n^2) scalability impact is limited because N is always small. For a 10M entry directory consuming 400MB of directory data, the hash table now sizes at 6.4 million entries instead of ~64k - it is ~100x larger. While the hash table now consumes ~50MB of RAM, the xfs_repair footprint barely changes as it's using already consuming ~9GB of RAM at this point in time. IOWs, the incremental memory usage change is noise, but the directory checking time: Unpatched: 97.11% xfs_repair [.] dir_hash_add 0.38% xfs_repair [.] longform_dir2_entry_check_data 0.34% libc-2.31.so [.] __libc_calloc 0.32% xfs_repair [.] avl_ino_start Phase 6: 10/22 12:11:40 10/22 12:14:28 2 minutes, 48 seconds Patched: 46.74% xfs_repair [.] radix_tree_lookup 32.13% xfs_repair [.] dir_hash_see_all 7.70% xfs_repair [.] radix_tree_tag_get 3.92% xfs_repair [.] dir_hash_add 3.52% xfs_repair [.] radix_tree_tag_clear 2.43% xfs_repair [.] crc32c_le Phase 6: 10/22 13:11:01 10/22 13:11:18 17 seconds has been reduced by an order of magnitude. Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Gao Xiang --- repair/phase6.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index 063329636500..72287b5c66ca 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -288,19 +288,37 @@ dir_hash_done( free(hashtab); } +/* + * Create a directory hash index structure based on the size of the directory we + * are about to try to repair. The size passed in is the size of the data + * segment of the directory in bytes, so we don't really know exactly how many + * entries are in it. Hence assume an entry size of around 64 bytes - that's a + * name length of 40+ bytes so should cover a most situations with really large + * directories. + */ static struct dir_hash_tab * dir_hash_init( xfs_fsize_t size) { - struct dir_hash_tab *hashtab; + struct dir_hash_tab *hashtab = NULL; int hsize; - hsize = size / (16 * 4); - if (hsize > 65536) - hsize = 63336; - else if (hsize < 16) + hsize = size / 64; + if (hsize < 16) hsize = 16; - if ((hashtab = calloc(DIR_HASH_TAB_SIZE(hsize), 1)) == NULL) + + /* + * Try to allocate as large a hash table as possible. Failure to + * allocate isn't fatal, it will just result in slower performance as we + * reduce the size of the table. + */ + while (hsize >= 16) { + hashtab = calloc(DIR_HASH_TAB_SIZE(hsize), 1); + if (hashtab) + break; + hsize /= 2; + } + if (!hashtab) do_error(_("calloc failed in dir_hash_init\n")); hashtab->size = hsize; hashtab->byhash = (struct dir_hash_ent **)((char *)hashtab +