From patchwork Thu Sep 8 16:15:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12970332 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA456C54EE9 for ; Thu, 8 Sep 2022 16:15:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231501AbiIHQPk (ORCPT ); Thu, 8 Sep 2022 12:15:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231237AbiIHQPa (ORCPT ); Thu, 8 Sep 2022 12:15:30 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39AD35FE3; Thu, 8 Sep 2022 09:15:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=UeuW3y4mcZK68L2IDICGGzPskyQWuMEDeNPUgMXd7i0=; b=txnX3wmwBVF6pn5dOC3TIBAM3e ESAzIGaY3nXB51Q1DSf4BYMsqh00GtapzNjYiCxChVmIVA1FWAhM76F3FrRVFABCd+udkOspoAV9P BMCm8MwUsc2wnSt1Pu0a8i+rM9aOynBAAUOFg9Om4t5RzhneoJttbOmLbfjHFFoiNcTeDLN74me3Y n0FRTW+lNraD1V1qYmeCG466UNfTqwEH/CTEdNHIhUErCyrkysVutXTdMLk0h31lIq8mo85vtM6my Ilpk5vu0TxmNlDOB+V9gh8OAfLHgjr3r/UI12mhlbtIO2a7DaWnjVmylbJYSwg1fBsfYxi+uWHTxS pxNaU7ug==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oWKBW-001jJz-RD; Thu, 08 Sep 2022 10:15:24 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1oWKBU-00019D-GF; Thu, 08 Sep 2022 10:15:20 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , David Sloan , Logan Gunthorpe , Christoph Hellwig , Guoqing Jiang Date: Thu, 8 Sep 2022 10:15:14 -0600 Message-Id: <20220908161516.4361-2-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220908161516.4361-1-logang@deltatee.com> References: <20220908161516.4361-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, david.sloan@eideticom.com, logang@deltatee.com, hch@lst.de, guoqing.jiang@linux.dev, Guoqing.jiang@linux.dev X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 1/3] md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org From: David Sloan When running chunk-sized reads on disks with badblocks duplicate bio free/puts are observed: ============================================================================= BUG bio-200 (Not tainted): Object already free ----------------------------------------------------------------------------- Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504 __slab_alloc.constprop.0+0x5a/0xb0 kmem_cache_alloc+0x31e/0x330 mempool_alloc_slab+0x17/0x20 mempool_alloc+0x100/0x2b0 bio_alloc_bioset+0x181/0x460 do_mpage_readpage+0x776/0xd00 mpage_readahead+0x166/0x320 blkdev_readahead+0x15/0x20 read_pages+0x13f/0x5f0 page_cache_ra_unbounded+0x18d/0x220 force_page_cache_ra+0x181/0x1c0 page_cache_sync_ra+0x65/0xb0 filemap_get_pages+0x1df/0xaf0 filemap_read+0x1e1/0x700 blkdev_read_iter+0x1e5/0x330 vfs_read+0x42a/0x570 Freed in mempool_free_slab+0x17/0x20 age=3 cpu=2 pid=7504 kmem_cache_free+0x46d/0x490 mempool_free_slab+0x17/0x20 mempool_free+0x66/0x190 bio_free+0x78/0x90 bio_put+0x100/0x1a0 raid5_make_request+0x2259/0x2450 md_handle_request+0x402/0x600 md_submit_bio+0xd9/0x120 __submit_bio+0x11f/0x1b0 submit_bio_noacct_nocheck+0x204/0x480 submit_bio_noacct+0x32e/0xc70 submit_bio+0x98/0x1a0 mpage_readahead+0x250/0x320 blkdev_readahead+0x15/0x20 read_pages+0x13f/0x5f0 page_cache_ra_unbounded+0x18d/0x220 Slab 0xffffea000481b600 objects=21 used=0 fp=0xffff8881206d8940 flags=0x17ffffc0010201(locked|slab|head|node=0|zone=2|lastcpupid=0x1fffff) CPU: 0 PID: 34525 Comm: kworker/u24:2 Not tainted 6.0.0-rc2-localyes-265166-gf11c5343fa3f #143 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: raid5wq raid5_do_work Call Trace: dump_stack_lvl+0x5a/0x78 dump_stack+0x10/0x16 print_trailer+0x158/0x165 object_err+0x35/0x50 free_debug_processing.cold+0xb7/0xbe __slab_free+0x1ae/0x330 kmem_cache_free+0x46d/0x490 mempool_free_slab+0x17/0x20 mempool_free+0x66/0x190 bio_free+0x78/0x90 bio_put+0x100/0x1a0 mpage_end_io+0x36/0x150 bio_endio+0x2fd/0x360 md_end_io_acct+0x7e/0x90 bio_endio+0x2fd/0x360 handle_failed_stripe+0x960/0xb80 handle_stripe+0x1348/0x3760 handle_active_stripes.constprop.0+0x72a/0xaf0 raid5_do_work+0x177/0x330 process_one_work+0x616/0xb20 worker_thread+0x2bd/0x6f0 kthread+0x179/0x1b0 ret_from_fork+0x22/0x30 The double free is caused by an unnecessary bio_put() in the if(is_badblock(...)) error path in raid5_read_one_chunk(). The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk"). The previous code checked and freed align_bio which required a bio_put. After the move that is no longer needed as raid_bio is returned to the control of the common io path which performs its own endio resulting in a double free on bad device blocks. Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk") Signed-off-by: David Sloan Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Acked-by: Guoqing Jiang --- drivers/md/raid5.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 4e6d865a6456..734f92e75f85 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5538,7 +5538,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad, &bad_sectors)) { - bio_put(raid_bio); rdev_dec_pending(rdev, mddev); return 0; } From patchwork Thu Sep 8 16:15:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12970335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98516C54EE9 for ; Thu, 8 Sep 2022 16:16:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231582AbiIHQQW (ORCPT ); Thu, 8 Sep 2022 12:16:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232021AbiIHQPn (ORCPT ); Thu, 8 Sep 2022 12:15:43 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD78A14D0F; Thu, 8 Sep 2022 09:15:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=yypPU1IxV0SG3MDtollNVAvP34loykQUhcCgU32lefk=; b=jNuOJ57jy3LdCqgpX/HqPM68qz OT/YXMtTucTfBSMypX3X93FURdCt+XB6gcHRnw92V1kGrbcsgU2Z5UDNS1frydXzc0g1g0Q6TMhrf pJXzPuv5lTtRH9NSIg1DwHMJky7eGpqtuyGwDdAxuR3SBW1kpJOABAiGihqpNZKvxGlZZ+fbBINiK LFcDJSObcr42lXEdEvaRLsO7ZOx8rI2PQ8GU6t3arCyHd91zBwnsXCdUzSSVDq7Un+JydCyAJ3oo9 8NWa1C8zZCZ0UIqHjFzn2SoVnV2vNdYNTqe5vvK1BEhZrqtxspRWFeQ4GK25Uz+W7AV/QPWNlRkqk Hz2ZtriA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oWKBW-001jK0-RD; Thu, 08 Sep 2022 10:15:24 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1oWKBU-00019G-L9; Thu, 08 Sep 2022 10:15:20 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig , Guoqing Jiang Date: Thu, 8 Sep 2022 10:15:15 -0600 Message-Id: <20220908161516.4361-3-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220908161516.4361-1-logang@deltatee.com> References: <20220908161516.4361-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de, guoqing.jiang@linux.dev, Guoqing.jiang@linux.dev X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 2/3] md: Remove extra mddev_get() in md_seq_start() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org A regression is seen where mddev devices stay permanently after they are stopped due to an elevated reference count. This was tracked down to an extra mddev_get() in md_seq_start(). It only happened rarely because most of the time the md_seq_start() is called with a zero offset. The path with an extra mddev_get() only happens when it starts with a non-zero offset. The commit noted below changed an mddev_get() to check its success but inadevrtantly left the original call in. Remove the extra call. Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed") Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Acked-by: Guoqing Jiang --- drivers/md/md.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index afaf36b2f6ab..9dc0175280b4 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8154,7 +8154,6 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos) list_for_each(tmp,&all_mddevs) if (!l--) { mddev = list_entry(tmp, struct mddev, all_mddevs); - mddev_get(mddev); if (!mddev_get(mddev)) continue; spin_unlock(&all_mddevs_lock); From patchwork Thu Sep 8 16:15:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12970333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28939C38145 for ; Thu, 8 Sep 2022 16:15:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231963AbiIHQP4 (ORCPT ); Thu, 8 Sep 2022 12:15:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231804AbiIHQPf (ORCPT ); Thu, 8 Sep 2022 12:15:35 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20AD64F397; Thu, 8 Sep 2022 09:15:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=4voS2AeqdwOOI85+fb4G1DXxw0S1Povq/w9IvgPjUE4=; b=l5bHewt+rRkB9AlL0kS0R9HjKa GE5WA68uxwkv99W2B/apVsVSY25WQLHa2Ko1WK8lCQ89RLHuqXLwUIGZbcqAxrXSvNGhjocG6GmkI DQ7xnL1xlDEWwpnMx52jydii23Y5ICvHaRVKuiwy7/oRWJ8EyrWkFdREUFwY49b70cU7g8e3Mdoz+ THAGuQqvMNndgcyKK+v9ajgYtzHjf581bLsDTx+PZRoJvcHSlJWYOoOKICuXA8OzMkA/vRiUA0WcS jB2jyEdmbps/8YtsiBU+j5Ffy3Riq0SnEb/6BZCYZUFhx8LLcLX6fAF2ssZkkN1Y+FbELOS/dPGCB OVNhwHYA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oWKBW-001jK1-2M; Thu, 08 Sep 2022 10:15:23 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1oWKBU-00019J-Od; Thu, 08 Sep 2022 10:15:20 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 8 Sep 2022 10:15:16 -0600 Message-Id: <20220908161516.4361-4-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220908161516.4361-1-logang@deltatee.com> References: <20220908161516.4361-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 3/3] md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org A complicated deadlock exists when using the journal and an elevated group_thrtead_cnt. It was found with loop devices, but its not clear whether it can be seen with real disks. The deadlock can occur simply by writing data with an fio script. When the deadlock occurs, multiple threads will hang in different ways: 1) The group threads will hang in the blk-wbt code with bios waiting to be submitted to the block layer: io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 __submit_bio+0xe6/0x100 submit_bio_noacct_nocheck+0x42e/0x470 submit_bio_noacct+0x4c2/0xbb0 ops_run_io+0x46b/0x1a30 handle_stripe+0xcd3/0x36b0 handle_active_stripes.constprop.0+0x6f6/0xa60 raid5_do_work+0x177/0x330 Or: io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 __submit_bio+0xe6/0x100 submit_bio_noacct_nocheck+0x42e/0x470 submit_bio_noacct+0x4c2/0xbb0 flush_deferred_bios+0x136/0x170 raid5_do_work+0x262/0x330 2) The r5l_reclaim thread will hang in the same way, submitting a bio to the block layer: io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 __submit_bio+0xe6/0x100 submit_bio_noacct_nocheck+0x42e/0x470 submit_bio_noacct+0x4c2/0xbb0 submit_bio+0x3f/0xf0 md_super_write+0x12f/0x1b0 md_update_sb.part.0+0x7c6/0xff0 md_update_sb+0x30/0x60 r5l_do_reclaim+0x4f9/0x5e0 r5l_reclaim_thread+0x69/0x30b However, before hanging, the MD_SB_CHANGE_PENDING flag will be set for sb_flags in r5l_write_super_and_discard_space(). This flag will never be cleared because the submit_bio() call never returns. 3) Due to the MD_SB_CHANGE_PENDING flag being set, handle_stripe() will do no processing on any pending stripes and re-set STRIPE_HANDLE. This will cause the raid5d thread to enter an infinite loop, constantly trying to handle the same stripes stuck in the queue. The raid5d thread has a blk_plug that holds a number of bios that are also stuck waiting seeing the thread is in a loop that never schedules. These bios have been accounted for by blk-wbt thus preventing the other threads above from continuing when they try to submit bios. --Deadlock. To fix this, add the same wait_event() that is used in raid5_do_work() to raid5d() such that if MD_SB_CHANGE_PENDING is set, the thread will schedule and wait until the flag is cleared. The schedule action will flush the plug which will allow the r5l_reclaim thread to continue, thus preventing the deadlock. It's not clear when the deadlock was introduced, but the similar wait_event() call in raid5_do_work() was added in 2017 by this commit: 16d997b78b15 ("md/raid5: simplfy delaying of writes while metadata is updated.") Link: https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 734f92e75f85..7d1b7e361ef2 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -6783,6 +6783,10 @@ static void raid5d(struct md_thread *thread) md_check_recovery(mddev); spin_lock_irq(&conf->device_lock); } + + wait_event_lock_irq(mddev->sb_wait, + !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), + conf->device_lock); } pr_debug("%d stripes handled\n", handled);