From patchwork Thu Jun 2 18:18:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 12868059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 190CAC43334 for ; Thu, 2 Jun 2022 18:18:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238011AbiFBSSu (ORCPT ); Thu, 2 Jun 2022 14:18:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237943AbiFBSSg (ORCPT ); Thu, 2 Jun 2022 14:18:36 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AFD24DF78; Thu, 2 Jun 2022 11:18:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=tiDb5N7DtJWgf5i+viE/JdrxQoObOCh/lk7c62dUGdI=; b=lDBClrE1qYUVkdV0xcNhnldQ2t jPvw8vp0i7h+F3C+hopk3B4kOMbXYsyGchjPjy/Y/D0twbGxPPKi/DqWt4TRbwhAyvfm+zO+wWA3r htmDTXbPBCPxKur8QbIKJ++mwrgvvL3xL0cVC23u8doodvwzoSOFNpN2mIOXeoeSX9TGvGzzm9aFk HQoUVe9kPlo5+hwTP6Lok7dO8s49c3XHnrugZ/ustYNGS9UUS6aO0T2ekLcpAsJU1SFzfeg/HG9OG +hV841Tymz1sOhVnMSo9BFxI/g4x52siQNeWJDE9CikEcXhtOM/DbY/xZeWJlZ1kQf7HmN9Q6gsJN wP9YKWaQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nwpOo-00EPmX-8q; Thu, 02 Jun 2022 12:18:22 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nwpOn-000DDi-7J; Thu, 02 Jun 2022 12:18:21 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Donald Buczek , Guoqing Jiang , Xiao Ni , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Thu, 2 Jun 2022 12:18:16 -0600 Message-Id: <20220602181818.50729-11-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220602181818.50729-1-logang@deltatee.com> References: <20220602181818.50729-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, buczek@molgen.mpg.de, guoqing.jiang@linux.dev, xni@redhat.com, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v3 10/11] md: Notify sysfs sync_completed in md_reap_sync_thread() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org The mdadm test 07layouts randomly produces a kernel hung task deadlock. The deadlock is caused by the suspend_lo/suspend_hi files being set by the mdadm background process during reshape and not being cleared because the process hangs. (Leaving aside the issue of the fragility of freezing kernel tasks by buggy userspace processes...) When the background mdadm process hangs it, is waiting (without a timeout) on a change to the sync_completed file signalling that the reshape has completed. The process is woken up a couple times when the reshape finishes but it is woken up before MD_RECOVERY_RUNNING is cleared so sync_completed_show() reports 0 instead of "none". To fix this, notify the sysfs file in md_reap_sync_thread() after MD_RECOVERY_RUNNING has been cleared. This wakes up mdadm and causes it to continue and write to suspend_lo/suspend_hi to allow IO to continue. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/md.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index 2be429874d18..2c07c9508222 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9476,6 +9476,7 @@ void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) wake_up(&resync_wait); /* flag recovery needed just to double check */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + sysfs_notify_dirent_safe(mddev->sysfs_completed); sysfs_notify_dirent_safe(mddev->sysfs_action); md_new_event(); if (mddev->event_work.func)