From patchwork Sat Jan 25 01:26:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug V Johnson X-Patchwork-Id: 13950099 Received: from mail.bonnevilleinformatics.com (mail.bn-i.net [69.92.154.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F26E31E7C2B; Sat, 25 Jan 2025 01:33:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=69.92.154.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737768813; cv=none; b=XBSPCocRFaLS4oE+zrKNhrJ2XveBDd3dxGmRxRbTVa/aQpbzUZHSu+LaG1sisnrObC0YA+zySiq/egHOUFRnCxS76s6SDTG2fDZRz2aUS/OS5N4F40IYh1+e3zS84xI0FTSlkbdjYmYHJFi2VV8CY6Ok3v1657kOl7chht9PXHs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737768813; c=relaxed/simple; bh=kOiDARYbmCGzBtSwW0hw7HsXZ52LJIRPVhc9it4eaTw=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Tk65D9ZIj+S84IhAg1Ve2IGy509hYop+SAz1byc5KKJCAYFgsX9t1AZSf7lNNDGvVEskYnf6Kn/pXU41TpTVHUrAP06cJP1ZYkbFNvrzuxO0He5Cv2+/hTMqlNzmpCR/A8aAGCfSj8UhnTGCpM97hFS3gjVIKfzTt7zkv4/olOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=dougvj.net; spf=pass smtp.mailfrom=dougvj.net; dkim=pass (1024-bit key) header.d=dougvj.net header.i=@dougvj.net header.b=L0rlyI/z; arc=none smtp.client-ip=69.92.154.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=dougvj.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=dougvj.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=dougvj.net header.i=@dougvj.net header.b="L0rlyI/z" From: Doug V Johnson DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dougvj.net; s=dkim; t=1737768457; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=/lyccTGQ7mqKDVei63cP+kBivFiAlc8djv7aI8R2xgs=; b=L0rlyI/zgItTor2KZD+bvdLTBbnDeBopz2TQQ2i22v+RnHe8r7FJHPOOZaJpEA/IFQJeS0 hmYX8T2VBP4oxr4k7FX9VMB4N1kaptCRl0ijBB86HTsqyZuccde7Li6HEd5JCMgLuyXvN3 okqBQ+0zgVZIg9EtteUHlBwO5v9gF3s= Authentication-Results: mail.bonnevilleinformatics.com; auth=pass smtp.mailfrom=dougvj@dougvj.net To: Cc: Doug Johnson , Doug V Johnson , Song Liu , Yu Kuai , linux-raid@vger.kernel.org (open list:SOFTWARE RAID (Multiple Disks) SUPPORT), linux-kernel@vger.kernel.org (open list) Subject: [PATCH 1/2] md/raid5: skip stripes with bad reads during reshape to avoid stall Date: Fri, 24 Jan 2025 18:26:57 -0700 Message-ID: <20250125012702.18773-1-dougvj@dougvj.net> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spamd-Bar: - While adding an additional drive to a raid6 array, the reshape stalled at about 13% complete and any I/O operations on the array hung, creating an effective soft lock. The kernel reported a hung task in mdXX_reshape thread and I had to use magic sysrq to recover as systemd hung as well. I first suspected an issue with one of the underlying block devices and as precaution I recovered the data in read only mode to a new array, but it turned out to be in the RAID layer as I was able to recreate the issue from a superblock dump in sparse files. After poking around some I discovered that I had somehow propagated the bad block list to several devices in the array such that a few blocks were unreable. The bad read reported correctly in userspace during recovery, but it wasn't obvious that it was from a bad block list metadata at the time and instead confirmed my bias suspecting hardware issues I was able to reproduce the issue with a minimal test case using small loopback devices. I put a script for this in a github repository: https://github.com/dougvj/md_badblock_reshape_stall_test This patch handles bad reads during a reshape by unmarking the STRIPE_EXPANDING and STRIPE_EXPAND_READY bits effectively skipping the stripe and then reports the issue in dmesg. Signed-off-by: Doug V Johnson --- drivers/md/raid5.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 5c79429acc64..0ae9ac695d8e 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4987,6 +4987,14 @@ static void handle_stripe(struct stripe_head *sh) handle_failed_stripe(conf, sh, &s, disks); if (s.syncing + s.replacing) handle_failed_sync(conf, sh, &s); + if (test_bit(STRIPE_EXPANDING, &sh->state)) { + pr_warn_ratelimited("md/raid:%s: read error during reshape at %lu", + mdname(conf->mddev), + (unsigned long)sh->sector); + /* Abort the current stripe */ + clear_bit(STRIPE_EXPANDING, &sh->state); + clear_bit(STRIPE_EXPAND_READY, &sh->state); + } } /* Now we check to see if any write operations have recently From patchwork Sat Jan 25 01:26:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug V Johnson X-Patchwork-Id: 13950098 Received: from mail.bonnevilleinformatics.com (mail.bn-i.net [69.92.154.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F268443AA9; Sat, 25 Jan 2025 01:33:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=69.92.154.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737768813; cv=none; b=N4tS+zvC4iJD6n8zz7f5G/IQf9F/9NqzppTfvzr7l1o+D1DEy4aS4Hd9O9cuLV2/UxZqrrYmraZRdIDxty4bK2MsMbFeqxkP67rwLdONbebFLEWZO5eD5Fu55EctyXvAXls9ww5Ix5LzpE2NwgY2mTgswDKgFMTklvcyp54cs9g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737768813; c=relaxed/simple; bh=7lRd6a4wz9Eo0UDhWcRed5VN7igKc8NSr9/wePVU6Mw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nZoGmilFIpRH7QjQlMVUIahJ16ebHDa+V1MdgziasaLCWXjX4AI01LiF1tJGvROCa0gOEl+t7XrUEFvOEEQN4QLnHGPRGffrBFrSTTyM+TG4A1mnRHDPAJspD93uoW+BF5tG8TO4z4/8rIRzXKK5h6WVgWSinygWF9CKZUTIrd8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=dougvj.net; spf=pass smtp.mailfrom=dougvj.net; dkim=pass (1024-bit key) header.d=dougvj.net header.i=@dougvj.net header.b=E9LU+PkW; arc=none smtp.client-ip=69.92.154.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=dougvj.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=dougvj.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=dougvj.net header.i=@dougvj.net header.b="E9LU+PkW" From: Doug V Johnson DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dougvj.net; s=dkim; t=1737768459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xuajq9/1sODhnb2N844HTk+p7pzivVzQBs6mVSrev/g=; b=E9LU+PkW82pkF8NzoxNYvwsezc0NV+U0dPKwwr2yt4VKriRnPqZW9L7EIOuDcgzhnHiwLK 32/Y5caWX/9o0ZIjHkQYfqpsHU3b8NScVhloNiWPh9akR+H2B4ADTNb/kM4mljgfUAwwIg 4ZSJ3fSfJhSfkFPx/BeSv3G/shDbNiI= Authentication-Results: mail.bonnevilleinformatics.com; auth=pass smtp.mailfrom=dougvj@dougvj.net To: Cc: Doug Johnson , Doug V Johnson , Song Liu , Yu Kuai , linux-raid@vger.kernel.org (open list:SOFTWARE RAID (Multiple Disks) SUPPORT), linux-kernel@vger.kernel.org (open list) Subject: [PATCH 2/2] md/raid5: warn when failing a read due to bad blocks metadata Date: Fri, 24 Jan 2025 18:26:58 -0700 Message-ID: <20250125012702.18773-2-dougvj@dougvj.net> In-Reply-To: <20250125012702.18773-1-dougvj@dougvj.net> References: <20250125012702.18773-1-dougvj@dougvj.net> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spamd-Bar: ----- It's easy to suspect that there might be some underlying hardware failures or similar issues when userspace receives a Buffer I/O error from a raid device. In order to hopefully send more sysadmins on the right track, lets report that a read failed at least in part due to bad blocks in the bad block list on device metadata. There are real world examples where bad block lists accidentally get propagated or copied around, so having this warning helps mitigate the consequences Signed-off-by: Doug V Johnson --- drivers/md/raid5.c | 10 +++++++++- drivers/md/raid5.h | 2 +- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0ae9ac695d8e..5d80e9bcbd6f 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3671,7 +3671,14 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, sh->dev[i].sector + RAID5_STRIPE_SECTORS(conf)) { struct bio *nextbi = r5_next_bio(conf, bi, sh->dev[i].sector); - + /* If we recorded bad blocks from the metadata + * on any of the devices then report this to + * userspace in case anyone might suspect + * something more fundamental instead + */ + if (s->bad_blocks) + pr_warn_ratelimited("%s: read encountered block in device bad block list.", + mdname(conf->mddev)); bio_io_error(bi); bi = nextbi; } @@ -4682,6 +4689,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) if (rdev) { is_bad = rdev_has_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf)); + s->bad_blocks++; if (s->blocked_rdev == NULL) { if (is_bad < 0) set_bit(BlockedBadBlocks, &rdev->flags); diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index eafc6e9ed6ee..c755c321ae36 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -282,7 +282,7 @@ struct stripe_head_state { * read all devices, just the replacement targets. */ int syncing, expanding, expanded, replacing; - int locked, uptodate, to_read, to_write, failed, written; + int locked, uptodate, to_read, to_write, failed, written, bad_blocks; int to_fill, compute, req_compute, non_overwrite; int injournal, just_cached; int failed_num[2];