From patchwork Wed Jan 28 02:37:54 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 5723851
X-Patchwork-Delegate: snitzer@redhat.com
Return-Path: <dm-devel-bounces@redhat.com>
X-Original-To: patchwork-dm-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id D9560BFFA8
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Wed, 28 Jan 2015 02:43:25 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 0718620219
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Wed, 28 Jan 2015 02:43:25 +0000 (UTC)
Received: from mx6-phx2.redhat.com (mx6-phx2.redhat.com [209.132.183.39])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id AFCB920142
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Wed, 28 Jan 2015 02:43:23 +0000 (UTC)
Received: from lists01.pubmisc.prod.ext.phx2.redhat.com
	(lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33])
	by mx6-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t0S2c5od019372;
	Tue, 27 Jan 2015 21:38:07 -0500
Received: from int-mx13.intmail.prod.int.phx2.redhat.com
	(int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])
	by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id t0S2c4GH020042 for <dm-devel@listman.util.phx.redhat.com>;
	Tue, 27 Jan 2015 21:38:04 -0500
Received: from mx1.redhat.com (ext-mx12.extmail.prod.ext.phx2.redhat.com
	[10.5.110.17])
	by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id t0S2c4WE011612; Tue, 27 Jan 2015 21:38:04 -0500
Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id t0S2c1h6030065
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256
	verify=FAIL); Tue, 27 Jan 2015 21:38:02 -0500
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254])
	by mx2.suse.de (Postfix) with ESMTP id 695F9AB0C;
	Wed, 28 Jan 2015 02:38:01 +0000 (UTC)
Date: Wed, 28 Jan 2015 13:37:54 +1100
From: NeilBrown <neilb@suse.de>
To: Heinz Mauelshagen <heinzm@redhat.com>
Message-ID: <20150128133754.25835582@notabene.brown>
In-Reply-To: <54C54CBC.50101@redhat.com>
References: <54C54CBC.50101@redhat.com>
MIME-Version: 1.0
X-RedHat-Spam-Score: -7.31  (BAYES_00, DCC_REPUT_00_12, RCVD_IN_DNSWL_HI,
	T_RP_MATCHES_RCVD) 195.135.220.15 cantor2.suse.de
	195.135.220.15 cantor2.suse.de <neilb@suse.de>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26
X-Scanned-By: MIMEDefang 2.68 on 10.5.110.17
X-loop: dm-devel@redhat.com
Cc: linux-raid@vger.kernel.org,
	"dm-devel >> device-mapper development" <dm-devel@redhat.com>
Subject: Re: [dm-devel] [PATCH] md: fix raid5 livelock
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.12
Precedence: junk
Reply-To: device-mapper development <dm-devel@redhat.com>
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/options/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	T_RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Sun, 25 Jan 2015 21:06:20 +0100 Heinz Mauelshagen <heinzm@redhat.com>
wrote:

> From: Heinz Mauelshagen <heinzm@redhat.com>
> 
> Hi Neil,
> 
> the reconstruct write optimization in raid5, function fetch_block causes
> livelocks in LVM raid4/5 tests.
> 
> Test scenarios:
> the tests wait for full initial array resynchronization before making a 
> filesystem
> on the raid4/5 logical volume, mounting it, writing to the filesystem 
> and failing
> one physical volume holding a raiddev.
> 
> In short, we're seeing livelocks on fully synchronized raid4/5 arrays 
> with a failed device.
> 
> This patch fixes the issue but likely in a suboptimnal way.
> 
> Do you think there is a better solution to avoid livelocks on 
> reconstruct writes?
> 
> Regards,
> Heinz
> 
> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
> Tested-by: Jon Brassow <jbrassow@redhat.com>
> Tested-by: Heinz Mauelshagen <heinzm@redhat.com>
> 
> ---
>   drivers/md/raid5.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index c1b0d52..0fc8737 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2915,7 +2915,7 @@ static int fetch_block(struct stripe_head *sh, 
> struct stripe_head_state *s,
>               (s->failed >= 1 && fdev[0]->toread) ||
>               (s->failed >= 2 && fdev[1]->toread) ||
>               (sh->raid_conf->level <= 5 && s->failed && fdev[0]->towrite &&
> -             (!test_bit(R5_Insync, &dev->flags) || 
> test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) &&
> +             (!test_bit(R5_Insync, &dev->flags) || 
> test_bit(STRIPE_PREREAD_ACTIVE, &sh->state) || s->non_overwrite) &&
>                !test_bit(R5_OVERWRITE, &fdev[0]->flags)) ||
>               ((sh->raid_conf->level == 6 ||
>                 sh->sector >= sh->raid_conf->mddev->recovery_cp)


That is a bit heavy handed, but knowing that fixes the problem helps a lot.

I think the problem happens when processes a non-overwrite write to a failed
device.

fetch_block() should, in that case, pre-read all of the working device, but
since

	      (!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) &&

was added, it sometimes doesn't.  The root problem is that
handle_stripe_dirtying is getting confused because neither rmw or rcw seem to
work, so it doesn't start the chain of events to set STRIPE_PREREAD_ACTIVE.

The following (which is against mainline) might fix it.  Can you test?

Thanks,
NeilBrown


This code really really needs to be tidied up and commented better!!!

Thanks,
NeilBrown
---
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c1b0d52bfcb0..793cf2861e97 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3195,6 +3195,10 @@ static void handle_stripe_dirtying(struct r5conf *conf,
 					  (unsigned long long)sh->sector,
 					  rcw, qread, test_bit(STRIPE_DELAYED, &sh->state));
 	}
+	if (rcw > disks && rmw > disks &&
+	    !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
+		set_bit(STRIPE_DELAYED, &sh->state);
+
 	/* now if nothing is locked, and if we have enough data,
 	 * we can start a write request
 	 */