From patchwork Wed Jan 28 02:37:54 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 5723851 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id D9560BFFA8 for ; Wed, 28 Jan 2015 02:43:25 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0718620219 for ; Wed, 28 Jan 2015 02:43:25 +0000 (UTC) Received: from mx6-phx2.redhat.com (mx6-phx2.redhat.com [209.132.183.39]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AFCB920142 for ; Wed, 28 Jan 2015 02:43:23 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx6-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t0S2c5od019372; Tue, 27 Jan 2015 21:38:07 -0500 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id t0S2c4GH020042 for ; Tue, 27 Jan 2015 21:38:04 -0500 Received: from mx1.redhat.com (ext-mx12.extmail.prod.ext.phx2.redhat.com [10.5.110.17]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t0S2c4WE011612; Tue, 27 Jan 2015 21:38:04 -0500 Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id t0S2c1h6030065 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=FAIL); Tue, 27 Jan 2015 21:38:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 695F9AB0C; Wed, 28 Jan 2015 02:38:01 +0000 (UTC) Date: Wed, 28 Jan 2015 13:37:54 +1100 From: NeilBrown To: Heinz Mauelshagen Message-ID: <20150128133754.25835582@notabene.brown> In-Reply-To: <54C54CBC.50101@redhat.com> References: <54C54CBC.50101@redhat.com> MIME-Version: 1.0 X-RedHat-Spam-Score: -7.31 (BAYES_00, DCC_REPUT_00_12, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD) 195.135.220.15 cantor2.suse.de 195.135.220.15 cantor2.suse.de X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Scanned-By: MIMEDefang 2.68 on 10.5.110.17 X-loop: dm-devel@redhat.com Cc: linux-raid@vger.kernel.org, "dm-devel >> device-mapper development" Subject: Re: [dm-devel] [PATCH] md: fix raid5 livelock X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, 25 Jan 2015 21:06:20 +0100 Heinz Mauelshagen wrote: > From: Heinz Mauelshagen > > Hi Neil, > > the reconstruct write optimization in raid5, function fetch_block causes > livelocks in LVM raid4/5 tests. > > Test scenarios: > the tests wait for full initial array resynchronization before making a > filesystem > on the raid4/5 logical volume, mounting it, writing to the filesystem > and failing > one physical volume holding a raiddev. > > In short, we're seeing livelocks on fully synchronized raid4/5 arrays > with a failed device. > > This patch fixes the issue but likely in a suboptimnal way. > > Do you think there is a better solution to avoid livelocks on > reconstruct writes? > > Regards, > Heinz > > Signed-off-by: Heinz Mauelshagen > Tested-by: Jon Brassow > Tested-by: Heinz Mauelshagen > > --- > drivers/md/raid5.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index c1b0d52..0fc8737 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -2915,7 +2915,7 @@ static int fetch_block(struct stripe_head *sh, > struct stripe_head_state *s, > (s->failed >= 1 && fdev[0]->toread) || > (s->failed >= 2 && fdev[1]->toread) || > (sh->raid_conf->level <= 5 && s->failed && fdev[0]->towrite && > - (!test_bit(R5_Insync, &dev->flags) || > test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) && > + (!test_bit(R5_Insync, &dev->flags) || > test_bit(STRIPE_PREREAD_ACTIVE, &sh->state) || s->non_overwrite) && > !test_bit(R5_OVERWRITE, &fdev[0]->flags)) || > ((sh->raid_conf->level == 6 || > sh->sector >= sh->raid_conf->mddev->recovery_cp) That is a bit heavy handed, but knowing that fixes the problem helps a lot. I think the problem happens when processes a non-overwrite write to a failed device. fetch_block() should, in that case, pre-read all of the working device, but since (!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) && was added, it sometimes doesn't. The root problem is that handle_stripe_dirtying is getting confused because neither rmw or rcw seem to work, so it doesn't start the chain of events to set STRIPE_PREREAD_ACTIVE. The following (which is against mainline) might fix it. Can you test? Thanks, NeilBrown This code really really needs to be tidied up and commented better!!! Thanks, NeilBrown --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index c1b0d52bfcb0..793cf2861e97 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3195,6 +3195,10 @@ static void handle_stripe_dirtying(struct r5conf *conf, (unsigned long long)sh->sector, rcw, qread, test_bit(STRIPE_DELAYED, &sh->state)); } + if (rcw > disks && rmw > disks && + !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) + set_bit(STRIPE_DELAYED, &sh->state); + /* now if nothing is locked, and if we have enough data, * we can start a write request */