From patchwork Wed Aug 12 01:46:31 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 6995321 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id A32609F344 for ; Wed, 12 Aug 2015 01:51:15 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B93D2205BA for ; Wed, 12 Aug 2015 01:51:14 +0000 (UTC) Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A79BD20522 for ; Wed, 12 Aug 2015 01:51:13 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id t7C1kgpk015472; Tue, 11 Aug 2015 21:46:43 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id t7C1kf0Q008468 for ; Tue, 11 Aug 2015 21:46:41 -0400 Received: from mx1.redhat.com (ext-mx03.extmail.prod.ext.phx2.redhat.com [10.5.110.27]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7C1kfVD000455 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 11 Aug 2015 21:46:41 -0400 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by mx1.redhat.com (Postfix) with ESMTPS id 0BE5491370; Wed, 12 Aug 2015 01:46:40 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 95C4EAC48; Wed, 12 Aug 2015 01:46:38 +0000 (UTC) Date: Wed, 12 Aug 2015 11:46:31 +1000 From: NeilBrown To: Mikulas Patocka Message-ID: <20150812114631.15268065@noble> In-Reply-To: References: <20150810135551.64d7dbac@noble> MIME-Version: 1.0 X-RedHat-Spam-Score: -4.601 (BAYES_50, DCC_REPUT_00_12, RCVD_IN_DNSWL_HI, SPF_PASS) 195.135.220.15 mx2.suse.de 195.135.220.15 mx2.suse.de X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Scanned-By: MIMEDefang 2.76 on 10.5.110.27 X-loop: dm-devel@redhat.com Cc: dm-devel@redhat.com Subject: Re: [dm-devel] dm-snap deadlock in pending_complete() X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, 11 Aug 2015 05:14:33 -0400 (EDT) Mikulas Patocka wrote: > Hi > > On Mon, 10 Aug 2015, NeilBrown wrote: > > > > > Hi Mikulas, > > I have a customer hitting the deadlock you described over a year ago > > in: > > > > Subject: [dm-devel] [PATCH] block: flush queued bios when the process > > blocks > > Ask block layer maintainers to accept that patch. Unfortunately I don't really like the patch ... or the bioset rescue workqueues that it is based on. Sorry. So I might keep looking for a more localised approach.... > > > I notice that patch never went upstream. > > > > Has anything else been done to fix this deadlock? > > > > My thought was that something like the below would be sufficient. Do > > you see any problem with that? It avoids the deadlock by dropping the > > lock while sleeping. > > The patch below introduces a bug - if the user is continuously reading the > same chunk (for example, by issuing multiple direct i/o requests to the > same chunk), the function pending_complete never finishes. > > We need to hold the lock while waiting to make sure that no new read > requests are submitted. Ahh - thanks for the explanation. That makes sense. Presumably you need to wait for old reads that you don't have a read returning old data after a write has confirmed the new data is written? Is there some other reason? As soon as the new "proper" exception is installed, no new reads will use the old address, so it should be safe to wait without holding the lock. So what about this? Thanks for your time, NeilBrown --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 7c82d3ccce87..c7a7ed2cfd6d 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success) goto out; } - /* Check for conflicting reads */ - __check_for_conflicting_io(s, pe->e.old_chunk); + /* Add proper exception. Now all reads will be + * redirected, so no new reads can be started on + * 'old_chunk'. + */ + dm_insert_exception(&s->complete, e); + + /* Drain conflicting reads */ + if (__chunk_is_tracked(s, pe->e.old_chunk)) { + up_write(&s->lock); + __check_for_conflicting_io(s, pe->e.old_chunk); + down_write(&s->lock); + } /* - * Add a proper exception, and remove the - * in-flight exception from the list. + * And now we can remove the temporary exception. */ - dm_insert_exception(&s->complete, e); out: dm_remove_exception(&pe->e);