From patchwork Thu Aug 13 01:23:16 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 7005381 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 143F8C05AC for ; Thu, 13 Aug 2015 01:27:57 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3B660202EB for ; Thu, 13 Aug 2015 01:27:56 +0000 (UTC) Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E88E82075A for ; Thu, 13 Aug 2015 01:27:54 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7D1NRpr020100; Wed, 12 Aug 2015 21:23:28 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id t7D1NQMJ004696 for ; Wed, 12 Aug 2015 21:23:26 -0400 Received: from mx1.redhat.com (ext-mx05.extmail.prod.ext.phx2.redhat.com [10.5.110.29]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7D1NQKH027602 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 12 Aug 2015 21:23:26 -0400 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by mx1.redhat.com (Postfix) with ESMTPS id 5C96E32B0BE; Thu, 13 Aug 2015 01:23:25 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id F2722ACCA; Thu, 13 Aug 2015 01:23:23 +0000 (UTC) Date: Thu, 13 Aug 2015 11:23:16 +1000 From: NeilBrown To: Mikulas Patocka Message-ID: <20150813112316.628f0de6@noble> In-Reply-To: References: <20150810135551.64d7dbac@noble> <20150812114631.15268065@noble> MIME-Version: 1.0 X-RedHat-Spam-Score: -4.601 (BAYES_50, DCC_REPUT_00_12, RCVD_IN_DNSWL_HI, SPF_PASS) 195.135.220.15 mx2.suse.de 195.135.220.15 mx2.suse.de X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Scanned-By: MIMEDefang 2.78 on 10.5.110.29 X-loop: dm-devel@redhat.com Cc: dm-devel@redhat.com Subject: Re: [dm-devel] dm-snap deadlock in pending_complete() X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 12 Aug 2015 10:16:17 -0400 (EDT) Mikulas Patocka wrote: > > > On Wed, 12 Aug 2015, NeilBrown wrote: > > > Ahh - thanks for the explanation. That makes sense. > > Presumably you need to wait for old reads that you don't have a read > > returning old data after a write has confirmed the new data is written? > > Is there some other reason? > > > > As soon as the new "proper" exception is installed, no new reads will > > use the old address, so it should be safe to wait without holding the > > lock. > > > > So what about this? > > This is wrong too - when you add the exception to the complete hash table > and drop the lock, writes to the chunk in the origin target are can modify > the origin volume. These writes race with pending reads to the snapshot > (__chunk_is_tracked tests for those reads). If the I/O scheduler schedules > the write to the origin before the read from the snapshot, the read from > the snapshot returns corrupted data. > > There can be more problems with snapshot merging - if the chunk is on the > complete hash table, the merging code assumes that it can be safely > merged. > Thanks again. I came up with the following which should address the origin-write issue, but I'm not so sure about snapshot merging, and it is becoming a lot less simple that I had hoped. I'll see if I can come up with code for a more general solution that is still localised to dm - along the lines of my bio_split suggestion. Thanks, NeilBrown --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 7c82d3ccce87..7f9e1b0429c8 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success) goto out; } - /* Check for conflicting reads */ - __check_for_conflicting_io(s, pe->e.old_chunk); + /* Add proper exception. Now all reads will be + * redirected, so no new reads can be started on + * 'old_chunk'. + */ + dm_insert_exception(&s->complete, e); + + /* Drain conflicting reads */ + if (__chunk_is_tracked(s, pe->e.old_chunk)) { + up_write(&s->lock); + __check_for_conflicting_io(s, pe->e.old_chunk); + down_write(&s->lock); + } /* - * Add a proper exception, and remove the - * in-flight exception from the list. + * And now we can remove the temporary exception. */ - dm_insert_exception(&s->complete, e); out: dm_remove_exception(&pe->e); @@ -2089,17 +2097,17 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, */ chunk = sector_to_chunk(snap->store, sector); - /* - * Check exception table to see if block - * is already remapped in this snapshot - * and trigger an exception if not. - */ - e = dm_lookup_exception(&snap->complete, chunk); - if (e) - goto next_snapshot; - pe = __lookup_pending_exception(snap, chunk); if (!pe) { + /* + * Check exception table to see if block + * is already remapped in this snapshot + * and trigger an exception if not. + */ + e = dm_lookup_exception(&snap->complete, chunk); + if (e) + goto next_snapshot; + up_write(&snap->lock); pe = alloc_pending_exception(snap); down_write(&snap->lock);