From patchwork Sun Mar 17 12:22:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Tsironis X-Patchwork-Id: 10856299 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B042417EF for ; Sun, 17 Mar 2019 12:23:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 910A129506 for ; Sun, 17 Mar 2019 12:23:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 859B02950B; Sun, 17 Mar 2019 12:23:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4CF292950F for ; Sun, 17 Mar 2019 12:23:19 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5E46D31028EB; Sun, 17 Mar 2019 12:23:18 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 43EEC60863; Sun, 17 Mar 2019 12:23:18 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id ED6C9181A13D; Sun, 17 Mar 2019 12:23:17 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x2HCN90t021949 for ; Sun, 17 Mar 2019 08:23:09 -0400 Received: by smtp.corp.redhat.com (Postfix) id 3B1BB5C298; Sun, 17 Mar 2019 12:23:09 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com [10.5.110.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3588D5C6C1 for ; Sun, 17 Mar 2019 12:23:07 +0000 (UTC) Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E4FCCC0495BB for ; Sun, 17 Mar 2019 12:23:05 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id y13so9496353wrd.3 for ; Sun, 17 Mar 2019 05:23:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arrikto-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uyqI+apM2JxxBmHnyL3ek6dv57ZYo4J43XloC+rdkpI=; b=LaTAvkO5F0dwQ8dJm5prlthxIXCp6/D/dPI//CSsneefbxnjYQDZYXDcLi3L+a2Ftj Tcnda1ebSSlHL0foItjzAJ0DDw8hjezi/cTkjNGuDlcXR1B4EH526GY2AWr3l2idDcTF gM4FxNoTFIu+JQ9MXeNjxFN2/ymvXRM6wFVRd2orMOKMb61ScYXHuQY4oGWIg/XqC7E/ vMuXAPkOxLICszdNTJabO3m5Te8Mm3mcv/y1J8visuNb1T5k9tgDKKssJLwablnL/sXQ rJv66DIvyzRoYzdjWRWud+UP47CMcAX0dsWqnOBZGqsxvV+MVGDRlk0hAmiq3MAEkfft /5+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uyqI+apM2JxxBmHnyL3ek6dv57ZYo4J43XloC+rdkpI=; b=ErCd/wGyt2g2e0fCdseW6KsLniX/AHJF10kJKxk0SHsX1Wwx1qOHCcQ+SHlG+mjLTE vycX1K4X+V46f2SBE5nUvkkZeEJVTOPm/W/VNicBxn7ak3mGJPV7Pkb0SesxAaVcn1RM u1n0J1tvMs1x/OdOLgZoZDTVedfN/Fw2M1gEjK1SrsrQdqCNoOUBTaCL0strwrTX8VUg LNYhUcwpcSuN2i7vbB4DxIzq1Jk+lWCDFRt4wJI0jKHABTzoXlnlnAk2rJQjnezI7wvi fDA+MqfpYuGoGa5tWclFfL+EDimmLadFRADXJLabItydM09PnYGFjUudxMMdhqJhWzOc +3Rg== X-Gm-Message-State: APjAAAXacnNA85hOBmdQC1mxGV2HbiTkYR8pWrYYSPGwsCtx/Fewx2jO vXSVUDxCI/MktfGI65dFxbyFZQ== X-Google-Smtp-Source: APXvYqzWGxQ6ihh3n7IC1Vh3iqQL2H9hyHgBKEloemecYczSCrOfKd8SPUluTe8nUk2KEZPa9vS2Mg== X-Received: by 2002:adf:ec41:: with SMTP id w1mr9046026wrn.184.1552825384625; Sun, 17 Mar 2019 05:23:04 -0700 (PDT) Received: from snf-864.vm.snf.arr ([31.177.62.212]) by smtp.gmail.com with ESMTPSA id z10sm5453292wrs.11.2019.03.17.05.23.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 17 Mar 2019 05:23:04 -0700 (PDT) From: Nikos Tsironis To: snitzer@redhat.com, agk@redhat.com, dm-devel@redhat.com Date: Sun, 17 Mar 2019 14:22:55 +0200 Message-Id: <20190317122258.21760-4-ntsironis@arrikto.com> In-Reply-To: <20190317122258.21760-1-ntsironis@arrikto.com> References: <20190317122258.21760-1-ntsironis@arrikto.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Sun, 17 Mar 2019 12:23:06 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Sun, 17 Mar 2019 12:23:06 +0000 (UTC) for IP:'209.85.221.66' DOMAIN:'mail-wr1-f66.google.com' HELO:'mail-wr1-f66.google.com' FROM:'ntsironis@arrikto.com' RCPT:'' X-RedHat-Spam-Score: -0.013 (DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS) 209.85.221.66 mail-wr1-f66.google.com 209.85.221.66 mail-wr1-f66.google.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.31 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-loop: dm-devel@redhat.com Cc: hch@infradead.org, paulmck@linux.ibm.com, mpatocka@redhat.com, linux-kernel@vger.kernel.org, iliastsi@arrikto.com Subject: [dm-devel] [PATCH v3 3/6] dm snapshot: Don't sleep holding the snapshot lock X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Sun, 17 Mar 2019 12:23:18 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP When completing a pending exception, pending_complete() waits for all conflicting reads to drain, before inserting the final, completed exception. Conflicting reads are snapshot reads redirected to the origin, because the relevant chunk is not remapped to the COW device the moment we receive the read. The completed exception must be inserted into the exception table after all conflicting reads drain to ensure snapshot reads don't return corrupted data. This is required because inserting the completed exception into the exception table signals that the relevant chunk is remapped and both origin writes and snapshot merging will now overwrite the chunk in origin. This wait is done holding the snapshot lock to ensure that pending_complete() doesn't starve if new snapshot reads keep coming for this chunk. In preparation for the next commit, where we use a spinlock instead of a mutex to protect the exception tables, we remove the need for holding the lock while waiting for conflicting reads to drain. We achieve this in two steps: 1. pending_complete() inserts the completed exception before waiting for conflicting reads to drain and removes the pending exception after all conflicting reads drain. This ensures that new snapshot reads will be redirected to the COW device, instead of the origin, and thus pending_complete() will not starve. Moreover, we use the existence of both a completed and a pending exception to signify that the COW is done but there are conflicting reads in flight. 2. In __origin_write() we check first if there is a pending exception and then if there is a completed exception. If there is a pending exception any submitted BIO is delayed on the pe->origin_bios list and DM_MAPIO_SUBMITTED is returned. This ensures that neither writes to the origin nor snapshot merging can overwrite the origin chunk, until all conflicting reads drain, and thus snapshot reads will not return corrupted data. Summarizing, we now have the following possible combinations of pending and completed exceptions for a chunk, along with their meaning: A. No exceptions exist: The chunk has not been remapped yet. B. Only a pending exception exists: The chunk is currently being copied to the COW device. C. Both a pending and a completed exception exist: COW for this chunk has completed but there are snapshot reads in flight which had been redirected to the origin before the chunk was remapped. D. Only the completed exception exists: COW has been completed and there are no conflicting reads in flight. Signed-off-by: Nikos Tsironis Signed-off-by: Ilias Tsitsimpis --- drivers/md/dm-snap.c | 102 ++++++++++++++++++++++++++++++++------------------- 1 file changed, 65 insertions(+), 37 deletions(-) diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 36805b12661e..4b34bfa0900a 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -1501,16 +1501,24 @@ static void pending_complete(void *context, int success) goto out; } - /* Check for conflicting reads */ - __check_for_conflicting_io(s, pe->e.old_chunk); - /* - * Add a proper exception, and remove the - * in-flight exception from the list. + * Add a proper exception. After inserting the completed exception all + * subsequent snapshot reads to this chunk will be redirected to the + * COW device. This ensures that we do not starve. Moreover, as long + * as the pending exception exists, neither origin writes nor snapshot + * merging can overwrite the chunk in origin. */ dm_insert_exception(&s->complete, e); + /* Wait for conflicting reads to drain */ + if (__chunk_is_tracked(s, pe->e.old_chunk)) { + mutex_unlock(&s->lock); + __check_for_conflicting_io(s, pe->e.old_chunk); + mutex_lock(&s->lock); + } + out: + /* Remove the in-flight exception from the list */ dm_remove_exception(&pe->e); snapshot_bios = bio_list_get(&pe->snapshot_bios); origin_bios = bio_list_get(&pe->origin_bios); @@ -1660,25 +1668,15 @@ __lookup_pending_exception(struct dm_snapshot *s, chunk_t chunk) } /* - * Looks to see if this snapshot already has a pending exception - * for this chunk, otherwise it allocates a new one and inserts - * it into the pending table. + * Inserts a pending exception into the pending table. * * NOTE: a write lock must be held on snap->lock before calling * this. */ static struct dm_snap_pending_exception * -__find_pending_exception(struct dm_snapshot *s, - struct dm_snap_pending_exception *pe, chunk_t chunk) +__insert_pending_exception(struct dm_snapshot *s, + struct dm_snap_pending_exception *pe, chunk_t chunk) { - struct dm_snap_pending_exception *pe2; - - pe2 = __lookup_pending_exception(s, chunk); - if (pe2) { - free_pending_exception(pe); - return pe2; - } - pe->e.old_chunk = chunk; bio_list_init(&pe->origin_bios); bio_list_init(&pe->snapshot_bios); @@ -1697,6 +1695,29 @@ __find_pending_exception(struct dm_snapshot *s, return pe; } +/* + * Looks to see if this snapshot already has a pending exception + * for this chunk, otherwise it allocates a new one and inserts + * it into the pending table. + * + * NOTE: a write lock must be held on snap->lock before calling + * this. + */ +static struct dm_snap_pending_exception * +__find_pending_exception(struct dm_snapshot *s, + struct dm_snap_pending_exception *pe, chunk_t chunk) +{ + struct dm_snap_pending_exception *pe2; + + pe2 = __lookup_pending_exception(s, chunk); + if (pe2) { + free_pending_exception(pe); + return pe2; + } + + return __insert_pending_exception(s, pe, chunk); +} + static void remap_exception(struct dm_snapshot *s, struct dm_exception *e, struct bio *bio, chunk_t chunk) { @@ -2107,7 +2128,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, int r = DM_MAPIO_REMAPPED; struct dm_snapshot *snap; struct dm_exception *e; - struct dm_snap_pending_exception *pe; + struct dm_snap_pending_exception *pe, *pe2; struct dm_snap_pending_exception *pe_to_start_now = NULL; struct dm_snap_pending_exception *pe_to_start_last = NULL; chunk_t chunk; @@ -2137,17 +2158,17 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, */ chunk = sector_to_chunk(snap->store, sector); - /* - * Check exception table to see if block - * is already remapped in this snapshot - * and trigger an exception if not. - */ - e = dm_lookup_exception(&snap->complete, chunk); - if (e) - goto next_snapshot; - pe = __lookup_pending_exception(snap, chunk); if (!pe) { + /* + * Check exception table to see if block is already + * remapped in this snapshot and trigger an exception + * if not. + */ + e = dm_lookup_exception(&snap->complete, chunk); + if (e) + goto next_snapshot; + mutex_unlock(&snap->lock); pe = alloc_pending_exception(snap); mutex_lock(&snap->lock); @@ -2157,16 +2178,23 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, goto next_snapshot; } - e = dm_lookup_exception(&snap->complete, chunk); - if (e) { + pe2 = __lookup_pending_exception(snap, chunk); + + if (!pe2) { + e = dm_lookup_exception(&snap->complete, chunk); + if (e) { + free_pending_exception(pe); + goto next_snapshot; + } + + pe = __insert_pending_exception(snap, pe, chunk); + if (!pe) { + __invalidate_snapshot(snap, -ENOMEM); + goto next_snapshot; + } + } else { free_pending_exception(pe); - goto next_snapshot; - } - - pe = __find_pending_exception(snap, pe, chunk); - if (!pe) { - __invalidate_snapshot(snap, -ENOMEM); - goto next_snapshot; + pe = pe2; } }