From patchwork Sun Mar 17 12:22:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikos Tsironis X-Patchwork-Id: 10856297 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B02D1515 for ; Sun, 17 Mar 2019 12:23:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3732529506 for ; Sun, 17 Mar 2019 12:23:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B8792950E; Sun, 17 Mar 2019 12:23:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 418242950B for ; Sun, 17 Mar 2019 12:23:19 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6610E11B776; Sun, 17 Mar 2019 12:23:18 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 479441001E7C; Sun, 17 Mar 2019 12:23:18 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 0A560181A13C; Sun, 17 Mar 2019 12:23:18 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x2HCNCw3021981 for ; Sun, 17 Mar 2019 08:23:12 -0400 Received: by smtp.corp.redhat.com (Postfix) id D14B65D704; Sun, 17 Mar 2019 12:23:12 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from mx1.redhat.com (ext-mx01.extmail.prod.ext.phx2.redhat.com [10.5.110.25]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CA9C85D70D for ; Sun, 17 Mar 2019 12:23:10 +0000 (UTC) Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6164C81F22 for ; Sun, 17 Mar 2019 12:23:08 +0000 (UTC) Received: by mail-wr1-f65.google.com with SMTP id p1so14011616wrs.8 for ; Sun, 17 Mar 2019 05:23:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arrikto-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=CpdoALOVEW0/p10Olg9aJ0e2cRGSvGq4LVyy2sp/g/g=; b=L34WvEe3txWEUXCHW6bCWvFe7we99rkUQ+T3VwZdrNqg6p+JiIz53+wgdVuQ6A1eHX MHnZh+iUabntqBw1GNCOT+gYRr93AYXVputuRUqWsn7HNcuoDHEAKout5xQj9xemI3T7 oZUZaGx7ozXrFHwh3WmQ9FEdiPWhC/HZJ29vN9TyLnjRYsDih4v1FHX82pHGf7Mj7OeN 9TZe76rVug8eFyLk8u6D5ffronEebyUNcXvF63BnfDbUaDWrGXfO8MKwBifrT/ZJ9TGH yj3fxLJwMhMw6dDThVkZOD2u75k2tWS8macoNKQbvx6tWFx0XPCrrFiAXfXEPVnSyRmr Z/gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=CpdoALOVEW0/p10Olg9aJ0e2cRGSvGq4LVyy2sp/g/g=; b=Boyi2ghFWLW8nlFs6xqp3nJa7IRy5dYJ3B7h1FJKhvasn7yUxXXxDle0B+ZS4xw9+3 co50aHKxhTAwhcISfLowA+1zS3TPYolJRH9XriNkkcq+nMizSaxtIBTy9dTwKJh2SRsA XyteWzFFXcPk0JnYouQIr177zyvSF3OkRdfGW0u3r44cmO31FL/IP2Sb3sZE5aesdHiG 2cuehsE9H0/wzcQTREIMFkNgFXzHlrp2eU2OgK5IIkefDy3bjR8TwJX4IPo3PmjLM9Pg vjV5OAINx8xBnMJwhcQBe6i1XM0KW4yRkztqPZuiTvtt61ey+5Rk3zVXRlxED18GBXmf Jykw== X-Gm-Message-State: APjAAAXZPhsLuT68ixL2/FHVcSdUBh+mdWnIBvtIIdqWYtXqatQj4glk nIUJrz69EuILDpr6tZGB1RYbgA== X-Google-Smtp-Source: APXvYqwzdsgRtjmGNCjQK9mOrmAzetL/58zJ4JgpuNBBsnBlmlALfJkQqb10Ky53cIqWsAIufwwr5g== X-Received: by 2002:adf:ebce:: with SMTP id v14mr8398242wrn.69.1552825386982; Sun, 17 Mar 2019 05:23:06 -0700 (PDT) Received: from snf-864.vm.snf.arr ([31.177.62.212]) by smtp.gmail.com with ESMTPSA id z10sm5453292wrs.11.2019.03.17.05.23.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 17 Mar 2019 05:23:06 -0700 (PDT) From: Nikos Tsironis To: snitzer@redhat.com, agk@redhat.com, dm-devel@redhat.com Date: Sun, 17 Mar 2019 14:22:57 +0200 Message-Id: <20190317122258.21760-6-ntsironis@arrikto.com> In-Reply-To: <20190317122258.21760-1-ntsironis@arrikto.com> References: <20190317122258.21760-1-ntsironis@arrikto.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Sun, 17 Mar 2019 12:23:08 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Sun, 17 Mar 2019 12:23:08 +0000 (UTC) for IP:'209.85.221.65' DOMAIN:'mail-wr1-f65.google.com' HELO:'mail-wr1-f65.google.com' FROM:'ntsironis@arrikto.com' RCPT:'' X-RedHat-Spam-Score: -0.012 (DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_NONE, SPF_PASS) 209.85.221.65 mail-wr1-f65.google.com 209.85.221.65 mail-wr1-f65.google.com X-Scanned-By: MIMEDefang 2.83 on 10.5.110.25 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-loop: dm-devel@redhat.com Cc: hch@infradead.org, paulmck@linux.ibm.com, mpatocka@redhat.com, linux-kernel@vger.kernel.org, iliastsi@arrikto.com Subject: [dm-devel] [PATCH v3 5/6] dm snapshot: Make exception tables scalable X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Sun, 17 Mar 2019 12:23:18 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP Use list_bl to implement the exception hash tables' buckets. This change permits concurrent access, to distinct buckets, by multiple threads. Also, implement helper functions to lock and unlock the exception tables based on the chunk number of the exception at hand. We retain the global locking, by means of down_write(), which is replaced by the next commit. Still, we must acquire the per-bucket spinlocks when accessing the hash tables, since list_bl does not allow modification on unlocked lists. Signed-off-by: Nikos Tsironis Signed-off-by: Ilias Tsitsimpis --- drivers/md/dm-exception-store.h | 3 +- drivers/md/dm-snap.c | 137 +++++++++++++++++++++++++++++++++------- 2 files changed, 116 insertions(+), 24 deletions(-) diff --git a/drivers/md/dm-exception-store.h b/drivers/md/dm-exception-store.h index 12b5216c2cfe..5a3c696c057f 100644 --- a/drivers/md/dm-exception-store.h +++ b/drivers/md/dm-exception-store.h @@ -11,6 +11,7 @@ #define _LINUX_DM_EXCEPTION_STORE #include +#include #include /* @@ -27,7 +28,7 @@ typedef sector_t chunk_t; * chunk within the device. */ struct dm_exception { - struct list_head hash_list; + struct hlist_bl_node hash_list; chunk_t old_chunk; chunk_t new_chunk; diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 209da5dd0ba6..ab75857309aa 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -44,7 +45,7 @@ static const char dm_snapshot_merge_target_name[] = "snapshot-merge"; struct dm_exception_table { uint32_t hash_mask; unsigned hash_shift; - struct list_head *table; + struct hlist_bl_head *table; }; struct dm_snapshot { @@ -618,6 +619,36 @@ static void unregister_snapshot(struct dm_snapshot *s) * The lowest hash_shift bits of the chunk number are ignored, allowing * some consecutive chunks to be grouped together. */ +static uint32_t exception_hash(struct dm_exception_table *et, chunk_t chunk); + +/* Lock to protect access to the completed and pending exception hash tables. */ +struct dm_exception_table_lock { + struct hlist_bl_head *complete_slot; + struct hlist_bl_head *pending_slot; +}; + +static void dm_exception_table_lock_init(struct dm_snapshot *s, chunk_t chunk, + struct dm_exception_table_lock *lock) +{ + struct dm_exception_table *complete = &s->complete; + struct dm_exception_table *pending = &s->pending; + + lock->complete_slot = &complete->table[exception_hash(complete, chunk)]; + lock->pending_slot = &pending->table[exception_hash(pending, chunk)]; +} + +static void dm_exception_table_lock(struct dm_exception_table_lock *lock) +{ + hlist_bl_lock(lock->complete_slot); + hlist_bl_lock(lock->pending_slot); +} + +static void dm_exception_table_unlock(struct dm_exception_table_lock *lock) +{ + hlist_bl_unlock(lock->pending_slot); + hlist_bl_unlock(lock->complete_slot); +} + static int dm_exception_table_init(struct dm_exception_table *et, uint32_t size, unsigned hash_shift) { @@ -625,12 +656,12 @@ static int dm_exception_table_init(struct dm_exception_table *et, et->hash_shift = hash_shift; et->hash_mask = size - 1; - et->table = dm_vcalloc(size, sizeof(struct list_head)); + et->table = dm_vcalloc(size, sizeof(struct hlist_bl_head)); if (!et->table) return -ENOMEM; for (i = 0; i < size; i++) - INIT_LIST_HEAD(et->table + i); + INIT_HLIST_BL_HEAD(et->table + i); return 0; } @@ -638,15 +669,16 @@ static int dm_exception_table_init(struct dm_exception_table *et, static void dm_exception_table_exit(struct dm_exception_table *et, struct kmem_cache *mem) { - struct list_head *slot; - struct dm_exception *ex, *next; + struct hlist_bl_head *slot; + struct dm_exception *ex; + struct hlist_bl_node *pos, *n; int i, size; size = et->hash_mask + 1; for (i = 0; i < size; i++) { slot = et->table + i; - list_for_each_entry_safe (ex, next, slot, hash_list) + hlist_bl_for_each_entry_safe(ex, pos, n, slot, hash_list) kmem_cache_free(mem, ex); } @@ -660,7 +692,7 @@ static uint32_t exception_hash(struct dm_exception_table *et, chunk_t chunk) static void dm_remove_exception(struct dm_exception *e) { - list_del(&e->hash_list); + hlist_bl_del(&e->hash_list); } /* @@ -670,11 +702,12 @@ static void dm_remove_exception(struct dm_exception *e) static struct dm_exception *dm_lookup_exception(struct dm_exception_table *et, chunk_t chunk) { - struct list_head *slot; + struct hlist_bl_head *slot; + struct hlist_bl_node *pos; struct dm_exception *e; slot = &et->table[exception_hash(et, chunk)]; - list_for_each_entry (e, slot, hash_list) + hlist_bl_for_each_entry(e, pos, slot, hash_list) if (chunk >= e->old_chunk && chunk <= e->old_chunk + dm_consecutive_chunk_count(e)) return e; @@ -721,7 +754,8 @@ static void free_pending_exception(struct dm_snap_pending_exception *pe) static void dm_insert_exception(struct dm_exception_table *eh, struct dm_exception *new_e) { - struct list_head *l; + struct hlist_bl_head *l; + struct hlist_bl_node *pos; struct dm_exception *e = NULL; l = &eh->table[exception_hash(eh, new_e->old_chunk)]; @@ -731,7 +765,7 @@ static void dm_insert_exception(struct dm_exception_table *eh, goto out; /* List is ordered by old_chunk */ - list_for_each_entry_reverse(e, l, hash_list) { + hlist_bl_for_each_entry(e, pos, l, hash_list) { /* Insert after an existing chunk? */ if (new_e->old_chunk == (e->old_chunk + dm_consecutive_chunk_count(e) + 1) && @@ -752,12 +786,24 @@ static void dm_insert_exception(struct dm_exception_table *eh, return; } - if (new_e->old_chunk > e->old_chunk) + if (new_e->old_chunk < e->old_chunk) break; } out: - list_add(&new_e->hash_list, e ? &e->hash_list : l); + if (!e) { + /* + * Either the table doesn't support consecutive chunks or slot + * l is empty. + */ + hlist_bl_add_head(&new_e->hash_list, l); + } else if (new_e->old_chunk < e->old_chunk) { + /* Add before an existing exception */ + hlist_bl_add_before(&new_e->hash_list, &e->hash_list); + } else { + /* Add to l's tail: e is the last exception in this slot */ + hlist_bl_add_behind(&new_e->hash_list, &e->hash_list); + } } /* @@ -766,6 +812,7 @@ static void dm_insert_exception(struct dm_exception_table *eh, */ static int dm_add_exception(void *context, chunk_t old, chunk_t new) { + struct dm_exception_table_lock lock; struct dm_snapshot *s = context; struct dm_exception *e; @@ -778,7 +825,17 @@ static int dm_add_exception(void *context, chunk_t old, chunk_t new) /* Consecutive_count is implicitly initialised to zero */ e->new_chunk = new; + /* + * Although there is no need to lock access to the exception tables + * here, if we don't then hlist_bl_add_head(), called by + * dm_insert_exception(), will complain about accessing the + * corresponding list without locking it first. + */ + dm_exception_table_lock_init(s, old, &lock); + + dm_exception_table_lock(&lock); dm_insert_exception(&s->complete, e); + dm_exception_table_unlock(&lock); return 0; } @@ -807,7 +864,7 @@ static int calc_max_buckets(void) { /* use a fixed size of 2MB */ unsigned long mem = 2 * 1024 * 1024; - mem /= sizeof(struct list_head); + mem /= sizeof(struct hlist_bl_head); return mem; } @@ -1473,13 +1530,18 @@ static void pending_complete(void *context, int success) struct bio *origin_bios = NULL; struct bio *snapshot_bios = NULL; struct bio *full_bio = NULL; + struct dm_exception_table_lock lock; int error = 0; + dm_exception_table_lock_init(s, pe->e.old_chunk, &lock); + if (!success) { /* Read/write error - snapshot is unusable */ down_write(&s->lock); __invalidate_snapshot(s, -EIO); error = 1; + + dm_exception_table_lock(&lock); goto out; } @@ -1488,11 +1550,14 @@ static void pending_complete(void *context, int success) down_write(&s->lock); __invalidate_snapshot(s, -ENOMEM); error = 1; + + dm_exception_table_lock(&lock); goto out; } *e = pe->e; down_write(&s->lock); + dm_exception_table_lock(&lock); if (!s->valid) { free_completed_exception(e); error = 1; @@ -1510,14 +1575,19 @@ static void pending_complete(void *context, int success) /* Wait for conflicting reads to drain */ if (__chunk_is_tracked(s, pe->e.old_chunk)) { + dm_exception_table_unlock(&lock); up_write(&s->lock); __check_for_conflicting_io(s, pe->e.old_chunk); down_write(&s->lock); + dm_exception_table_lock(&lock); } out: /* Remove the in-flight exception from the list */ dm_remove_exception(&pe->e); + + dm_exception_table_unlock(&lock); + snapshot_bios = bio_list_get(&pe->snapshot_bios); origin_bios = bio_list_get(&pe->origin_bios); full_bio = pe->full_bio; @@ -1733,6 +1803,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) int r = DM_MAPIO_REMAPPED; chunk_t chunk; struct dm_snap_pending_exception *pe = NULL; + struct dm_exception_table_lock lock; init_tracked_chunk(bio); @@ -1742,6 +1813,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) } chunk = sector_to_chunk(s->store, bio->bi_iter.bi_sector); + dm_exception_table_lock_init(s, chunk, &lock); /* Full snapshots are not usable */ /* To get here the table must be live so s->active is always set. */ @@ -1749,6 +1821,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_KILL; down_write(&s->lock); + dm_exception_table_lock(&lock); if (!s->valid || (unlikely(s->snapshot_overflowed) && bio_data_dir(bio) == WRITE)) { @@ -1771,9 +1844,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) if (bio_data_dir(bio) == WRITE) { pe = __lookup_pending_exception(s, chunk); if (!pe) { + dm_exception_table_unlock(&lock); up_write(&s->lock); pe = alloc_pending_exception(s); down_write(&s->lock); + dm_exception_table_lock(&lock); if (!s->valid || s->snapshot_overflowed) { free_pending_exception(pe); @@ -1790,13 +1865,17 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) pe = __find_pending_exception(s, pe, chunk); if (!pe) { + dm_exception_table_unlock(&lock); + if (s->store->userspace_supports_overflow) { s->snapshot_overflowed = 1; DMERR("Snapshot overflowed: Unable to allocate exception."); } else __invalidate_snapshot(s, -ENOMEM); + up_write(&s->lock); + r = DM_MAPIO_KILL; - goto out_unlock; + goto out; } } @@ -1808,6 +1887,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) bio->bi_iter.bi_size == (s->store->chunk_size << SECTOR_SHIFT)) { pe->started = 1; + dm_exception_table_unlock(&lock); up_write(&s->lock); start_full_bio(pe, bio); goto out; @@ -1818,6 +1898,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) if (!pe->started) { /* this is protected by snap->lock */ pe->started = 1; + dm_exception_table_unlock(&lock); up_write(&s->lock); start_copy(pe); goto out; @@ -1828,6 +1909,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) } out_unlock: + dm_exception_table_unlock(&lock); up_write(&s->lock); out: return r; @@ -2129,6 +2211,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, struct dm_snap_pending_exception *pe, *pe2; struct dm_snap_pending_exception *pe_to_start_now = NULL; struct dm_snap_pending_exception *pe_to_start_last = NULL; + struct dm_exception_table_lock lock; chunk_t chunk; /* Do all the snapshots on this origin */ @@ -2140,21 +2223,23 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, if (dm_target_is_snapshot_merge(snap->ti)) continue; - down_write(&snap->lock); - - /* Only deal with valid and active snapshots */ - if (!snap->valid || !snap->active) - goto next_snapshot; - /* Nothing to do if writing beyond end of snapshot */ if (sector >= dm_table_get_size(snap->ti->table)) - goto next_snapshot; + continue; /* * Remember, different snapshots can have * different chunk sizes. */ chunk = sector_to_chunk(snap->store, sector); + dm_exception_table_lock_init(snap, chunk, &lock); + + down_write(&snap->lock); + dm_exception_table_lock(&lock); + + /* Only deal with valid and active snapshots */ + if (!snap->valid || !snap->active) + goto next_snapshot; pe = __lookup_pending_exception(snap, chunk); if (!pe) { @@ -2167,9 +2252,11 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, if (e) goto next_snapshot; + dm_exception_table_unlock(&lock); up_write(&snap->lock); pe = alloc_pending_exception(snap); down_write(&snap->lock); + dm_exception_table_lock(&lock); if (!snap->valid) { free_pending_exception(pe); @@ -2187,8 +2274,11 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, pe = __insert_pending_exception(snap, pe, chunk); if (!pe) { + dm_exception_table_unlock(&lock); __invalidate_snapshot(snap, -ENOMEM); - goto next_snapshot; + up_write(&snap->lock); + + continue; } } else { free_pending_exception(pe); @@ -2219,6 +2309,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, } next_snapshot: + dm_exception_table_unlock(&lock); up_write(&snap->lock); if (pe_to_start_now) {