From patchwork Tue Feb 27 15:06:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13573997 Received: from wfhigh2-smtp.messagingengine.com (wfhigh2-smtp.messagingengine.com [64.147.123.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FCE01487DF for ; Tue, 27 Feb 2024 15:07:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.153 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709046424; cv=none; b=VItoEsB+4ZznDirtbux8U5dSb6qc/notCRTtv/J/blCcRRH6jl9TTd8H8+Tc33TN/mIlbxkKXqcq9dcJzMr0s15YQCCJRglGleoBgTcWeRjkVxceafMMO2c4CxzMfyfP5CiY+W9dgzFLICyV8EAhA/7ZcB1ybXldbFdC1eIfOVY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709046424; c=relaxed/simple; bh=bwxy3NdcIreWWmerl7X5jwA6Uv97oCQeHhUpLFiUpew=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=RVw/7zsS0HqNJJUIx4W0EJuTd9ydbUtJCrzyS2wZCarX8ARBpWTsn5sgVBQjnIZ7RWciTibGLlvcGVFZwLeBYCnZrVSDTtEPPuy7KncGk6Fwpdx/i9Du9GQdFqTn2OQVRRsCvI+PpzNcto56qFraSeWcsme8tMUo0wfbmhJt+KU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im; spf=pass smtp.mailfrom=pks.im; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b=MrBcw///; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=kwCmgCIf; arc=none smtp.client-ip=64.147.123.153 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="MrBcw///"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="kwCmgCIf" Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailfhigh.west.internal (Postfix) with ESMTP id 1E43318000B0; Tue, 27 Feb 2024 10:07:02 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 27 Feb 2024 10:07:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1709046421; x=1709132821; bh=JQ6Pf1c2wY Hysh3NOP66HAv0d4sUnT0xw/at/bnxjKc=; b=MrBcw///dSkVKt4nz43aGpM/2q Qsgj4HpS1B44JKEVI4Jbt1OIkBe9KyZ4csqgby/6jWZ8fUEi8hVQloVqwvyGseLd DVe3XBvBpEjzs5tpoUG+CSTBWzBnx5fbcbNmQO/hizBbTtxDJi0gd38HQzueX1Rh o+911Z2P89Fk5WMJG2gdOe+1V5omAixSqLmE3vIBR91ks73FGDGn+ibwQASGBtl4 qY2HGTm22cB2ZTtKD82AkrY//t3yHZzN71ooXxwTpSeRRFypiuXhvw4vii7lmTiY 2WO7LUsd1/LvQHUPVGYsK8XcMBObnXorr3rM2WpxAq5D8VMagMcULnKj7ZGQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1709046421; x=1709132821; bh=JQ6Pf1c2wYHysh3NOP66HAv0d4sU nT0xw/at/bnxjKc=; b=kwCmgCIf54YHL3F63vaDPn38WQUcDQK4/jwwlrBkV3LS MCy81zGrEDGHd0Iep19FbOtZZ0356tKhJ8+j1Edd7ALmqFojewusrHTVONmJLxG7 G+s+MX4C3nnj+N/stlRwuoC8oycFapMqO+pDCJMZStt8uNMC3dUWD4qm5fK5425f j5Eod3daRsEaNxQCK5s2VCaB9/pOA09LITMO93OM2SA3ELCI2ODLKx7oHwjKemY+ RBF5Yg0GbnzjOU5p76aHlH2DJDuBTJHyc7Fj0wmSU6jPyeFoQZYQKzToVeHfX9kC NAOFQRuLrWe8knuIzM1Nxx2fV0MUJ4bQqNGa8dxMBg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrgeehgdegkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvvefukfhfgggtuggjsehgtderredttdejnecuhfhrohhmpefrrghtrhhi tghkucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrh hnpeetueevhffhudefvdegieeuieelgedthfegfedtueevjeejtdfgjeehudejuedtuden ucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhepmhgrihhlfhhrohhmpehpshesph hkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 27 Feb 2024 10:07:00 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id ddfef1fd (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 27 Feb 2024 15:02:43 +0000 (UTC) Date: Tue, 27 Feb 2024 16:06:59 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Justin Tobler Subject: [PATCH v2 11/13] reftable/record: decode keys in place Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When reading a record from a block, we need to decode the record's key. As reftable keys are prefix-compressed, meaning they reuse a prefix from the preceding record's key, this is a bit more involved than just having to copy the relevant bytes: we need to figure out the prefix and suffix lengths, copy the prefix from the preceding record and finally copy the suffix from the current record. This is done by passing three buffers to `reftable_decode_key()`: one buffer that holds the result, one buffer that holds the last key, and one buffer that points to the current record. The final key is then assembled by calling `strbuf_add()` twice to copy over the prefix and suffix. Performing two memory copies is inefficient though. And we can indeed do better by decoding keys in place. Instead of providing two buffers, the caller may only call a single buffer that is already pre-populated with the last key. Like this, we only have to call `strbuf_setlen()` to trim the record to its prefix and then `strbuf_add()` to add the suffix. This refactoring leads to a noticeable performance bump when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 112.2 ms ± 3.9 ms [User: 109.3 ms, System: 2.8 ms] Range (min … max): 109.2 ms … 149.6 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 106.0 ms ± 3.5 ms [User: 103.2 ms, System: 2.7 ms] Range (min … max): 103.2 ms … 133.7 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt --- reftable/block.c | 25 +++++++++++-------------- reftable/block.h | 2 -- reftable/record.c | 19 +++++++++---------- reftable/record.h | 9 ++++++--- reftable/record_test.c | 3 ++- 5 files changed, 28 insertions(+), 30 deletions(-) diff --git a/reftable/block.c b/reftable/block.c index 72eb73b380..ad9074dba6 100644 --- a/reftable/block.c +++ b/reftable/block.c @@ -291,9 +291,8 @@ static int restart_key_less(size_t idx, void *args) /* the restart key is verbatim in the block, so this could avoid the alloc for decoding the key */ struct strbuf rkey = STRBUF_INIT; - struct strbuf last_key = STRBUF_INIT; uint8_t unused_extra; - int n = reftable_decode_key(&rkey, &unused_extra, last_key, in); + int n = reftable_decode_key(&rkey, &unused_extra, in); int result; if (n < 0) { a->error = 1; @@ -326,35 +325,34 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec) if (it->next_off >= it->br->block_len) return 1; - n = reftable_decode_key(&it->key, &extra, it->last_key, in); + n = reftable_decode_key(&it->last_key, &extra, in); if (n < 0) return -1; - - if (!it->key.len) + if (!it->last_key.len) return REFTABLE_FORMAT_ERROR; string_view_consume(&in, n); - n = reftable_record_decode(rec, it->key, extra, in, it->br->hash_size); + n = reftable_record_decode(rec, it->last_key, extra, in, it->br->hash_size); if (n < 0) return -1; string_view_consume(&in, n); - strbuf_swap(&it->last_key, &it->key); it->next_off += start.len - in.len; return 0; } int block_reader_first_key(struct block_reader *br, struct strbuf *key) { - struct strbuf empty = STRBUF_INIT; - int off = br->header_off + 4; + int off = br->header_off + 4, n; struct string_view in = { .buf = br->block.data + off, .len = br->block_len - off, }; - uint8_t extra = 0; - int n = reftable_decode_key(key, &extra, empty, in); + + strbuf_reset(key); + + n = reftable_decode_key(key, &extra, in); if (n < 0) return n; if (!key->len) @@ -371,7 +369,6 @@ int block_iter_seek(struct block_iter *it, struct strbuf *want) void block_iter_close(struct block_iter *it) { strbuf_release(&it->last_key); - strbuf_release(&it->key); } int block_reader_seek(struct block_reader *br, struct block_iter *it, @@ -408,8 +405,8 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it, if (err < 0) goto done; - reftable_record_key(&rec, &it->key); - if (err > 0 || strbuf_cmp(&it->key, want) >= 0) { + reftable_record_key(&rec, &it->last_key); + if (err > 0 || strbuf_cmp(&it->last_key, want) >= 0) { err = 0; goto done; } diff --git a/reftable/block.h b/reftable/block.h index 17481e6331..51699af233 100644 --- a/reftable/block.h +++ b/reftable/block.h @@ -84,12 +84,10 @@ struct block_iter { /* key for last entry we read. */ struct strbuf last_key; - struct strbuf key; }; #define BLOCK_ITER_INIT { \ .last_key = STRBUF_INIT, \ - .key = STRBUF_INIT, \ } /* initializes a block reader. */ diff --git a/reftable/record.c b/reftable/record.c index 3f2a639036..37682cc7d0 100644 --- a/reftable/record.c +++ b/reftable/record.c @@ -159,20 +159,19 @@ int reftable_encode_key(int *restart, struct string_view dest, return start.len - dest.len; } -int reftable_decode_key(struct strbuf *key, uint8_t *extra, - struct strbuf last_key, struct string_view in) +int reftable_decode_key(struct strbuf *last_key, uint8_t *extra, + struct string_view in) { int start_len = in.len; uint64_t prefix_len = 0; uint64_t suffix_len = 0; - int n = get_var_int(&prefix_len, &in); + int n; + + n = get_var_int(&prefix_len, &in); if (n < 0) return -1; string_view_consume(&in, n); - if (prefix_len > last_key.len) - return -1; - n = get_var_int(&suffix_len, &in); if (n <= 0) return -1; @@ -181,12 +180,12 @@ int reftable_decode_key(struct strbuf *key, uint8_t *extra, *extra = (uint8_t)(suffix_len & 0x7); suffix_len >>= 3; - if (in.len < suffix_len) + if (in.len < suffix_len || + prefix_len > last_key->len) return -1; - strbuf_reset(key); - strbuf_add(key, last_key.buf, prefix_len); - strbuf_add(key, in.buf, suffix_len); + strbuf_setlen(last_key, prefix_len); + strbuf_add(last_key, in.buf, suffix_len); string_view_consume(&in, suffix_len); return start_len - in.len; diff --git a/reftable/record.h b/reftable/record.h index a05e2be179..91c9c6ebfd 100644 --- a/reftable/record.h +++ b/reftable/record.h @@ -81,9 +81,12 @@ int reftable_encode_key(int *is_restart, struct string_view dest, struct strbuf prev_key, struct strbuf key, uint8_t extra); -/* Decode into `key` and `extra` from `in` */ -int reftable_decode_key(struct strbuf *key, uint8_t *extra, - struct strbuf last_key, struct string_view in); +/* + * Decode into `last_key` and `extra` from `in`. `last_key` is expected to + * contain the decoded key of the preceding record, if any. + */ +int reftable_decode_key(struct strbuf *last_key, uint8_t *extra, + struct string_view in); /* reftable_index_record are used internally to speed up lookups. */ struct reftable_index_record { diff --git a/reftable/record_test.c b/reftable/record_test.c index a86cff5526..89209894d8 100644 --- a/reftable/record_test.c +++ b/reftable/record_test.c @@ -295,7 +295,8 @@ static void test_key_roundtrip(void) EXPECT(!restart); EXPECT(n > 0); - m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest); + strbuf_addstr(&roundtrip, "refs/heads/master"); + m = reftable_decode_key(&roundtrip, &rt_extra, dest); EXPECT(n == m); EXPECT(0 == strbuf_cmp(&key, &roundtrip)); EXPECT(rt_extra == extra);