From patchwork Wed Dec 20 09:17:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13499698 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEF381F5F5 for ; Wed, 20 Dec 2023 09:17:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="VCgfdUqE"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="ONkXRfTy" Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 6BC935C01DF for ; Wed, 20 Dec 2023 04:17:08 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Wed, 20 Dec 2023 04:17:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1703063828; x=1703150228; bh=2RrOEu2ZE8 ZgV2dcYO0vKb6laIFK3q5kveNwGJTZaWo=; b=VCgfdUqEPLERYTJv1T+BQ1t88j Xpa7nRAnvHmi2mzANlpvEp3Rw5HlvdqWxmzTMrFlWx+OqHmdFUaDmhdKzxqvwgZW sd480p/Nc+PX0FrdmGbMqkWTrxqeDCV52IhJ+aOFC8rCLakH2o+hDOSnw64D4UzR Lb0wilopmKCX/mWD2+9JxVP46WZbGjshozEiakZsolUlt14MUH+UvRPz8zntEylt iWbrQI4/A9/v1+AEUdE1YGtxqVjuZKalhCS+Eh9Yw8vA5JiXIZRXElZ1Nyof25D2 uiXTGUeqwBO8rfkNy1JURhDErdp5YYdfdkqqE43vVw7cF16M3VbH2XvZWOVg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1703063828; x=1703150228; bh=2RrOEu2ZE8ZgV2dcYO0vKb6laIFK 3q5kveNwGJTZaWo=; b=ONkXRfTy5PD5xTPR54FuqQv2+Vn5K1WlM7mRc0KoE0I0 UX55cXWrzXoW5s/ovC5HWnmjcH8IWtxLMUJbPOAaLznZ29v/Q7ZTcZ1YYWdnL0D7 qgw/ICnwIVM0M+r1rAShQk8V3Vnpq0yCUhKcuyd+2eOAtkbw6Er2yN4cGWLzT95P pyTmacEw6ozPpiCBmaNdAMwVnRj5J0msFlz1egp1svmQs01bp9uk1u6Ox961l5cD BfRyOPkhe6Dv1kzN55PX5aeQoWVKp6m5Jq4SkEnh3+8o8aIeExbOYnbdpaK2laKf mGHGrOKrCvCkqs7dmjS/KxdEYVXBiKx/9qzEbNoSVw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdduvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhepheeghfdtfeeuffehkefgffduleffjedthf dvjeektdfhhedvlefgtefgvdettdfhnecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Dec 2023 04:17:07 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 09cd5230 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 20 Dec 2023 09:15:12 +0000 (UTC) Date: Wed, 20 Dec 2023 10:17:05 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 1/7] reftable/stack: do not overwrite errors when compacting Message-ID: <573fb2c4fb6c97d4279058cce6aab877c27e77bc.1703063544.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In order to compact multiple stacks we iterate through the merged ref and log records. When there is any error either when reading the records from the old merged table or when writing the records to the new table then we break out of the respective loops. When breaking out of the loop for the ref records though the error code will be overwritten, which may cause us to inadvertently skip over bad ref records. In the worst case, this can lead to a compacted stack that is missing records. Fix the code by using `goto done` instead so that any potential error codes are properly returned to the caller. Signed-off-by: Patrick Steinhardt --- reftable/stack.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 16bab82063..8729508dc3 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -801,18 +801,16 @@ static int stack_write_compact(struct reftable_stack *st, err = 0; break; } - if (err < 0) { - break; - } + if (err < 0) + goto done; if (first == 0 && reftable_ref_record_is_deletion(&ref)) { continue; } err = reftable_writer_add_ref(wr, &ref); - if (err < 0) { - break; - } + if (err < 0) + goto done; entries++; } reftable_iterator_destroy(&it); @@ -827,9 +825,8 @@ static int stack_write_compact(struct reftable_stack *st, err = 0; break; } - if (err < 0) { - break; - } + if (err < 0) + goto done; if (first == 0 && reftable_log_record_is_deletion(&log)) { continue; } @@ -845,9 +842,8 @@ static int stack_write_compact(struct reftable_stack *st, } err = reftable_writer_add_log(wr, &log); - if (err < 0) { - break; - } + if (err < 0) + goto done; entries++; } From patchwork Wed Dec 20 09:17:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13499699 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59CAD1F92D for ; Wed, 20 Dec 2023 09:17:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="IzZu8hpr"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="HNeAKkh1" Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 51CDE5C01DA for ; Wed, 20 Dec 2023 04:17:12 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 20 Dec 2023 04:17:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1703063832; x=1703150232; bh=OwTrUkx6cu yCXOZbOedIuYeuZU8XX5iKo9cFe0omCTM=; b=IzZu8hprIcwwPTTrNIqKhUjhD7 z3OhnByca6AUiWh8cp1jxV8ThVDMMDURQYjl/xitrRVSmDHGGMMwVBPAN9P9j6H/ Y/b2r6YJGhTqBchtYeFPauEE/F2xYXBId+4jowBszjofNCPc/fB9S1mozpxwhR1B sQ6cZsAXErUcuR2Xko2Gbqv7qPPYuEeBTd2pHOfPbhg9pqyxhpmdI654MNcmpGXJ lnzsGE+4UDrCpAVP1gkIJS4CkMqlKKKNivheX++S8KUNhASL1etAdEbBsaNryLMw GzNVXeAoO7HIHgIo1MuMp2/nlIs/UfFT56atfG0HOpnS9Bo6S5VDuVm0+92g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1703063832; x=1703150232; bh=OwTrUkx6cuyCXOZbOedIuYeuZU8X X5iKo9cFe0omCTM=; b=HNeAKkh1jmhaj08LB6rUK8uFJK5ycq6JYD4hCbfniMjJ t5MD3GUtr6DsTNLgxCo+E6BmgbRH4dhtQ+cnLLyLQPpwlY2OzaXNVLnHcI8wj5zJ GQbXXK6z0yftUy3rBFZAwfB3EaTz525UivI/WaKfELiN684FERsjW0YwV/3B5MUR iTl6R17sSJyoWN8KdO9hlTDOqMEqcC8B9YOEB14aBwOoNpuyztHCxNi9lfKkin03 tD5MQw9eFbhtbfw3FfqPChnCdSKrqShM3vHgkMxqaogL4FrDF2IkG7xtD7BMtQJG KRZVWAnlr9AI86Q6y29ZEHOLyeDzVNG398TXgpETsw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdduvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhepheeghfdtfeeuffehkefgffduleffjedthf dvjeektdfhhedvlefgtefgvdettdfhnecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Dec 2023 04:17:11 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id c884e674 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 20 Dec 2023 09:15:16 +0000 (UTC) Date: Wed, 20 Dec 2023 10:17:09 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 2/7] reftable/writer: fix index corruption when writing multiple indices Message-ID: <86ee79c48d599fee5e85bf88a17a59929ae1ecab.1703063544.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Each reftable may contain multiple types of blocks for refs, objects and reflog records, where each of these may have an index that makes it more efficient to find the records. It was observed that the index for log records can become corrupted under certain circumstances, where the first entry of the index points into the object index instead of to the log records. As it turns out, this corruption can occur whenever we write a log index as well as at least one additional index. Writing records and their index is basically a two-step process: 1. We write all blocks for the corresponding record. Each block that gets written is added to a list of blocks to index. 2. Once all blocks were written we finish the section. If at least two blocks have been added to the list of blocks to index then we will now write the index for those blocks and flush it, as well. When we have a very large number of blocks then we may decide to write a multi-level index, which is why we also keep track of the list of the index blocks in the same way as we previously kept track of the blocks to index. Now when we have finished writing all index blocks we clear the index and flush the last block to disk. This is done in the wrong order though because flushing the block to disk will re-add it to the list of blocks to be indexed. The result is that the next section we are about to write will have an entry in the list of blocks to index that points to the last block of the preceding section's index, which will corrupt the log index. Fix this corruption by clearing the index after having written the last block. Signed-off-by: Patrick Steinhardt --- reftable/readwrite_test.c | 80 +++++++++++++++++++++++++++++++++++++++ reftable/writer.c | 4 +- 2 files changed, 82 insertions(+), 2 deletions(-) diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c index 278663f22d..9c16e0504e 100644 --- a/reftable/readwrite_test.c +++ b/reftable/readwrite_test.c @@ -798,6 +798,85 @@ static void test_write_key_order(void) strbuf_release(&buf); } +static void test_write_multiple_indices(void) +{ + struct reftable_write_options opts = { + .block_size = 100, + }; + struct strbuf writer_buf = STRBUF_INIT, buf = STRBUF_INIT; + struct reftable_block_source source = { 0 }; + struct reftable_iterator it = { 0 }; + const struct reftable_stats *stats; + struct reftable_writer *writer; + struct reftable_reader *reader; + int err, i; + + writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts); + reftable_writer_set_limits(writer, 1, 1); + for (i = 0; i < 100; i++) { + unsigned char hash[GIT_SHA1_RAWSZ] = {i}; + struct reftable_ref_record ref = { + .update_index = 1, + .value_type = REFTABLE_REF_VAL1, + .value.val1 = hash, + }; + + strbuf_reset(&buf); + strbuf_addf(&buf, "refs/heads/%04d", i); + ref.refname = buf.buf, + + err = reftable_writer_add_ref(writer, &ref); + EXPECT_ERR(err); + } + + for (i = 0; i < 100; i++) { + unsigned char hash[GIT_SHA1_RAWSZ] = {i}; + struct reftable_log_record log = { + .update_index = 1, + .value_type = REFTABLE_LOG_UPDATE, + .value.update = { + .old_hash = hash, + .new_hash = hash, + }, + }; + + strbuf_reset(&buf); + strbuf_addf(&buf, "refs/heads/%04d", i); + log.refname = buf.buf, + + err = reftable_writer_add_log(writer, &log); + EXPECT_ERR(err); + } + + reftable_writer_close(writer); + + /* + * The written data should be sufficiently large to result in indices + * for each of the block types. + */ + stats = reftable_writer_stats(writer); + EXPECT(stats->ref_stats.index_offset > 0); + EXPECT(stats->obj_stats.index_offset > 0); + EXPECT(stats->log_stats.index_offset > 0); + + block_source_from_strbuf(&source, &writer_buf); + err = reftable_new_reader(&reader, &source, "filename"); + EXPECT_ERR(err); + + /* + * Seeking the log uses the log index now. In case there is any + * confusion regarding indices we would notice here. + */ + err = reftable_reader_seek_log(reader, &it, ""); + EXPECT_ERR(err); + + reftable_iterator_destroy(&it); + reftable_writer_free(writer); + reftable_reader_free(reader); + strbuf_release(&writer_buf); + strbuf_release(&buf); +} + static void test_corrupt_table_empty(void) { struct strbuf buf = STRBUF_INIT; @@ -847,5 +926,6 @@ int readwrite_test_main(int argc, const char *argv[]) RUN_TEST(test_log_overflow); RUN_TEST(test_write_object_id_length); RUN_TEST(test_write_object_id_min_length); + RUN_TEST(test_write_multiple_indices); return 0; } diff --git a/reftable/writer.c b/reftable/writer.c index 2e322a5683..ee4590e20f 100644 --- a/reftable/writer.c +++ b/reftable/writer.c @@ -432,12 +432,12 @@ static int writer_finish_section(struct reftable_writer *w) reftable_free(idx); } - writer_clear_index(w); - err = writer_flush_block(w); if (err < 0) return err; + writer_clear_index(w); + bstats = writer_reftable_block_stats(w, typ); bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks; bstats->index_offset = index_start; From patchwork Wed Dec 20 09:17:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13499700 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A52C200CD for ; Wed, 20 Dec 2023 09:17:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="tfOUe7EY"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="ARoPZ730" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 3A4B95C01E2 for ; Wed, 20 Dec 2023 04:17:16 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Wed, 20 Dec 2023 04:17:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1703063836; x=1703150236; bh=sTI8hiXSak S5gsZAiu+ASBpkk2qb2jMOHHdNmIf5/HM=; b=tfOUe7EY4cgcdfcHzlGYi40bHS VQWqh7qDjgTvOrB6/qgBUnRR44TDQRO3IBukWSAHS1ZL9qRAglkBtHmR0yZj8HjA w4LSHxIYrHD/vXcvKQodqHeMYJZnp+TIzmaw3864ofdPfXkOOKZuPA2L9zywUk7v t4DMvOcqR60KKwqt17AB+1sihYpm7MU+2OEqu/u88VYUlsNZ+bba7Y/94QlvbG3K PWO9oYeakZ503BqixIy8rABA8khVkt2E9ujuEA5p8CvOQ+RAMIkwuPfCLEOyUZdV 5b/leWi9ayWOUMqR4BwxJQMoP5xe7SoI1bZhGkAHbsJPuF4/PaMf3Osa4+Cg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1703063836; x=1703150236; bh=sTI8hiXSakS5gsZAiu+ASBpkk2qb 2jMOHHdNmIf5/HM=; b=ARoPZ730EWlMX6HDMmYuVu7vAIDh8So5Vg4MmRn1CrZG 7jHdzpFhlzp1mXcmqVGDaUZrH/WOTOIR6V2SeiiIt754T5wVPBoAcwqoj+Ef17n7 x91J/MMjN8a6i4P913pRPvx9CW8envtnBfOBxZiszr2S6bEocV1GRmx/BePyzV2z QukN8v04UYnP0WhONiPEACKDW8TLh6JtpCb2uxAebHc+N12dorPnRpTbi3X8jwIK v72Dh5CZQAJ+SRYWG5eH/RSGntwRZhDtSe4gty+8L2YXbSwRIAIDJbY8JWeUET/o w7QcevNx/SOMSvdkEuZSNqFZvBBt8Hkdwj9nKXtagg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdduvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhepheeghfdtfeeuffehkefgffduleffjedthf dvjeektdfhhedvlefgtefgvdettdfhnecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Dec 2023 04:17:15 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 896d9411 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 20 Dec 2023 09:15:20 +0000 (UTC) Date: Wed, 20 Dec 2023 10:17:13 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 3/7] reftable/record: constify some parts of the interface Message-ID: <3ad4a0e5b9afd7ef841c2671ac4a8dd23b315e4b.1703063544.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: We're about to convert reftable records to stop storing their object IDs as allocated hashes. Prepare for this refactoring by constifying some parts of the interface that will be impacted by this. Signed-off-by: Patrick Steinhardt --- reftable/record.c | 8 ++++---- reftable/reftable-record.h | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/reftable/record.c b/reftable/record.c index fbaa1fbef5..5e258c734b 100644 --- a/reftable/record.c +++ b/reftable/record.c @@ -76,7 +76,7 @@ int reftable_is_block_type(uint8_t typ) return 0; } -uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec) +const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec) { switch (rec->value_type) { case REFTABLE_REF_VAL1: @@ -88,7 +88,7 @@ uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec) } } -uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec) +const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec) { switch (rec->value_type) { case REFTABLE_REF_VAL2: @@ -242,7 +242,7 @@ static char hexdigit(int c) return 'a' + (c - 10); } -static void hex_format(char *dest, uint8_t *src, int hash_size) +static void hex_format(char *dest, const unsigned char *src, int hash_size) { assert(hash_size > 0); if (src) { @@ -1164,7 +1164,7 @@ int reftable_record_equal(struct reftable_record *a, struct reftable_record *b, reftable_record_data(a), reftable_record_data(b), hash_size); } -static int hash_equal(uint8_t *a, uint8_t *b, int hash_size) +static int hash_equal(const unsigned char *a, const unsigned char *b, int hash_size) { if (a && b) return !memcmp(a, b, hash_size); diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h index 67104f8fbf..f7eb2d6015 100644 --- a/reftable/reftable-record.h +++ b/reftable/reftable-record.h @@ -49,11 +49,11 @@ struct reftable_ref_record { /* Returns the first hash, or NULL if `rec` is not of type * REFTABLE_REF_VAL1 or REFTABLE_REF_VAL2. */ -uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec); +const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec); /* Returns the second hash, or NULL if `rec` is not of type * REFTABLE_REF_VAL2. */ -uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec); +const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec); /* returns whether 'ref' represents a deletion */ int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref); From patchwork Wed Dec 20 09:17:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13499701 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C255220313 for ; Wed, 20 Dec 2023 09:17:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="vW2M2PSf"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="zNf160MB" Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 229945C01E5 for ; Wed, 20 Dec 2023 04:17:20 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Wed, 20 Dec 2023 04:17:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1703063840; x=1703150240; bh=nxEVCiPimo C99N6wAyN+qy66ywNRNCzXda2GpyTZNQw=; b=vW2M2PSfI16+IcpR9cHFZqbhYp sUe0O5261vdzz8+vgKS+GXth5ijbpIJ2+pC2zNMqDsiIJsFF1QNt6QjCKAOvn+TE yZaH5TyoClza6RrBBy98n+ieL1PLwqMw12AWOq/3NIipDcYGEqE5niSVzAsLqLMu mCUgUwb18QqziwuGoE+f5+kmr8bn4CLySN5ZmbSFyrNJZj/b8QPvkUo5G1F9l3GK b7I6zMfdszh3Nkm7PbgOFVX5PddAF7kDU1ZH/Bys+2kuuKROcKpbzvZnSUeBJu86 lZVUwAutdda/BuSWt/3pTNisMq1UgTuqL6NqKVfVGID3/mzIMcR0Ki1fjDIQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1703063840; x=1703150240; bh=nxEVCiPimoC99N6wAyN+qy66ywNR NCzXda2GpyTZNQw=; b=zNf160MBYdQytL+tmV4GygJh1gq87hpvMLHgsOV/zMW7 9xEMiYx+q/hUYAzlo1VCvFzbUTkIVAucRAV1YhVMep3h8pcZP5tYYp+i/OKYI2qZ rDLxlnsecWZKtqUGZSXjjU54QKHcpUwrip2zOmb5sFfHdDE55H8tiQYe5W3aECo9 hwxY9whOriF3QPMUsW02v6Wn8cxVIqziRB2MofGmihAmKv5W8AEVEBUmkBTr0ekG +SYwiblKXpRyQApGmCZKfXaPHrDSyTR856r0OGiNExOyzPCuHZAzJvbxKhqoEdi+ 2ZPEAovml2oLzvFyGSdm6cTz5lARPRt76ma1BCA+Yw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdduvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhepheeghfdtfeeuffehkefgffduleffjedthf dvjeektdfhhedvlefgtefgvdettdfhnecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Dec 2023 04:17:19 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id b7aeaf71 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 20 Dec 2023 09:15:24 +0000 (UTC) Date: Wed, 20 Dec 2023 10:17:17 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 4/7] reftable/record: store "val1" hashes as static arrays Message-ID: <06c9eab67802ba933b44d32f5c8d11fddc216c26.1703063544.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When reading ref records of type "val1" we store its object ID in an allocated array. This results in an additional allocation for every single ref record we read, which is rather inefficient especially when iterating over refs. Refactor the code to instead use a static array of `GIT_MAX_RAWSZ` bytes. While this means that `struct ref_record` is bigger now, we typically do not store all refs in an array anyway and instead only handle a limited number of records at the same point in time. Using `git show-ref --quiet` in a repository with ~350k refs this leads to a significant drop in allocations. Before: HEAP SUMMARY: in use at exit: 21,098 bytes in 192 blocks total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated After: HEAP SUMMARY: in use at exit: 21,098 bytes in 192 blocks total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated Signed-off-by: Patrick Steinhardt --- reftable/block_test.c | 4 +--- reftable/merged_test.c | 16 ++++++---------- reftable/readwrite_test.c | 11 +++-------- reftable/record.c | 3 --- reftable/record_test.c | 1 - reftable/reftable-record.h | 2 +- reftable/stack_test.c | 2 -- 7 files changed, 11 insertions(+), 28 deletions(-) diff --git a/reftable/block_test.c b/reftable/block_test.c index c00bbc8aed..dedb05c7d8 100644 --- a/reftable/block_test.c +++ b/reftable/block_test.c @@ -49,13 +49,11 @@ static void test_block_read_write(void) for (i = 0; i < N; i++) { char name[100]; - uint8_t hash[GIT_SHA1_RAWSZ]; snprintf(name, sizeof(name), "branch%02d", i); - memset(hash, i, sizeof(hash)); rec.u.ref.refname = name; rec.u.ref.value_type = REFTABLE_REF_VAL1; - rec.u.ref.value.val1 = hash; + memset(rec.u.ref.value.val1, i, GIT_SHA1_RAWSZ); names[i] = xstrdup(name); n = block_writer_add(&bw, &rec); diff --git a/reftable/merged_test.c b/reftable/merged_test.c index d08c16abef..b3927a5d73 100644 --- a/reftable/merged_test.c +++ b/reftable/merged_test.c @@ -123,13 +123,11 @@ static void readers_destroy(struct reftable_reader **readers, size_t n) static void test_merged_between(void) { - uint8_t hash1[GIT_SHA1_RAWSZ] = { 1, 2, 3, 0 }; - struct reftable_ref_record r1[] = { { .refname = "b", .update_index = 1, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash1, + .value.val1 = { 1, 2, 3, 0 }, } }; struct reftable_ref_record r2[] = { { .refname = "a", @@ -165,26 +163,24 @@ static void test_merged_between(void) static void test_merged(void) { - uint8_t hash1[GIT_SHA1_RAWSZ] = { 1 }; - uint8_t hash2[GIT_SHA1_RAWSZ] = { 2 }; struct reftable_ref_record r1[] = { { .refname = "a", .update_index = 1, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash1, + .value.val1 = { 1 }, }, { .refname = "b", .update_index = 1, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash1, + .value.val1 = { 1 }, }, { .refname = "c", .update_index = 1, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash1, + .value.val1 = { 1 }, } }; struct reftable_ref_record r2[] = { { @@ -197,13 +193,13 @@ static void test_merged(void) .refname = "c", .update_index = 3, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash2, + .value.val1 = { 2 }, }, { .refname = "d", .update_index = 3, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash1, + .value.val1 = { 1 }, }, }; diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c index 9c16e0504e..56c0b4db5d 100644 --- a/reftable/readwrite_test.c +++ b/reftable/readwrite_test.c @@ -60,18 +60,15 @@ static void write_table(char ***names, struct strbuf *buf, int N, *names = reftable_calloc(sizeof(char *) * (N + 1)); reftable_writer_set_limits(w, update_index, update_index); for (i = 0; i < N; i++) { - uint8_t hash[GIT_SHA256_RAWSZ] = { 0 }; char name[100]; int n; - set_test_hash(hash, i); - snprintf(name, sizeof(name), "refs/heads/branch%02d", i); ref.refname = name; ref.update_index = update_index; ref.value_type = REFTABLE_REF_VAL1; - ref.value.val1 = hash; + set_test_hash(ref.value.val1, i); (*names)[i] = xstrdup(name); n = reftable_writer_add_ref(w, &ref); @@ -675,11 +672,10 @@ static void test_write_object_id_min_length(void) struct strbuf buf = STRBUF_INIT; struct reftable_writer *w = reftable_new_writer(&strbuf_add_void, &buf, &opts); - uint8_t hash[GIT_SHA1_RAWSZ] = {42}; struct reftable_ref_record ref = { .update_index = 1, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash, + .value.val1 = {42}, }; int err; int i; @@ -711,11 +707,10 @@ static void test_write_object_id_length(void) struct strbuf buf = STRBUF_INIT; struct reftable_writer *w = reftable_new_writer(&strbuf_add_void, &buf, &opts); - uint8_t hash[GIT_SHA1_RAWSZ] = {42}; struct reftable_ref_record ref = { .update_index = 1, .value_type = REFTABLE_REF_VAL1, - .value.val1 = hash, + .value.val1 = {42}, }; int err; int i; diff --git a/reftable/record.c b/reftable/record.c index 5e258c734b..a67a6b4d8a 100644 --- a/reftable/record.c +++ b/reftable/record.c @@ -219,7 +219,6 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec, case REFTABLE_REF_DELETION: break; case REFTABLE_REF_VAL1: - ref->value.val1 = reftable_malloc(hash_size); memcpy(ref->value.val1, src->value.val1, hash_size); break; case REFTABLE_REF_VAL2: @@ -303,7 +302,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref) reftable_free(ref->value.val2.value); break; case REFTABLE_REF_VAL1: - reftable_free(ref->value.val1); break; case REFTABLE_REF_DELETION: break; @@ -394,7 +392,6 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key, return -1; } - r->value.val1 = reftable_malloc(hash_size); memcpy(r->value.val1, in.buf, hash_size); string_view_consume(&in, hash_size); break; diff --git a/reftable/record_test.c b/reftable/record_test.c index 70ae78feca..5c94d26e35 100644 --- a/reftable/record_test.c +++ b/reftable/record_test.c @@ -119,7 +119,6 @@ static void test_reftable_ref_record_roundtrip(void) case REFTABLE_REF_DELETION: break; case REFTABLE_REF_VAL1: - in.u.ref.value.val1 = reftable_malloc(GIT_SHA1_RAWSZ); set_hash(in.u.ref.value.val1, 1); break; case REFTABLE_REF_VAL2: diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h index f7eb2d6015..44b5125445 100644 --- a/reftable/reftable-record.h +++ b/reftable/reftable-record.h @@ -38,7 +38,7 @@ struct reftable_ref_record { #define REFTABLE_NR_REF_VALUETYPES 4 } value_type; union { - uint8_t *val1; /* malloced hash. */ + unsigned char val1[GIT_MAX_RAWSZ]; struct { uint8_t *value; /* first value, malloced hash */ uint8_t *target_value; /* second value, malloced hash */ diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 14a3fc11ee..feab49d7f7 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -463,7 +463,6 @@ static void test_reftable_stack_add(void) refs[i].refname = xstrdup(buf); refs[i].update_index = i + 1; refs[i].value_type = REFTABLE_REF_VAL1; - refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ); set_test_hash(refs[i].value.val1, i); logs[i].refname = xstrdup(buf); @@ -600,7 +599,6 @@ static void test_reftable_stack_tombstone(void) refs[i].update_index = i + 1; if (i % 2 == 0) { refs[i].value_type = REFTABLE_REF_VAL1; - refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ); set_test_hash(refs[i].value.val1, i); } From patchwork Wed Dec 20 09:17:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13499702 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E91761D55C for ; Wed, 20 Dec 2023 09:17:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="OpiyJ3Lf"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="uJ+HU86k" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 08BA25C00D5 for ; Wed, 20 Dec 2023 04:17:25 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Wed, 20 Dec 2023 04:17:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1703063845; x=1703150245; bh=OPzbFCLw49 w3k19YqVHfuomKD/o7Hx5kBYXQPZinhjU=; b=OpiyJ3LfZ5fClrONzLN2WjTVaM yBFXG90CTo4rimRtFaHWatbsVOzNnkOJK31GsJgst7LPeBw8BCakOadYC+tLN4Sn Eh+LxJfhR8K+S4+2+AVZTQy+9xgS1f77N4T3xpf3RjL3fdUmq8NM4cdUxcjaGfCf 8yj84fEwqC+Qdwop1VYv3Z0qclO4yElR60TgwdOwItfd7q8CdImpORigWu68G8fN boqU99yhki5uaJsBn3IZfqsfL3PbOCgbr3PIA+uek5+fjsw9hYSGS26mg6yeVNNU 5uBUHIztCrzmmXDA6e7YjgyaUSGqjh3dymHNYJW4Yl+aT0+YOMYSQwHTK04A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1703063845; x=1703150245; bh=OPzbFCLw49w3k19YqVHfuomKD/o7 Hx5kBYXQPZinhjU=; b=uJ+HU86kLbpaUAeFDHn/4OUTE1TLm8MzOanPOfk9xyqG DtKeUDjUhxhZ1F0oqnhJXmyj1crJZJdp7rP5N81EIShnmRx/T40dOK8XwBrLwL6e yWEXIH/KszFjJKATjnGrklf2aTr1l6zt+MozLnVYUt08OEds5G8nL2bf1vTWNIBM ezqqLowVqpsDmh8PyVRWXIZetpz7xUAP2btvGG20PkiCla/6cyE9FR3V9bfTEvOL 7c7XqNF2ZFHajMn+aPyhZ7EZk/lyxpLHhJGPiE2PNiIISFWx4i3xO9j+4tRHy8Mb cyLQKPSVJvP9uqExS2HGS3fzPuJkGxgU2IPZ6PxHEA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdduvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhepfeelkeetleetteelkeehteekieduueehhf eljeetkeehgfevvdevledtjedtkefhnecuffhomhgrihhnpehvrghlvddrthgrrhhgvght necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhsse hpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Dec 2023 04:17:24 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 49fa591f (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 20 Dec 2023 09:15:29 +0000 (UTC) Date: Wed, 20 Dec 2023 10:17:22 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 5/7] reftable/record: store "val2" hashes as static arrays Message-ID: <49f13c123f211c0cf4b22316a326930ee5852c38.1703063544.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Similar to the preceding commit, convert ref records of type "val2" to store their object IDs in static arrays instead of allocating them for every single record. We're using the same benchmark as in the preceding commit, with `git show-ref --quiet` in a repository with ~350k refs. This time around though the effects aren't this huge. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated This is because "val2"-type records are typically only stored for peeled tags, and the number of annotated tags in the benchmark repository is rather low. Still, it can be seen that this change leads to a reduction of allocations overall, even if only a small one. Signed-off-by: Patrick Steinhardt --- reftable/readwrite_test.c | 12 ++++-------- reftable/record.c | 6 ------ reftable/record_test.c | 4 ---- reftable/reftable-record.h | 4 ++-- 4 files changed, 6 insertions(+), 20 deletions(-) diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c index 56c0b4db5d..900afc1b70 100644 --- a/reftable/readwrite_test.c +++ b/reftable/readwrite_test.c @@ -547,8 +547,6 @@ static void test_table_refs_for(int indexed) uint8_t hash[GIT_SHA1_RAWSZ]; char fill[51] = { 0 }; char name[100]; - uint8_t hash1[GIT_SHA1_RAWSZ]; - uint8_t hash2[GIT_SHA1_RAWSZ]; struct reftable_ref_record ref = { NULL }; memset(hash, i, sizeof(hash)); @@ -558,11 +556,9 @@ static void test_table_refs_for(int indexed) name[40] = 0; ref.refname = name; - set_test_hash(hash1, i / 4); - set_test_hash(hash2, 3 + i / 4); ref.value_type = REFTABLE_REF_VAL2; - ref.value.val2.value = hash1; - ref.value.val2.target_value = hash2; + set_test_hash(ref.value.val2.value, i / 4); + set_test_hash(ref.value.val2.target_value, 3 + i / 4); /* 80 bytes / entry, so 3 entries per block. Yields 17 */ @@ -570,8 +566,8 @@ static void test_table_refs_for(int indexed) n = reftable_writer_add_ref(w, &ref); EXPECT(n == 0); - if (!memcmp(hash1, want_hash, GIT_SHA1_RAWSZ) || - !memcmp(hash2, want_hash, GIT_SHA1_RAWSZ)) { + if (!memcmp(ref.value.val2.value, want_hash, GIT_SHA1_RAWSZ) || + !memcmp(ref.value.val2.target_value, want_hash, GIT_SHA1_RAWSZ)) { want_names[want_names_len++] = xstrdup(name); } } diff --git a/reftable/record.c b/reftable/record.c index a67a6b4d8a..5c3fbb7b2a 100644 --- a/reftable/record.c +++ b/reftable/record.c @@ -222,9 +222,7 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec, memcpy(ref->value.val1, src->value.val1, hash_size); break; case REFTABLE_REF_VAL2: - ref->value.val2.value = reftable_malloc(hash_size); memcpy(ref->value.val2.value, src->value.val2.value, hash_size); - ref->value.val2.target_value = reftable_malloc(hash_size); memcpy(ref->value.val2.target_value, src->value.val2.target_value, hash_size); break; @@ -298,8 +296,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref) reftable_free(ref->value.symref); break; case REFTABLE_REF_VAL2: - reftable_free(ref->value.val2.target_value); - reftable_free(ref->value.val2.value); break; case REFTABLE_REF_VAL1: break; @@ -401,11 +397,9 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key, return -1; } - r->value.val2.value = reftable_malloc(hash_size); memcpy(r->value.val2.value, in.buf, hash_size); string_view_consume(&in, hash_size); - r->value.val2.target_value = reftable_malloc(hash_size); memcpy(r->value.val2.target_value, in.buf, hash_size); string_view_consume(&in, hash_size); break; diff --git a/reftable/record_test.c b/reftable/record_test.c index 5c94d26e35..2876db7d27 100644 --- a/reftable/record_test.c +++ b/reftable/record_test.c @@ -122,11 +122,7 @@ static void test_reftable_ref_record_roundtrip(void) set_hash(in.u.ref.value.val1, 1); break; case REFTABLE_REF_VAL2: - in.u.ref.value.val2.value = - reftable_malloc(GIT_SHA1_RAWSZ); set_hash(in.u.ref.value.val2.value, 1); - in.u.ref.value.val2.target_value = - reftable_malloc(GIT_SHA1_RAWSZ); set_hash(in.u.ref.value.val2.target_value, 2); break; case REFTABLE_REF_SYMREF: diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h index 44b5125445..83d252ec2c 100644 --- a/reftable/reftable-record.h +++ b/reftable/reftable-record.h @@ -40,8 +40,8 @@ struct reftable_ref_record { union { unsigned char val1[GIT_MAX_RAWSZ]; struct { - uint8_t *value; /* first value, malloced hash */ - uint8_t *target_value; /* second value, malloced hash */ + unsigned char value[GIT_MAX_RAWSZ]; /* first hash */ + unsigned char target_value[GIT_MAX_RAWSZ]; /* second hash */ } val2; char *symref; /* referent, malloced 0-terminated string */ } value; From patchwork Wed Dec 20 09:17:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13499703 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA2381DA50 for ; Wed, 20 Dec 2023 09:17:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="mAo/jkhG"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="BltLfGTB" Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id EABF35C00D5 for ; Wed, 20 Dec 2023 04:17:28 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Wed, 20 Dec 2023 04:17:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1703063848; x=1703150248; bh=c+bkSn9K8f 55TeF00jgUIPh4PUQKYah/DXHNF+bB8o4=; b=mAo/jkhG8qV2hY8KIvuO87Uszk 4eK3rLdBX+OeSo1P+Vt9PvBkezg4XaFIDy1VTqQwcLZT09K2EWHWqqgbgiwgK/g8 6k3212f+85f+VgZ0nT3DVCrOrruvgbBbZnl7ZKWs+EeFmpFw6UwJaySYsgj9tVe9 LyuhHrgSvoMt8bI6HVMRizp4ENsCujYJih/C5TyPBQK1s4zk7tYquzzKrnfpWsAI bCXLcChiX8BKm4zr+L1HRTPUHkP6L7y8StNWT26U35DNa9pTx2BRN4fH9pmDdov9 9Ni39N+5pbXxWjt6ATC0YdHNOFWrk1msO7ANyld5kAoL641MHUD3cr9PNT7Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1703063848; x=1703150248; bh=c+bkSn9K8f55TeF00jgUIPh4PUQK Yah/DXHNF+bB8o4=; b=BltLfGTBS8J2B0ko2uH7U9maFDtKsstxVEkDGueFxqVL rk5ORou4cBp4/I2GPYY5SnyrTM5rGVsNa9Vx8Ts4JpDAcAjmYeLE/5/furBRQFsu 1nuVIGxyJ1MEhtahdiilZNgWVXv2Aw0H0RIsu6aSgE6fIRiyoQ33ZINJh9igZGCp N1sKGxeVZGfvIdFoJ9TbUy+dMbaWwS2dcDEF/di7S3T6H34s5n+6wKhvTQxShd0A nNvNZoFyaP5q8asJinowhuHskJYoa1Acz5UV/JgPA7kiAuwLeMjFinASzbzgRHTq KVttAcHaOSkbdFUzAAdGboX1CMkfVaZ/Tmp4E/pQHw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdduvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhepheeghfdtfeeuffehkefgffduleffjedthf dvjeektdfhhedvlefgtefgvdettdfhnecuvehluhhsthgvrhfuihiivgepudenucfrrghr rghmpehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Dec 2023 04:17:28 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 52a6f5aa (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 20 Dec 2023 09:15:33 +0000 (UTC) Date: Wed, 20 Dec 2023 10:17:26 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 6/7] reftable/merged: really reuse buffers to compute record keys Message-ID: <32b7ddee28243922ae2754ec69252ff87635202a.1703063544.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In 829231dc20 (reftable/merged: reuse buffer to compute record keys, 2023-12-11), we have refactored the merged iterator to reuse a set of buffers that it would otherwise have to reallocate on every single iteration. Unfortunately, there was a brown-paper-bag-style bug here as we continue to release these buffers after the iteration, and thus we have essentially gained nothing. Fix this performance issue by not releasing those buffers on iteration anymore, where we instead rely on `merged_iter_close()` to release the buffers for us. Using `git show-ref --quiet` in a repository with ~350k refs this leads to a significant drop in allocations. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated Signed-off-by: Patrick Steinhardt --- reftable/merged.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/reftable/merged.c b/reftable/merged.c index 556bb5c556..a28bb99aaf 100644 --- a/reftable/merged.c +++ b/reftable/merged.c @@ -128,8 +128,6 @@ static int merged_iter_next_entry(struct merged_iter *mi, done: reftable_record_release(&entry.rec); - strbuf_release(&mi->entry_key); - strbuf_release(&mi->key); return err; } From patchwork Wed Dec 20 09:17:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13499704 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80B4E1F5FC for ; Wed, 20 Dec 2023 09:17:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="NpqVfQNe"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="0VHwbGVh" Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id D65EF5C01DA for ; Wed, 20 Dec 2023 04:17:33 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 20 Dec 2023 04:17:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1703063853; x=1703150253; bh=IJYBXV6OpK uF0WD8tK0pUJwLaDBpQsm7Nuqo0O4A5+8=; b=NpqVfQNeUro+wkKK/25h1d3JKs O1fOy1I7HUXts1OV2BMzVz4GKGOGSpEesRWMJHM/5y5w8oRoI68Q5jZsn8cBgFZC hY/3xYv5VRUPnDf4raVVlhqvvC90mdgf+7e7+vDZJr48Ma3sPW6R4S+lCv6I/NP9 REAGy3aFiyMvoJ9WiXGsPbBe4H98kFiQgJUANqCxo0tguAQ5D1TsCU2k6VGOvjem Q+ITqMfIY0Hz/w0ogkSREBnFpH7vkphlmP+n6yBpQfOjhR+e6y4+QuWink9GAoAi tYcRt+6UfMtKTtbBGlZY0NKXkzl7fcTVhz1gk2RpZ9c5Snf3yJ9Rm1MMmbWQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1703063853; x=1703150253; bh=IJYBXV6OpKuF0WD8tK0pUJwLaDBp Qsm7Nuqo0O4A5+8=; b=0VHwbGVhwQyynuA8Xt+tLXxqoNlAKPOw4SGJr1vtS/ZP dZjE8xP8lcsZIgPhx+3JY579nxNcz7LKR0tQLiepanbUTBueijLgDT6MMz+asQWw oCjusmSz8eOXvt5D/KB7qCW4enXabWSk7yiHlJz0nPle4QbbBJAuGFAw45eYXjyp cqSdP0kqwaU2pyC/pHyMu21EJe0ZpKnAWz0Wy+L5Vk8TpZbDpSmjG7xaBm3jAs3V 5pYL9zUGHW3WU8iRcaFu7q7iuTNUHVGEVgpGS++DGJZzlNum4dJhh7mDxxnLSW2B nkqU57xBzZLZq27ly73vTkpSRMak0EcqXwd7WI+Oag== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdduvddgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtvdenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhepheeghfdtfeeuffehkefgffduleffjedthf dvjeektdfhhedvlefgtefgvdettdfhnecuvehluhhsthgvrhfuihiivgepudenucfrrghr rghmpehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Dec 2023 04:17:33 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 5eec481f (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 20 Dec 2023 09:15:37 +0000 (UTC) Date: Wed, 20 Dec 2023 10:17:31 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 7/7] reftable/merged: transfer ownership of records when iterating Message-ID: <3dbabea22a2fca6b9eeddde26da664e50ccfddab.1703063544.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When iterating over recods with the merged iterator we put the records into a priority queue before yielding them to the caller. This means that we need to allocate the contents of these records before we can pass them over to the caller. The handover to the caller is quite inefficient though because we first deallocate the record passed in by the caller and then copy over the new record, which requires us to reallocate memory. Refactor the code to instead transfer ownership of the new record to the caller. So instead of reallocating all contents, we now release the old record and then copy contents of the new record into place. The following benchmark of `git show-ref --quiet` in a repository with around 350k refs shows a clear improvement. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated This shows that we now have roundabout a single allocation per record that we're yielding from the iterator. Ideally, we'd also get rid of this allocation so that the number of allocations doesn't scale with the number of refs anymore. This would require some larger surgery though because the memory is owned by the priority queue before transferring it over to the caller. Signed-off-by: Patrick Steinhardt --- reftable/merged.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/reftable/merged.c b/reftable/merged.c index a28bb99aaf..a52914d667 100644 --- a/reftable/merged.c +++ b/reftable/merged.c @@ -124,10 +124,12 @@ static int merged_iter_next_entry(struct merged_iter *mi, reftable_record_release(&top.rec); } - reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id)); + reftable_record_release(rec); + *rec = entry.rec; done: - reftable_record_release(&entry.rec); + if (err) + reftable_record_release(&entry.rec); return err; }