diff mbox series

[v3,08/11] refs/reftable: allow configuring restart interval

Message ID 2b1579570798a325f8a5c97438b1fdeef65aa049.1715587849.git.ps@pks.im (mailing list archive)
State Accepted
Commit 90db611c2a1f4ca2123ef5a8d7e592fa348bb23b
Headers show
Series reftable: expose write options as config | expand

Commit Message

Patrick Steinhardt May 13, 2024, 8:18 a.m. UTC
Add a new option `reftable.restartInterval` that allows the user to
control the restart interval when writing reftable records used by the
reftable library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/reftable.txt | 18 +++++++++++++
 refs/reftable-backend.c           |  5 ++++
 t/t0613-reftable-write-options.sh | 43 +++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

Comments

Justin Tobler May 21, 2024, 11:50 p.m. UTC | #1
On 24/05/13 10:18AM, Patrick Steinhardt wrote:
> +reftable.restartInterval::
> +	The interval at which to create restart points. The reftable backend
> +	determines the restart points at file creation. Every 16 may be
> +	more suitable for smaller block sizes (4k or 8k), every 64 for larger
> +	block sizes (64k).
> ++
> +More frequent restart points reduces prefix compression and increases
> +space consumed by the restart table, both of which increase file size.
> ++
> +Less frequent restart points makes prefix compression more effective,
> +decreasing overall file size, with increased penalties for readers
> +walking through more records after the binary search step.
> ++
> +A maximum of `65535` restart points per block is supported.
> ++
> +The default value is to create restart points every 16 records. A value of `0`
> +will use the default value.

Out of curiousity, if for some reason we didn't want any prefix
compression, would the best way to do this be via setting the restart
interval to 1? I guess this means the number of references would also be
limited by the maximum number of restart points.

-Justin
Patrick Steinhardt May 22, 2024, 7:19 a.m. UTC | #2
On Tue, May 21, 2024 at 06:50:58PM -0500, Justin Tobler wrote:
> On 24/05/13 10:18AM, Patrick Steinhardt wrote:
> > +reftable.restartInterval::
> > +	The interval at which to create restart points. The reftable backend
> > +	determines the restart points at file creation. Every 16 may be
> > +	more suitable for smaller block sizes (4k or 8k), every 64 for larger
> > +	block sizes (64k).
> > ++
> > +More frequent restart points reduces prefix compression and increases
> > +space consumed by the restart table, both of which increase file size.
> > ++
> > +Less frequent restart points makes prefix compression more effective,
> > +decreasing overall file size, with increased penalties for readers
> > +walking through more records after the binary search step.
> > ++
> > +A maximum of `65535` restart points per block is supported.
> > ++
> > +The default value is to create restart points every 16 records. A value of `0`
> > +will use the default value.
> 
> Out of curiousity, if for some reason we didn't want any prefix
> compression, would the best way to do this be via setting the restart
> interval to 1? I guess this means the number of references would also be
> limited by the maximum number of restart points.

Yup, exactly.

Patrick
diff mbox series

Patch

diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt
index fa7c4be014..2374be71d7 100644
--- a/Documentation/config/reftable.txt
+++ b/Documentation/config/reftable.txt
@@ -12,3 +12,21 @@  readers during access.
 +
 The largest block size is `16777215` bytes (15.99 MiB). The default value is
 `4096` bytes (4kB). A value of `0` will use the default value.
+
+reftable.restartInterval::
+	The interval at which to create restart points. The reftable backend
+	determines the restart points at file creation. Every 16 may be
+	more suitable for smaller block sizes (4k or 8k), every 64 for larger
+	block sizes (64k).
++
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
++
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalties for readers
+walking through more records after the binary search step.
++
+A maximum of `65535` restart points per block is supported.
++
+The default value is to create restart points every 16 records. A value of `0`
+will use the default value.
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 8d0ae9e285..a2880aabce 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -240,6 +240,11 @@  static int reftable_be_config(const char *var, const char *value,
 		if (block_size > 16777215)
 			die("reftable block size cannot exceed 16MB");
 		opts->block_size = block_size;
+	} else if (!strcmp(var, "reftable.restartinterval")) {
+		unsigned long restart_interval = git_config_ulong(var, value, ctx->kvi);
+		if (restart_interval > UINT16_MAX)
+			die("reftable block size cannot exceed %u", (unsigned)UINT16_MAX);
+		opts->restart_interval = restart_interval;
 	}
 
 	return 0;
diff --git a/t/t0613-reftable-write-options.sh b/t/t0613-reftable-write-options.sh
index 8bdbc6ec70..e0a5b26f58 100755
--- a/t/t0613-reftable-write-options.sh
+++ b/t/t0613-reftable-write-options.sh
@@ -171,4 +171,47 @@  test_expect_success 'block size exceeding maximum supported size' '
 	)
 '
 
+test_expect_success 'restart interval at every single record' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		for i in $(test_seq 10)
+		do
+			printf "update refs/heads/branch-%d HEAD\n" "$i" ||
+			return 1
+		done >input &&
+		git update-ref --stdin <input &&
+		git -c reftable.restartInterval=1 pack-refs &&
+
+		cat >expect <<-EOF &&
+		header:
+		  block_size: 4096
+		ref:
+		  - length: 566
+		    restarts: 13
+		log:
+		  - length: 1393
+		    restarts: 12
+		EOF
+		test-tool dump-reftable -b .git/reftable/*.ref >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'restart interval exceeding maximum supported interval' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		cat >expect <<-EOF &&
+		fatal: reftable block size cannot exceed 65535
+		EOF
+		test_must_fail git -c reftable.restartInterval=65536 pack-refs 2>err &&
+		test_cmp expect err
+	)
+'
+
 test_done