diff mbox series

[v2,08/11] refs/reftable: allow configuring restart interval

Message ID bc0bf65553c8dd89bf5fcaa592fc3427507f1993.1715336798.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: expose write options as config | expand

Commit Message

Patrick Steinhardt May 10, 2024, 10:29 a.m. UTC
Add a new option `reftable.restartInterval` that allows the user to
control the restart interval when writing reftable records used by the
reftable library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/reftable.txt | 19 ++++++++++++++
 refs/reftable-backend.c           |  5 ++++
 t/t0613-reftable-write-options.sh | 43 +++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

Comments

Junio C Hamano May 10, 2024, 9:57 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> +
> +reftable.restartInterval::
> +	The interval at which to create restart points. The reftable backend
> +	determines the restart points at file creation. The process is
> +	arbitrary, but every 16 or 64 records is recommended. Every 16 may be

It is unclear what exactly "The process is arbitrary, but" wants to
say, especially the use of the noun "process".  The process the user
uses to choose the inteval value is?  The default value chosen by us
was arbitrary and out of thin air?

Just striking the whole sentence (or removing up to ", but" part and
starting the sentence with "Every 16 or 64") may make the resulting
paragraph easier to follow, I suspect.

> +	} else if (!strcmp(var, "reftable.restartinterval")) {
> +		unsigned long restart_interval = git_config_ulong(var, value, ctx->kvi);
> +		if (restart_interval > UINT16_MAX)
> +			die("reftable block size cannot exceed %u", (unsigned)UINT16_MAX);

OK.
Patrick Steinhardt May 13, 2024, 7:54 a.m. UTC | #2
On Fri, May 10, 2024 at 02:57:46PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > +
> > +reftable.restartInterval::
> > +	The interval at which to create restart points. The reftable backend
> > +	determines the restart points at file creation. The process is
> > +	arbitrary, but every 16 or 64 records is recommended. Every 16 may be
> 
> It is unclear what exactly "The process is arbitrary, but" wants to
> say, especially the use of the noun "process".  The process the user
> uses to choose the inteval value is?  The default value chosen by us
> was arbitrary and out of thin air?

The latter is what I wanted to say, but I agree that it's hard to parse.
And honestly, I don't even know how arbitrary it is, so I should
probably not claim something like this in the first place.

> Just striking the whole sentence (or removing up to ", but" part and
> starting the sentence with "Every 16 or 64") may make the resulting
> paragraph easier to follow, I suspect.

Will do.

Patrick
diff mbox series

Patch

diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt
index fa7c4be014..16b915c75e 100644
--- a/Documentation/config/reftable.txt
+++ b/Documentation/config/reftable.txt
@@ -12,3 +12,22 @@  readers during access.
 +
 The largest block size is `16777215` bytes (15.99 MiB). The default value is
 `4096` bytes (4kB). A value of `0` will use the default value.
+
+reftable.restartInterval::
+	The interval at which to create restart points. The reftable backend
+	determines the restart points at file creation. The process is
+	arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+	more suitable for smaller block sizes (4k or 8k), every 64 for larger
+	block sizes (64k).
++
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
++
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalties for readers
+walking through more records after the binary search step.
++
+A maximum of `65535` restart points per block is supported.
++
+The default value is to create restart points every 16 records. A value of `0`
+will use the default value.
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index bd9999cefc..9972dfc1a3 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -242,6 +242,11 @@  static int reftable_be_config(const char *var, const char *value,
 		if (block_size > 16777215)
 			die("reftable block size cannot exceed 16MB");
 		opts->block_size = block_size;
+	} else if (!strcmp(var, "reftable.restartinterval")) {
+		unsigned long restart_interval = git_config_ulong(var, value, ctx->kvi);
+		if (restart_interval > UINT16_MAX)
+			die("reftable block size cannot exceed %u", (unsigned)UINT16_MAX);
+		opts->restart_interval = restart_interval;
 	}
 
 	return 0;
diff --git a/t/t0613-reftable-write-options.sh b/t/t0613-reftable-write-options.sh
index 8bdbc6ec70..e0a5b26f58 100755
--- a/t/t0613-reftable-write-options.sh
+++ b/t/t0613-reftable-write-options.sh
@@ -171,4 +171,47 @@  test_expect_success 'block size exceeding maximum supported size' '
 	)
 '
 
+test_expect_success 'restart interval at every single record' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		for i in $(test_seq 10)
+		do
+			printf "update refs/heads/branch-%d HEAD\n" "$i" ||
+			return 1
+		done >input &&
+		git update-ref --stdin <input &&
+		git -c reftable.restartInterval=1 pack-refs &&
+
+		cat >expect <<-EOF &&
+		header:
+		  block_size: 4096
+		ref:
+		  - length: 566
+		    restarts: 13
+		log:
+		  - length: 1393
+		    restarts: 12
+		EOF
+		test-tool dump-reftable -b .git/reftable/*.ref >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'restart interval exceeding maximum supported interval' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		cat >expect <<-EOF &&
+		fatal: reftable block size cannot exceed 65535
+		EOF
+		test_must_fail git -c reftable.restartInterval=65536 pack-refs 2>err &&
+		test_cmp expect err
+	)
+'
+
 test_done