diff mbox series

[2/2] maintenance: add loose-objects.batchSize config

Message ID 6ed537862fa7585928fde33d21b77367a7fb708d.1742777512.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series Allow configuration for loose-objects maintenance task | expand

Commit Message

Derrick Stolee March 24, 2025, 12:51 a.m. UTC
From: Derrick Stolee <stolee@gmail.com>

The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.

Add a new config option that allows this limit to be adjusted or ignored
entirely.

While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 Documentation/config/maintenance.adoc |  5 +++++
 Documentation/git-maintenance.adoc    | 18 ++++++++++-------
 builtin/gc.c                          | 10 ++++++++++
 t/t7900-maintenance.sh                | 28 +++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 7 deletions(-)

Comments

Junio C Hamano March 24, 2025, 6:11 a.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> The 'loose-objects' task of 'git maintenance run' first deletes loose
> objects that exit within packfiles and then collects loose objects into

"exit" -> "exist"?  It may read better to also do "collects" ->
"collects remaining".

> a packfile. This second step uses an implicit limit of fifty thousand
> that cannot be modified by users.
>
> Add a new config option that allows this limit to be adjusted or ignored
> entirely.
>
> While creating tests for this option, I noticed that actually there was
> an off-by-one error due to the strict comparison in the limit check.

Ahh, I, contrary to my usual routine, started reading from the code
change before reading the proposed log message and was wondering
about this exact point.  

> +	/* If batch_size is INT_MAX, then this will return 0 always. */

Cute ;-).

>  	return ++(d->count) > d->batch_size;
>  }
diff mbox series

Patch

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 72a9d6cf816..42f9545da0e 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -61,6 +61,11 @@  maintenance.loose-objects.auto::
 	loose objects is at least the value of `maintenance.loose-objects.auto`.
 	The default value is 100.
 
+maintenance.loose-objects.batchSize::
+	This integer config option controls the maximum number of loose objects
+	written into a packfile during the `loose-objects` task. The default is
+	fifty thousand. Use value `0` to indicate no limit.
+
 maintenance.incremental-repack.auto::
 	This integer config option controls how often the `incremental-repack`
 	task should be run as part of `git maintenance run --auto`. If zero,
diff --git a/Documentation/git-maintenance.adoc b/Documentation/git-maintenance.adoc
index 0450d74aff1..c90b370b1fc 100644
--- a/Documentation/git-maintenance.adoc
+++ b/Documentation/git-maintenance.adoc
@@ -126,13 +126,17 @@  loose-objects::
 	objects that already exist in a pack-file; concurrent Git processes
 	will examine the pack-file for the object data instead of the loose
 	object. Second, it creates a new pack-file (starting with "loose-")
-	containing a batch of loose objects. The batch size is limited to 50
-	thousand objects to prevent the job from taking too long on a
-	repository with many loose objects. The `gc` task writes unreachable
-	objects as loose objects to be cleaned up by a later step only if
-	they are not re-added to a pack-file; for this reason it is not
-	advisable to enable both the `loose-objects` and `gc` tasks at the
-	same time.
+	containing a batch of loose objects.
++
+The batch size defaults to fifty thousand objects to prevent the job from
+taking too long on a repository with many loose objects. Use the
+`maintenance.loose-objects.batchSize` config option to adjust this size,
+including a value of `0` to remove the limit.
++
+The `gc` task writes unreachable objects as loose objects to be cleaned up
+by a later step only if they are not re-added to a pack-file; for this
+reason it is not advisable to enable both the `loose-objects` and `gc`
+tasks at the same time.
 
 incremental-repack::
 	The `incremental-repack` job repacks the object directory
diff --git a/builtin/gc.c b/builtin/gc.c
index 6672f165bda..817081e1a50 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1163,6 +1163,7 @@  static int write_loose_object_to_stdin(const struct object_id *oid,
 
 	fprintf(d->in, "%s\n", oid_to_hex(oid));
 
+	/* If batch_size is INT_MAX, then this will return 0 always. */
 	return ++(d->count) > d->batch_size;
 }
 
@@ -1208,6 +1209,15 @@  static int pack_loose(struct maintenance_run_opts *opts)
 	data.count = 0;
 	data.batch_size = 50000;
 
+	repo_config_get_int(r, "maintenance.loose-objects.batchSize",
+			    &data.batch_size);
+
+	/* If configured as 0, then remove limit. */
+	if (!data.batch_size)
+		data.batch_size = INT_MAX;
+	else if (data.batch_size > 0)
+		data.batch_size--; /* Decrease for equality on limit. */
+
 	for_each_loose_file_in_objdir(r->objects->odb->path,
 				      write_loose_object_to_stdin,
 				      NULL,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 1909aed95e0..834ddb5ad68 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -306,6 +306,34 @@  test_expect_success 'maintenance.loose-objects.auto' '
 	test_subcommand git prune-packed --quiet <trace-loC
 '
 
+test_expect_success 'maintenance.loose-objects.batchSize' '
+	git init loose-batch &&
+
+	# This creates three objects per commit.
+	test_commit_bulk -C loose-batch 34 &&
+	pack=$(ls loose-batch/.git/objects/pack/pack-*.pack) &&
+	index="${pack%pack}idx" &&
+	rm "$index" &&
+	git -C loose-batch unpack-objects <"$pack" &&
+	git -C loose-batch config maintenance.loose-objects.batchSize 50 &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 50, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 50, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	grep "Enumerating objects: 2, done." err &&
+
+	GIT_PROGRESS_DELAY=0 \
+	git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err &&
+	test_must_be_empty err
+'
+
 test_expect_success 'incremental-repack task' '
 	packDir=.git/objects/pack &&
 	for i in $(test_seq 1 5)