diff mbox series

[v4,3/8] maintenance: create auto condition for loose-objects

Message ID 931fff4883a3da26a296af69a9c5ccc3a5037d71.1601037218.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 3e220e60696ebc27c719b83adc6f734d6857521f
Headers show
Series Maintenance II: prefetch, loose-objects, incremental-repack tasks | expand

Commit Message

John Passaro via GitGitGadget Sept. 25, 2020, 12:33 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.

The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  9 +++++++++
 builtin/gc.c                         | 30 ++++++++++++++++++++++++++++
 t/t7900-maintenance.sh               | 22 ++++++++++++++++++++
 3 files changed, 61 insertions(+)

Comments

Junio C Hamano Sept. 25, 2020, 6 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index b3fc7c8670..27565c55a2 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -127,4 +127,26 @@ test_expect_success 'loose-objects task' '
>  	test_cmp packs-between packs-after
>  '
>  
> +test_expect_success 'maintenance.loose-objects.auto' '
> +	git repack -adk &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace-lo1.txt" \
> +		git -c maintenance.loose-objects.auto=1 maintenance \
> +		run --auto --task=loose-objects 2>/dev/null &&
> +	test_subcommand ! git prune-packed --quiet <trace-lo1.txt &&
> +	printf data-A | git hash-object -t blob --stdin -w &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace-loA" \
> +		git -c maintenance.loose-objects.auto=2 \
> +		maintenance run --auto --task=loose-objects 2>/dev/null &&
> +	test_subcommand ! git prune-packed --quiet <trace-loA &&
> +	printf data-B | git hash-object -t blob --stdin -w &&

Is it essential for the purpose of the test somehow that the data
used for the test are all incomplete files that lack the end-of-line
at the end of the file?  Use of 'printf' sends such a signal to
confuse readers.

Use of test_write_lines to write a single line may feel overkill,
but it may be less cryptic, as newer parts of testsuite are
encouraged to use it over 'echo' and raw 'printf'.

> +	GIT_TRACE2_EVENT="$(pwd)/trace-loB" \
> +		git -c maintenance.loose-objects.auto=2 \
> +		maintenance run --auto --task=loose-objects 2>/dev/null &&
> +	test_subcommand git prune-packed --quiet <trace-loB &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace-loC" \
> +		git -c maintenance.loose-objects.auto=2 \
> +		maintenance run --auto --task=loose-objects 2>/dev/null &&
> +	test_subcommand git prune-packed --quiet <trace-loC
> +'
Derrick Stolee Sept. 25, 2020, 6:43 p.m. UTC | #2
On 9/25/2020 2:00 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> +	printf data-B | git hash-object -t blob --stdin -w &&
> 
> Is it essential for the purpose of the test somehow that the data
> used for the test are all incomplete files that lack the end-of-line
> at the end of the file?  Use of 'printf' sends such a signal to
> confuse readers.
> 
> Use of test_write_lines to write a single line may feel overkill,
> but it may be less cryptic, as newer parts of testsuite are
> encouraged to use it over 'echo' and raw 'printf'.

I suppose it could be more standard. It's not particularly
important what the data is here, so lacking a newline seems
innocuous enough to me. I'm happy to agree with standards
elsewhere to avoid being a bad example.

A simple s/printf/test_write_lines/ appears to do the trick:

diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index f259485990..a1bd0029c6 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -147,12 +147,12 @@ test_expect_success 'maintenance.loose-objects.auto' '
                git -c maintenance.loose-objects.auto=1 maintenance \
                run --auto --task=loose-objects 2>/dev/null &&
        test_subcommand ! git prune-packed --quiet <trace-lo1.txt &&
-       printf data-A | git hash-object -t blob --stdin -w &&
+       test_write_lines data-A | git hash-object -t blob --stdin -w &&
        GIT_TRACE2_EVENT="$(pwd)/trace-loA" \
                git -c maintenance.loose-objects.auto=2 \
                maintenance run --auto --task=loose-objects 2>/dev/null &&
        test_subcommand ! git prune-packed --quiet <trace-loA &&
-       printf data-B | git hash-object -t blob --stdin -w &&
+       test_write_lines data-B | git hash-object -t blob --stdin -w &&
        GIT_TRACE2_EVENT="$(pwd)/trace-loB" \
                git -c maintenance.loose-objects.auto=2 \
                maintenance run --auto --task=loose-objects 2>/dev/null &&

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 7cc6700d57..c31613be62 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -14,3 +14,12 @@  maintenance.commit-graph.auto::
 	reachable commits that are not in the commit-graph file is at least
 	the value of `maintenance.commit-graph.auto`. The default value is
 	100.
+
+maintenance.loose-objects.auto::
+	This integer config option controls how often the `loose-objects` task
+	should be run as part of `git maintenance run --auto`. If zero, then
+	the `loose-objects` task will not run with the `--auto` option. A
+	negative value will force the task to run every time. Otherwise, a
+	positive value implies the command should run when the number of
+	loose objects is at least the value of `maintenance.loose-objects.auto`.
+	The default value is 100.
diff --git a/builtin/gc.c b/builtin/gc.c
index c9db8555b9..4403827481 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -899,6 +899,35 @@  struct write_loose_object_data {
 	int batch_size;
 };
 
+static int loose_object_auto_limit = 100;
+
+static int loose_object_count(const struct object_id *oid,
+			       const char *path,
+			       void *data)
+{
+	int *count = (int*)data;
+	if (++(*count) >= loose_object_auto_limit)
+		return 1;
+	return 0;
+}
+
+static int loose_object_auto_condition(void)
+{
+	int count = 0;
+
+	git_config_get_int("maintenance.loose-objects.auto",
+			   &loose_object_auto_limit);
+
+	if (!loose_object_auto_limit)
+		return 0;
+	if (loose_object_auto_limit < 0)
+		return 1;
+
+	return for_each_loose_file_in_objdir(the_repository->objects->odb->path,
+					     loose_object_count,
+					     NULL, NULL, &count);
+}
+
 static int bail_on_loose(const struct object_id *oid,
 			 const char *path,
 			 void *data)
@@ -1009,6 +1038,7 @@  static struct maintenance_task tasks[] = {
 	[TASK_LOOSE_OBJECTS] = {
 		"loose-objects",
 		maintenance_task_loose_objects,
+		loose_object_auto_condition,
 	},
 	[TASK_GC] = {
 		"gc",
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index b3fc7c8670..27565c55a2 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -127,4 +127,26 @@  test_expect_success 'loose-objects task' '
 	test_cmp packs-between packs-after
 '
 
+test_expect_success 'maintenance.loose-objects.auto' '
+	git repack -adk &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-lo1.txt" \
+		git -c maintenance.loose-objects.auto=1 maintenance \
+		run --auto --task=loose-objects 2>/dev/null &&
+	test_subcommand ! git prune-packed --quiet <trace-lo1.txt &&
+	printf data-A | git hash-object -t blob --stdin -w &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-loA" \
+		git -c maintenance.loose-objects.auto=2 \
+		maintenance run --auto --task=loose-objects 2>/dev/null &&
+	test_subcommand ! git prune-packed --quiet <trace-loA &&
+	printf data-B | git hash-object -t blob --stdin -w &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-loB" \
+		git -c maintenance.loose-objects.auto=2 \
+		maintenance run --auto --task=loose-objects 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <trace-loB &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-loC" \
+		git -c maintenance.loose-objects.auto=2 \
+		maintenance run --auto --task=loose-objects 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <trace-loC
+'
+
 test_done