diff mbox series

[v3,2/3] multi-pack-index: respect repack.packKeptObjects=false

Message ID 988697dd5121430cd3ddfa60b1ebcf26027566ef.1589034270.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series midx: apply gitconfig to midx repack | expand

Commit Message

Johannes Schindelin via GitGitGadget May 9, 2020, 2:24 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 +++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

Comments

Đoàn Trần Công Danh May 9, 2020, 4:11 p.m. UTC | #1
On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> +test_expect_success 'repack respects repack.packKeptObjects=false' '
> +	test_when_finished rm -f dup/.git/objects/pack/*keep &&
> +	(
> +		cd dup &&
> +		ls .git/objects/pack/*idx >idx-list &&

I think ls(1) is an overkill.
I think:

	echo .git/objects/pack/*idx

is more efficient.

> +		test_line_count = 5 idx-list &&
> +		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&

Likewise.

> +		for keep in $(cat keep-list)
> +		do
> +			touch $keep || return 1

Is this intended?
Since touch(1) accepts multiple files as argument.

> +		done &&
> +		git multi-pack-index repack --batch-size=0 &&
> +		ls .git/objects/pack/*idx >idx-list &&
> +		test_line_count = 5 idx-list &&
> +		test-tool read-midx .git/objects | grep idx >midx-list &&
> +		test_line_count = 5 midx-list &&
> +		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&

This line is overly long.
Should we write test-tool's output to temp file and process it?

And I think either

	sed -n '3{p;q}'

or:

	sed -n 3p

is cleaner than

	head -n 3 | tail -n 1

> +		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&

I think we're better to make this correct in this patch instead of
spend a dollar here, than take it back in the next patch.

> +		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
> +		ls .git/objects/pack/*idx >idx-list &&
Junio C Hamano May 9, 2020, 5:33 p.m. UTC | #2
Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>> 
>> +test_expect_success 'repack respects repack.packKeptObjects=false' '
>> +	test_when_finished rm -f dup/.git/objects/pack/*keep &&
>> +	(
>> +		cd dup &&
>> +		ls .git/objects/pack/*idx >idx-list &&
>
> I think ls(1) is an overkill.
> I think:
>
> 	echo .git/objects/pack/*idx
>
> is more efficient.

When there is no file whose name ends with idx, what happens?

    $ ls *idx && echo OK
    ls: cannot access '*idx': No such file or directory
    $ echo *idx && echo OK
    *idx
    OK

>> +		test_line_count = 5 idx-list &&
>> +		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
>
> Likewise.

Likewise.

>> +		for keep in $(cat keep-list)
>> +		do
>> +			touch $keep || return 1
>
> Is this intended?
> Since touch(1) accepts multiple files as argument.

Good suggestion, but doesn't .keep file record why the pack is kept
in real life (i.e. not an empty file)?

>> +		done &&
>> +		git multi-pack-index repack --batch-size=0 &&
>> +		ls .git/objects/pack/*idx >idx-list &&
>> +		test_line_count = 5 idx-list &&
>> +		test-tool read-midx .git/objects | grep idx >midx-list &&
>> +		test_line_count = 5 midx-list &&
>> +		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
>
> This line is overly long.
> Should we write test-tool's output to temp file and process it?
>
> And I think either
>
> 	sed -n '3{p;q}'
>
> or:
>
> 	sed -n 3p
>
> is cleaner than
>
> 	head -n 3 | tail -n 1

"sed -n 3p" is the only valid way to write it ;-)
Đoàn Trần Công Danh May 10, 2020, 6:38 a.m. UTC | #3
On 2020-05-09 10:33:30-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
> 
> > On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >> 
> >> +test_expect_success 'repack respects repack.packKeptObjects=false' '
> >> +	test_when_finished rm -f dup/.git/objects/pack/*keep &&
> >> +	(
> >> +		cd dup &&
> >> +		ls .git/objects/pack/*idx >idx-list &&
> >
> > I think ls(1) is an overkill.
> > I think:
> >
> > 	echo .git/objects/pack/*idx
> >
> > is more efficient.
> 
> When there is no file whose name ends with idx, what happens?
> 
>     $ ls *idx && echo OK
>     ls: cannot access '*idx': No such file or directory
>     $ echo *idx && echo OK
>     *idx
>     OK

Yes, but I think the next line is checking for the number of lines.
This is better to fail faster.

(My suggestion was wrong anyway, it should be "printf "%s\\n" *idx)

> >> +		test_line_count = 5 idx-list &&
> >> +		for keep in $(cat keep-list)
> >> +		do
> >> +			touch $keep || return 1
> >
> > Is this intended?
> > Since touch(1) accepts multiple files as argument.
> 
> Good suggestion, but doesn't .keep file record why the pack is kept
> in real life (i.e. not an empty file)?

Yes, in real life, we usually provide a reason in this .keep file.
But, we also allow empty file with git-index-pack --keep
I think simple touch is fine for this test.

Missing piece for my previous command:
if `keep-list` is empty, we may want to fail fast,
touch with empty list will error out (at least in my system).
Son Luong Ngoc May 10, 2020, 3:52 p.m. UTC | #4
Hi,

Thanks Danh and Junio for the testing improvement suggestions.
I think these are the points I will adopt into next version:

- Remove the 3rd patch and keep the removal of dollar sign locally
  inside `repack respects repack.packKeptObjects=false`.

- Change `head -n -3 | tail -n -1` to `sed -n 3p`

- Apply test_line_count on keep-list for failing fast (before touch)

Cheers,
Son Luong.
diff mbox series

Patch

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b72..0c6619493c1 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@  repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index 1e76be56826..9b14d915db1 100644
--- a/midx.c
+++ b/midx.c
@@ -1293,15 +1293,26 @@  static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1312,6 +1323,9 @@  static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1338,6 +1352,8 @@  static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1380,7 +1396,7 @@  int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 030a7222b2a..67afe1bb8d9 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -538,6 +538,32 @@  test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&