diff mbox series

[v2,10/18] maintenance: add incremental-repack task

Message ID b6328c210625e1ba98e2065208a2a478c2c64f94.1595527000.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Maintenance builtin, allowing 'gc --auto' customization | expand

Commit Message

Johannes Schindelin via GitGitGadget July 23, 2020, 5:56 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.

One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.

Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).

The 'incremental-repack' task runs the following steps:

1. 'git multi-pack-index write' creates a multi-pack-index file if
   one did not exist, and otherwise will update the multi-pack-index
   with any new pack-files that appeared since the last write. This
   is particularly relevant with the background fetch job.

   When the multi-pack-index sees two copies of the same object, it
   stores the offset data into the newer pack-file. This means that
   some old pack-files could become "unreferenced" which I will use
   to mean "a pack-file that is in the pack-file list of the
   multi-pack-index but none of the objects in the multi-pack-index
   reference a location inside that pack-file."

2. 'git multi-pack-index expire' deletes any unreferenced pack-files
   and updaes the multi-pack-index to drop those pack-files from the
   list. This is safe to do as concurrent Git processes will see the
   multi-pack-index and not open those packs when looking for object
   contents. (Similar to the 'loose-objects' job, there are some Git
   commands that open pack-files regardless of the multi-pack-index,
   but they are rarely used. Further, a user that self-selects to
   use background operations would likely refrain from using those
   commands.)

3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
   of pack-files that are listed in the multi-pack-index and creates
   a new pack-file containing the objects whose offsets are listed
   by the multi-pack-index to be in those objects. The set of pack-
   files is selected greedily by sorting the pack-files by modified
   time and adding a pack-file to the set if its "expected size" is
   smaller than the batch size until the total expected size of the
   selected pack-files is at least the batch size. The "expected
   size" is calculated by taking the size of the pack-file divided
   by the number of objects in the pack-file and multiplied by the
   number of objects from the multi-pack-index with offset in that
   pack-file. The expected size approximats how much data from that
   pack-file will contribute to the resulting pack-file size. The
   intention is that the resulting pack-file will be close in size
   to the provided batch size.

   The next run of the incremental-repack task will delete these
   repacked pack-files during the 'expire' step.

   In this version, the batch size is set to "0" which ignores the
   size restrictions when selecting the pack-files. It instead
   selects all pack-files and repacks all packed objects into a
   single pack-file. This will be updated in the next change, but
   it requires doing some calculations that are better isolated to
   a separate change.

Each of the above steps update the multi-pack-index file. After
each step, we verify the new multi-pack-index. If the new
multi-pack-index is corrupt, then delete the multi-pack-index,
rewrite it from scratch, and stop doing the later steps of the
job. This is intended to be an extra-safe check without leaving
a repo with many pack-files without a multi-pack-index.

These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).

These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.

By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  15 ++++
 builtin/gc.c                      | 121 +++++++++++++++++++++++++++++-
 midx.c                            |   2 +-
 midx.h                            |   1 +
 t/t7900-maintenance.sh            |  37 +++++++++
 5 files changed, 174 insertions(+), 2 deletions(-)

Comments

Junio C Hamano July 23, 2020, 10 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> 1. 'git multi-pack-index write' creates a multi-pack-index file if
>    one did not exist, and otherwise will update the multi-pack-index
>    with any new pack-files that appeared since the last write. This
>    is particularly relevant with the background fetch job.
>
>    When the multi-pack-index sees two copies of the same object, it
>    stores the offset data into the newer pack-file. This means that
>    some old pack-files could become "unreferenced" which I will use
>    to mean "a pack-file that is in the pack-file list of the
>    multi-pack-index but none of the objects in the multi-pack-index
>    reference a location inside that pack-file."

An obvious alternative is to favor the copy in the older pack,
right?  Is the expectation that over time, most of the objects that
are relevant would reappear in newer packs, so that eventually by
favoring the copies in the newer packs, we can retire and remove the
old pack, keeping only the newer ones?

But would that assumption hold?  The old packs hold objects that are
necessary for the older parts of the history, so unless you are
cauterizing away the old history, these objects in the older packs
are likely to stay with us longer than those used by the newer parts
of the history, some of which may not even have been pushed out yet
and can be rebased away?

> 2. 'git multi-pack-index expire' deletes any unreferenced pack-files
>    and updaes the multi-pack-index to drop those pack-files from the
>    list. This is safe to do as concurrent Git processes will see the
>    multi-pack-index and not open those packs when looking for object
>    contents. (Similar to the 'loose-objects' job, there are some Git
>    commands that open pack-files regardless of the multi-pack-index,
>    but they are rarely used. Further, a user that self-selects to
>    use background operations would likely refrain from using those
>    commands.)

OK.

> 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
>    of pack-files that are listed in the multi-pack-index and creates
>    a new pack-file containing the objects whose offsets are listed
>    by the multi-pack-index to be in those objects. The set of pack-
>    files is selected greedily by sorting the pack-files by modified
>    time and adding a pack-file to the set if its "expected size" is
>    smaller than the batch size until the total expected size of the
>    selected pack-files is at least the batch size. The "expected
>    size" is calculated by taking the size of the pack-file divided
>    by the number of objects in the pack-file and multiplied by the
>    number of objects from the multi-pack-index with offset in that
>    pack-file. The expected size approximats how much data from that

approximates.

>    pack-file will contribute to the resulting pack-file size. The
>    intention is that the resulting pack-file will be close in size
>    to the provided batch size.

> +static int maintenance_task_incremental_repack(void)
> +{
> +	if (multi_pack_index_write()) {
> +		error(_("failed to write multi-pack-index"));
> +		return 1;
> +	}
> +
> +	if (multi_pack_index_verify()) {
> +		warning(_("multi-pack-index verify failed after initial write"));
> +		return rewrite_multi_pack_index();
> +	}
> +
> +	if (multi_pack_index_expire()) {
> +		error(_("multi-pack-index expire failed"));
> +		return 1;
> +	}
> +
> +	if (multi_pack_index_verify()) {
> +		warning(_("multi-pack-index verify failed after expire"));
> +		return rewrite_multi_pack_index();
> +	}
> +	if (multi_pack_index_repack()) {
> +		error(_("multi-pack-index repack failed"));
> +		return 1;
> +	}

Hmph, I wonder if these warning should come from each helper
functions that are static to this function anyway.

It also makes it easier to reason about this function by eliminating
the need for having a different pattern only for the verify helper.
Instead, verify could call rewrite internally when it notices a
breakage.  I.e.

	if (multi_pack_index_write())
		return 1;
	if (multi_pack_index_verify("after initial write"))
		return 1;
	if (multi_pack_index_exire())
		return 1;
	...

Also, it feels odd, compared to our internal API convention, that
positive non-zero is used as an error here.

> +	return 0;
> +}
> +
>  typedef int maintenance_task_fn(void);
>  
>  struct maintenance_task {
> @@ -1037,6 +1152,10 @@ static void initialize_tasks(void)
>  	tasks[num_tasks]->fn = maintenance_task_loose_objects;
>  	num_tasks++;
>  
> +	tasks[num_tasks]->name = "incremental-repack";
> +	tasks[num_tasks]->fn = maintenance_task_incremental_repack;
> +	num_tasks++;
> +
>  	tasks[num_tasks]->name = "gc";
>  	tasks[num_tasks]->fn = maintenance_task_gc;
>  	tasks[num_tasks]->enabled = 1;

Exactly the same comment as 08/18 about natural/inherent ordering
applies here as well.

Thanks.
Derrick Stolee July 24, 2020, 3:03 p.m. UTC | #2
On 7/23/2020 6:00 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> 1. 'git multi-pack-index write' creates a multi-pack-index file if
>>    one did not exist, and otherwise will update the multi-pack-index
>>    with any new pack-files that appeared since the last write. This
>>    is particularly relevant with the background fetch job.
>>
>>    When the multi-pack-index sees two copies of the same object, it
>>    stores the offset data into the newer pack-file. This means that
>>    some old pack-files could become "unreferenced" which I will use
>>    to mean "a pack-file that is in the pack-file list of the
>>    multi-pack-index but none of the objects in the multi-pack-index
>>    reference a location inside that pack-file."
> 
> An obvious alternative is to favor the copy in the older pack,
> right?  Is the expectation that over time, most of the objects that
> are relevant would reappear in newer packs, so that eventually by
> favoring the copies in the newer packs, we can retire and remove the
> old pack, keeping only the newer ones?
> 
> But would that assumption hold?  The old packs hold objects that are
> necessary for the older parts of the history, so unless you are
> cauterizing away the old history, these objects in the older packs
> are likely to stay with us longer than those used by the newer parts
> of the history, some of which may not even have been pushed out yet
> and can be rebased away?

If we created a new pack-file containing an already-packed object,
then shouldn't we assume that the new pack-file does a _better_ job
of compressing that object? Or at least, doesn't make it worse?

For example, if we use 'git multi-pack-index repack --batch-size=0',
then this creates a new pack-file containing every previously-packed
object. This new pack-file should have better delta compression than
the previous setup across multiple pack-files. We want this new
pack-file to be used, not the old ones.

This "pick the newest pack" strategy is also what allows us to safely
use the 'expire' option to drop old pack-files. If we always keep the
old copies, then when we try to switch the new pack-files, we cannot
delete the old packs safely because a concurrent Git process could
be trying to reference it. One race is as follows:

 * Process A opens the multi-pack-index referring to old pack P. It
   doesn't open the pack-file as it hasn't tried to parse objects yet.

 * Process B is 'git multi-pack-index expire'. It sees that pack P can
   be dropped because all objects appear in newer pack-files. It deletes
   P.

 * Process A tries to read from P, and this fails. A must reprepare its
   representation of the pack-files.

This is the less disruptive race since A can recover with a small cost
to its performance. The worse race (on Windows) is this:

 * Process A loads the multi-pack-index and tries to parse an object by
   loading "old" pack P.

 * Process B tries to delete P. However, on Windows the handle to P by
   A prevents the deletion.

At this point, there could be two resolutions. The first is to have
the 'expire' fail because we can't delete A. This means we might never
delete A in a busy repository. The second is that the 'expire' command
continues and drops A from the list in the multi-pack-index. However,
now all Git processes add A to the packed_git list because it isn't
referenced by the multi-pack-index.

In summary: the decision to pick the newer copies is a fundamental
part of how the write->expire->repack loop was designed.

>> 2. 'git multi-pack-index expire' deletes any unreferenced pack-files
>>    and updaes the multi-pack-index to drop those pack-files from the
>>    list. This is safe to do as concurrent Git processes will see the
>>    multi-pack-index and not open those packs when looking for object
>>    contents. (Similar to the 'loose-objects' job, there are some Git
>>    commands that open pack-files regardless of the multi-pack-index,
>>    but they are rarely used. Further, a user that self-selects to
>>    use background operations would likely refrain from using those
>>    commands.)
> 
> OK.
> 
>> 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
>>    of pack-files that are listed in the multi-pack-index and creates
>>    a new pack-file containing the objects whose offsets are listed
>>    by the multi-pack-index to be in those objects. The set of pack-
>>    files is selected greedily by sorting the pack-files by modified
>>    time and adding a pack-file to the set if its "expected size" is
>>    smaller than the batch size until the total expected size of the
>>    selected pack-files is at least the batch size. The "expected
>>    size" is calculated by taking the size of the pack-file divided
>>    by the number of objects in the pack-file and multiplied by the
>>    number of objects from the multi-pack-index with offset in that
>>    pack-file. The expected size approximats how much data from that
> 
> approximates.
> 
>>    pack-file will contribute to the resulting pack-file size. The
>>    intention is that the resulting pack-file will be close in size
>>    to the provided batch size.
> 
>> +static int maintenance_task_incremental_repack(void)
>> +{
>> +	if (multi_pack_index_write()) {
>> +		error(_("failed to write multi-pack-index"));
>> +		return 1;
>> +	}
>> +
>> +	if (multi_pack_index_verify()) {
>> +		warning(_("multi-pack-index verify failed after initial write"));
>> +		return rewrite_multi_pack_index();
>> +	}
>> +
>> +	if (multi_pack_index_expire()) {
>> +		error(_("multi-pack-index expire failed"));
>> +		return 1;
>> +	}
>> +
>> +	if (multi_pack_index_verify()) {
>> +		warning(_("multi-pack-index verify failed after expire"));
>> +		return rewrite_multi_pack_index();
>> +	}
>> +	if (multi_pack_index_repack()) {
>> +		error(_("multi-pack-index repack failed"));
>> +		return 1;
>> +	}
> 
> Hmph, I wonder if these warning should come from each helper
> functions that are static to this function anyway.
> 
> It also makes it easier to reason about this function by eliminating
> the need for having a different pattern only for the verify helper.
> Instead, verify could call rewrite internally when it notices a
> breakage.  I.e.
> 
> 	if (multi_pack_index_write())
> 		return 1;
> 	if (multi_pack_index_verify("after initial write"))
> 		return 1;
> 	if (multi_pack_index_exire())
> 		return 1;
> 	...

This is a cleaner model. I'll work on that.

> Also, it feels odd, compared to our internal API convention, that
> positive non-zero is used as an error here.
...
> Exactly the same comment as 08/18 about natural/inherent ordering
> applies here as well.

I'll leave these to be resolved in the earlier messages.

Thanks,
-Stolee
Emily Shaffer July 29, 2020, 10:22 p.m. UTC | #3
On Thu, Jul 23, 2020 at 05:56:32PM +0000, Derrick Stolee via GitGitGadget wrote:
> 1. 'git multi-pack-index write' creates a multi-pack-index file if
>    one did not exist, and otherwise will update the multi-pack-index
>    with any new pack-files that appeared since the last write. This
>    is particularly relevant with the background fetch job.

[emily shaffer] Will this use midx even if the user has disabled it in
their config?

> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 94bb493733..3ec813979a 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -103,4 +103,41 @@ test_expect_success 'loose-objects task' '
>  	test_cmp packs-between packs-after
>  '

[emily shaffer] Can we include a test to prove that this task is or is
not run if core.multipackindex is set or unset? That behavior is hard to
deduce from the code... we might want to be cautious.

>  
> +test_expect_success 'incremental-repack task' '
> +	packDir=.git/objects/pack &&
> +	for i in $(test_seq 1 5)
> +	do
> +		test_commit $i || return 1
> +	done &&
> +
> +	# Create three disjoint pack-files with size BIG, small, small.
> +	echo HEAD~2 | git pack-objects --revs $packDir/test-1 &&
> +	test_tick &&
> +	git pack-objects --revs $packDir/test-2 <<-\EOF &&
> +	HEAD~1
> +	^HEAD~2
> +	EOF
> +	test_tick &&
> +	git pack-objects --revs $packDir/test-3 <<-\EOF &&
> +	HEAD
> +	^HEAD~1
> +	EOF
> +	rm -f $packDir/pack-* &&
> +	rm -f $packDir/loose-* &&
> +	ls $packDir/*.pack >packs-before &&
> +	test_line_count = 3 packs-before &&
> +
> +	# the job repacks the two into a new pack, but does not
> +	# delete the old ones.
> +	git maintenance run --task=incremental-repack &&
> +	ls $packDir/*.pack >packs-between &&
> +	test_line_count = 4 packs-between &&
> +
> +	# the job deletes the two old packs, and does not write
> +	# a new one because only one pack remains.
> +	git maintenance run --task=incremental-repack &&
> +	ls .git/objects/pack/*.pack >packs-after &&
> +	test_line_count = 1 packs-after
> +'
> +
>  test_done
> -- 
> gitgitgadget
>
diff mbox series

Patch

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 557915a653..bda8df4aaa 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -84,6 +84,21 @@  loose-objects::
 	thousand objects to prevent the job from taking too long on a
 	repository with many loose objects.
 
+incremental-repack::
+	The `incremental-repack` job repacks the object directory
+	using the `multi-pack-index` feature. In order to prevent race
+	conditions with concurrent Git commands, it follows a two-step
+	process. First, it deletes any pack-files included in the
+	`multi-pack-index` where none of the objects in the
+	`multi-pack-index` reference those pack-files; this only happens
+	if all objects in the pack-file are also stored in a newer
+	pack-file. Second, it selects a group of pack-files whose "expected
+	size" is below the batch size until the group has total expected
+	size at least the batch size; see the `--batch-size` option for
+	the `repack` subcommand in linkgit:git-multi-pack-index[1]. The
+	default batch-size is zero, which is a special case that attempts
+	to repack all pack-files into a single pack-file.
+
 OPTIONS
 -------
 --auto::
diff --git a/builtin/gc.c b/builtin/gc.c
index fa65304580..eb4b01c104 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -29,6 +29,7 @@ 
 #include "tree.h"
 #include "promisor-remote.h"
 #include "remote.h"
+#include "midx.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -701,7 +702,7 @@  int cmd_gc(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
-#define MAX_NUM_TASKS 4
+#define MAX_NUM_TASKS 5
 
 static const char * const builtin_maintenance_usage[] = {
 	N_("git maintenance run [<options>]"),
@@ -958,6 +959,120 @@  static int maintenance_task_loose_objects(void)
 	return prune_packed() || pack_loose();
 }
 
+static int multi_pack_index_write(void)
+{
+	int result;
+	struct argv_array cmd = ARGV_ARRAY_INIT;
+	argv_array_pushl(&cmd, "multi-pack-index", "write", NULL);
+
+	if (opts.quiet)
+		argv_array_push(&cmd, "--no-progress");
+
+	result = run_command_v_opt(cmd.argv, RUN_GIT_CMD);
+	argv_array_clear(&cmd);
+
+	return result;
+}
+
+static int rewrite_multi_pack_index(void)
+{
+	struct repository *r = the_repository;
+	char *midx_name = get_midx_filename(r->objects->odb->path);
+
+	unlink(midx_name);
+	free(midx_name);
+
+	if (multi_pack_index_write()) {
+		error(_("failed to rewrite multi-pack-index"));
+		return 1;
+	}
+
+	return 0;
+}
+
+static int multi_pack_index_verify(void)
+{
+	int result;
+	struct argv_array cmd = ARGV_ARRAY_INIT;
+	argv_array_pushl(&cmd, "multi-pack-index", "verify", NULL);
+
+	if (opts.quiet)
+		argv_array_push(&cmd, "--no-progress");
+
+	result = run_command_v_opt(cmd.argv, RUN_GIT_CMD);
+	argv_array_clear(&cmd);
+
+	return result;
+}
+
+static int multi_pack_index_expire(void)
+{
+	int result;
+	struct argv_array cmd = ARGV_ARRAY_INIT;
+	argv_array_pushl(&cmd, "multi-pack-index", "expire", NULL);
+
+	if (opts.quiet)
+		argv_array_push(&cmd, "--no-progress");
+
+	close_object_store(the_repository->objects);
+	result = run_command_v_opt(cmd.argv, RUN_GIT_CMD);
+	argv_array_clear(&cmd);
+
+	return result;
+}
+
+static int multi_pack_index_repack(void)
+{
+	int result;
+	struct argv_array cmd = ARGV_ARRAY_INIT;
+	argv_array_pushl(&cmd, "multi-pack-index", "repack", NULL);
+
+	if (opts.quiet)
+		argv_array_push(&cmd, "--no-progress");
+
+	argv_array_push(&cmd, "--batch-size=0");
+
+	close_object_store(the_repository->objects);
+	result = run_command_v_opt(cmd.argv, RUN_GIT_CMD);
+
+	if (result && multi_pack_index_verify()) {
+		warning(_("multi-pack-index verify failed after repack"));
+		result = rewrite_multi_pack_index();
+	}
+
+	return result;
+}
+
+static int maintenance_task_incremental_repack(void)
+{
+	if (multi_pack_index_write()) {
+		error(_("failed to write multi-pack-index"));
+		return 1;
+	}
+
+	if (multi_pack_index_verify()) {
+		warning(_("multi-pack-index verify failed after initial write"));
+		return rewrite_multi_pack_index();
+	}
+
+	if (multi_pack_index_expire()) {
+		error(_("multi-pack-index expire failed"));
+		return 1;
+	}
+
+	if (multi_pack_index_verify()) {
+		warning(_("multi-pack-index verify failed after expire"));
+		return rewrite_multi_pack_index();
+	}
+
+	if (multi_pack_index_repack()) {
+		error(_("multi-pack-index repack failed"));
+		return 1;
+	}
+
+	return 0;
+}
+
 typedef int maintenance_task_fn(void);
 
 struct maintenance_task {
@@ -1037,6 +1152,10 @@  static void initialize_tasks(void)
 	tasks[num_tasks]->fn = maintenance_task_loose_objects;
 	num_tasks++;
 
+	tasks[num_tasks]->name = "incremental-repack";
+	tasks[num_tasks]->fn = maintenance_task_incremental_repack;
+	num_tasks++;
+
 	tasks[num_tasks]->name = "gc";
 	tasks[num_tasks]->fn = maintenance_task_gc;
 	tasks[num_tasks]->enabled = 1;
diff --git a/midx.c b/midx.c
index 6d1584ca51..57a8a00082 100644
--- a/midx.c
+++ b/midx.c
@@ -36,7 +36,7 @@ 
 
 #define PACK_EXPIRED UINT_MAX
 
-static char *get_midx_filename(const char *object_dir)
+char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
diff --git a/midx.h b/midx.h
index b18cf53bc4..baeecc70c9 100644
--- a/midx.h
+++ b/midx.h
@@ -37,6 +37,7 @@  struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 
+char *get_midx_filename(const char *object_dir);
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 94bb493733..3ec813979a 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -103,4 +103,41 @@  test_expect_success 'loose-objects task' '
 	test_cmp packs-between packs-after
 '
 
+test_expect_success 'incremental-repack task' '
+	packDir=.git/objects/pack &&
+	for i in $(test_seq 1 5)
+	do
+		test_commit $i || return 1
+	done &&
+
+	# Create three disjoint pack-files with size BIG, small, small.
+	echo HEAD~2 | git pack-objects --revs $packDir/test-1 &&
+	test_tick &&
+	git pack-objects --revs $packDir/test-2 <<-\EOF &&
+	HEAD~1
+	^HEAD~2
+	EOF
+	test_tick &&
+	git pack-objects --revs $packDir/test-3 <<-\EOF &&
+	HEAD
+	^HEAD~1
+	EOF
+	rm -f $packDir/pack-* &&
+	rm -f $packDir/loose-* &&
+	ls $packDir/*.pack >packs-before &&
+	test_line_count = 3 packs-before &&
+
+	# the job repacks the two into a new pack, but does not
+	# delete the old ones.
+	git maintenance run --task=incremental-repack &&
+	ls $packDir/*.pack >packs-between &&
+	test_line_count = 4 packs-between &&
+
+	# the job deletes the two old packs, and does not write
+	# a new one because only one pack remains.
+	git maintenance run --task=incremental-repack &&
+	ls .git/objects/pack/*.pack >packs-after &&
+	test_line_count = 1 packs-after
+'
+
 test_done