mbox series

[v4,00/10] Create 'expire' and 'repack' verbs for git-multi-pack-index

Message ID pull.92.v4.git.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Create 'expire' and 'repack' verbs for git-multi-pack-index | expand

Message

Derrick Stolee via GitGitGadget Jan. 24, 2019, 9:51 p.m. UTC
The multi-pack-index provides a fast way to find an object among a large
list of pack-files. It stores a single pack-reference for each object id, so
duplicate objects are ignored. Among a list of pack-files storing the same
object, the most-recently modified one is used.

Create new subcommands for the multi-pack-index builtin.

 * 'git multi-pack-index expire': If we have a pack-file indexed by the
   multi-pack-index, but all objects in that pack are duplicated in
   more-recently modified packs, then delete that pack (and any others like
   it). Delete the reference to that pack in the multi-pack-index.
   
   
 * 'git multi-pack-index repack --batch-size=': Starting from the oldest
   pack-files covered by the multi-pack-index, find those whose on-disk size
   is below the batch size until we have a collection of packs whose sizes
   add up to the batch size. Create a new pack containing all objects that
   the multi-pack-index references to those packs.
   
   

This allows us to create a new pattern for repacking objects: run 'repack'.
After enough time has passed that all Git commands that started before the
last 'repack' are finished, run 'expire' again. This approach has some
advantages over the existing "repack everything" model:

 1. Incremental. We can repack a small batch of objects at a time, instead
    of repacking all reachable objects. We can also limit ourselves to the
    objects that do not appear in newer pack-files.
    
    
 2. Highly Available. By adding a new pack-file (and not deleting the old
    pack-files) we do not interrupt concurrent Git commands, and do not
    suffer performance degradation. By expiring only pack-files that have no
    referenced objects, we know that Git commands that are doing normal
    object lookups* will not be interrupted.
    
    
 3. Note: if someone concurrently runs a Git command that uses
    get_all_packs(), then that command could try to read the pack-files and
    pack-indexes that we are deleting during an expire command. Such
    commands are usually related to object maintenance (i.e. fsck, gc,
    pack-objects) or are related to less-often-used features (i.e.
    fast-import, http-backend, server-info).
    
    

We plan to use this approach in VFS for Git to do background maintenance of
the "shared object cache" which is a Git alternate directory filled with
packfiles containing commits and trees. We currently download pack-files on
an hourly basis to keep up-to-date with the central server. The cache
servers supply packs on an hourly and daily basis, so most of the hourly
packs become useless after a new daily pack is downloaded. The 'expire'
command would clear out most of those packs, but many will still remain with
fewer than 100 objects remaining. The 'repack' command (with a batch size of
1-3gb, probably) can condense the remaining packs in commands that run for
1-3 min at a time. Since the daily packs range from 100-250mb, we will also
combine and condense those packs.

Updates in V2:

 * Added a method, unlink_pack_path() to remove packfiles, but with the
   additional check for a .keep file. This borrows logic from 
   builtin/repack.c.
   
   
 * Modified documentation and commit messages to replace 'verb' with
   'subcommand'. Simplified the documentation. (I left 'verbs' in the title
   of the cover letter for consistency.)
   
   

Updates in V3:

 * There was a bug in the expire logic when simultaneously removing packs
   and adding uncovered packs, specifically around the pack permutation.
   This was hard to see during review because I was using the 'pack_perm'
   array for multiple purposes. First, I was reducing its length, and then I
   was adding to it and resorting. In V3, I significantly overhauled the
   logic here, which required some extra commits before implementing
   'expire'. The final commit includes a test that would cover this case.

Updates in V4:

 * More 'verb' and 'command' instances replaced with 'subcommand'. I grepped
   the patch to check these should be fixed everywhere.
   
   
 * Update the tests to check .keep files (in last patch).
   
   
 * Modify the tests to show the terminating condition of --batch-size when
   there are three packs that fit under the size, but the first two are
   large enough to stop adding packs. This required rearranging the packs
   slightly to get different sizes than we had before. Also, I added 'touch
   -t' to set the modified times so we can fix the order in which the packs
   are selected.
   
   
 * Added a comment about the purpose of pack_perm.
   
   

Thanks, -Stolee

Derrick Stolee (10):
  repack: refactor pack deletion for future use
  Docs: rearrange subcommands for multi-pack-index
  multi-pack-index: prepare for 'expire' subcommand
  midx: simplify computation of pack name lengths
  midx: refactor permutation logic and pack sorting
  multi-pack-index: implement 'expire' subcommand
  multi-pack-index: prepare 'repack' subcommand
  midx: implement midx_repack()
  multi-pack-index: test expire while adding packs
  midx: add test that 'expire' respects .keep files

 Documentation/git-multi-pack-index.txt |  26 +-
 builtin/multi-pack-index.c             |  14 +-
 builtin/repack.c                       |  14 +-
 midx.c                                 | 399 ++++++++++++++++++-------
 midx.h                                 |   2 +
 packfile.c                             |  28 ++
 packfile.h                             |   7 +
 t/t5319-multi-pack-index.sh            | 165 ++++++++++
 8 files changed, 536 insertions(+), 119 deletions(-)


base-commit: 26aa9fc81d4c7f6c3b456a29da0b7ec72e5c6595
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-92%2Fderrickstolee%2Fmidx-expire%2Fupstream-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-92/derrickstolee/midx-expire/upstream-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/92

Range-diff vs v3:

  1:  62b393b816 =  1:  62b393b816 repack: refactor pack deletion for future use
  2:  7886785904 =  2:  7886785904 Docs: rearrange subcommands for multi-pack-index
  3:  f06382b4ae !  3:  628ca46036 multi-pack-index: prepare for 'expire' subcommand
     @@ -16,7 +16,9 @@
          Add a test that verifies the 'expire' subcommand is correctly wired,
          but will still be valid when the verb is implemented. Specifically,
          create a set of packs that should all have referenced objects and
     -    should not be removed during an 'expire' operation.
     +    should not be removed during an 'expire' operation. The packs are
     +    created carefully to ensure they have a specific order when sorted
     +    by size. This will be important in a later test.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -95,6 +97,8 @@
      +	(
      +		cd dup &&
      +		git init &&
     ++		test-tool genrandom "data" 4096 >large_file.txt &&
     ++		git update-index --add large_file.txt &&
      +		for i in $(test_seq 1 20)
      +		do
      +			test_commit $i
     @@ -104,24 +108,24 @@
      +		git branch C HEAD~13 &&
      +		git branch D HEAD~16 &&
      +		git branch E HEAD~18 &&
     -+		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
     -+		refs/heads/E
     ++		git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
     ++		refs/heads/A
     ++		^refs/heads/B
      +		EOF
     -+		git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
     -+		refs/heads/D
     -+		^refs/heads/E
     ++		git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
     ++		refs/heads/B
     ++		^refs/heads/C
      +		EOF
      +		git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
      +		refs/heads/C
      +		^refs/heads/D
      +		EOF
     -+		git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
     -+		refs/heads/B
     -+		^refs/heads/C
     ++		git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
     ++		refs/heads/D
     ++		^refs/heads/E
      +		EOF
     -+		git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
     -+		refs/heads/A
     -+		^refs/heads/B
     ++		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
     ++		refs/heads/E
      +		EOF
      +		git multi-pack-index write
      +	)
  4:  2a763990ae !  4:  d55c1d7ee7 midx: simplify computation of pack name lengths
     @@ -12,7 +12,7 @@
          dir not already covered by the multi-pack-index.
      
          In anticipation of this becoming more complicated with the 'expire'
     -    command, simplify the computation by centralizing it to a single
     +    subcommand, simplify the computation by centralizing it to a single
          loop before writing the file.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
  5:  a0d4cc6cb3 !  5:  3950743b96 midx: refactor permutation logic and pack sorting
     @@ -282,6 +282,12 @@
       
      +	QSORT(packs.info, packs.nr, pack_info_compare);
      +
     ++	/*
     ++	 * pack_perm stores a permutation between pack-int-ids from the
     ++	 * previous multi-pack-index to the new one we are writing:
     ++	 *
     ++	 * pack_perm[old_id] = new_id
     ++	 */
      +	ALLOC_ARRAY(pack_perm, packs.nr);
      +	for (i = 0; i < packs.nr; i++) {
      +		pack_perm[packs.info[i].orig_pack_int_id] = i;
  6:  4dbff40e7a !  6:  6691d97902 multi-pack-index: implement 'expire' verb
     @@ -1,8 +1,8 @@
      Author: Derrick Stolee <dstolee@microsoft.com>
      
     -    multi-pack-index: implement 'expire' verb
     +    multi-pack-index: implement 'expire' subcommand
      
     -    The 'git multi-pack-index expire' command looks at the existing
     +    The 'git multi-pack-index expire' subcommand looks at the existing
          mult-pack-index, counts the number of objects referenced in each
          pack-file, deletes the pack-fils with no referenced objects, and
          rewrites the multi-pack-index to no longer reference those packs.
     @@ -18,7 +18,7 @@
      
          Test that a new pack-file that covers the contents of two other
          pack-files leads to those pack-files being deleted during the
     -    expire command. Be sure to read the multi-pack-index to ensure
     +    expire subcommand. Be sure to read the multi-pack-index to ensure
          it no longer references those packs.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     @@ -161,6 +161,11 @@
      +		}
      +	}
      +
     + 	/*
     + 	 * pack_perm stores a permutation between pack-int-ids from the
     + 	 * previous multi-pack-index to the new one we are writing:
     +@@
     + 	 */
       	ALLOC_ARRAY(pack_perm, packs.nr);
       	for (i = 0; i < packs.nr; i++) {
      -		pack_perm[packs.info[i].orig_pack_int_id] = i;
     @@ -273,7 +278,9 @@
      +		test_cmp expect actual &&
      +		ls .git/objects/pack/ | grep idx >expect-idx &&
      +		test-tool read-midx .git/objects | grep idx >actual-midx &&
     -+		test_cmp expect-idx actual-midx
     ++		test_cmp expect-idx actual-midx &&
     ++		git multi-pack-index verify &&
     ++		git fsck
      +	)
      +'
      +
  7:  b39f90ad09 !  7:  f5a8ff21dd multi-pack-index: prepare 'repack' subcommand
     @@ -11,7 +11,7 @@
          operation does not interrupt concurrent git commands.
      
          Introduce a 'repack' subcommand to 'git multi-pack-index' that
     -    takes a '--batch-size' option. The verb will inspect the
     +    takes a '--batch-size' option. The subcommand will inspect the
          multi-pack-index for referenced pack-files whose size is smaller
          than the batch size, until collecting a list of pack-files whose
          sizes sum to larger than the batch size. Then, a new pack-file
     @@ -26,6 +26,11 @@
          we specify a small batch size, we will guarantee that future
          implementations do not change the list of pack-files.
      
     +    In addition, we hard-code the modified times of the packs in
     +    the pack directory to ensure the list of packs sorted by modified
     +    time matches the order if sorted by size (ascending). This will
     +    be important in a future test.
     +
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
     @@ -36,15 +41,15 @@
       	afterward to remove all references to these pack-files.
       
      +repack::
     -+	Collect a batch of pack-files whose size are all at most the
     -+	size given by --batch-size, but whose sizes sum to larger
     -+	than --batch-size. The batch is selected by greedily adding
     -+	small pack-files starting with the oldest pack-files that fit
     -+	the size. Create a new pack-file containing the objects the
     -+	multi-pack-index indexes into those pack-files, and rewrite
     -+	the multi-pack-index to contain that pack-file. A later run
     -+	of 'git multi-pack-index expire' will delete the pack-files
     -+	that were part of this batch.
     ++	Create a new pack-file containing objects in small pack-files
     ++	referenced by the multi-pack-index. Select the pack-files by
     ++	examining packs from oldest-to-newest, adding a pack if its
     ++	size is below the batch size. Stop adding packs when the sum
     ++	of sizes of the added packs is above the batch size. If the
     ++	total size does not reach the batch size, then do nothing.
     ++	Rewrite the multi-pack-index to reference the new pack-file.
     ++	A later run of 'git multi-pack-index expire' will delete the
     ++	pack-files that were part of this batch.
      +
       
       EXAMPLES
     @@ -84,11 +89,18 @@
      +	if (!strcmp(argv[0], "repack"))
      +		return midx_repack(opts.object_dir, (size_t)opts.batch_size);
      +	if (opts.batch_size)
     -+		die(_("--batch-size option is only for 'repack' verb"));
     ++		die(_("--batch-size option is only for 'repack' subcommand"));
      +
       	if (!strcmp(argv[0], "write"))
       		return write_midx_file(opts.object_dir);
       	if (!strcmp(argv[0], "verify"))
     +@@
     + 	if (!strcmp(argv[0], "expire"))
     + 		return expire_midx_packs(opts.object_dir);
     + 
     +-	die(_("unrecognized verb: %s"), argv[0]);
     ++	die(_("unrecognized subcommand: %s"), argv[0]);
     + }
      
       diff --git a/midx.c b/midx.c
       --- a/midx.c
     @@ -125,6 +137,12 @@
      +test_expect_success 'repack with minimum size does not alter existing packs' '
      +	(
      +		cd dup &&
     ++		rm -rf .git/objects/pack &&
     ++		mv .git/objects/pack-backup .git/objects/pack &&
     ++		touch -m -t 201901010000 .git/objects/pack/pack-D* &&
     ++		touch -m -t 201901010001 .git/objects/pack/pack-C* &&
     ++		touch -m -t 201901010002 .git/objects/pack/pack-B* &&
     ++		touch -m -t 201901010003 .git/objects/pack/pack-A* &&
      +		ls .git/objects/pack >expect &&
      +		MINSIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 1) &&
      +		git multi-pack-index repack --batch-size=$MINSIZE &&
  8:  a4c2d5a8e1 !  8:  ba1a1c7bbb midx: implement midx_repack()
     @@ -149,6 +149,16 @@
       diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
       --- a/t/t5319-multi-pack-index.sh
       +++ b/t/t5319-multi-pack-index.sh
     +@@
     + 		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
     + 		refs/heads/E
     + 		EOF
     +-		git multi-pack-index write
     ++		git multi-pack-index write &&
     ++		cp -r .git/objects/pack .git/objects/pack-backup
     + 	)
     + '
     + 
      @@
       	)
       '
     @@ -156,25 +166,28 @@
      +test_expect_success 'repack creates a new pack' '
      +	(
      +		cd dup &&
     -+		SECOND_SMALLEST_SIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 2 | tail -n 1) &&
     -+		BATCH_SIZE=$(($SECOND_SMALLEST_SIZE + 1)) &&
     -+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
      +		ls .git/objects/pack/*idx >idx-list &&
      +		test_line_count = 5 idx-list &&
     ++		THIRD_SMALLEST_SIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 3 | tail -n 1) &&
     ++		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
     ++		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
     ++		ls .git/objects/pack/*idx >idx-list &&
     ++		test_line_count = 6 idx-list &&
      +		test-tool read-midx .git/objects | grep idx >midx-list &&
     -+		test_line_count = 5 midx-list
     ++		test_line_count = 6 midx-list
      +	)
      +'
      +
      +test_expect_success 'expire removes repacked packs' '
      +	(
      +		cd dup &&
     -+		ls -S .git/objects/pack/*pack | head -n 3 >expect &&
     ++		ls -al .git/objects/pack/*pack &&
     ++		ls -S .git/objects/pack/*pack | head -n 4 >expect &&
      +		git multi-pack-index expire &&
      +		ls -S .git/objects/pack/*pack >actual &&
      +		test_cmp expect actual &&
      +		test-tool read-midx .git/objects | grep idx >midx-list &&
     -+		test_line_count = 3 midx-list
     ++		test_line_count = 4 midx-list
      +	)
      +'
      +
  9:  b97fb35ba9 =  9:  b1c6892417 multi-pack-index: test expire while adding packs
  -:  ---------- > 10:  481b08890f midx: add test that 'expire' respects .keep files

Comments

Jonathan Tan Jan. 24, 2019, 10:14 p.m. UTC | #1
> Updates in V4:
> 
>  * More 'verb' and 'command' instances replaced with 'subcommand'. I grepped
>    the patch to check these should be fixed everywhere.
>    
>    
>  * Update the tests to check .keep files (in last patch).
>    
>    
>  * Modify the tests to show the terminating condition of --batch-size when
>    there are three packs that fit under the size, but the first two are
>    large enough to stop adding packs. This required rearranging the packs
>    slightly to get different sizes than we had before. Also, I added 'touch
>    -t' to set the modified times so we can fix the order in which the packs
>    are selected.
>    
>    
>  * Added a comment about the purpose of pack_perm.

Thanks, the interdiff and patch 10 look good to me (I already reviewed
V3). I also verified that in the last test, if there is no .keep file,
the test fails as expected.
Josh Steadmon Jan. 25, 2019, 11:49 p.m. UTC | #2
On 2019.01.24 13:51, Derrick Stolee via GitGitGadget wrote:
> The multi-pack-index provides a fast way to find an object among a large
> list of pack-files. It stores a single pack-reference for each object id, so
> duplicate objects are ignored. Among a list of pack-files storing the same
> object, the most-recently modified one is used.
> 
> Create new subcommands for the multi-pack-index builtin.
> 
>  * 'git multi-pack-index expire': If we have a pack-file indexed by the
>    multi-pack-index, but all objects in that pack are duplicated in
>    more-recently modified packs, then delete that pack (and any others like
>    it). Delete the reference to that pack in the multi-pack-index.
>    
>    
>  * 'git multi-pack-index repack --batch-size=': Starting from the oldest
>    pack-files covered by the multi-pack-index, find those whose on-disk size
>    is below the batch size until we have a collection of packs whose sizes
>    add up to the batch size. Create a new pack containing all objects that
>    the multi-pack-index references to those packs.
>    
>    
> 
> This allows us to create a new pattern for repacking objects: run 'repack'.
> After enough time has passed that all Git commands that started before the
> last 'repack' are finished, run 'expire' again. This approach has some
> advantages over the existing "repack everything" model:
> 
>  1. Incremental. We can repack a small batch of objects at a time, instead
>     of repacking all reachable objects. We can also limit ourselves to the
>     objects that do not appear in newer pack-files.
>     
>     
>  2. Highly Available. By adding a new pack-file (and not deleting the old
>     pack-files) we do not interrupt concurrent Git commands, and do not
>     suffer performance degradation. By expiring only pack-files that have no
>     referenced objects, we know that Git commands that are doing normal
>     object lookups* will not be interrupted.
>     
>     
>  3. Note: if someone concurrently runs a Git command that uses
>     get_all_packs(), then that command could try to read the pack-files and
>     pack-indexes that we are deleting during an expire command. Such
>     commands are usually related to object maintenance (i.e. fsck, gc,
>     pack-objects) or are related to less-often-used features (i.e.
>     fast-import, http-backend, server-info).
>     
>     
> 
> We plan to use this approach in VFS for Git to do background maintenance of
> the "shared object cache" which is a Git alternate directory filled with
> packfiles containing commits and trees. We currently download pack-files on
> an hourly basis to keep up-to-date with the central server. The cache
> servers supply packs on an hourly and daily basis, so most of the hourly
> packs become useless after a new daily pack is downloaded. The 'expire'
> command would clear out most of those packs, but many will still remain with
> fewer than 100 objects remaining. The 'repack' command (with a batch size of
> 1-3gb, probably) can condense the remaining packs in commands that run for
> 1-3 min at a time. Since the daily packs range from 100-250mb, we will also
> combine and condense those packs.
> 
> Updates in V2:
> 
>  * Added a method, unlink_pack_path() to remove packfiles, but with the
>    additional check for a .keep file. This borrows logic from 
>    builtin/repack.c.
>    
>    
>  * Modified documentation and commit messages to replace 'verb' with
>    'subcommand'. Simplified the documentation. (I left 'verbs' in the title
>    of the cover letter for consistency.)
>    
>    
> 
> Updates in V3:
> 
>  * There was a bug in the expire logic when simultaneously removing packs
>    and adding uncovered packs, specifically around the pack permutation.
>    This was hard to see during review because I was using the 'pack_perm'
>    array for multiple purposes. First, I was reducing its length, and then I
>    was adding to it and resorting. In V3, I significantly overhauled the
>    logic here, which required some extra commits before implementing
>    'expire'. The final commit includes a test that would cover this case.
> 
> Updates in V4:
> 
>  * More 'verb' and 'command' instances replaced with 'subcommand'. I grepped
>    the patch to check these should be fixed everywhere.
>    
>    
>  * Update the tests to check .keep files (in last patch).
>    
>    
>  * Modify the tests to show the terminating condition of --batch-size when
>    there are three packs that fit under the size, but the first two are
>    large enough to stop adding packs. This required rearranging the packs
>    slightly to get different sizes than we had before. Also, I added 'touch
>    -t' to set the modified times so we can fix the order in which the packs
>    are selected.
>    
>    
>  * Added a comment about the purpose of pack_perm.
>    
>    
> 
> Thanks, -Stolee
> 
> Derrick Stolee (10):
>   repack: refactor pack deletion for future use
>   Docs: rearrange subcommands for multi-pack-index
>   multi-pack-index: prepare for 'expire' subcommand
>   midx: simplify computation of pack name lengths
>   midx: refactor permutation logic and pack sorting
>   multi-pack-index: implement 'expire' subcommand
>   multi-pack-index: prepare 'repack' subcommand
>   midx: implement midx_repack()
>   multi-pack-index: test expire while adding packs
>   midx: add test that 'expire' respects .keep files
> 
>  Documentation/git-multi-pack-index.txt |  26 +-
>  builtin/multi-pack-index.c             |  14 +-
>  builtin/repack.c                       |  14 +-
>  midx.c                                 | 399 ++++++++++++++++++-------
>  midx.h                                 |   2 +
>  packfile.c                             |  28 ++
>  packfile.h                             |   7 +
>  t/t5319-multi-pack-index.sh            | 165 ++++++++++
>  8 files changed, 536 insertions(+), 119 deletions(-)
> 
> 
> base-commit: 26aa9fc81d4c7f6c3b456a29da0b7ec72e5c6595
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-92%2Fderrickstolee%2Fmidx-expire%2Fupstream-v4
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-92/derrickstolee/midx-expire/upstream-v4
> Pull-Request: https://github.com/gitgitgadget/git/pull/92
> 
> Range-diff vs v3:
> 
>   1:  62b393b816 =  1:  62b393b816 repack: refactor pack deletion for future use
>   2:  7886785904 =  2:  7886785904 Docs: rearrange subcommands for multi-pack-index
>   3:  f06382b4ae !  3:  628ca46036 multi-pack-index: prepare for 'expire' subcommand
>      @@ -16,7 +16,9 @@
>           Add a test that verifies the 'expire' subcommand is correctly wired,
>           but will still be valid when the verb is implemented. Specifically,
>           create a set of packs that should all have referenced objects and
>      -    should not be removed during an 'expire' operation.
>      +    should not be removed during an 'expire' operation. The packs are
>      +    created carefully to ensure they have a specific order when sorted
>      +    by size. This will be important in a later test.
>       
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>       
>      @@ -95,6 +97,8 @@
>       +	(
>       +		cd dup &&
>       +		git init &&
>      ++		test-tool genrandom "data" 4096 >large_file.txt &&
>      ++		git update-index --add large_file.txt &&
>       +		for i in $(test_seq 1 20)
>       +		do
>       +			test_commit $i
>      @@ -104,24 +108,24 @@
>       +		git branch C HEAD~13 &&
>       +		git branch D HEAD~16 &&
>       +		git branch E HEAD~18 &&
>      -+		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
>      -+		refs/heads/E
>      ++		git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
>      ++		refs/heads/A
>      ++		^refs/heads/B
>       +		EOF
>      -+		git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
>      -+		refs/heads/D
>      -+		^refs/heads/E
>      ++		git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
>      ++		refs/heads/B
>      ++		^refs/heads/C
>       +		EOF
>       +		git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
>       +		refs/heads/C
>       +		^refs/heads/D
>       +		EOF
>      -+		git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
>      -+		refs/heads/B
>      -+		^refs/heads/C
>      ++		git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
>      ++		refs/heads/D
>      ++		^refs/heads/E
>       +		EOF
>      -+		git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
>      -+		refs/heads/A
>      -+		^refs/heads/B
>      ++		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
>      ++		refs/heads/E
>       +		EOF
>       +		git multi-pack-index write
>       +	)
>   4:  2a763990ae !  4:  d55c1d7ee7 midx: simplify computation of pack name lengths
>      @@ -12,7 +12,7 @@
>           dir not already covered by the multi-pack-index.
>       
>           In anticipation of this becoming more complicated with the 'expire'
>      -    command, simplify the computation by centralizing it to a single
>      +    subcommand, simplify the computation by centralizing it to a single
>           loop before writing the file.
>       
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>   5:  a0d4cc6cb3 !  5:  3950743b96 midx: refactor permutation logic and pack sorting
>      @@ -282,6 +282,12 @@
>        
>       +	QSORT(packs.info, packs.nr, pack_info_compare);
>       +
>      ++	/*
>      ++	 * pack_perm stores a permutation between pack-int-ids from the
>      ++	 * previous multi-pack-index to the new one we are writing:
>      ++	 *
>      ++	 * pack_perm[old_id] = new_id
>      ++	 */
>       +	ALLOC_ARRAY(pack_perm, packs.nr);
>       +	for (i = 0; i < packs.nr; i++) {
>       +		pack_perm[packs.info[i].orig_pack_int_id] = i;
>   6:  4dbff40e7a !  6:  6691d97902 multi-pack-index: implement 'expire' verb
>      @@ -1,8 +1,8 @@
>       Author: Derrick Stolee <dstolee@microsoft.com>
>       
>      -    multi-pack-index: implement 'expire' verb
>      +    multi-pack-index: implement 'expire' subcommand
>       
>      -    The 'git multi-pack-index expire' command looks at the existing
>      +    The 'git multi-pack-index expire' subcommand looks at the existing
>           mult-pack-index, counts the number of objects referenced in each
>           pack-file, deletes the pack-fils with no referenced objects, and
>           rewrites the multi-pack-index to no longer reference those packs.
>      @@ -18,7 +18,7 @@
>       
>           Test that a new pack-file that covers the contents of two other
>           pack-files leads to those pack-files being deleted during the
>      -    expire command. Be sure to read the multi-pack-index to ensure
>      +    expire subcommand. Be sure to read the multi-pack-index to ensure
>           it no longer references those packs.
>       
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>      @@ -161,6 +161,11 @@
>       +		}
>       +	}
>       +
>      + 	/*
>      + 	 * pack_perm stores a permutation between pack-int-ids from the
>      + 	 * previous multi-pack-index to the new one we are writing:
>      +@@
>      + 	 */
>        	ALLOC_ARRAY(pack_perm, packs.nr);
>        	for (i = 0; i < packs.nr; i++) {
>       -		pack_perm[packs.info[i].orig_pack_int_id] = i;
>      @@ -273,7 +278,9 @@
>       +		test_cmp expect actual &&
>       +		ls .git/objects/pack/ | grep idx >expect-idx &&
>       +		test-tool read-midx .git/objects | grep idx >actual-midx &&
>      -+		test_cmp expect-idx actual-midx
>      ++		test_cmp expect-idx actual-midx &&
>      ++		git multi-pack-index verify &&
>      ++		git fsck
>       +	)
>       +'
>       +
>   7:  b39f90ad09 !  7:  f5a8ff21dd multi-pack-index: prepare 'repack' subcommand
>      @@ -11,7 +11,7 @@
>           operation does not interrupt concurrent git commands.
>       
>           Introduce a 'repack' subcommand to 'git multi-pack-index' that
>      -    takes a '--batch-size' option. The verb will inspect the
>      +    takes a '--batch-size' option. The subcommand will inspect the
>           multi-pack-index for referenced pack-files whose size is smaller
>           than the batch size, until collecting a list of pack-files whose
>           sizes sum to larger than the batch size. Then, a new pack-file
>      @@ -26,6 +26,11 @@
>           we specify a small batch size, we will guarantee that future
>           implementations do not change the list of pack-files.
>       
>      +    In addition, we hard-code the modified times of the packs in
>      +    the pack directory to ensure the list of packs sorted by modified
>      +    time matches the order if sorted by size (ascending). This will
>      +    be important in a future test.
>      +
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>       
>        diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
>      @@ -36,15 +41,15 @@
>        	afterward to remove all references to these pack-files.
>        
>       +repack::
>      -+	Collect a batch of pack-files whose size are all at most the
>      -+	size given by --batch-size, but whose sizes sum to larger
>      -+	than --batch-size. The batch is selected by greedily adding
>      -+	small pack-files starting with the oldest pack-files that fit
>      -+	the size. Create a new pack-file containing the objects the
>      -+	multi-pack-index indexes into those pack-files, and rewrite
>      -+	the multi-pack-index to contain that pack-file. A later run
>      -+	of 'git multi-pack-index expire' will delete the pack-files
>      -+	that were part of this batch.
>      ++	Create a new pack-file containing objects in small pack-files
>      ++	referenced by the multi-pack-index. Select the pack-files by
>      ++	examining packs from oldest-to-newest, adding a pack if its
>      ++	size is below the batch size. Stop adding packs when the sum
>      ++	of sizes of the added packs is above the batch size. If the
>      ++	total size does not reach the batch size, then do nothing.
>      ++	Rewrite the multi-pack-index to reference the new pack-file.
>      ++	A later run of 'git multi-pack-index expire' will delete the
>      ++	pack-files that were part of this batch.
>       +
>        
>        EXAMPLES
>      @@ -84,11 +89,18 @@
>       +	if (!strcmp(argv[0], "repack"))
>       +		return midx_repack(opts.object_dir, (size_t)opts.batch_size);
>       +	if (opts.batch_size)
>      -+		die(_("--batch-size option is only for 'repack' verb"));
>      ++		die(_("--batch-size option is only for 'repack' subcommand"));
>       +
>        	if (!strcmp(argv[0], "write"))
>        		return write_midx_file(opts.object_dir);
>        	if (!strcmp(argv[0], "verify"))
>      +@@
>      + 	if (!strcmp(argv[0], "expire"))
>      + 		return expire_midx_packs(opts.object_dir);
>      + 
>      +-	die(_("unrecognized verb: %s"), argv[0]);
>      ++	die(_("unrecognized subcommand: %s"), argv[0]);
>      + }
>       
>        diff --git a/midx.c b/midx.c
>        --- a/midx.c
>      @@ -125,6 +137,12 @@
>       +test_expect_success 'repack with minimum size does not alter existing packs' '
>       +	(
>       +		cd dup &&
>      ++		rm -rf .git/objects/pack &&
>      ++		mv .git/objects/pack-backup .git/objects/pack &&
>      ++		touch -m -t 201901010000 .git/objects/pack/pack-D* &&
>      ++		touch -m -t 201901010001 .git/objects/pack/pack-C* &&
>      ++		touch -m -t 201901010002 .git/objects/pack/pack-B* &&
>      ++		touch -m -t 201901010003 .git/objects/pack/pack-A* &&
>       +		ls .git/objects/pack >expect &&
>       +		MINSIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 1) &&
>       +		git multi-pack-index repack --batch-size=$MINSIZE &&
>   8:  a4c2d5a8e1 !  8:  ba1a1c7bbb midx: implement midx_repack()
>      @@ -149,6 +149,16 @@
>        diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
>        --- a/t/t5319-multi-pack-index.sh
>        +++ b/t/t5319-multi-pack-index.sh
>      +@@
>      + 		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
>      + 		refs/heads/E
>      + 		EOF
>      +-		git multi-pack-index write
>      ++		git multi-pack-index write &&
>      ++		cp -r .git/objects/pack .git/objects/pack-backup
>      + 	)
>      + '
>      + 
>       @@
>        	)
>        '
>      @@ -156,25 +166,28 @@
>       +test_expect_success 'repack creates a new pack' '
>       +	(
>       +		cd dup &&
>      -+		SECOND_SMALLEST_SIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 2 | tail -n 1) &&
>      -+		BATCH_SIZE=$(($SECOND_SMALLEST_SIZE + 1)) &&
>      -+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
>       +		ls .git/objects/pack/*idx >idx-list &&
>       +		test_line_count = 5 idx-list &&
>      ++		THIRD_SMALLEST_SIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 3 | tail -n 1) &&
>      ++		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
>      ++		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
>      ++		ls .git/objects/pack/*idx >idx-list &&
>      ++		test_line_count = 6 idx-list &&
>       +		test-tool read-midx .git/objects | grep idx >midx-list &&
>      -+		test_line_count = 5 midx-list
>      ++		test_line_count = 6 midx-list
>       +	)
>       +'
>       +
>       +test_expect_success 'expire removes repacked packs' '
>       +	(
>       +		cd dup &&
>      -+		ls -S .git/objects/pack/*pack | head -n 3 >expect &&
>      ++		ls -al .git/objects/pack/*pack &&
>      ++		ls -S .git/objects/pack/*pack | head -n 4 >expect &&
>       +		git multi-pack-index expire &&
>       +		ls -S .git/objects/pack/*pack >actual &&
>       +		test_cmp expect actual &&
>       +		test-tool read-midx .git/objects | grep idx >midx-list &&
>      -+		test_line_count = 3 midx-list
>      ++		test_line_count = 4 midx-list
>       +	)
>       +'
>       +
>   9:  b97fb35ba9 =  9:  b1c6892417 multi-pack-index: test expire while adding packs
>   -:  ---------- > 10:  481b08890f midx: add test that 'expire' respects .keep files
> 
> -- 
> gitgitgadget

With the exception of the broken test in patch 7, and some minor style
diffs, this all looks good to me.