diff mbox series

[1/5] multi-pack-index: prepare for 'expire' verb

Message ID 1e34b48a2051089052c0b53b0059576b5ace45de.1544465177.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Create 'expire' and 'repack' verbs for git-multi-pack-index | expand

Commit Message

John Passaro via GitGitGadget Dec. 10, 2018, 6:06 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The multi-pack-index tracks objects in a collection of pack-files.
Only one copy of each object is indexed, using the modified time
of the pack-files to determine tie-breakers. It is possible to
have a pack-file with no referenced objects because all objects
have a duplicate in a newer pack-file.

Introduce a new 'expire' verb to the multi-pack-index builtin.
This verb will delete these unused pack-files and rewrite the
multi-pack-index to no longer refer to those files. More details
about the specifics will follow as the method is implemented.

Add a test that verifies the 'expire' verb is correctly wired,
but will still be valid when the verb is implemented. Specifically,
create a set of packs that should all have referenced objects and
should not be removed during an 'expire' operation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  8 +++++
 builtin/multi-pack-index.c             |  4 ++-
 midx.c                                 |  5 +++
 midx.h                                 |  1 +
 t/t5319-multi-pack-index.sh            | 47 ++++++++++++++++++++++++++
 5 files changed, 64 insertions(+), 1 deletion(-)

Comments

Stefan Beller Dec. 11, 2018, 1:35 a.m. UTC | #1
On Mon, Dec 10, 2018 at 10:06 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The multi-pack-index tracks objects in a collection of pack-files.
> Only one copy of each object is indexed, using the modified time
> of the pack-files to determine tie-breakers. It is possible to
> have a pack-file with no referenced objects because all objects
> have a duplicate in a newer pack-file.
>
> Introduce a new 'expire' verb to the multi-pack-index builtin.
> This verb will delete these unused pack-files and rewrite the
> multi-pack-index to no longer refer to those files. More details
> about the specifics will follow as the method is implemented.
>
> Add a test that verifies the 'expire' verb is correctly wired,
> but will still be valid when the verb is implemented. Specifically,
> create a set of packs that should all have referenced objects and
> should not be removed during an 'expire' operation.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-multi-pack-index.txt |  8 +++++
>  builtin/multi-pack-index.c             |  4 ++-
>  midx.c                                 |  5 +++
>  midx.h                                 |  1 +
>  t/t5319-multi-pack-index.sh            | 47 ++++++++++++++++++++++++++
>  5 files changed, 64 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> index f7778a2c85..822d83c845 100644
> --- a/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -31,6 +31,14 @@ verify::
>         When given as the verb, verify the contents of the MIDX file
>         at `<dir>/packs/multi-pack-index`.
>
> +expire::
> +       When given as the verb,

Can it be given in another way? Or rather "if the verb is expire",
then ...
(I just checked the current man page, and both write and verify use
this pattern as well. I find it strange as this first part of the sentence
conveys little information, but is repeated 3 times now (once for
each verb)).

Maybe we can restructure the man page to have it more like

    The following verbs are available:
    +write::
    +    create a new MIDX file, writing to <dir>/packs/multi-pack-index.
    +
    +verify::
    +    verify the contents ...

>               delete the pack-files that are tracked
> +       by the MIDX file at `<dir>/packs/multi-pack-index`

We're mentioning the location a lot. Could we keep a more detailed
note in --object-dir and not go into such detail in the verbs section?
(Then the paragraph here would be more concise. That makes it
easier to understand)

>              but have
> +       no objects referenced by the MIDX. All objects in these pack-
> +       files have another copy in a more-recently modified pack-file.

The second sentence reads like a reason on why the first is a good
thing to have, so maybe use some subordinating conjunction adverb
("because") to make tell the reader

> +       Rewrite the MIDX file afterward to remove all references to
> +       these pack-files.

Makes sense.


>
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 70926b5bc0..948effc1ee 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -348,4 +348,51 @@ test_expect_success 'verify incorrect 64-bit offset' '
>                 "incorrect object offset"
>  '
>
> +test_expect_success 'setup expire tests' '
> +       mkdir dup &&
> +       (
> +               cd dup &&
> +               git init &&
> +               for i in $(test_seq 1 20)
> +               do
> +                       test_commit $i
> +               done &&
> +               git branch A HEAD &&
> +               git branch B HEAD~8 &&
> +               git branch C HEAD~13 &&
> +               git branch D HEAD~16 &&
> +               git branch E HEAD~18 &&
> +               git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
> +               refs/heads/E
> +               EOF
> +               git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
> +               refs/heads/D
> +               ^refs/heads/E
> +               EOF
> +               git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
> +               refs/heads/C
> +               ^refs/heads/D
> +               EOF
> +               git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
> +               refs/heads/B
> +               ^refs/heads/C
> +               EOF
> +               git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
> +               refs/heads/A
> +               ^refs/heads/B
> +               EOF
> +               git multi-pack-index write
> +       )
> +'
> +
> +test_expect_success 'expire does not remove any packs' '

With the clever setup, this test is already correctly testing
what the docs claims it should do, despite having
no implementation. Nice.
Although the core issue is that the packs are disjunct sets
of objects, so maybe /s/any packs/required packs/ or such?
SZEDER Gábor Dec. 11, 2018, 1:59 a.m. UTC | #2
On Mon, Dec 10, 2018 at 05:35:28PM -0800, Stefan Beller wrote:
> On Mon, Dec 10, 2018 at 10:06 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > The multi-pack-index tracks objects in a collection of pack-files.
> > Only one copy of each object is indexed, using the modified time
> > of the pack-files to determine tie-breakers. It is possible to
> > have a pack-file with no referenced objects because all objects
> > have a duplicate in a newer pack-file.
> >
> > Introduce a new 'expire' verb to the multi-pack-index builtin.
> > This verb will delete these unused pack-files and rewrite the
> > multi-pack-index to no longer refer to those files. More details
> > about the specifics will follow as the method is implemented.
> >
> > Add a test that verifies the 'expire' verb is correctly wired,
> > but will still be valid when the verb is implemented. Specifically,
> > create a set of packs that should all have referenced objects and
> > should not be removed during an 'expire' operation.
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  Documentation/git-multi-pack-index.txt |  8 +++++
> >  builtin/multi-pack-index.c             |  4 ++-
> >  midx.c                                 |  5 +++
> >  midx.h                                 |  1 +
> >  t/t5319-multi-pack-index.sh            | 47 ++++++++++++++++++++++++++
> >  5 files changed, 64 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> > index f7778a2c85..822d83c845 100644
> > --- a/Documentation/git-multi-pack-index.txt
> > +++ b/Documentation/git-multi-pack-index.txt
> > @@ -31,6 +31,14 @@ verify::
> >         When given as the verb, verify the contents of the MIDX file
> >         at `<dir>/packs/multi-pack-index`.
> >
> > +expire::
> > +       When given as the verb,
> 
> Can it be given in another way? Or rather "if the verb is expire",
> then ...
> (I just checked the current man page, and both write and verify use
> this pattern as well. I find it strange as this first part of the sentence
> conveys little information, but is repeated 3 times now (once for
> each verb)).
> 
> Maybe we can restructure the man page to have it more like
> 
>     The following verbs are available:
>     +write::
>     +    create a new MIDX file, writing to <dir>/packs/multi-pack-index.
>     +
>     +verify::
>     +    verify the contents ...

I think a s/verb/subcommand/ would help a lot, too, because that's
what we call it everywhere else.
Derrick Stolee Dec. 11, 2018, 12:32 p.m. UTC | #3
On 12/10/2018 8:59 PM, SZEDER Gábor wrote:
> On Mon, Dec 10, 2018 at 05:35:28PM -0800, Stefan Beller wrote:
>> On Mon, Dec 10, 2018 at 10:06 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>> +expire::
>>> +       When given as the verb,
>> Can it be given in another way? Or rather "if the verb is expire",
>> then ...
>> (I just checked the current man page, and both write and verify use
>> this pattern as well. I find it strange as this first part of the sentence
>> conveys little information, but is repeated 3 times now (once for
>> each verb)).
>>
>> Maybe we can restructure the man page to have it more like
>>
>>      The following verbs are available:
>>      +write::
>>      +    create a new MIDX file, writing to <dir>/packs/multi-pack-index.
>>      +
>>      +verify::
>>      +    verify the contents ...
> I think a s/verb/subcommand/ would help a lot, too, because that's
> what we call it everywhere else.

Thanks, both. V2 will include a new patch that reformats the doc to use 
these suggestions, then extend it for the new subcommand.

-Stolee
diff mbox series

Patch

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index f7778a2c85..822d83c845 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -31,6 +31,14 @@  verify::
 	When given as the verb, verify the contents of the MIDX file
 	at `<dir>/packs/multi-pack-index`.
 
+expire::
+	When given as the verb, delete the pack-files that are tracked
+	by the MIDX file at `<dir>/packs/multi-pack-index` but have
+	no objects referenced by the MIDX. All objects in these pack-
+	files have another copy in a more-recently modified pack-file.
+	Rewrite the MIDX file afterward to remove all references to
+	these pack-files.
+
 
 EXAMPLES
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index fca70f8e4f..145de3a46c 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,7 +5,7 @@ 
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [--object-dir=<dir>] (write|verify)"),
+	N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire)"),
 	NULL
 };
 
@@ -44,6 +44,8 @@  int cmd_multi_pack_index(int argc, const char **argv,
 		return write_midx_file(opts.object_dir);
 	if (!strcmp(argv[0], "verify"))
 		return verify_midx_file(opts.object_dir);
+	if (!strcmp(argv[0], "expire"))
+		return expire_midx_packs(opts.object_dir);
 
 	die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/midx.c b/midx.c
index 730ff84dff..bb825ef816 100644
--- a/midx.c
+++ b/midx.c
@@ -1025,3 +1025,8 @@  int verify_midx_file(const char *object_dir)
 
 	return verify_midx_error;
 }
+
+int expire_midx_packs(const char *object_dir)
+{
+	return 0;
+}
diff --git a/midx.h b/midx.h
index 774f652530..e3a2b740b5 100644
--- a/midx.h
+++ b/midx.h
@@ -49,6 +49,7 @@  int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 int write_midx_file(const char *object_dir);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(const char *object_dir);
+int expire_midx_packs(const char *object_dir);
 
 void close_midx(struct multi_pack_index *m);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 70926b5bc0..948effc1ee 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -348,4 +348,51 @@  test_expect_success 'verify incorrect 64-bit offset' '
 		"incorrect object offset"
 '
 
+test_expect_success 'setup expire tests' '
+	mkdir dup &&
+	(
+		cd dup &&
+		git init &&
+		for i in $(test_seq 1 20)
+		do
+			test_commit $i
+		done &&
+		git branch A HEAD &&
+		git branch B HEAD~8 &&
+		git branch C HEAD~13 &&
+		git branch D HEAD~16 &&
+		git branch E HEAD~18 &&
+		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
+		refs/heads/E
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
+		refs/heads/D
+		^refs/heads/E
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
+		refs/heads/C
+		^refs/heads/D
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
+		refs/heads/B
+		^refs/heads/C
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
+		refs/heads/A
+		^refs/heads/B
+		EOF
+		git multi-pack-index write
+	)
+'
+
+test_expect_success 'expire does not remove any packs' '
+	(
+		cd dup &&
+		ls .git/objects/pack >expect &&
+		git multi-pack-index expire &&
+		ls .git/objects/pack >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done