Message ID | ec51d0a50e6e64ae37795d77f7d33204b9b71ecd.1645638911.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Partial bundles | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > In order to have a valid pack-file after unbundling a bundle that has > the 'filter' capability, we need to generate a .promisor file. The > bundle does not promise _where_ the objects can be found, but we can > expect that these bundles will be unbundled in repositories with > appropriate promisor remotes that can find those missing objects. That sounds like a lot of wishful thinking, but I do not think of a better way to phrase the idea. Taking a bundle out of a repository and unbundling it elsewhere is "git fetch" that could be done to send objects from the former to the latter repository, so I am OK with the assumption that the original repository will stay available for such users who took its contents over sneaker-net instead of over the wire. > Use the 'git index-pack --promisor=<message>' option to create this > .promisor file. Add "from-bundle" as the message to help anyone diagnose > issues with these promisor packs. > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > bundle.c | 4 ++++ > t/t6020-bundle-misc.sh | 8 +++++++- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/bundle.c b/bundle.c > index e284ef63062..3d97de40ef0 100644 > --- a/bundle.c > +++ b/bundle.c > @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header, > struct child_process ip = CHILD_PROCESS_INIT; > strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL); > > + /* If there is a filter, then we need to create the promisor pack. */ > + if (header->filter) > + strvec_push(&ip.args, "--promisor=from-bundle"); > + > if (extra_index_pack_args) { > strvec_pushv(&ip.args, extra_index_pack_args->v); > strvec_clear(extra_index_pack_args); > diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh > index 39cfefafb65..344af34db1e 100755 > --- a/t/t6020-bundle-misc.sh > +++ b/t/t6020-bundle-misc.sh > @@ -513,7 +513,13 @@ do > The bundle uses this filter: $filter > The bundle records a complete history. > EOF > - test_cmp expect actual > + test_cmp expect actual && > + > + # This creates the first pack-file in the > + # .git/objects/pack directory. Look for a .promisor. > + git bundle unbundle partial.bdl && > + ls .git/objects/pack/pack-*.promisor >promisor && > + test_line_count = 1 promisor OK. Do we also want to inspect the contents of the resulting repository to make sure that the bundle had the right contents? One idea to do so would probably be - prepare a test repository (you already have it) - prepare a partial.bdl (you already do this) - clone the test repository into a new repository, with the same filter - create an empty repository, unbundle the partial.bdl - take "for-each-ref" and list of objects available in these two "partial copies" from the test repository, and compare
On 3/4/2022 6:43 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> In order to have a valid pack-file after unbundling a bundle that has >> the 'filter' capability, we need to generate a .promisor file. The >> bundle does not promise _where_ the objects can be found, but we can >> expect that these bundles will be unbundled in repositories with >> appropriate promisor remotes that can find those missing objects. > > That sounds like a lot of wishful thinking, but I do not think of a > better way to phrase the idea. Taking a bundle out of a repository > and unbundling it elsewhere is "git fetch" that could be done to > send objects from the former to the latter repository, so I am OK > with the assumption that the original repository will stay available > for such users who took its contents over sneaker-net instead of > over the wire. As an aside, I'm also concerned about the existing model of promisor remotes where it depends on each remote, and isn't a repository-wide state. In particular, if I do a blobless partial clone of git/git and then add git-for-windows/git as a remote and fetch it, it will break because git-for-windows/git isn't set up as a promisor remote and we expect to have every blob reachable from its pack-file (even though it was not sent because we advertised a commit that can reach it). I've been thinking about adjusting the config parsing around promisors to say "I see one promisor remote, so I will assume all remotes are promisors." It seems to me that this will fix cases like the above without further breaking any cases (that are not already broken). But that's a tangent for another time. :) >> Use the 'git index-pack --promisor=<message>' option to create this >> .promisor file. Add "from-bundle" as the message to help anyone diagnose >> issues with these promisor packs. >> >> Signed-off-by: Derrick Stolee <derrickstolee@github.com> >> --- >> bundle.c | 4 ++++ >> t/t6020-bundle-misc.sh | 8 +++++++- >> 2 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/bundle.c b/bundle.c >> index e284ef63062..3d97de40ef0 100644 >> --- a/bundle.c >> +++ b/bundle.c >> @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header, >> struct child_process ip = CHILD_PROCESS_INIT; >> strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL); >> >> + /* If there is a filter, then we need to create the promisor pack. */ >> + if (header->filter) >> + strvec_push(&ip.args, "--promisor=from-bundle"); >> + >> if (extra_index_pack_args) { >> strvec_pushv(&ip.args, extra_index_pack_args->v); >> strvec_clear(extra_index_pack_args); >> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh >> index 39cfefafb65..344af34db1e 100755 >> --- a/t/t6020-bundle-misc.sh >> +++ b/t/t6020-bundle-misc.sh >> @@ -513,7 +513,13 @@ do >> The bundle uses this filter: $filter >> The bundle records a complete history. >> EOF >> - test_cmp expect actual >> + test_cmp expect actual && >> + >> + # This creates the first pack-file in the >> + # .git/objects/pack directory. Look for a .promisor. >> + git bundle unbundle partial.bdl && >> + ls .git/objects/pack/pack-*.promisor >promisor && >> + test_line_count = 1 promisor > > OK. Do we also want to inspect the contents of the resulting > repository to make sure that the bundle had the right contents? > > One idea to do so would probably be > > - prepare a test repository (you already have it) > - prepare a partial.bdl (you already do this) > > - clone the test repository into a new repository, with the same > filter > - create an empty repository, unbundle the partial.bdl > > - take "for-each-ref" and list of objects available in these two > "partial copies" from the test repository, and compare Good idea. Thanks! Of course, looking closer at it... "git bundle unbundle" doesn't actually store the refs directly in the refspace, but instead only outputs the refs that it used. Here is an attempt to verify the refs that are reported match those in a mirror clone. --- >8 --- diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh index 344af34db1e..a228cbfc4e3 100755 --- a/t/t6020-bundle-misc.sh +++ b/t/t6020-bundle-misc.sh @@ -490,7 +490,7 @@ test_expect_success 'unfiltered bundle with --objects' ' for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100" do test_expect_success 'filtered bundle: $filter' ' - test_when_finished rm -rf .git/objects/pack && + test_when_finished rm -rf .git/objects/pack cloned unbundled && git bundle create partial.bdl \ --all \ --filter=$filter && @@ -515,11 +515,22 @@ do EOF test_cmp expect actual && - # This creates the first pack-file in the - # .git/objects/pack directory. Look for a .promisor. - git bundle unbundle partial.bdl && - ls .git/objects/pack/pack-*.promisor >promisor && - test_line_count = 1 promisor + git init unbundled && + ( + cd unbundled && + # This creates the first pack-file in the + # .git/objects/pack directory. Look for a .promisor. + git bundle unbundle ../partial.bdl >ref-list.txt && + ls .git/objects/pack/pack-*.promisor >promisor && + test_line_count = 1 promisor + ) && + + git clone --filter=blob:none --mirror "file://$(pwd)" cloned && + git -C cloned for-each-ref \ + --format="%(objectname) %(refname)" >cloned-refs.txt && + echo "$(git -C cloned rev-parse HEAD) HEAD" >>cloned-refs.txt && + test_cmp cloned-refs.txt unbundled/ref-list.txt ' done --- >8 --- I also attempted doing a "git clone --bare partial.bdl unbundled.git" to get the 'git clone' command to actually place the refs. However, 'git clone' does not set up the repository filter based on the bundle, so it reports missing blobs (even though there is no checkout). Making this work would require that "repository global promisor config" idea that I mentioned in another reply. I'll make note of this as a potential application of that idea. Thanks, -Stolee
On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > In order to have a valid pack-file after unbundling a bundle that has > the 'filter' capability, we need to generate a .promisor file. The > bundle does not promise _where_ the objects can be found, but we can > expect that these bundles will be unbundled in repositories with > appropriate promisor remotes that can find those missing objects. > > Use the 'git index-pack --promisor=<message>' option to create this > .promisor file. Add "from-bundle" as the message to help anyone diagnose > issues with these promisor packs. > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > bundle.c | 4 ++++ > t/t6020-bundle-misc.sh | 8 +++++++- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/bundle.c b/bundle.c > index e284ef63062..3d97de40ef0 100644 > --- a/bundle.c > +++ b/bundle.c > @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header, > struct child_process ip = CHILD_PROCESS_INIT; > strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL); > > + /* If there is a filter, then we need to create the promisor pack. */ > + if (header->filter) > + strvec_push(&ip.args, "--promisor=from-bundle"); > + > if (extra_index_pack_args) { > strvec_pushv(&ip.args, extra_index_pack_args->v); > strvec_clear(extra_index_pack_args); > diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh > index 39cfefafb65..344af34db1e 100755 > --- a/t/t6020-bundle-misc.sh > +++ b/t/t6020-bundle-misc.sh > @@ -513,7 +513,13 @@ do > The bundle uses this filter: $filter > The bundle records a complete history. > EOF > - test_cmp expect actual > + test_cmp expect actual && > + > + # This creates the first pack-file in the > + # .git/objects/pack directory. Look for a .promisor. > + git bundle unbundle partial.bdl && > + ls .git/objects/pack/pack-*.promisor >promisor && > + test_line_count = 1 promisor > ' > done Aside from what Junio mentioned, the preceding commit seems to be incomplete here. I.e. I'd expect to see this replace a case where we died or whatever before. What happened if we invoked "unbundle" before?
On 3/7/2022 10:47 AM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> In order to have a valid pack-file after unbundling a bundle that has >> the 'filter' capability, we need to generate a .promisor file. The >> bundle does not promise _where_ the objects can be found, but we can >> expect that these bundles will be unbundled in repositories with >> appropriate promisor remotes that can find those missing objects. >> >> Use the 'git index-pack --promisor=<message>' option to create this >> .promisor file. Add "from-bundle" as the message to help anyone diagnose >> issues with these promisor packs. >> >> Signed-off-by: Derrick Stolee <derrickstolee@github.com> >> --- >> bundle.c | 4 ++++ >> t/t6020-bundle-misc.sh | 8 +++++++- >> 2 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/bundle.c b/bundle.c >> index e284ef63062..3d97de40ef0 100644 >> --- a/bundle.c >> +++ b/bundle.c >> @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header, >> struct child_process ip = CHILD_PROCESS_INIT; >> strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL); >> >> + /* If there is a filter, then we need to create the promisor pack. */ >> + if (header->filter) >> + strvec_push(&ip.args, "--promisor=from-bundle"); >> + >> if (extra_index_pack_args) { >> strvec_pushv(&ip.args, extra_index_pack_args->v); >> strvec_clear(extra_index_pack_args); >> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh >> index 39cfefafb65..344af34db1e 100755 >> --- a/t/t6020-bundle-misc.sh >> +++ b/t/t6020-bundle-misc.sh >> @@ -513,7 +513,13 @@ do >> The bundle uses this filter: $filter >> The bundle records a complete history. >> EOF >> - test_cmp expect actual >> + test_cmp expect actual && >> + >> + # This creates the first pack-file in the >> + # .git/objects/pack directory. Look for a .promisor. >> + git bundle unbundle partial.bdl && >> + ls .git/objects/pack/pack-*.promisor >promisor && >> + test_line_count = 1 promisor >> ' >> done > > Aside from what Junio mentioned, the preceding commit seems to be > incomplete here. I.e. I'd expect to see this replace a case where we > died or whatever before. What happened if we invoked "unbundle" before? Looking closely, I think the only difference is that this patch adds the .promisor file. I can push my expanded test to be earlier in the series so we can verify this. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > Of course, looking closer at it... "git bundle unbundle" doesn't > actually store the refs directly in the refspace, but instead only > outputs the refs that it used. True. I was more thinking about equivalence between cd src_repo git clone --no-local --filter=... . ../partial.network.cloned git bundle create --filter=... partial.bndl git clone partial.bndl ../partial.bundle.cloned The two resulting repositories should look very similar except for that the remote.origin.* of the former would expect that it pushes back to where it was cloned from, while the latter would not. > + git init unbundled && > + ( > + cd unbundled && > + # This creates the first pack-file in the > + # .git/objects/pack directory. Look for a .promisor. > + git bundle unbundle ../partial.bdl >ref-list.txt && > + ls .git/objects/pack/pack-*.promisor >promisor && > + test_line_count = 1 promisor And can we enumerate the objects we have in .git/objects, both loose and packed? > + ) && > + > + git clone --filter=blob:none --mirror "file://$(pwd)" cloned && > + git -C cloned for-each-ref \ > + --format="%(objectname) %(refname)" >cloned-refs.txt && > + echo "$(git -C cloned rev-parse HEAD) HEAD" >>cloned-refs.txt && > + test_cmp cloned-refs.txt unbundled/ref-list.txt Likewise here? I think the two should match, and that was what I was wondering if we should enforce. > ' > done > > --- >8 --- > > I also attempted doing a "git clone --bare partial.bdl unbundled.git" to > get the 'git clone' command to actually place the refs. However, 'git clone' > does not set up the repository filter based on the bundle, so it reports > missing blobs (even though there is no checkout). Understandable, as cloning from a bundle, if I recall correctly, was done as yet another special case in "git clone", differently from the main "over the network" code path. And from end-user's point of view, I think updating it is part of introducing the filtered bundle. Thanks.
On 3/7/2022 11:56 AM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> Of course, looking closer at it... "git bundle unbundle" doesn't >> actually store the refs directly in the refspace, but instead only >> outputs the refs that it used. > > True. I was more thinking about equivalence between > > cd src_repo > git clone --no-local --filter=... . ../partial.network.cloned > git bundle create --filter=... partial.bndl > git clone partial.bndl ../partial.bundle.cloned > > The two resulting repositories should look very similar except for > that the remote.origin.* of the former would expect that it pushes > back to where it was cloned from, while the latter would not. Makes sense. The one downside is that you list cloning form a partial bundle, but that currently does not work, even if we avoid a checkout. It fails because the clone command is not parsing the filter and properly setting repo-global promisor information. (Again, this is a bigger change to make this possible.) I also had some struggles getting this to work since local clones were actually ignoring the filter. I didn't think it was worth setting up an HTTP or SSH server just for this test. See workaround below. >> + git init unbundled && >> + ( >> + cd unbundled && >> + # This creates the first pack-file in the >> + # .git/objects/pack directory. Look for a .promisor. >> + git bundle unbundle ../partial.bdl >ref-list.txt && >> + ls .git/objects/pack/pack-*.promisor >promisor && >> + test_line_count = 1 promisor > > And can we enumerate the objects we have in .git/objects, both loose > and packed? I can enumerate using 'git rev-list --objects' to compare the unbundled set to the full clone (adding --filter=$filter to the full clone's run and --missing=allow-any to the unbundled one). >> + ) && >> + >> + git clone --filter=blob:none --mirror "file://$(pwd)" cloned && >> + git -C cloned for-each-ref \ >> + --format="%(objectname) %(refname)" >cloned-refs.txt && >> + echo "$(git -C cloned rev-parse HEAD) HEAD" >>cloned-refs.txt && >> + test_cmp cloned-refs.txt unbundled/ref-list.txt > > Likewise here? I think the two should match, and that was what I > was wondering if we should enforce. > >> ' >> done >> >> --- >8 --- >> >> I also attempted doing a "git clone --bare partial.bdl unbundled.git" to >> get the 'git clone' command to actually place the refs. However, 'git clone' >> does not set up the repository filter based on the bundle, so it reports >> missing blobs (even though there is no checkout). > > Understandable, as cloning from a bundle, if I recall correctly, was > done as yet another special case in "git clone", differently from > the main "over the network" code path. And from end-user's point of > view, I think updating it is part of introducing the filtered > bundle. The reason I did not include that here is because of the lack of repository-global promisor/filter config. I do want to loop back and make those updates, but perhaps for this series we should add an error condition into 'git clone' to say "Cannot currently clone from a filtered bundle" to help users understand the issue? Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > I also had some struggles getting this to work since local clones > were actually ignoring the filter. I didn't think it was worth > setting up an HTTP or SSH server just for this test. Does "clone --no-local $path" work as a workaround? It should do the same thing as "ssh" codepath except that it uses "sh" instead as the process on the other side is running locally. >>> I also attempted doing a "git clone --bare partial.bdl unbundled.git" to >>> get the 'git clone' command to actually place the refs. However, 'git clone' >>> does not set up the repository filter based on the bundle, so it reports >>> missing blobs (even though there is no checkout). >> >> Understandable, as cloning from a bundle, if I recall correctly, was >> done as yet another special case in "git clone", differently from >> the main "over the network" code path. And from end-user's point of >> view, I think updating it is part of introducing the filtered >> bundle. > > The reason I did not include that here is because of the lack of > repository-global promisor/filter config. I do want to loop back > and make those updates, but perhaps for this series we should add > an error condition into 'git clone' to say "Cannot currently clone > from a filtered bundle" to help users understand the issue? It would be a workable stepwise solution, I would think. It is not like we are robbing an existing feature from users---it merely is that the support of partial cloning over different "transport" is uneven, which is to be expected, especially in earlier phase of introducing a new feature. Thanks.
On 3/7/2022 2:40 PM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> I also had some struggles getting this to work since local clones >> were actually ignoring the filter. I didn't think it was worth >> setting up an HTTP or SSH server just for this test. > > Does "clone --no-local $path" work as a workaround? It should do > the same thing as "ssh" codepath except that it uses "sh" instead as > the process on the other side is running locally. I've been trying this: git clone --no-local --filter=$filter "file://$(pwd)" cloned && which "succeeds" with this in the stderr: warning: filtering not recognized by server, ignoring >>>> I also attempted doing a "git clone --bare partial.bdl unbundled.git" to >>>> get the 'git clone' command to actually place the refs. However, 'git clone' >>>> does not set up the repository filter based on the bundle, so it reports >>>> missing blobs (even though there is no checkout). >>> >>> Understandable, as cloning from a bundle, if I recall correctly, was >>> done as yet another special case in "git clone", differently from >>> the main "over the network" code path. And from end-user's point of >>> view, I think updating it is part of introducing the filtered >>> bundle. >> >> The reason I did not include that here is because of the lack of >> repository-global promisor/filter config. I do want to loop back >> and make those updates, but perhaps for this series we should add >> an error condition into 'git clone' to say "Cannot currently clone >> from a filtered bundle" to help users understand the issue? > > It would be a workable stepwise solution, I would think. It is not > like we are robbing an existing feature from users---it merely is > that the support of partial cloning over different "transport" is > uneven, which is to be expected, especially in earlier phase of > introducing a new feature. That was my understanding, too. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > I've been trying this: > > git clone --no-local --filter=$filter "file://$(pwd)" cloned && > > which "succeeds" with this in the stderr: > > warning: filtering not recognized by server, ignoring Hmph, and we won't see it when going over ssh to the same repository? That is puzzling. >>> an error condition into 'git clone' to say "Cannot currently clone >>> from a filtered bundle" to help users understand the issue? >> >> It would be a workable stepwise solution, I would think. It is not >> like we are robbing an existing feature from users---it merely is >> that the support of partial cloning over different "transport" is >> uneven, which is to be expected, especially in earlier phase of >> introducing a new feature. > > That was my understanding, too. Good to see us on the same page. Thanks.
On 3/7/2022 2:54 PM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> I've been trying this: >> >> git clone --no-local --filter=$filter "file://$(pwd)" cloned && >> >> which "succeeds" with this in the stderr: >> >> warning: filtering not recognized by server, ignoring > > Hmph, and we won't see it when going over ssh to the same > repository? That is puzzling. Of course, it's not an issue with file://, but an issue with the defaults. In order to test partial clones, I need to enable these config options in the server repo: test_config uploadpack.allowfilter 1 && test_config uploadpack.allowanysha1inwant 1 && Sorry for taking so long to realize this. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: >> Hmph, and we won't see it when going over ssh to the same >> repository? That is puzzling. > > Of course, it's not an issue with file://, but an issue with the > defaults. In order to test partial clones, I need to enable these > config options in the server repo: > > test_config uploadpack.allowfilter 1 && > test_config uploadpack.allowanysha1inwant 1 && > > Sorry for taking so long to realize this. No, thanks for finding that out---the fact that I left it at "puzzling" without offering that is a strong evidence that I didn't think of it, either ;-)
diff --git a/bundle.c b/bundle.c index e284ef63062..3d97de40ef0 100644 --- a/bundle.c +++ b/bundle.c @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header, struct child_process ip = CHILD_PROCESS_INIT; strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL); + /* If there is a filter, then we need to create the promisor pack. */ + if (header->filter) + strvec_push(&ip.args, "--promisor=from-bundle"); + if (extra_index_pack_args) { strvec_pushv(&ip.args, extra_index_pack_args->v); strvec_clear(extra_index_pack_args); diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh index 39cfefafb65..344af34db1e 100755 --- a/t/t6020-bundle-misc.sh +++ b/t/t6020-bundle-misc.sh @@ -513,7 +513,13 @@ do The bundle uses this filter: $filter The bundle records a complete history. EOF - test_cmp expect actual + test_cmp expect actual && + + # This creates the first pack-file in the + # .git/objects/pack directory. Look for a .promisor. + git bundle unbundle partial.bdl && + ls .git/objects/pack/pack-*.promisor >promisor && + test_line_count = 1 promisor ' done