Message ID | pull.1644.git.git.1705231010118.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | clone: support cloning of filtered bundles | expand |
Hi Nikolay On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote: > From: Nikolay Edigaryev <edigaryev@gmail.com> > > f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an > incredibly useful ability to create filtered bundles, which advances > the partial clone/promisor support in Git and allows for archiving > large repositories to object storages like S3 in bundles that are: > > * easy to manage > * bundle is just a single file, it's easier to guarantee atomic > replacements in object storages like S3 and they are faster to > fetch than a bare repository since there's only a single GET > request involved > * incredibly tiny > * no indexes (which may be more than 10 MB for some repositories) > and other fluff, compared to cloning a bare repository > * bundle can be filtered to only contain the tips of refs neccessary > for e.g. code-analysis purposes > > However, in 86fdd94d72 (clone: fail gracefully when cloning filtered > bundle, 2022-03-09) the cloning of such bundles was disabled, with a > note that this behavior is not desired, and it the long-term this > should be possible. > > The commit above states that it's not possible to have this at the > moment due to lack of remote and a repository-global config that > specifies an object filter, yet it's unclear why a remote-specific > config can't be used instead, which is what this change does. As I understand it if you're cloning from a bundle file then then there is no remote so how can we set a remote-specific config? I'm surprised that the proposed change does not require the user to pass "--filter" to "git clone" as I expected that we'd want to check that the filter on the command line was compatible with the filter used to create the bundle. Allowing "git clone" to create a partial clone without the user asking for it by passing the "--filter" option feels like is going to end up confusing users. > +test_expect_success 'cloning from filtered bundle works' ' > + git bundle create partial.bdl --all --filter=blob:none && > + git clone --bare partial.bdl partial 2>err The redirection hides any error message which will make debugging test failures harder. It would be nice to see this test check any config set when cloning and that git commands can run successfully in the repository. Best Wishes Phillip
Hello Phillip, > As I understand it if you're cloning from a bundle file then then there > is no remote so how can we set a remote-specific config? There is a remote, for more details see df61c88979 (clone: also configure url for bare clones, 2010-03-29), which has the following code: strbuf_addf(&key, "remote.%s.url", remote_name); git_config_set(key.buf, repo); strbuf_reset(&key); You can verify this by creating a bundle on Git 2.43.0 with "git create bundle bundle.bundle --all" and then cloning it with "git clone --bare /path/to/bundle.bundle", in my case the following repo-wide configuration file was created: [core] repositoryformatversion = 0 filemode = true bare = true ignorecase = true precomposeunicode = true [remote "origin"] url = /Users/edi/src/cirrus-cli/cli.bundle > I'm surprised that the proposed change does not require the user to pass > "--filter" to "git clone" as I expected that we'd want to check that the > filter on the command line was compatible with the filter used to create > the bundle. Allowing "git clone" to create a partial clone without the > user asking for it by passing the "--filter" option feels like is going > to end up confusing users. Note that currently, when you clone a normal non-filtered bundle with a '--filter' argument specified, no filtering will take place and no error will be thrown. "promisor = true" and "partialclonefilter = ..." options will be set in the repo config, but no .promisor file will be created. This is even more confusing IMO, but that's how it currently on Git 2.43.0. You have a good point, but I feel like completely preventing cloning of filtered bundles and requiring a '--filter' argument is very taxing. If you've already specified a '--filter' when creating a bundle (and thus your intent to use partially cloned data), why do it multiple times? What I propose as an alternative here is to act based on the user's intent when cloning: * when the user specifies no '--filter' argument, do nothing special, allow cloning both types of bundles: normal and filtered (with the logic from this patch) * when the user does specify a '--filter' argument, either: * throw an error explaining that filtering of filtered bundles is not supported * or compare the user's filter specification and the one that is in the bundle and only throw an error if they mismatch Let me know what you think about this (and perhaps you have a more concrete example in mind where this will have negative consequences) and I'll be happy to do a next iteration. On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote: > > Hi Nikolay > > On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote: > > From: Nikolay Edigaryev <edigaryev@gmail.com> > > > > f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an > > incredibly useful ability to create filtered bundles, which advances > > the partial clone/promisor support in Git and allows for archiving > > large repositories to object storages like S3 in bundles that are: > > > > * easy to manage > > * bundle is just a single file, it's easier to guarantee atomic > > replacements in object storages like S3 and they are faster to > > fetch than a bare repository since there's only a single GET > > request involved > > * incredibly tiny > > * no indexes (which may be more than 10 MB for some repositories) > > and other fluff, compared to cloning a bare repository > > * bundle can be filtered to only contain the tips of refs neccessary > > for e.g. code-analysis purposes > > > > However, in 86fdd94d72 (clone: fail gracefully when cloning filtered > > bundle, 2022-03-09) the cloning of such bundles was disabled, with a > > note that this behavior is not desired, and it the long-term this > > should be possible. > > > > The commit above states that it's not possible to have this at the > > moment due to lack of remote and a repository-global config that > > specifies an object filter, yet it's unclear why a remote-specific > > config can't be used instead, which is what this change does. > > As I understand it if you're cloning from a bundle file then then there > is no remote so how can we set a remote-specific config? > > I'm surprised that the proposed change does not require the user to pass > "--filter" to "git clone" as I expected that we'd want to check that the > filter on the command line was compatible with the filter used to create > the bundle. Allowing "git clone" to create a partial clone without the > user asking for it by passing the "--filter" option feels like is going > to end up confusing users. > > > +test_expect_success 'cloning from filtered bundle works' ' > > + git bundle create partial.bdl --all --filter=blob:none && > > + git clone --bare partial.bdl partial 2>err > > The redirection hides any error message which will make debugging test > failures harder. It would be nice to see this test check any config set > when cloning and that git commands can run successfully in the repository. > > Best Wishes > > Phillip
> Note that currently, when you clone a normal non-filtered bundle with a > '--filter' argument specified, no filtering will take place and no error > will be thrown. "promisor = true" and "partialclonefilter = ..." options > will be set in the repo config, but no .promisor file will be created. > This is even more confusing IMO, but that's how it currently on > Git 2.43.0. I was wrong about this one: the .promisor pack file actually gets created. And the filtering is not being done because the "uploadpack.allowFilter" global option is not enabled by default. Unfortunately the only way to figure this out is to run Git with GIT_TRACE=2, which shows: warning: filtering not recognized by server, ignoring So please feel free to skip this part from the consideration. On Sun, Jan 14, 2024 at 11:39 PM Nikolay Edigaryev <edigaryev@gmail.com> wrote: > > Hello Phillip, > > > As I understand it if you're cloning from a bundle file then then there > > is no remote so how can we set a remote-specific config? > > There is a remote, for more details see df61c88979 (clone: also > configure url for bare clones, 2010-03-29), which has the following > code: > > strbuf_addf(&key, "remote.%s.url", remote_name); > git_config_set(key.buf, repo); > strbuf_reset(&key); > > You can verify this by creating a bundle on Git 2.43.0 with "git create > bundle bundle.bundle --all" and then cloning it with "git clone > --bare /path/to/bundle.bundle", in my case the following repo-wide > configuration file was created: > > [core] > repositoryformatversion = 0 > filemode = true > bare = true > ignorecase = true > precomposeunicode = true > [remote "origin"] > url = /Users/edi/src/cirrus-cli/cli.bundle > > > I'm surprised that the proposed change does not require the user to pass > > "--filter" to "git clone" as I expected that we'd want to check that the > > filter on the command line was compatible with the filter used to create > > the bundle. Allowing "git clone" to create a partial clone without the > > user asking for it by passing the "--filter" option feels like is going > > to end up confusing users. > > Note that currently, when you clone a normal non-filtered bundle with a > '--filter' argument specified, no filtering will take place and no error > will be thrown. "promisor = true" and "partialclonefilter = ..." options > will be set in the repo config, but no .promisor file will be created. > This is even more confusing IMO, but that's how it currently on > Git 2.43.0. > > You have a good point, but I feel like completely preventing cloning of > filtered bundles and requiring a '--filter' argument is very taxing. If > you've already specified a '--filter' when creating a bundle (and thus > your intent to use partially cloned data), why do it multiple times? > > What I propose as an alternative here is to act based on the user's > intent when cloning: > > * when the user specifies no '--filter' argument, do nothing special, > allow cloning both types of bundles: normal and filtered (with the > logic from this patch) > > * when the user does specify a '--filter' argument, either: > * throw an error explaining that filtering of filtered bundles is not > supported > * or compare the user's filter specification and the one that is > in the bundle and only throw an error if they mismatch > > Let me know what you think about this (and perhaps you have a more > concrete example in mind where this will have negative consequences) > and I'll be happy to do a next iteration. > > > On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote: > > > > Hi Nikolay > > > > On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote: > > > From: Nikolay Edigaryev <edigaryev@gmail.com> > > > > > > f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an > > > incredibly useful ability to create filtered bundles, which advances > > > the partial clone/promisor support in Git and allows for archiving > > > large repositories to object storages like S3 in bundles that are: > > > > > > * easy to manage > > > * bundle is just a single file, it's easier to guarantee atomic > > > replacements in object storages like S3 and they are faster to > > > fetch than a bare repository since there's only a single GET > > > request involved > > > * incredibly tiny > > > * no indexes (which may be more than 10 MB for some repositories) > > > and other fluff, compared to cloning a bare repository > > > * bundle can be filtered to only contain the tips of refs neccessary > > > for e.g. code-analysis purposes > > > > > > However, in 86fdd94d72 (clone: fail gracefully when cloning filtered > > > bundle, 2022-03-09) the cloning of such bundles was disabled, with a > > > note that this behavior is not desired, and it the long-term this > > > should be possible. > > > > > > The commit above states that it's not possible to have this at the > > > moment due to lack of remote and a repository-global config that > > > specifies an object filter, yet it's unclear why a remote-specific > > > config can't be used instead, which is what this change does. > > > > As I understand it if you're cloning from a bundle file then then there > > is no remote so how can we set a remote-specific config? > > > > I'm surprised that the proposed change does not require the user to pass > > "--filter" to "git clone" as I expected that we'd want to check that the > > filter on the command line was compatible with the filter used to create > > the bundle. Allowing "git clone" to create a partial clone without the > > user asking for it by passing the "--filter" option feels like is going > > to end up confusing users. > > > > > +test_expect_success 'cloning from filtered bundle works' ' > > > + git bundle create partial.bdl --all --filter=blob:none && > > > + git clone --bare partial.bdl partial 2>err > > > > The redirection hides any error message which will make debugging test > > failures harder. It would be nice to see this test check any config set > > when cloning and that git commands can run successfully in the repository. > > > > Best Wishes > > > > Phillip
"Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/builtin/clone.c b/builtin/clone.c > index c6357af9498..4b3fedf78ed 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix) > > if (fd > 0) > close(fd); > + > + if (has_filter) { > + strbuf_addf(&key, "remote.%s.promisor", remote_name); > + git_config_set(key.buf, "true"); > + strbuf_reset(&key); > + > + strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name); > + git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter)); > + strbuf_reset(&key); > + } > + > -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered > -# bundle, but that requires a change to promisor/filter config options. Sorry, but this "should be able to" does not make sense to me in the first place. I can understand how an operation to create a narrow clone of an unfiltered bundle and then prepare the resulting repository for future "fattening" by naming the unfiltered bundle file a "promisor", and allow the user to lazily fetch objects that have initially been filtered out of the bundle lazily. But a bundle that were created with objects _omitted_ already? It obviously cannot "promise" to deliber any objects that ought to be reachable from the objects contained in it, so setting the bundle file as promisor in the configuration does not help the resulting repository. Those missing objects must be obtained from somewhere other than that original "filtered" bundle, and if that source of objects that can promise all the objects that are ought to be reachable were specified as the promisor, it would make sense. But the source of this clone operation, i.e. the bundle file that is pointed at by "remote.$remote_name.url", cannot be that promisor.
Junio C Hamano <gitster@pobox.com> writes: > "Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com> > writes: > >> diff --git a/builtin/clone.c b/builtin/clone.c >> index c6357af9498..4b3fedf78ed 100644 >> --- a/builtin/clone.c >> +++ b/builtin/clone.c >> @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix) >> >> if (fd > 0) >> close(fd); >> + >> + if (has_filter) { >> + strbuf_addf(&key, "remote.%s.promisor", remote_name); >> + git_config_set(key.buf, "true"); >> + strbuf_reset(&key); >> + >> + strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name); >> + git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter)); >> + strbuf_reset(&key); >> + } >> + > >> -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered >> -# bundle, but that requires a change to promisor/filter config options. > ... > But a bundle that were created with objects _omitted_ already? > ... the source of this clone operation, i.e. the bundle file that is > pointed at by "remote.$remote_name.url", cannot be that promisor. Extending the above a bit, one important way a bundle is used is as a medium for sneaker-net. Instead of making a full clone over the network, if you can create a bundle that records all objects and all refs out of the source repository and then unbundle it in a different place to create a repository, you can tweak the resulting repository by either adding a separete remote or changing the remote.origin.url so that your subsequent fetch goes over the network to the repository you took the initial bundle from. The "tweak the resulting repository" part however MUST be done manually with the current system. If we can optionally record the publically reachable URL of the source repository when we create a bundle file, and "git clone" on the receiving side can read the URL out of the bundle and act on it (e.g., show it to the user and offer to record it as remote.origin.url in the resulting repository---I do not think it is wise to do this silently without letting the user know from security's point of view), then the use of bundle files as a medium for sneaker-netting will become even easier. And once that is done, perhaps allowing a filtered bundle to act as a sneaker-net medium to simulate an initial filtered clone would make sense. The promisor as well as the origin will be the network reachable URL and subsequent fetches (both deliberate ones via "git fetch" as well as lazy on-demand ones that backfills missing objects via the "promisor" access) would become possible. But without such a change to the bundle file format, allowing "clone" to finish and pretend the resulting repository is usable is somewhat irresponsible to the users. The on-demand lazy fetch would fail after this code cloned from such a filtered bundle, no?
Hi Nikolay On 14/01/2024 19:39, Nikolay Edigaryev wrote: > Hello Phillip, > >> As I understand it if you're cloning from a bundle file then then there >> is no remote so how can we set a remote-specific config? > > There is a remote, for more details see df61c88979 (clone: also > configure url for bare clones, 2010-03-29), which has the following > code: > > strbuf_addf(&key, "remote.%s.url", remote_name); > git_config_set(key.buf, repo); > strbuf_reset(&key); > > You can verify this by creating a bundle on Git 2.43.0 with "git create > bundle bundle.bundle --all" and then cloning it with "git clone > --bare /path/to/bundle.bundle", in my case the following repo-wide > configuration file was created: > > [core] > repositoryformatversion = 0 > filemode = true > bare = true > ignorecase = true > precomposeunicode = true > [remote "origin"] > url = /Users/edi/src/cirrus-cli/cli.bundle Oh, thanks for clarifying that I didn't realize we set "origin" to point to the bundle. That means this patch creates a promisor remote config pointing to a bundle that does not contain the missing objects. As Junio said that doesn't make much sense to me as the point of the promisor config is to allow git to lazily fetch the missing objects. Best Wishes Phillip >> I'm surprised that the proposed change does not require the user to pass >> "--filter" to "git clone" as I expected that we'd want to check that the >> filter on the command line was compatible with the filter used to create >> the bundle. Allowing "git clone" to create a partial clone without the >> user asking for it by passing the "--filter" option feels like is going >> to end up confusing users. > > Note that currently, when you clone a normal non-filtered bundle with a > '--filter' argument specified, no filtering will take place and no error > will be thrown. "promisor = true" and "partialclonefilter = ..." options > will be set in the repo config, but no .promisor file will be created. > This is even more confusing IMO, but that's how it currently on > Git 2.43.0. > > You have a good point, but I feel like completely preventing cloning of > filtered bundles and requiring a '--filter' argument is very taxing. If > you've already specified a '--filter' when creating a bundle (and thus > your intent to use partially cloned data), why do it multiple times? > > What I propose as an alternative here is to act based on the user's > intent when cloning: > > * when the user specifies no '--filter' argument, do nothing special, > allow cloning both types of bundles: normal and filtered (with the > logic from this patch) > > * when the user does specify a '--filter' argument, either: > * throw an error explaining that filtering of filtered bundles is not > supported > * or compare the user's filter specification and the one that is > in the bundle and only throw an error if they mismatch > > Let me know what you think about this (and perhaps you have a more > concrete example in mind where this will have negative consequences) > and I'll be happy to do a next iteration. > > > On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote: >> >> Hi Nikolay >> >> On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote: >>> From: Nikolay Edigaryev <edigaryev@gmail.com> >>> >>> f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an >>> incredibly useful ability to create filtered bundles, which advances >>> the partial clone/promisor support in Git and allows for archiving >>> large repositories to object storages like S3 in bundles that are: >>> >>> * easy to manage >>> * bundle is just a single file, it's easier to guarantee atomic >>> replacements in object storages like S3 and they are faster to >>> fetch than a bare repository since there's only a single GET >>> request involved >>> * incredibly tiny >>> * no indexes (which may be more than 10 MB for some repositories) >>> and other fluff, compared to cloning a bare repository >>> * bundle can be filtered to only contain the tips of refs neccessary >>> for e.g. code-analysis purposes >>> >>> However, in 86fdd94d72 (clone: fail gracefully when cloning filtered >>> bundle, 2022-03-09) the cloning of such bundles was disabled, with a >>> note that this behavior is not desired, and it the long-term this >>> should be possible. >>> >>> The commit above states that it's not possible to have this at the >>> moment due to lack of remote and a repository-global config that >>> specifies an object filter, yet it's unclear why a remote-specific >>> config can't be used instead, which is what this change does. >> >> As I understand it if you're cloning from a bundle file then then there >> is no remote so how can we set a remote-specific config? >> >> I'm surprised that the proposed change does not require the user to pass >> "--filter" to "git clone" as I expected that we'd want to check that the >> filter on the command line was compatible with the filter used to create >> the bundle. Allowing "git clone" to create a partial clone without the >> user asking for it by passing the "--filter" option feels like is going >> to end up confusing users. >> >>> +test_expect_success 'cloning from filtered bundle works' ' >>> + git bundle create partial.bdl --all --filter=blob:none && >>> + git clone --bare partial.bdl partial 2>err >> >> The redirection hides any error message which will make debugging test >> failures harder. It would be nice to see this test check any config set >> when cloning and that git commands can run successfully in the repository. >> >> Best Wishes >> >> Phillip
Hi Nikolay On 14/01/2024 21:26, Nikolay Edigaryev wrote: >> Note that currently, when you clone a normal non-filtered bundle with a >> '--filter' argument specified, no filtering will take place and no error >> will be thrown. "promisor = true" and "partialclonefilter = ..." options >> will be set in the repo config, but no .promisor file will be created. >> This is even more confusing IMO, but that's how it currently on >> Git 2.43.0. > > I was wrong about this one: the .promisor pack file actually gets created. > > And the filtering is not being done because the "uploadpack.allowFilter" > global option is not enabled by default. That makes sense - if you try to make a partial clone from a remote that does not support object filtering we fallback to a full clone and print the warning you mention below. > Unfortunately the only way to figure this out is to run Git with > GIT_TRACE=2, which shows: That seems strange, you should see the warning without having to set GIT_TRACE. I've certainly seen the warning in the past when trying to create a partial clone from a remote did not support them without me setting any special environment variables. Best Wishes Phillip > warning: filtering not recognized by server, ignoring > > So please feel free to skip this part from the consideration. > > > On Sun, Jan 14, 2024 at 11:39 PM Nikolay Edigaryev <edigaryev@gmail.com> wrote: >> >> Hello Phillip, >> >>> As I understand it if you're cloning from a bundle file then then there >>> is no remote so how can we set a remote-specific config? >> >> There is a remote, for more details see df61c88979 (clone: also >> configure url for bare clones, 2010-03-29), which has the following >> code: >> >> strbuf_addf(&key, "remote.%s.url", remote_name); >> git_config_set(key.buf, repo); >> strbuf_reset(&key); >> >> You can verify this by creating a bundle on Git 2.43.0 with "git create >> bundle bundle.bundle --all" and then cloning it with "git clone >> --bare /path/to/bundle.bundle", in my case the following repo-wide >> configuration file was created: >> >> [core] >> repositoryformatversion = 0 >> filemode = true >> bare = true >> ignorecase = true >> precomposeunicode = true >> [remote "origin"] >> url = /Users/edi/src/cirrus-cli/cli.bundle >> >>> I'm surprised that the proposed change does not require the user to pass >>> "--filter" to "git clone" as I expected that we'd want to check that the >>> filter on the command line was compatible with the filter used to create >>> the bundle. Allowing "git clone" to create a partial clone without the >>> user asking for it by passing the "--filter" option feels like is going >>> to end up confusing users. >> >> Note that currently, when you clone a normal non-filtered bundle with a >> '--filter' argument specified, no filtering will take place and no error >> will be thrown. "promisor = true" and "partialclonefilter = ..." options >> will be set in the repo config, but no .promisor file will be created. >> This is even more confusing IMO, but that's how it currently on >> Git 2.43.0. >> >> You have a good point, but I feel like completely preventing cloning of >> filtered bundles and requiring a '--filter' argument is very taxing. If >> you've already specified a '--filter' when creating a bundle (and thus >> your intent to use partially cloned data), why do it multiple times? >> >> What I propose as an alternative here is to act based on the user's >> intent when cloning: >> >> * when the user specifies no '--filter' argument, do nothing special, >> allow cloning both types of bundles: normal and filtered (with the >> logic from this patch) >> >> * when the user does specify a '--filter' argument, either: >> * throw an error explaining that filtering of filtered bundles is not >> supported >> * or compare the user's filter specification and the one that is >> in the bundle and only throw an error if they mismatch >> >> Let me know what you think about this (and perhaps you have a more >> concrete example in mind where this will have negative consequences) >> and I'll be happy to do a next iteration. >> >> >> On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote: >>> >>> Hi Nikolay >>> >>> On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote: >>>> From: Nikolay Edigaryev <edigaryev@gmail.com> >>>> >>>> f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an >>>> incredibly useful ability to create filtered bundles, which advances >>>> the partial clone/promisor support in Git and allows for archiving >>>> large repositories to object storages like S3 in bundles that are: >>>> >>>> * easy to manage >>>> * bundle is just a single file, it's easier to guarantee atomic >>>> replacements in object storages like S3 and they are faster to >>>> fetch than a bare repository since there's only a single GET >>>> request involved >>>> * incredibly tiny >>>> * no indexes (which may be more than 10 MB for some repositories) >>>> and other fluff, compared to cloning a bare repository >>>> * bundle can be filtered to only contain the tips of refs neccessary >>>> for e.g. code-analysis purposes >>>> >>>> However, in 86fdd94d72 (clone: fail gracefully when cloning filtered >>>> bundle, 2022-03-09) the cloning of such bundles was disabled, with a >>>> note that this behavior is not desired, and it the long-term this >>>> should be possible. >>>> >>>> The commit above states that it's not possible to have this at the >>>> moment due to lack of remote and a repository-global config that >>>> specifies an object filter, yet it's unclear why a remote-specific >>>> config can't be used instead, which is what this change does. >>> >>> As I understand it if you're cloning from a bundle file then then there >>> is no remote so how can we set a remote-specific config? >>> >>> I'm surprised that the proposed change does not require the user to pass >>> "--filter" to "git clone" as I expected that we'd want to check that the >>> filter on the command line was compatible with the filter used to create >>> the bundle. Allowing "git clone" to create a partial clone without the >>> user asking for it by passing the "--filter" option feels like is going >>> to end up confusing users. >>> >>>> +test_expect_success 'cloning from filtered bundle works' ' >>>> + git bundle create partial.bdl --all --filter=blob:none && >>>> + git clone --bare partial.bdl partial 2>err >>> >>> The redirection hides any error message which will make debugging test >>> failures harder. It would be nice to see this test check any config set >>> when cloning and that git commands can run successfully in the repository. >>> >>> Best Wishes >>> >>> Phillip
Hello Junio and Phillip, Thanks a lot for the explanations of how this is supposed to work. It seems that to make this work properly, we'd need to: (1) add an argument (or an option) to 'git bundle create', so that the user will be able to explicitly request the inclusion of a desired remote's URL Without such mechanism in place data leak is possible, e.g. remote with credentials hardcoded in it. (2) extend the 'gitformat-bundle' to include 'url' However, a remote can have multiple URLs and other remote-specific options might be necessary to properly work with it. (3) add an argument (or an option) to 'git clone', so that the user will be able to explicitly request the write of the URL contained in the bundle to the repository's config Otherwise, it's insecure, e.g. someone might craft a bundle with a URL that collects data from the user. I don't want waste anyone's time on this anymore because I've toyed with 'git bundle' a bit more and realized that what I'm trying to accomplish can be done the other way: 1. git init 2. git bundle unbundle <PATH> | <script that swaps hashes and refs in 'git bundle unbundle output' and feeds them to 'git update-ref'> Hopefully this discussion will be useful for people looking to accomplish something similar to what I've described in the initial message. On Mon, Jan 15, 2024 at 6:09 AM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > "Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com> > > writes: > > > >> diff --git a/builtin/clone.c b/builtin/clone.c > >> index c6357af9498..4b3fedf78ed 100644 > >> --- a/builtin/clone.c > >> +++ b/builtin/clone.c > >> @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix) > >> > >> if (fd > 0) > >> close(fd); > >> + > >> + if (has_filter) { > >> + strbuf_addf(&key, "remote.%s.promisor", remote_name); > >> + git_config_set(key.buf, "true"); > >> + strbuf_reset(&key); > >> + > >> + strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name); > >> + git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter)); > >> + strbuf_reset(&key); > >> + } > >> + > > > >> -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered > >> -# bundle, but that requires a change to promisor/filter config options. > > ... > > But a bundle that were created with objects _omitted_ already? > > ... the source of this clone operation, i.e. the bundle file that is > > pointed at by "remote.$remote_name.url", cannot be that promisor. > > Extending the above a bit, one important way a bundle is used is as > a medium for sneaker-net. Instead of making a full clone over the > network, if you can create a bundle that records all objects and all > refs out of the source repository and then unbundle it in a > different place to create a repository, you can tweak the resulting > repository by either adding a separete remote or changing the > remote.origin.url so that your subsequent fetch goes over the > network to the repository you took the initial bundle from. > > The "tweak the resulting repository" part however MUST be done > manually with the current system. If we can optionally record the > publically reachable URL of the source repository when we create a > bundle file, and "git clone" on the receiving side can read the URL > out of the bundle and act on it (e.g., show it to the user and offer > to record it as remote.origin.url in the resulting repository---I do > not think it is wise to do this silently without letting the user > know from security's point of view), then the use of bundle files as > a medium for sneaker-netting will become even easier. > > And once that is done, perhaps allowing a filtered bundle to act as > a sneaker-net medium to simulate an initial filtered clone would > make sense. The promisor as well as the origin will be the network > reachable URL and subsequent fetches (both deliberate ones via "git > fetch" as well as lazy on-demand ones that backfills missing objects > via the "promisor" access) would become possible. > > But without such a change to the bundle file format, allowing > "clone" to finish and pretend the resulting repository is usable is > somewhat irresponsible to the users. The on-demand lazy fetch would > fail after this code cloned from such a filtered bundle, no?
diff --git a/builtin/clone.c b/builtin/clone.c index c6357af9498..4b3fedf78ed 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix) if (fd > 0) close(fd); + + if (has_filter) { + strbuf_addf(&key, "remote.%s.promisor", remote_name); + git_config_set(key.buf, "true"); + strbuf_reset(&key); + + strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name); + git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter)); + strbuf_reset(&key); + } + bundle_header_release(&header); - if (has_filter) - die(_("cannot clone from filtered bundle")); } transport_set_option(transport, TRANS_OPT_KEEP, "yes"); diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh index 3e6bcbf30cd..f449df00642 100755 --- a/t/t6020-bundle-misc.sh +++ b/t/t6020-bundle-misc.sh @@ -555,16 +555,9 @@ do ' done -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered -# bundle, but that requires a change to promisor/filter config options. -# For now, we fail gracefully with a helpful error. This behavior can be -# changed in the future to succeed as much as possible. -test_expect_success 'cloning from filtered bundle has useful error' ' - git bundle create partial.bdl \ - --all \ - --filter=blob:none && - test_must_fail git clone --bare partial.bdl partial 2>err && - grep "cannot clone from filtered bundle" err +test_expect_success 'cloning from filtered bundle works' ' + git bundle create partial.bdl --all --filter=blob:none && + git clone --bare partial.bdl partial 2>err ' test_expect_success 'verify catches unreachable, broken prerequisites' '