Message ID | 20200529000432.146618-1-delphij@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] fetch: allow adding a filter after initial clone. | expand |
Xin Li <delphij@google.com> writes: > Retroactively adding filter can be useful for existing shallow clones as > they allow users to see earlier change histories without downloading all > git objects in a regular --unshallow fetch. > > Without this patch, users can make a clone partial by editing the > repository configuration to convert the remote into a promisor, like: > > git config core.repositoryFormatVersion 1 > git config extensions.partialClone origin > git fetch --unshallow --filter=blob:none origin > > Since the hard part of making this work is already in place and such > edits can be error-prone, teach Git to perform the required configuration > change automatically instead. > > Instead of bailing out immediately when no promisor is available, make > the code perform a more precise check for any potential problems > (extensions became special in repository version 1, while it can have > any value in version 0, so upgrade should not happen if the repository > have an unsupported configuration that would render it invalid if we > upgraded). Upgrade from v0 to v1 must follow the more strict "no extension" rule, not "no unknown ones" rule, so the above description must be corrected. Perhaps like this? ... so upgrade from version 0 should not happen if the repository has ANY extension. A repository version 1 and later make Git fail if there is any unknown extension, so we need to fail an upgrade only if there is any extension that is unknown to us). You can drop the second paragraph about upgrading from version 1 to a later version if you want, as the only interesting use cases in practice at this point are upgrading from v0 to v1 and staying at v1. > Signed-off-by: Xin Li <delphij@google.com> > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> I think the updated design looks good. Let's nitpick some styles ;-) > diff --git a/builtin/fetch.c b/builtin/fetch.c > index b5788c16bf..3347d578ea 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -1790,9 +1790,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) > if (depth || deepen_since || deepen_not.nr) > deepen = 1; > > - if (filter_options.choice && !has_promisor_remote()) > - die("--filter can only be used when extensions.partialClone is set"); > - > if (all) { > if (argc == 1) > die(_("fetch --all does not take a repository argument")); > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c > index 95d0882417..95669815d4 100644 > --- a/builtin/sparse-checkout.c > +++ b/builtin/sparse-checkout.c > @@ -249,6 +249,8 @@ static int set_config(enum sparse_checkout_mode mode) > { > const char *config_path; > > + if (upgrade_repository_format(1) < 0) > + die(_("unable to upgrade repository format to enable worktreeConfig")); > if (git_config_set_gently("extensions.worktreeConfig", "true")) { > error(_("failed to set extensions.worktreeConfig setting")); > return 1; OK. > diff --git a/cache.h b/cache.h > index 0f0485ecfe..66dcd2f219 100644 > --- a/cache.h > +++ b/cache.h > @@ -1042,6 +1042,7 @@ struct repository_format { > int worktree_config; > int is_bare; > int hash_algo; > + int has_extensions; > char *work_tree; > struct string_list unknown_extensions; > }; > @@ -1056,6 +1057,7 @@ struct repository_format { > .version = -1, \ > .is_bare = -1, \ > .hash_algo = GIT_HASH_SHA1, \ > + .has_extensions = 0, \ I am on the fence between "explicitly initializing to zero value is pointless, especially when we use .designated_initializer" and "especially with .designated_initializer, it adds a documentation value to explicitly initialize a field to its zero value". Unless other reviewers weigh in, I am OK to let this stand as-is. > .unknown_extensions = STRING_LIST_INIT_DUP, \ > } > > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > index 256bcfbdfe..3553ad7b0a 100644 > --- a/list-objects-filter-options.c > +++ b/list-objects-filter-options.c > @@ -326,7 +326,8 @@ void partial_clone_register( > > /* Check if it is already registered */ > if (!promisor_remote_find(remote)) { > - git_config_set("core.repositoryformatversion", "1"); > + if (upgrade_repository_format(1) < 0) > + die(_("unable to upgrade repository format to support partial clone")); OK. > diff --git a/repository.h b/repository.h > index 6534fbb7b3..40cc12c7cf 100644 > --- a/repository.h > +++ b/repository.h > @@ -196,4 +196,10 @@ void repo_update_index_if_able(struct repository *, struct lock_file *); > > void prepare_repo_settings(struct repository *r); > > +/* > + * Return 1 if upgrade repository format to target_version succeeded, > + * 0 if no upgrade is necessary; returns -1 when upgrade is not possible. > + */ Do we want to start with "Return" but say "returns" later? Return 1 if ..., 0 if ..., and -1 when upgrade is not possible. > +int upgrade_repository_format(int target_version); > + > #endif /* REPOSITORY_H */ > +int upgrade_repository_format(int target_version) > +{ > + struct strbuf sb = STRBUF_INIT; > + struct strbuf err = STRBUF_INIT; > + struct strbuf repo_version = STRBUF_INIT; > + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; > + > + strbuf_git_common_path(&sb, the_repository, "config"); > + read_repository_format(&repo_fmt, sb.buf); > + strbuf_release(&sb); > + > + if (repo_fmt.version >= target_version) > + return 0; OK. That's "already up-to-date" case. > + if (verify_repository_format_eligibility(&repo_fmt, &err, > + target_version) < 0) { I do not think this _eligibility thing should be a separate helper function. One reason is that its name sounds nonsensical ("eligible for what? it deserives to be verified for its repo format?"), another is it makes it unclear what "upgrade" requires by hiding the logic inside that decides the eligibility for upgrading. Besides, there is only one callsite. Open-coding the gist of the helper like this: if (verify_repository_format(&repo_fmt, &err) < 0 || (!repo_fmt.version && repo_fmt.has_extensions)) { should make it a lot clearer to see. If the repository is unusable by the version of Git we are running already, or the repository is v0 and has configuration variable(s) in "extensions.*" section, we refuse to upgrade. Which is slightly different from what you did with the three-way split of verify_repository_format(), which made the "eligibility" thing not to care about unknown extensions in a repository v1 and higher. I actually think we should refuse to update v1 or v2 repository to v3 with a running Git that knows only about v1 (i.e. the repository before upgrading may or may not be something we understand, and if we do not understand it, we shouldn't touch it). > + warning("unable to upgrade repository format from %d to %d: %s", > + repo_fmt.version, target_version, err.buf); > + strbuf_release(&err); > + return -1; > + } > + ... And with the suggested change to eliminate "eligibility" helper, none of the changes below would become necessary, I would think, so I won't say things like "we do not say 'if (result != 0)'; instead we just say 'if (result)'" ;-) Thanks.
(+cc: Jonathan Tan, partial clone expert) Xin Li wrote: > Subject: fetch: allow adding a filter after initial clone. nit: should be no period at the end of the subject line > Signed-off-by: Xin Li <delphij@google.com> > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Let's see whether I still stand by that. ;-) [...] > --- a/setup.c > +++ b/setup.c > @@ -13,6 +13,9 @@ static int work_tree_config_is_bogus; > static struct startup_info the_startup_info; > struct startup_info *startup_info = &the_startup_info; > > +static int verify_repository_format_eligibility(const struct repository_format *, > + struct strbuf *, int); Odd wrapping, but after Junio's suggestion this goes away. [...] > + warning("unable to upgrade repository format from %d to %d: %s", > + repo_fmt.version, target_version, err.buf); Same wrapping nit: this should use a tab to line up right after the parenthesis (Git uses 8-space tabs). [...] > --- a/t/t1090-sparse-checkout-scope.sh > +++ b/t/t1090-sparse-checkout-scope.sh > @@ -63,7 +63,6 @@ test_expect_success 'in partial clone, sparse checkout only fetches needed blobs > git -C server commit -m message && > > test_config -C client core.sparsecheckout 1 && > - test_config -C client extensions.partialclone origin && Nice. > echo "!/*" >client/.git/info/sparse-checkout && > echo "/a" >>client/.git/info/sparse-checkout && > git -C client fetch --filter=blob:none origin && > diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh > index 286121d8de..9536d10919 100755 > --- a/t/t2404-worktree-config.sh > +++ b/t/t2404-worktree-config.sh > @@ -23,8 +23,10 @@ test_expect_success 'config --worktree without extension' ' > ' > > test_expect_success 'enable worktreeConfig extension' ' > + git config core.repositoryformatversion 1 && > git config extensions.worktreeConfig true && Yes, makes sense. Does this patch need it, or could this go in a separate patch? > - test_cmp_config true extensions.worktreeConfig > + test_cmp_config true extensions.worktreeConfig && > + test_cmp_config 1 core.repositoryformatversion This (both the existing code and the modified version) is strange: we just set the config, so why are we checking it? [...] > --- a/t/t5500-fetch-pack.sh > +++ b/t/t5500-fetch-pack.sh > @@ -999,7 +999,6 @@ fetch_filter_blob_limit_zero () { > test_config -C "$SERVER" uploadpack.allowfilter 1 && > > git clone "$URL" client && > - test_config -C client extensions.partialclone origin && > Nice. [...] > --- a/t/t5702-protocol-v2.sh > +++ b/t/t5702-protocol-v2.sh > @@ -348,7 +348,6 @@ test_expect_success 'partial fetch' ' > rm -rf client "$(pwd)/trace" && > git init client && > SERVER="file://$(pwd)/server" && > - test_config -C client extensions.partialClone "$SERVER" && > > GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \ > fetch --filter=blob:none "$SERVER" master:refs/heads/other && This test is a bit unusual, since it's fetching by URL instead of from a remote. Looks like it comes from commit ba95710a3bdcb2a80495b1d93a0e482dd69905e1 Author: Jonathan Tan <jonathantanmy@google.com> Date: Thu May 3 16:46:56 2018 -0700 {fetch,upload}-pack: support filter in protocol v2 Jonathan, thoughts about this one? Is making it set extensions.partialClone implicitly like the change above a good change? With the nits above and the bits Junio described addressed, this would indeed be Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Thanks.
Regarding t/t2404-worktree-config.sh: On Thu, May 28, 2020 at 6:01 PM Jonathan Nieder <jrnieder@gmail.com> wrote: > Xin Li wrote: > > > diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh > > index 286121d8de..9536d10919 100755 > > --- a/t/t2404-worktree-config.sh > > +++ b/t/t2404-worktree-config.sh > > @@ -23,8 +23,10 @@ test_expect_success 'config --worktree without extension' ' > > ' > > > > test_expect_success 'enable worktreeConfig extension' ' > > + git config core.repositoryformatversion 1 && > > git config extensions.worktreeConfig true && > > Yes, makes sense. Does this patch need it, or could this go in a > separate patch? Yes, this patch needs setting repositoryformatversion to 1 as we would no longer recognize extensions.worktreeConfig=true on version 0 repositories. > > - test_cmp_config true extensions.worktreeConfig > > + test_cmp_config true extensions.worktreeConfig && > > + test_cmp_config 1 core.repositoryformatversion > > This (both the existing code and the modified version) is strange: we > just set the config, so why are we checking it? The check was mainly to match the existing pattern (which sets extensions.worktreeConfig and immediately asserts that they were set). These assertions are not strictly necessary but are harmless, so I don't feel strongly about keeping or removing them.
Junio C Hamano <gitster@pobox.com> writes: > Which is slightly different from what you did with the three-way > split of verify_repository_format(), which made the "eligibility" > thing not to care about unknown extensions in a repository v1 and > higher. I actually think we should refuse to update v1 or v2 > repository to v3 with a running Git that knows only about v1 > (i.e. the repository before upgrading may or may not be something we > understand, and if we do not understand it, we shouldn't touch it). It does not change the conclusion, but I think the above sample situation would not make much sense---a caller that asks this function to upgrade the repository to v3 when the version of Git it is linked in does not understand v3 is simply buggy. But we should still refuse to update v1 to v2 with a version of Git that understands v2 if the repository has some extension that we do not know about, so (1) if upgrading from v0, there must be no "extensions.*"; and (2) if upgrading from other versions, there must be no "extensions.*" we do not recognise. I suggested would still be the reasonably defensive rule. Thanks.
Jonathan Nieder <jrnieder@gmail.com> writes: >> diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh >> index 286121d8de..9536d10919 100755 >> --- a/t/t2404-worktree-config.sh >> +++ b/t/t2404-worktree-config.sh >> @@ -23,8 +23,10 @@ test_expect_success 'config --worktree without extension' ' >> ' >> >> test_expect_success 'enable worktreeConfig extension' ' >> + git config core.repositoryformatversion 1 && >> git config extensions.worktreeConfig true && > > Yes, makes sense. Does this patch need it, or could this go in a > separate patch? I think it is the consequence of unrelated change in this hunk: @@ -506,9 +510,15 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ die("%s", err.buf); } - repository_format_precious_objects = candidate->precious_objects; - set_repository_format_partial_clone(candidate->partial_clone); - repository_format_worktree_config = candidate->worktree_config; + if (candidate->version >= 1) { + repository_format_precious_objects = candidate->precious_objects; + set_repository_format_partial_clone(candidate->partial_clone); + repository_format_worktree_config = candidate->worktree_config; + } else { + repository_format_precious_objects = 0; + set_repository_format_partial_clone(NULL); + repository_format_worktree_config = 0; + } string_list_clear(&candidate->unknown_extensions, 0); if (repository_format_worktree_config) { We used to honor extensions.*, as long as the version of Git understand it, even in a repository whose version is still v0. I am not sure if this backward incompatible change is necessary for the purpose of "allow safe upgrade of repository format version" topic, but as you hinted, it does smell like it belongs to a separate (and much larger and potentially controversial) patch. Thanks for carefully reading.
diff --git a/builtin/fetch.c b/builtin/fetch.c index b5788c16bf..3347d578ea 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1790,9 +1790,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (depth || deepen_since || deepen_not.nr) deepen = 1; - if (filter_options.choice && !has_promisor_remote()) - die("--filter can only be used when extensions.partialClone is set"); - if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 95d0882417..95669815d4 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -249,6 +249,8 @@ static int set_config(enum sparse_checkout_mode mode) { const char *config_path; + if (upgrade_repository_format(1) < 0) + die(_("unable to upgrade repository format to enable worktreeConfig")); if (git_config_set_gently("extensions.worktreeConfig", "true")) { error(_("failed to set extensions.worktreeConfig setting")); return 1; diff --git a/cache.h b/cache.h index 0f0485ecfe..66dcd2f219 100644 --- a/cache.h +++ b/cache.h @@ -1042,6 +1042,7 @@ struct repository_format { int worktree_config; int is_bare; int hash_algo; + int has_extensions; char *work_tree; struct string_list unknown_extensions; }; @@ -1056,6 +1057,7 @@ struct repository_format { .version = -1, \ .is_bare = -1, \ .hash_algo = GIT_HASH_SHA1, \ + .has_extensions = 0, \ .unknown_extensions = STRING_LIST_INIT_DUP, \ } diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 256bcfbdfe..3553ad7b0a 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -326,7 +326,8 @@ void partial_clone_register( /* Check if it is already registered */ if (!promisor_remote_find(remote)) { - git_config_set("core.repositoryformatversion", "1"); + if (upgrade_repository_format(1) < 0) + die(_("unable to upgrade repository format to support partial clone")); /* Add promisor config for the remote */ cfg_name = xstrfmt("remote.%s.promisor", remote); diff --git a/repository.h b/repository.h index 6534fbb7b3..40cc12c7cf 100644 --- a/repository.h +++ b/repository.h @@ -196,4 +196,10 @@ void repo_update_index_if_able(struct repository *, struct lock_file *); void prepare_repo_settings(struct repository *r); +/* + * Return 1 if upgrade repository format to target_version succeeded, + * 0 if no upgrade is necessary; returns -1 when upgrade is not possible. + */ +int upgrade_repository_format(int target_version); + #endif /* REPOSITORY_H */ diff --git a/setup.c b/setup.c index 65fe5ecefb..0759e9f8f9 100644 --- a/setup.c +++ b/setup.c @@ -13,6 +13,9 @@ static int work_tree_config_is_bogus; static struct startup_info the_startup_info; struct startup_info *startup_info = &the_startup_info; +static int verify_repository_format_eligibility(const struct repository_format *, + struct strbuf *, int); + /* * The input parameter must contain an absolute path, and it must already be * normalized. @@ -455,6 +458,7 @@ static int check_repo_format(const char *var, const char *value, void *vdata) if (strcmp(var, "core.repositoryformatversion") == 0) data->version = git_config_int(var, value); else if (skip_prefix(var, "extensions.", &ext)) { + data->has_extensions = 1; /* * record any known extensions here; otherwise, * we fall through to recording it as unknown, and @@ -506,9 +510,15 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ die("%s", err.buf); } - repository_format_precious_objects = candidate->precious_objects; - set_repository_format_partial_clone(candidate->partial_clone); - repository_format_worktree_config = candidate->worktree_config; + if (candidate->version >= 1) { + repository_format_precious_objects = candidate->precious_objects; + set_repository_format_partial_clone(candidate->partial_clone); + repository_format_worktree_config = candidate->worktree_config; + } else { + repository_format_precious_objects = 0; + set_repository_format_partial_clone(NULL); + repository_format_worktree_config = 0; + } string_list_clear(&candidate->unknown_extensions, 0); if (repository_format_worktree_config) { @@ -538,6 +548,34 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ return 0; } +int upgrade_repository_format(int target_version) +{ + struct strbuf sb = STRBUF_INIT; + struct strbuf err = STRBUF_INIT; + struct strbuf repo_version = STRBUF_INIT; + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; + + strbuf_git_common_path(&sb, the_repository, "config"); + read_repository_format(&repo_fmt, sb.buf); + strbuf_release(&sb); + + if (repo_fmt.version >= target_version) + return 0; + + if (verify_repository_format_eligibility(&repo_fmt, &err, + target_version) < 0) { + warning("unable to upgrade repository format from %d to %d: %s", + repo_fmt.version, target_version, err.buf); + strbuf_release(&err); + return -1; + } + + strbuf_addf(&repo_version, "%d", target_version); + git_config_set("core.repositoryformatversion", repo_version.buf); + strbuf_release(&repo_version); + return 1; +} + static void init_repository_format(struct repository_format *format) { const struct repository_format fresh = REPOSITORY_FORMAT_INIT; @@ -562,7 +600,7 @@ void clear_repository_format(struct repository_format *format) init_repository_format(format); } -int verify_repository_format(const struct repository_format *format, +static int verify_repository_format_version(const struct repository_format *format, struct strbuf *err) { if (GIT_REPO_VERSION_READ < format->version) { @@ -571,6 +609,18 @@ int verify_repository_format(const struct repository_format *format, return -1; } + return 0; +} + +int verify_repository_format(const struct repository_format *format, + struct strbuf *err) +{ + int result; + + result = verify_repository_format_version(format, err); + if (result != 0) + return (result); + if (format->version >= 1 && format->unknown_extensions.nr) { int i; @@ -585,6 +635,25 @@ int verify_repository_format(const struct repository_format *format, return 0; } +static int verify_repository_format_eligibility(const struct repository_format + *format, struct strbuf *err, int target_version) +{ + int result; + + result = verify_repository_format_version(format, err); + if (result != 0) + return (result); + + if (format->version <= 0 && format->has_extensions && + target_version >= 1) { + strbuf_addf(err, _("extensions found for repository version %d"), + format->version); + return -1; + } + + return 0; +} + void read_gitfile_error_die(int error_code, const char *path, const char *dir) { switch (error_code) { diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index a3988bd4b8..463dc3a8be 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -30,6 +30,29 @@ test_expect_success 'extensions.partialclone without filter' ' git -C client fetch origin ' +test_expect_success 'convert shallow clone to partial clone' ' + rm -fr server client && + test_create_repo server && + test_commit -C server my_commit 1 && + test_commit -C server my_commit2 1 && + git clone --depth=1 "file://$(pwd)/server" client && + git -C client fetch --unshallow --filter="blob:none" && + test_cmp_config -C client true remote.origin.promisor && + test_cmp_config -C client blob:none remote.origin.partialclonefilter && + test_cmp_config -C client 1 core.repositoryformatversion +' + +test_expect_success 'convert shallow clone to partial clone must fail with any extension' ' + rm -fr server client && + test_create_repo server && + test_commit -C server my_commit 1 && + test_commit -C server my_commit2 1 && + git clone --depth=1 "file://$(pwd)/server" client && + test_cmp_config -C client 0 core.repositoryformatversion && + git -C client config extensions.partialclone origin && + test_must_fail git -C client fetch --unshallow --filter="blob:none" +' + test_expect_success 'missing reflog object, but promised by a commit, passes fsck' ' rm -rf repo && test_create_repo repo && diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh index 40cc004326..f35a73dd20 100755 --- a/t/t1090-sparse-checkout-scope.sh +++ b/t/t1090-sparse-checkout-scope.sh @@ -63,7 +63,6 @@ test_expect_success 'in partial clone, sparse checkout only fetches needed blobs git -C server commit -m message && test_config -C client core.sparsecheckout 1 && - test_config -C client extensions.partialclone origin && echo "!/*" >client/.git/info/sparse-checkout && echo "/a" >>client/.git/info/sparse-checkout && git -C client fetch --filter=blob:none origin && diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh index 286121d8de..9536d10919 100755 --- a/t/t2404-worktree-config.sh +++ b/t/t2404-worktree-config.sh @@ -23,8 +23,10 @@ test_expect_success 'config --worktree without extension' ' ' test_expect_success 'enable worktreeConfig extension' ' + git config core.repositoryformatversion 1 && git config extensions.worktreeConfig true && - test_cmp_config true extensions.worktreeConfig + test_cmp_config true extensions.worktreeConfig && + test_cmp_config 1 core.repositoryformatversion ' test_expect_success 'config is shared as before' ' diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 8c54e34ef1..0f5ff25179 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -999,7 +999,6 @@ fetch_filter_blob_limit_zero () { test_config -C "$SERVER" uploadpack.allowfilter 1 && git clone "$URL" client && - test_config -C client extensions.partialclone origin && test_commit -C "$SERVER" two && diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index 5039e66dc4..8b27fad6cd 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -348,7 +348,6 @@ test_expect_success 'partial fetch' ' rm -rf client "$(pwd)/trace" && git init client && SERVER="file://$(pwd)/server" && - test_config -C client extensions.partialClone "$SERVER" && GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \ fetch --filter=blob:none "$SERVER" master:refs/heads/other &&