Message ID | 20241219-toon-clone-refs-v3-1-1484faea3008@iotcl.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] builtin/clone: teach git-clone(1) the --revision= option | expand |
Toon Claes <toon@iotcl.com> writes: > The git-clone(1) command has the option `--branch` that allows the user > to select the branch they want HEAD to point to. In a non-bare > repository this also checks out that branch. > > Option `--branch` also accepts a tag. When a tag name is provided, the > commit this tag points to is checked out and HEAD is detached. Thus > `--branch` can be used to clone a repository and check out a ref kept > under `refs/heads` or `refs/tags`. But some other refs might be in use > as well. For example Git forges might use refs like `refs/pull/<id>` and > `refs/merge-requests/<id>` to track pull/merge requests. These refs > cannot be selected upon git-clone(1). > > Add option `--revision` to git-clone(1). This option accepts a fully > qualified reference, or a raw commit hash. This enables the user to > clone and check out any revision they want. `--revision` can be used in > conjunction with `--depth` to do a minimal clone that only contains the > sources for a single revision. This can be useful for automated tests > running in CI systems. I somehow suspect that you want the mental model your users form for this new feature to be "--single-branch --branch=<branch>" mode of "git clone", not just "--branch=<branch>". This is especially true when you bring up the possibility of the "--depth" option. The difference between "git clone" and "git init" followed by "git fetch" is not just "the user does not have to type two commands if the feature is inside clone". That is merely a superficial difference, and the real value is that "clone" leaves configuration to make further work easier. For the "--branch" option the user's mental model for this new feature is based on, "git clone" gives the usual fetch pathspec and the branch.merge configuration to make it build on top of the single remote branch we cloned, i.e. you'd get [remote "origin"] url = $URL fetch = +refs/heads/*:refs/remotes/origin/* [branch "next"] remote = origin merge = refs/heads/next with "git clone --branch=next $URL". But after "git clone --single-branch --branch=next $URL", you'd get [remote "origin"] url = $URL fetch = +refs/heads/next:refs/remotes/origin/next [branch "next"] remote = origin merge = refs/heads/next After all, the user wants to work with this single branch, so we only dedicate a single remote-tracking branch for that and ignore everything else. With that line of thought in mind, let's read on with this question in mind: what configuration should this mode of "git clone" leave, to make further work in the resulting repository easier? > This type of shallow clone could also be achieved with the following set > of commands: > > git init the-repo > cd ./the-repo OK. > git remote add origin <url> Not a great idea, as it is likely to leave remote.origin.fetch refspec that is overly wide. Compared to "--single-branch --branch" case, I suspect that the resulting configuration should just record remote.origin.url but not remote.origin.fetch or branch.*.* at all, or something like that, to make sure we won't track any refs from there. > git fetch --depth=1 origin <commit-id> > git checkout <commit-id> OK, but "git checkout --detach <commit>" would make it clear that we are not getting any local branch. > Adding this new option to git-clone(1) simplifies this not uncommon > use-case. And besides simplifying this, it enables the use of > git-clone(1) over git-fetch(1). This is beneficial in case bundle URIs > are available on the server. Bundle URIs are only used on clone, not on > fetch, so using allowing the user to use git-clone(1) here makes them > benefit from bundle URIs if advertised by the server. That reasoning goes backwards. If this wants to be really a one-time "git fetch", but the lack of features like bundleURI support in it makes your choice of tool unworkable, we should be adding the support to your chosen tool, "git fetch". As I already said, we should sell "clone" not for the stupid "the user do not have to do init and fetch separately" reason, but "we can tell what workflow is expected from the command line options given to 'git clone', and configure the resulting repository appropriately, which is not something init+fetch cannot do". And when you sell it that way, bundleURI becomes totally a non-issue (it becomes a mere bonus cherry-on-top). > diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt > index de8d8f58930ecff305f79480b13ddce10cd96c60..67498dae7c7d0315c7026b4ca2e822e48dcb7479 100644 > --- a/Documentation/git-clone.txt > +++ b/Documentation/git-clone.txt > @@ -221,6 +221,14 @@ objects from the source repository into a pack in the cloned repository. > `--branch` can also take tags and detaches the `HEAD` at that commit > in the resulting repository. > > +`--revision` _<rev>_:: > + This clones the given revision, and that revision only. "that revision only" sounds like this option automatically forces "--depth=1", but I do not think that is what we want. Create a new repository, and fetch the history leading to the given revision _<rev>_ (and nothing else), without making any remote-tracking branch, and without making any local branch, and point `HEAD` to _<rev_>. When creating a non-bare repository, the revision is checked out on a detached `HEAD`. or something? > + The argument can > + be a ref name (e.g. `refs/heads/main`) that peels down to a commit, or a > + raw commit hash. "raw" -> "hexadecimal", "hash" -> "object name", but otherwise looks good. A ref-name may have an example for tag as well, i.e. (e.g., `refs/heads/main` or `refs/tags/v1.0`) > + The given revision is checked out, and for any revision other than a > + branch (i.e. ref starting with `refs/heads/`), the HEAD is detached. "--revision refs/heads/master~4" start with "refs/heads/" but I do not think you want to special case it. You'd want to always detach for consistency and simplicity. If they want branch, the will say "--single-branch --branch" instead of "--revision". > + This option is incompatible with `--branch`, `--mirror`, and `--bare`. How does this intarct with "--single-branch"? It is obvious and natural to ignore "--single-branch" as you are not going to create any local or remote-tracking branches, but should this be made incompatible with "--no-single-branch"? I dunno. > diff --git a/builtin/clone.c b/builtin/clone.c > index 21721db28a12864621d27d2c70ee5c2598cccc0f..391757e5260a902a87bcf3b435fe39c6cd841b3b 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -69,6 +69,7 @@ static char *option_template, *option_depth, *option_since; > static char *option_origin = NULL; > static char *remote_name = NULL; > static char *option_branch = NULL; > +static char *option_rev = NULL; > static struct string_list option_not = STRING_LIST_INIT_NODUP; > static const char *real_git_dir; > static const char *ref_format; > @@ -141,6 +142,8 @@ static struct option builtin_clone_options[] = { > N_("use <name> instead of 'origin' to track upstream")), > OPT_STRING('b', "branch", &option_branch, N_("branch"), > N_("checkout <branch> instead of the remote's HEAD")), > + OPT_STRING(0, "revision", &option_rev, N_("rev"), > + N_("clone single revision <rev> and check out")), > OPT_STRING('u', "upload-pack", &option_upload_pack, N_("path"), > N_("path to git-upload-pack on the remote")), > OPT_STRING(0, "depth", &option_depth, N_("depth"), > @@ -684,6 +687,9 @@ static void update_head(const struct ref *our, const struct ref *remote, > } else if (our) { > struct commit *c = lookup_commit_reference(the_repository, > &our->old_oid); > + if (!c) > + die(_("unable to update HEAD")); This new error condition becomes necessary, because you have no way to verify the value given to --revision until you see it here, which is understandable. What are the failure modes and causes? Can we give a bit more useful diagnostics, like - the named object did not exist on the remote repository at all - failed to fetch the named object from the remote repository - fetched the named object and it turns out not to be a committish - something else? > @@ -968,7 +974,7 @@ int cmd_clone(int argc, > char *repo_to_free = NULL; > char *path = NULL, *dir, *display_repo = NULL; > int dest_exists, real_dest_exists = 0; > - const struct ref *refs, *remote_head; > + const struct ref *refs = NULL, *remote_head; > struct ref *remote_head_points_at = NULL; > const struct ref *our_head_points_at; > char *unborn_head = NULL; Offhand I do not see why this hunk is needed. > diff --git a/parse-options.h b/parse-options.h > index f0801d4532a175b65783689f2a68fb56da2c8e87..72c62311b61f46152d66bcba9328de59fd300df9 100644 > --- a/parse-options.h > +++ b/parse-options.h > @@ -420,6 +420,15 @@ static inline void die_for_incompatible_opt3(int opt1, const char *opt1_name, > 0, ""); > } > > +static inline void die_for_incompatible_opt2(int opt1, const char *opt1_name, > + int opt2, const char *opt2_name) > +{ > + die_for_incompatible_opt4(opt1, opt1_name, > + opt2, opt2_name, > + 0, "", > + 0, ""); > +} > + Funny that we had 4 and 3 but did not need 2 until now ;-) > diff --git a/t/t5620-clone-revision.sh b/t/t5620-clone-revision.sh There already is a topic in flight that uses 5620 for its new test. > @@ -0,0 +1,101 @@ > +#!/bin/sh > + > +test_description='tests for git clone --revision' > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME > +TEST_PASSES_SANITIZE_LEAK=true > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + test_commit --no-tag "initial commit" README "Hello" && > + test_commit --annotate "second commit" README "Hello world" v1.0 && > + test_commit --no-tag "third commit" README "Hello world!" && > + git switch -c feature v1.0 && > + test_commit --no-tag "feature commit" README "Hello world!" && > + git switch main > +' > +test_expect_success 'clone with --revision being a branch' ' > + test_when_finished "rm -rf dst" && > + git clone --revision=refs/heads/feature . dst && > + git rev-parse refs/heads/feature >expect && > + git -C dst rev-parse HEAD >actual && > + test_cmp expect actual && This validates that the checked out commit is the same named commit, but does not see whether we are on a detached HEAD or not. Even after we change the implementation to always detach, up to this point the test will pass (I am not saying it is great. I am saying that the test is too loose to etch the desired behaviour in stone). The next line, however, expects that we create a local branch for it (the design I do not necessarily agree with---if this is wanted, the user can already use --branch with --single-branch to do so). > + git for-each-ref refs/heads/feature > expect && > + git -C dst for-each-ref > actual && Style: no spaces between redirection operator and its target file. > + test_cmp expect actual > +' We should check the configuration in the resulting repository. If we do not care what configuration "clone" leaves, we are not getting the benefit of using it (you can just do init + fetch + checkout instead). This applies to all the remaining tests, so I won't repeat. > +test_expect_success 'clone with --depth and --revision being a branch' ' > + test_when_finished "rm -rf dst" && > + git clone --depth=1 --revision=refs/heads/feature . dst && > + git rev-parse refs/heads/feature >expect && > + git -C dst rev-parse HEAD >actual && > + test_cmp expect actual > +' We are not validating if the depth did its task correctly here. We do not test if the resulting repository has it checked out on the detached HEAD. We do not test if the resulting repository has the local branch of the same name. > +test_expect_success 'clone with --revision being a tag' ' > + test_when_finished "rm -rf dst" && > + git clone --revision=refs/tags/v1.0 . dst && > + git rev-parse refs/tags/v1.0^{} >expect && > + git -C dst rev-parse HEAD >actual && > + test_cmp expect actual > +' It is a good idea to verify an annotated tag, as you never want to write anything but a commit in HEAD. Again, the tests are too loose to validate anything important. > +test_expect_success 'clone with --revision being HEAD' ' > + test_when_finished "rm -rf dst" && > + git clone --revision=HEAD . dst && > + git rev-parse HEAD >expect && > + git -C dst rev-parse HEAD >actual && > + test_cmp expect actual > +' Ditto. I presume that this wants to detach and without any local or remote-tracking branches left? > +test_expect_success 'clone with --revision being a raw commit hash' ' > + test_when_finished "rm -rf dst" && > + oid=$(git rev-parse refs/heads/feature) && > + git clone --revision=$oid . dst && > + echo $oid >expect && > + git -C dst rev-parse HEAD >actual && > + test_cmp expect actual > +' Ditto. It is common across many of the tests that they are too loose and do not check much beyond the value of HEAD, which wants corrected. My recommendation for the behaviour is for all cases they do not touch either local or remote-tracking branches and always detach HEAD at the commit (or fail if a non-committish was given), which means the all the tests (other than the ones that makes sure the command fails under some conditions) should check that the HEAD is detached at the expected commit and no local or remote-tracking branches created. This is shared among the rest of the test, so I won't repeat. Thanks.
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt index de8d8f58930ecff305f79480b13ddce10cd96c60..67498dae7c7d0315c7026b4ca2e822e48dcb7479 100644 --- a/Documentation/git-clone.txt +++ b/Documentation/git-clone.txt @@ -221,6 +221,14 @@ objects from the source repository into a pack in the cloned repository. `--branch` can also take tags and detaches the `HEAD` at that commit in the resulting repository. +`--revision` _<rev>_:: + This clones the given revision, and that revision only. The argument can + be a ref name (e.g. `refs/heads/main`) that peels down to a commit, or a + raw commit hash. + The given revision is checked out, and for any revision other than a + branch (i.e. ref starting with `refs/heads/`), the HEAD is detached. + This option is incompatible with `--branch`, `--mirror`, and `--bare`. + `-u` _<upload-pack>_:: `--upload-pack` _<upload-pack>_:: When given, and the repository to clone from is accessed diff --git a/builtin/clone.c b/builtin/clone.c index 21721db28a12864621d27d2c70ee5c2598cccc0f..391757e5260a902a87bcf3b435fe39c6cd841b3b 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -69,6 +69,7 @@ static char *option_template, *option_depth, *option_since; static char *option_origin = NULL; static char *remote_name = NULL; static char *option_branch = NULL; +static char *option_rev = NULL; static struct string_list option_not = STRING_LIST_INIT_NODUP; static const char *real_git_dir; static const char *ref_format; @@ -141,6 +142,8 @@ static struct option builtin_clone_options[] = { N_("use <name> instead of 'origin' to track upstream")), OPT_STRING('b', "branch", &option_branch, N_("branch"), N_("checkout <branch> instead of the remote's HEAD")), + OPT_STRING(0, "revision", &option_rev, N_("rev"), + N_("clone single revision <rev> and check out")), OPT_STRING('u', "upload-pack", &option_upload_pack, N_("path"), N_("path to git-upload-pack on the remote")), OPT_STRING(0, "depth", &option_depth, N_("depth"), @@ -684,6 +687,9 @@ static void update_head(const struct ref *our, const struct ref *remote, } else if (our) { struct commit *c = lookup_commit_reference(the_repository, &our->old_oid); + if (!c) + die(_("unable to update HEAD")); + /* --branch specifies a non-branch (i.e. tags), detach HEAD */ refs_update_ref(get_main_ref_store(the_repository), msg, "HEAD", &c->object.oid, NULL, REF_NO_DEREF, @@ -889,7 +895,7 @@ static void write_refspec_config(const char *src_ref_prefix, struct strbuf key = STRBUF_INIT; struct strbuf value = STRBUF_INIT; - if (option_mirror || !option_bare) { + if (!option_rev && (option_mirror || !option_bare)) { if (option_single_branch && !option_mirror) { if (option_branch) { if (starts_with(our_head_points_at->name, "refs/tags/")) @@ -968,7 +974,7 @@ int cmd_clone(int argc, char *repo_to_free = NULL; char *path = NULL, *dir, *display_repo = NULL; int dest_exists, real_dest_exists = 0; - const struct ref *refs, *remote_head; + const struct ref *refs = NULL, *remote_head; struct ref *remote_head_points_at = NULL; const struct ref *our_head_points_at; char *unborn_head = NULL; @@ -1345,6 +1351,11 @@ int cmd_clone(int argc, transport_set_option(transport, TRANS_OPT_KEEP, "yes"); + die_for_incompatible_opt2(!!option_rev, "--revision", + !!option_branch, "--branch"); + die_for_incompatible_opt2(!!option_rev, "--revision", + option_mirror, "--mirror"); + if (reject_shallow) transport_set_option(transport, TRANS_OPT_REJECT_SHALLOW, "1"); if (option_depth) @@ -1387,7 +1398,27 @@ int cmd_clone(int argc, strvec_push(&transport_ls_refs_options.ref_prefixes, "refs/tags/"); - refs = transport_get_remote_refs(transport, &transport_ls_refs_options); + if (option_rev) { + refspec_clear(&remote->fetch); + refspec_init(&remote->fetch, REFSPEC_FETCH); + refspec_append(&remote->fetch, option_rev); + + strvec_clear(&transport_ls_refs_options.ref_prefixes); + refspec_ref_prefixes(&remote->fetch, + &transport_ls_refs_options.ref_prefixes); + + if (transport_ls_refs_options.ref_prefixes.nr == 0) + /* + * We need to talk to the server to determine the hash + * algorithm, but when no ref prefixes are set the + * server announces all known refs, so ask the server to + * only tell us about HEAD. + */ + strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD"); + } + + refs = transport_get_remote_refs(transport, + &transport_ls_refs_options); /* * Now that we know what algorithm the remote side is using, let's set @@ -1461,8 +1492,13 @@ int cmd_clone(int argc, } } - if (refs) + if (option_rev) { + struct ref **tail = &mapped_refs; + + get_fetch_map(refs, &remote->fetch.items[0], &tail, 1); + } else if (refs) { mapped_refs = wanted_peer_refs(refs, &remote->fetch); + } if (mapped_refs) { /* @@ -1495,6 +1531,11 @@ int cmd_clone(int argc, if (!our_head_points_at) die(_("Remote branch %s not found in upstream %s"), option_branch, remote_name); + } else if (option_rev) { + our_head_points_at = mapped_refs; + if (!our_head_points_at) + die(_("Remote revision %s not found in upstream %s"), + option_rev, remote_name); } else if (remote_head_points_at) { our_head_points_at = remote_head_points_at; } else if (remote_head) { diff --git a/parse-options.h b/parse-options.h index f0801d4532a175b65783689f2a68fb56da2c8e87..72c62311b61f46152d66bcba9328de59fd300df9 100644 --- a/parse-options.h +++ b/parse-options.h @@ -420,6 +420,15 @@ static inline void die_for_incompatible_opt3(int opt1, const char *opt1_name, 0, ""); } +static inline void die_for_incompatible_opt2(int opt1, const char *opt1_name, + int opt2, const char *opt2_name) +{ + die_for_incompatible_opt4(opt1, opt1_name, + opt2, opt2_name, + 0, "", + 0, ""); +} + /* * Use these assertions for callbacks that expect to be called with NONEG and * NOARG respectively, and do not otherwise handle the "unset" and "arg" diff --git a/t/meson.build b/t/meson.build index 13fe854ba0a18f9b83dbc48651f581198042ffd3..606fd6ff9d09f5c90c08ee4b9e108a5a04e187d9 100644 --- a/t/meson.build +++ b/t/meson.build @@ -721,6 +721,7 @@ integration_tests = [ 't5617-clone-submodules-remote.sh', 't5618-alternate-refs.sh', 't5619-clone-local-ambiguous-transport.sh', + 't5620-clone-revision.sh', 't5700-protocol-v1.sh', 't5701-git-serve.sh', 't5702-protocol-v2.sh', diff --git a/t/t5620-clone-revision.sh b/t/t5620-clone-revision.sh new file mode 100755 index 0000000000000000000000000000000000000000..92fe0b3f1c1268fc29e0e61d543167ea35845066 --- /dev/null +++ b/t/t5620-clone-revision.sh @@ -0,0 +1,101 @@ +#!/bin/sh + +test_description='tests for git clone --revision' +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'setup' ' + test_commit --no-tag "initial commit" README "Hello" && + test_commit --annotate "second commit" README "Hello world" v1.0 && + test_commit --no-tag "third commit" README "Hello world!" && + git switch -c feature v1.0 && + test_commit --no-tag "feature commit" README "Hello world!" && + git switch main +' + +test_expect_success 'clone with --revision being a branch' ' + test_when_finished "rm -rf dst" && + git clone --revision=refs/heads/feature . dst && + git rev-parse refs/heads/feature >expect && + git -C dst rev-parse HEAD >actual && + test_cmp expect actual && + git for-each-ref refs/heads/feature > expect && + git -C dst for-each-ref > actual && + test_cmp expect actual +' + +test_expect_success 'clone with --depth and --revision being a branch' ' + test_when_finished "rm -rf dst" && + git clone --depth=1 --revision=refs/heads/feature . dst && + git rev-parse refs/heads/feature >expect && + git -C dst rev-parse HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'clone with --revision being a tag' ' + test_when_finished "rm -rf dst" && + git clone --revision=refs/tags/v1.0 . dst && + git rev-parse refs/tags/v1.0^{} >expect && + git -C dst rev-parse HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'clone with --revision being HEAD' ' + test_when_finished "rm -rf dst" && + git clone --revision=HEAD . dst && + git rev-parse HEAD >expect && + git -C dst rev-parse HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'clone with --revision being a raw commit hash' ' + test_when_finished "rm -rf dst" && + oid=$(git rev-parse refs/heads/feature) && + git clone --revision=$oid . dst && + echo $oid >expect && + git -C dst rev-parse HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'clone with --revision and --bare' ' + test_when_finished "rm -rf dst" && + git clone --revision=refs/heads/main --bare . dst && + oid=$(git rev-parse refs/heads/main) && + git -C dst cat-file -t $oid > actual && + echo "commit" >expect && + test_cmp expect actual +' + +test_expect_success 'clone with --revision being a short raw commit hash' ' + test_when_finished "rm -rf dst" && + oid=$(git rev-parse --short refs/heads/feature) && + test_must_fail git clone --revision=$oid . dst 2>err && + test_grep "fatal: Remote revision $oid not found in upstream origin" err +' + +test_expect_success 'clone with --revision being a tree hash' ' + test_when_finished "rm -rf dst" && + oid=$(git rev-parse refs/heads/feature^{tree}) && + test_must_fail git clone --revision=$oid . dst 2>err && + test_grep "error: object $oid is a tree, not a commit" err +' + +test_expect_success 'clone with --revision being the parent of a ref fails' ' + test_when_finished "rm -rf dst" && + test_must_fail git clone --revision=refs/heads/main^ . dst +' + +test_expect_success 'clone with --revision and --branch fails' ' + test_when_finished "rm -rf dst" && + test_must_fail git clone --revision=refs/heads/main --branch=main . dst +' + +test_expect_success 'clone with --revision and --mirror fails' ' + test_when_finished "rm -rf dst" && + test_must_fail git clone --revision=refs/heads/main --mirror . dst +' + +test_done