diff mbox series

[v3] builtin/clone: teach git-clone(1) the --revision= option

Message ID 20241219-toon-clone-refs-v3-1-1484faea3008@iotcl.com (mailing list archive)
State New
Headers show
Series [v3] builtin/clone: teach git-clone(1) the --revision= option | expand

Commit Message

Toon Claes Dec. 19, 2024, 11:58 a.m. UTC
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.

Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).

Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a raw commit hash. This enables the user to
clone and check out any revision they want. `--revision` can be used in
conjunction with `--depth` to do a minimal clone that only contains the
sources for a single revision. This can be useful for automated tests
running in CI systems.

This type of shallow clone could also be achieved with the following set
of commands:

    git init the-repo
    cd ./the-repo
    git remote add origin <url>
    git fetch --depth=1 origin <commit-id>
    git checkout <commit-id>

Adding this new option to git-clone(1) simplifies this not uncommon
use-case. And besides simplifying this, it enables the use of
git-clone(1) over git-fetch(1). This is beneficial in case bundle URIs
are available on the server. Bundle URIs are only used on clone, not on
fetch, so using allowing the user to use git-clone(1) here makes them
benefit from bundle URIs if advertised by the server.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
I've submitted a similar change before[1], but this is a little bit of
an alternative approach. There was a request[2] to also make it possible
to clone a revision by a commit hash, so I've reworked the patch to
enable that. Because of this I've renamed the option from `--ref` to
`--revision`.

In the previous patch the code was heavily intertwined with handling
`--single-branch`, `--branch`, and `--no-tags`. This make the code a lot
more complicated. In this version of the patch I've made the new option
incompatible with those and added a separate code path to deal with it
specifically.

[1]: https://lore.kernel.org/git/20240927085438.1010431-1-toon@iotcl.com/
[2]: https://lore.kernel.org/git/ZzNJGHMlxGQyFV_c@kitsune.suse.cz/
---
Changes in v3:
- Fail early when the revision was not found on the remote, instead of
  creating a clone that's in an invalid state.
- State more clearly in the commit message adding this option is useful
  for a not uncommon use-case.
- Be explicit in the documentation the ref needs to peel down to a
  commit.
- Die in case we try to update_head() to an object that's not a commit.
- Allow combining `--revision` with `--bare`.
- Add die_for_incompatible_opt2() to parse-options.h and use it for the
  options that are not compatible with the new `--revision` option.
- Small tweaks to the added tests.
- Small touchups on commit messages.
- Link to v2: https://lore.kernel.org/r/20241129-toon-clone-refs-v2-1-dca4c19a3510@iotcl.com
---
Range-diff versus v2:

1:  b5cf7362be ! 1:  592a22fdbd builtin/clone: teach git-clone(1) the --revision= option
    @@ Commit message
         under `refs/heads` or `refs/tags`. But some other refs might be in use
         as well. For example Git forges might use refs like `refs/pull/<id>` and
         `refs/merge-requests/<id>` to track pull/merge requests. These refs
    -    cannot selected upon git-clone(1).
    +    cannot be selected upon git-clone(1).
     
         Add option `--revision` to git-clone(1). This option accepts a fully
         qualified reference, or a raw commit hash. This enables the user to
    -    clone and checkout any revision they want. `--revision` can be used in
    +    clone and check out any revision they want. `--revision` can be used in
         conjunction with `--depth` to do a minimal clone that only contains the
    -    sources for a single revision. This can be useful for automated tests.
    +    sources for a single revision. This can be useful for automated tests
    +    running in CI systems.
     
         This type of shallow clone could also be achieved with the following set
         of commands:
    @@ Commit message
             git fetch --depth=1 origin <commit-id>
             git checkout <commit-id>
     
    -    Unfortunately, this approach uses git-fetch(1) instead of git-clone(1),
    -    and only on git-clone(1) the bundle URIs advertised by the server are
    -    used. By adding this option `--revision` to git-clone(1) allows us to
    -    get the same end result, while benefiting from bundle URIs if advertised
    -    by the server.
    +    Adding this new option to git-clone(1) simplifies this not uncommon
    +    use-case. And besides simplifying this, it enables the use of
    +    git-clone(1) over git-fetch(1). This is beneficial in case bundle URIs
    +    are available on the server. Bundle URIs are only used on clone, not on
    +    fetch, so using allowing the user to use git-clone(1) here makes them
    +    benefit from bundle URIs if advertised by the server.
     
         Signed-off-by: Toon Claes <toon@iotcl.com>
     
    @@ Documentation/git-clone.txt: objects from the source repository into a pack in t
      
     +`--revision` _<rev>_::
     +	This clones the given revision, and that revision only. The argument can
    -+	be a symbolic ref name (e.g. `refs/heads/main`), or a raw commit hash.
    -+	Unless the revision points to a branch (i.e. ref starting with
    -+	`refs/heads/`), the HEAD is detached.
    ++	be a ref name (e.g. `refs/heads/main`) that peels down to a commit, or a
    ++	raw commit hash.
    ++	The given revision is checked out, and for any revision other than a
    ++	branch (i.e. ref starting with `refs/heads/`), the HEAD is detached.
     +	This option is incompatible with `--branch`, `--mirror`, and `--bare`.
     +
      `-u` _<upload-pack>_::
    @@ builtin/clone.c: static struct option builtin_clone_options[] = {
      		   N_("path to git-upload-pack on the remote")),
      	OPT_STRING(0, "depth", &option_depth, N_("depth"),
     @@ builtin/clone.c: static void update_head(const struct ref *our, const struct ref *remote,
    - 			install_branch_config(0, head, remote_name, our->name);
    - 		}
      	} else if (our) {
    --		struct commit *c = lookup_commit_reference(the_repository,
    --							   &our->old_oid);
    --		/* --branch specifies a non-branch (i.e. tags), detach HEAD */
    --		refs_update_ref(get_main_ref_store(the_repository), msg,
    --				"HEAD", &c->object.oid, NULL, REF_NO_DEREF,
    --				UPDATE_REFS_DIE_ON_ERR);
    -+		struct commit *c =
    -+			lookup_commit_reference(the_repository, &our->old_oid);
    + 		struct commit *c = lookup_commit_reference(the_repository,
    + 							   &our->old_oid);
    ++		if (!c)
    ++			die(_("unable to update HEAD"));
     +
    -+		if (c)
    -+			/* --branch specifies a non-branch (i.e. tags), detach HEAD */
    -+			refs_update_ref(get_main_ref_store(the_repository), msg,
    -+					"HEAD", &c->object.oid, NULL,
    -+					REF_NO_DEREF, UPDATE_REFS_DIE_ON_ERR);
    - 	} else if (remote) {
    - 		/*
    - 		 * We know remote HEAD points to a non-branch, or
    + 		/* --branch specifies a non-branch (i.e. tags), detach HEAD */
    + 		refs_update_ref(get_main_ref_store(the_repository), msg,
    + 				"HEAD", &c->object.oid, NULL, REF_NO_DEREF,
     @@ builtin/clone.c: static void write_refspec_config(const char *src_ref_prefix,
      	struct strbuf key = STRBUF_INIT;
      	struct strbuf value = STRBUF_INIT;
    @@ builtin/clone.c: int cmd_clone(int argc,
      
      	transport_set_option(transport, TRANS_OPT_KEEP, "yes");
      
    -+	if (option_rev) {
    -+		if (option_branch)
    -+			die(_("options '%s' and '%s' cannot be used together"),
    -+			    "--revision", "--branch");
    -+		if (option_mirror)
    -+			die(_("options '%s' and '%s' cannot be used together"),
    -+			    "--revision", "--mirror");
    -+		if (option_bare)
    -+			die(_("options '%s' and '%s' cannot be used together"),
    -+			    "--revision", "--bare");
    -+	}
    ++	die_for_incompatible_opt2(!!option_rev, "--revision",
    ++				  !!option_branch, "--branch");
    ++	die_for_incompatible_opt2(!!option_rev, "--revision",
    ++				  option_mirror, "--mirror");
     +
      	if (reject_shallow)
      		transport_set_option(transport, TRANS_OPT_REJECT_SHALLOW, "1");
    @@ builtin/clone.c: int cmd_clone(int argc,
      			    option_branch, remote_name);
     +	} else if (option_rev) {
     +		our_head_points_at = mapped_refs;
    ++		if (!our_head_points_at)
    ++			die(_("Remote revision %s not found in upstream %s"),
    ++			    option_rev, remote_name);
      	} else if (remote_head_points_at) {
      		our_head_points_at = remote_head_points_at;
      	} else if (remote_head) {
     
    + ## parse-options.h ##
    +@@ parse-options.h: static inline void die_for_incompatible_opt3(int opt1, const char *opt1_name,
    + 				  0, "");
    + }
    + 
    ++static inline void die_for_incompatible_opt2(int opt1, const char *opt1_name,
    ++					     int opt2, const char *opt2_name)
    ++{
    ++	die_for_incompatible_opt4(opt1, opt1_name,
    ++				  opt2, opt2_name,
    ++				  0, "",
    ++				  0, "");
    ++}
    ++
    + /*
    +  * Use these assertions for callbacks that expect to be called with NONEG and
    +  * NOARG respectively, and do not otherwise handle the "unset" and "arg"
    +
    + ## t/meson.build ##
    +@@ t/meson.build: integration_tests = [
    +   't5617-clone-submodules-remote.sh',
    +   't5618-alternate-refs.sh',
    +   't5619-clone-local-ambiguous-transport.sh',
    ++  't5620-clone-revision.sh',
    +   't5700-protocol-v1.sh',
    +   't5701-git-serve.sh',
    +   't5702-protocol-v2.sh',
    +
      ## t/t5620-clone-revision.sh (new) ##
     @@
     +#!/bin/sh
    @@ t/t5620-clone-revision.sh (new)
     +
     +test_expect_success 'setup' '
     +	test_commit --no-tag "initial commit" README "Hello" &&
    -+	test_commit "second commit" README "Hello world" v1.0 &&
    ++	test_commit --annotate "second commit" README "Hello world" v1.0 &&
     +	test_commit --no-tag "third commit" README "Hello world!" &&
     +	git switch -c feature v1.0 &&
     +	test_commit --no-tag "feature commit" README "Hello world!" &&
    @@ t/t5620-clone-revision.sh (new)
     +test_expect_success 'clone with --revision being a tag' '
     +	test_when_finished "rm -rf dst" &&
     +	git clone --revision=refs/tags/v1.0 . dst &&
    -+	git rev-parse refs/tags/v1.0 >expect &&
    ++	git rev-parse refs/tags/v1.0^{} >expect &&
     +	git -C dst rev-parse HEAD >actual &&
     +	test_cmp expect actual
     +'
    @@ t/t5620-clone-revision.sh (new)
     +	test_cmp expect actual
     +'
     +
    ++test_expect_success 'clone with --revision and --bare' '
    ++	test_when_finished "rm -rf dst" &&
    ++	git clone --revision=refs/heads/main --bare . dst &&
    ++	oid=$(git rev-parse refs/heads/main) &&
    ++	git -C dst cat-file -t $oid > actual &&
    ++	echo "commit" >expect &&
    ++	test_cmp expect actual
    ++'
    ++
     +test_expect_success 'clone with --revision being a short raw commit hash' '
     +	test_when_finished "rm -rf dst" &&
     +	oid=$(git rev-parse --short refs/heads/feature) &&
    -+	git clone --revision=$oid . dst 2>err &&
    -+	grep "warning: remote HEAD refers to nonexistent ref, unable to checkout" err
    ++	test_must_fail git clone --revision=$oid . dst 2>err &&
    ++	test_grep "fatal: Remote revision $oid not found in upstream origin" err
     +'
     +
     +test_expect_success 'clone with --revision being a tree hash' '
     +	test_when_finished "rm -rf dst" &&
     +	oid=$(git rev-parse refs/heads/feature^{tree}) &&
    -+	git clone --revision=$oid . dst 2>err &&
    -+	grep "warning: remote HEAD refers to nonexistent ref, unable to checkout" err
    ++	test_must_fail git clone --revision=$oid . dst 2>err &&
    ++	test_grep "error: object $oid is a tree, not a commit" err
     +'
     +
     +test_expect_success 'clone with --revision being the parent of a ref fails' '
    @@ t/t5620-clone-revision.sh (new)
     +	test_must_fail git clone --revision=refs/heads/main --mirror . dst
     +'
     +
    -+test_expect_success 'clone with --revision and --bare fails' '
    -+	test_when_finished "rm -rf dst" &&
    -+	test_must_fail git clone --revision=refs/heads/main --bare . dst
    -+'
    -+
     +test_done
---
 Documentation/git-clone.txt |   8 ++++
 builtin/clone.c             |  49 +++++++++++++++++++--
 parse-options.h             |   9 ++++
 t/meson.build               |   1 +
 t/t5620-clone-revision.sh   | 101 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 164 insertions(+), 4 deletions(-)


---

base-commit: d882f382b3d939d90cfa58d17b17802338f05d66
change-id: 20241129-toon-clone-refs-ad3623772f92

Thanks
--
Toon

Comments

Junio C Hamano Dec. 19, 2024, 6:30 p.m. UTC | #1
Toon Claes <toon@iotcl.com> writes:

> The git-clone(1) command has the option `--branch` that allows the user
> to select the branch they want HEAD to point to. In a non-bare
> repository this also checks out that branch.
>
> Option `--branch` also accepts a tag. When a tag name is provided, the
> commit this tag points to is checked out and HEAD is detached. Thus
> `--branch` can be used to clone a repository and check out a ref kept
> under `refs/heads` or `refs/tags`. But some other refs might be in use
> as well. For example Git forges might use refs like `refs/pull/<id>` and
> `refs/merge-requests/<id>` to track pull/merge requests. These refs
> cannot be selected upon git-clone(1).
>
> Add option `--revision` to git-clone(1). This option accepts a fully
> qualified reference, or a raw commit hash. This enables the user to
> clone and check out any revision they want. `--revision` can be used in
> conjunction with `--depth` to do a minimal clone that only contains the
> sources for a single revision. This can be useful for automated tests
> running in CI systems.

I somehow suspect that you want the mental model your users form
for this new feature to be "--single-branch --branch=<branch>" mode
of "git clone", not just "--branch=<branch>".  This is especially true
when you bring up the possibility of the "--depth" option.

The difference between "git clone" and "git init" followed by "git
fetch" is not just "the user does not have to type two commands if
the feature is inside clone".  That is merely a superficial
difference, and the real value is that "clone" leaves configuration
to make further work easier.

For the "--branch" option the user's mental model for this new
feature is based on, "git clone" gives the usual fetch pathspec
and the branch.merge configuration to make it build on top of the
single remote branch we cloned, i.e. you'd get

	[remote "origin"]
		url = $URL
		fetch = +refs/heads/*:refs/remotes/origin/*
	[branch "next"]
		remote = origin
		merge = refs/heads/next

with "git clone --branch=next $URL".  

But after "git clone --single-branch --branch=next $URL", you'd get

	[remote "origin"]
		url = $URL
		fetch = +refs/heads/next:refs/remotes/origin/next
	[branch "next"]
		remote = origin
		merge = refs/heads/next

After all, the user wants to work with this single branch, so we
only dedicate a single remote-tracking branch for that and ignore
everything else.

With that line of thought in mind, let's read on with this question
in mind: what configuration should this mode of "git clone" leave,
to make further work in the resulting repository easier?

> This type of shallow clone could also be achieved with the following set
> of commands:
>
>     git init the-repo
>     cd ./the-repo

OK.

>     git remote add origin <url>

Not a great idea, as it is likely to leave remote.origin.fetch
refspec that is overly wide.  Compared to "--single-branch --branch"
case, I suspect that the resulting configuration should just record
remote.origin.url but not remote.origin.fetch or branch.*.* at all,
or something like that, to make sure we won't track any refs from
there.

>     git fetch --depth=1 origin <commit-id>
>     git checkout <commit-id>

OK, but "git checkout --detach <commit>" would make it clear that we
are not getting any local branch.

> Adding this new option to git-clone(1) simplifies this not uncommon
> use-case. And besides simplifying this, it enables the use of
> git-clone(1) over git-fetch(1). This is beneficial in case bundle URIs
> are available on the server. Bundle URIs are only used on clone, not on
> fetch, so using allowing the user to use git-clone(1) here makes them
> benefit from bundle URIs if advertised by the server.

That reasoning goes backwards.  If this wants to be really a
one-time "git fetch", but the lack of features like bundleURI
support in it makes your choice of tool unworkable, we should be
adding the support to your chosen tool, "git fetch".

As I already said, we should sell "clone" not for the stupid "the
user do not have to do init and fetch separately" reason, but "we
can tell what workflow is expected from the command line options
given to 'git clone', and configure the resulting repository
appropriately, which is not something init+fetch cannot do".

And when you sell it that way, bundleURI becomes totally a
non-issue (it becomes a mere bonus cherry-on-top).

> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index de8d8f58930ecff305f79480b13ddce10cd96c60..67498dae7c7d0315c7026b4ca2e822e48dcb7479 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -221,6 +221,14 @@ objects from the source repository into a pack in the cloned repository.
>  	`--branch` can also take tags and detaches the `HEAD` at that commit
>  	in the resulting repository.
>  
> +`--revision` _<rev>_::
> +	This clones the given revision, and that revision only.

"that revision only" sounds like this option automatically forces "--depth=1",
but I do not think that is what we want.

	Create a new repository, and fetch the history leading to
	the given revision _<rev>_ (and nothing else), without
	making any remote-tracking branch, and without making any
	local branch, and point `HEAD` to _<rev_>.  When creating a
	non-bare repository, the revision is checked out on a
	detached `HEAD`.

or something?

> +	The argument can
> +	be a ref name (e.g. `refs/heads/main`) that peels down to a commit, or a
> +	raw commit hash.

"raw" -> "hexadecimal", "hash" -> "object name", but otherwise looks
good.  A ref-name may have an example for tag as well, i.e.

	(e.g., `refs/heads/main` or `refs/tags/v1.0`)

> +	The given revision is checked out, and for any revision other than a
> +	branch (i.e. ref starting with `refs/heads/`), the HEAD is detached.

"--revision refs/heads/master~4" start with "refs/heads/" but I do
not think you want to special case it.  You'd want to always detach
for consistency and simplicity.  If they want branch, the will say
"--single-branch --branch" instead of "--revision".

> +	This option is incompatible with `--branch`, `--mirror`, and `--bare`.

How does this intarct with "--single-branch"?  It is obvious and
natural to ignore "--single-branch" as you are not going to create
any local or remote-tracking branches, but should this be made
incompatible with "--no-single-branch"?  I dunno.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 21721db28a12864621d27d2c70ee5c2598cccc0f..391757e5260a902a87bcf3b435fe39c6cd841b3b 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -69,6 +69,7 @@ static char *option_template, *option_depth, *option_since;
>  static char *option_origin = NULL;
>  static char *remote_name = NULL;
>  static char *option_branch = NULL;
> +static char *option_rev = NULL;
>  static struct string_list option_not = STRING_LIST_INIT_NODUP;
>  static const char *real_git_dir;
>  static const char *ref_format;
> @@ -141,6 +142,8 @@ static struct option builtin_clone_options[] = {
>  		   N_("use <name> instead of 'origin' to track upstream")),
>  	OPT_STRING('b', "branch", &option_branch, N_("branch"),
>  		   N_("checkout <branch> instead of the remote's HEAD")),
> +	OPT_STRING(0, "revision", &option_rev, N_("rev"),
> +		   N_("clone single revision <rev> and check out")),
>  	OPT_STRING('u', "upload-pack", &option_upload_pack, N_("path"),
>  		   N_("path to git-upload-pack on the remote")),
>  	OPT_STRING(0, "depth", &option_depth, N_("depth"),
> @@ -684,6 +687,9 @@ static void update_head(const struct ref *our, const struct ref *remote,
>  	} else if (our) {
>  		struct commit *c = lookup_commit_reference(the_repository,
>  							   &our->old_oid);
> +		if (!c)
> +			die(_("unable to update HEAD"));

This new error condition becomes necessary, because you have no way
to verify the value given to --revision until you see it here, which
is understandable.

What are the failure modes and causes?  Can we give a bit more
useful diagnostics, like

 - the named object did not exist on the remote repository at all
 - failed to fetch the named object from the remote repository
 - fetched the named object and it turns out not to be a committish
 - something else?

> @@ -968,7 +974,7 @@ int cmd_clone(int argc,
>  	char *repo_to_free = NULL;
>  	char *path = NULL, *dir, *display_repo = NULL;
>  	int dest_exists, real_dest_exists = 0;
> -	const struct ref *refs, *remote_head;
> +	const struct ref *refs = NULL, *remote_head;
>  	struct ref *remote_head_points_at = NULL;
>  	const struct ref *our_head_points_at;
>  	char *unborn_head = NULL;

Offhand I do not see why this hunk is needed.

> diff --git a/parse-options.h b/parse-options.h
> index f0801d4532a175b65783689f2a68fb56da2c8e87..72c62311b61f46152d66bcba9328de59fd300df9 100644
> --- a/parse-options.h
> +++ b/parse-options.h
> @@ -420,6 +420,15 @@ static inline void die_for_incompatible_opt3(int opt1, const char *opt1_name,
>  				  0, "");
>  }
>  
> +static inline void die_for_incompatible_opt2(int opt1, const char *opt1_name,
> +					     int opt2, const char *opt2_name)
> +{
> +	die_for_incompatible_opt4(opt1, opt1_name,
> +				  opt2, opt2_name,
> +				  0, "",
> +				  0, "");
> +}
> +

Funny that we had 4 and 3 but did not need 2 until now ;-)

> diff --git a/t/t5620-clone-revision.sh b/t/t5620-clone-revision.sh

There already is a topic in flight that uses 5620 for its new test.

> @@ -0,0 +1,101 @@
> +#!/bin/sh
> +
> +test_description='tests for git clone --revision'
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME

> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +	test_commit --no-tag "initial commit" README "Hello" &&
> +	test_commit --annotate "second commit" README "Hello world" v1.0 &&
> +	test_commit --no-tag "third commit" README "Hello world!" &&
> +	git switch -c feature v1.0 &&
> +	test_commit --no-tag "feature commit" README "Hello world!" &&
> +	git switch main
> +'

> +test_expect_success 'clone with --revision being a branch' '
> +	test_when_finished "rm -rf dst" &&
> +	git clone --revision=refs/heads/feature . dst &&
> +	git rev-parse refs/heads/feature >expect &&
> +	git -C dst rev-parse HEAD >actual &&
> +	test_cmp expect actual &&

This validates that the checked out commit is the same named commit,
but does not see whether we are on a detached HEAD or not.  Even
after we change the implementation to always detach, up to this
point the test will pass (I am not saying it is great.  I am saying
that the test is too loose to etch the desired behaviour in stone).

The next line, however, expects that we create a local branch for it
(the design I do not necessarily agree with---if this is wanted, the
user can already use --branch with --single-branch to do so).

> +	git for-each-ref refs/heads/feature > expect &&
> +	git -C dst for-each-ref > actual &&

Style: no spaces between redirection operator and its target file.

> +	test_cmp expect actual
> +'

We should check the configuration in the resulting repository.  If
we do not care what configuration "clone" leaves, we are not getting
the benefit of using it (you can just do init + fetch + checkout
instead).  This applies to all the remaining tests, so I won't repeat.

> +test_expect_success 'clone with --depth and --revision being a branch' '
> +	test_when_finished "rm -rf dst" &&
> +	git clone --depth=1 --revision=refs/heads/feature . dst &&
> +	git rev-parse refs/heads/feature >expect &&
> +	git -C dst rev-parse HEAD >actual &&
> +	test_cmp expect actual
> +'

We are not validating if the depth did its task correctly here.  We
do not test if the resulting repository has it checked out on the
detached HEAD.  We do not test if the resulting repository has the
local branch of the same name.

> +test_expect_success 'clone with --revision being a tag' '
> +	test_when_finished "rm -rf dst" &&
> +	git clone --revision=refs/tags/v1.0 . dst &&
> +	git rev-parse refs/tags/v1.0^{} >expect &&
> +	git -C dst rev-parse HEAD >actual &&
> +	test_cmp expect actual
> +'

It is a good idea to verify an annotated tag, as you never want to
write anything but a commit in HEAD.

Again, the tests are too loose to validate anything important.

> +test_expect_success 'clone with --revision being HEAD' '
> +	test_when_finished "rm -rf dst" &&
> +	git clone --revision=HEAD . dst &&
> +	git rev-parse HEAD >expect &&
> +	git -C dst rev-parse HEAD >actual &&
> +	test_cmp expect actual
> +'

Ditto.  I presume that this wants to detach and without any local or
remote-tracking branches left?

> +test_expect_success 'clone with --revision being a raw commit hash' '
> +	test_when_finished "rm -rf dst" &&
> +	oid=$(git rev-parse refs/heads/feature) &&
> +	git clone --revision=$oid . dst &&
> +	echo $oid >expect &&
> +	git -C dst rev-parse HEAD >actual &&
> +	test_cmp expect actual
> +'

Ditto.  It is common across many of the tests that they are too
loose and do not check much beyond the value of HEAD, which wants
corrected.  My recommendation for the behaviour is for all cases
they do not touch either local or remote-tracking branches and
always detach HEAD at the commit (or fail if a non-committish was
given), which means the all the tests (other than the ones that
makes sure the command fails under some conditions) should check
that the HEAD is detached at the expected commit and no local or
remote-tracking branches created.  This is shared among the rest of
the test, so I won't repeat.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index de8d8f58930ecff305f79480b13ddce10cd96c60..67498dae7c7d0315c7026b4ca2e822e48dcb7479 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -221,6 +221,14 @@  objects from the source repository into a pack in the cloned repository.
 	`--branch` can also take tags and detaches the `HEAD` at that commit
 	in the resulting repository.
 
+`--revision` _<rev>_::
+	This clones the given revision, and that revision only. The argument can
+	be a ref name (e.g. `refs/heads/main`) that peels down to a commit, or a
+	raw commit hash.
+	The given revision is checked out, and for any revision other than a
+	branch (i.e. ref starting with `refs/heads/`), the HEAD is detached.
+	This option is incompatible with `--branch`, `--mirror`, and `--bare`.
+
 `-u` _<upload-pack>_::
 `--upload-pack` _<upload-pack>_::
 	When given, and the repository to clone from is accessed
diff --git a/builtin/clone.c b/builtin/clone.c
index 21721db28a12864621d27d2c70ee5c2598cccc0f..391757e5260a902a87bcf3b435fe39c6cd841b3b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -69,6 +69,7 @@  static char *option_template, *option_depth, *option_since;
 static char *option_origin = NULL;
 static char *remote_name = NULL;
 static char *option_branch = NULL;
+static char *option_rev = NULL;
 static struct string_list option_not = STRING_LIST_INIT_NODUP;
 static const char *real_git_dir;
 static const char *ref_format;
@@ -141,6 +142,8 @@  static struct option builtin_clone_options[] = {
 		   N_("use <name> instead of 'origin' to track upstream")),
 	OPT_STRING('b', "branch", &option_branch, N_("branch"),
 		   N_("checkout <branch> instead of the remote's HEAD")),
+	OPT_STRING(0, "revision", &option_rev, N_("rev"),
+		   N_("clone single revision <rev> and check out")),
 	OPT_STRING('u', "upload-pack", &option_upload_pack, N_("path"),
 		   N_("path to git-upload-pack on the remote")),
 	OPT_STRING(0, "depth", &option_depth, N_("depth"),
@@ -684,6 +687,9 @@  static void update_head(const struct ref *our, const struct ref *remote,
 	} else if (our) {
 		struct commit *c = lookup_commit_reference(the_repository,
 							   &our->old_oid);
+		if (!c)
+			die(_("unable to update HEAD"));
+
 		/* --branch specifies a non-branch (i.e. tags), detach HEAD */
 		refs_update_ref(get_main_ref_store(the_repository), msg,
 				"HEAD", &c->object.oid, NULL, REF_NO_DEREF,
@@ -889,7 +895,7 @@  static void write_refspec_config(const char *src_ref_prefix,
 	struct strbuf key = STRBUF_INIT;
 	struct strbuf value = STRBUF_INIT;
 
-	if (option_mirror || !option_bare) {
+	if (!option_rev && (option_mirror || !option_bare)) {
 		if (option_single_branch && !option_mirror) {
 			if (option_branch) {
 				if (starts_with(our_head_points_at->name, "refs/tags/"))
@@ -968,7 +974,7 @@  int cmd_clone(int argc,
 	char *repo_to_free = NULL;
 	char *path = NULL, *dir, *display_repo = NULL;
 	int dest_exists, real_dest_exists = 0;
-	const struct ref *refs, *remote_head;
+	const struct ref *refs = NULL, *remote_head;
 	struct ref *remote_head_points_at = NULL;
 	const struct ref *our_head_points_at;
 	char *unborn_head = NULL;
@@ -1345,6 +1351,11 @@  int cmd_clone(int argc,
 
 	transport_set_option(transport, TRANS_OPT_KEEP, "yes");
 
+	die_for_incompatible_opt2(!!option_rev, "--revision",
+				  !!option_branch, "--branch");
+	die_for_incompatible_opt2(!!option_rev, "--revision",
+				  option_mirror, "--mirror");
+
 	if (reject_shallow)
 		transport_set_option(transport, TRANS_OPT_REJECT_SHALLOW, "1");
 	if (option_depth)
@@ -1387,7 +1398,27 @@  int cmd_clone(int argc,
 		strvec_push(&transport_ls_refs_options.ref_prefixes,
 			    "refs/tags/");
 
-	refs = transport_get_remote_refs(transport, &transport_ls_refs_options);
+	if (option_rev) {
+		refspec_clear(&remote->fetch);
+		refspec_init(&remote->fetch, REFSPEC_FETCH);
+		refspec_append(&remote->fetch, option_rev);
+
+		strvec_clear(&transport_ls_refs_options.ref_prefixes);
+		refspec_ref_prefixes(&remote->fetch,
+				     &transport_ls_refs_options.ref_prefixes);
+
+		if (transport_ls_refs_options.ref_prefixes.nr == 0)
+			/*
+			 * We need to talk to the server to determine the hash
+			 * algorithm, but when no ref prefixes are set the
+			 * server announces all known refs, so ask the server to
+			 * only tell us about HEAD.
+			 */
+			strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
+	}
+
+	refs = transport_get_remote_refs(transport,
+					 &transport_ls_refs_options);
 
 	/*
 	 * Now that we know what algorithm the remote side is using, let's set
@@ -1461,8 +1492,13 @@  int cmd_clone(int argc,
 		}
 	}
 
-	if (refs)
+	if (option_rev) {
+		struct ref **tail = &mapped_refs;
+
+		get_fetch_map(refs, &remote->fetch.items[0], &tail, 1);
+	} else if (refs) {
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
+	}
 
 	if (mapped_refs) {
 		/*
@@ -1495,6 +1531,11 @@  int cmd_clone(int argc,
 		if (!our_head_points_at)
 			die(_("Remote branch %s not found in upstream %s"),
 			    option_branch, remote_name);
+	} else if (option_rev) {
+		our_head_points_at = mapped_refs;
+		if (!our_head_points_at)
+			die(_("Remote revision %s not found in upstream %s"),
+			    option_rev, remote_name);
 	} else if (remote_head_points_at) {
 		our_head_points_at = remote_head_points_at;
 	} else if (remote_head) {
diff --git a/parse-options.h b/parse-options.h
index f0801d4532a175b65783689f2a68fb56da2c8e87..72c62311b61f46152d66bcba9328de59fd300df9 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -420,6 +420,15 @@  static inline void die_for_incompatible_opt3(int opt1, const char *opt1_name,
 				  0, "");
 }
 
+static inline void die_for_incompatible_opt2(int opt1, const char *opt1_name,
+					     int opt2, const char *opt2_name)
+{
+	die_for_incompatible_opt4(opt1, opt1_name,
+				  opt2, opt2_name,
+				  0, "",
+				  0, "");
+}
+
 /*
  * Use these assertions for callbacks that expect to be called with NONEG and
  * NOARG respectively, and do not otherwise handle the "unset" and "arg"
diff --git a/t/meson.build b/t/meson.build
index 13fe854ba0a18f9b83dbc48651f581198042ffd3..606fd6ff9d09f5c90c08ee4b9e108a5a04e187d9 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -721,6 +721,7 @@  integration_tests = [
   't5617-clone-submodules-remote.sh',
   't5618-alternate-refs.sh',
   't5619-clone-local-ambiguous-transport.sh',
+  't5620-clone-revision.sh',
   't5700-protocol-v1.sh',
   't5701-git-serve.sh',
   't5702-protocol-v2.sh',
diff --git a/t/t5620-clone-revision.sh b/t/t5620-clone-revision.sh
new file mode 100755
index 0000000000000000000000000000000000000000..92fe0b3f1c1268fc29e0e61d543167ea35845066
--- /dev/null
+++ b/t/t5620-clone-revision.sh
@@ -0,0 +1,101 @@ 
+#!/bin/sh
+
+test_description='tests for git clone --revision'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit --no-tag "initial commit" README "Hello" &&
+	test_commit --annotate "second commit" README "Hello world" v1.0 &&
+	test_commit --no-tag "third commit" README "Hello world!" &&
+	git switch -c feature v1.0 &&
+	test_commit --no-tag "feature commit" README "Hello world!" &&
+	git switch main
+'
+
+test_expect_success 'clone with --revision being a branch' '
+	test_when_finished "rm -rf dst" &&
+	git clone --revision=refs/heads/feature . dst &&
+	git rev-parse refs/heads/feature >expect &&
+	git -C dst rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+	git for-each-ref refs/heads/feature > expect &&
+	git -C dst for-each-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone with --depth and --revision being a branch' '
+	test_when_finished "rm -rf dst" &&
+	git clone --depth=1 --revision=refs/heads/feature . dst &&
+	git rev-parse refs/heads/feature >expect &&
+	git -C dst rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone with --revision being a tag' '
+	test_when_finished "rm -rf dst" &&
+	git clone --revision=refs/tags/v1.0 . dst &&
+	git rev-parse refs/tags/v1.0^{} >expect &&
+	git -C dst rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone with --revision being HEAD' '
+	test_when_finished "rm -rf dst" &&
+	git clone --revision=HEAD . dst &&
+	git rev-parse HEAD >expect &&
+	git -C dst rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone with --revision being a raw commit hash' '
+	test_when_finished "rm -rf dst" &&
+	oid=$(git rev-parse refs/heads/feature) &&
+	git clone --revision=$oid . dst &&
+	echo $oid >expect &&
+	git -C dst rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone with --revision and --bare' '
+	test_when_finished "rm -rf dst" &&
+	git clone --revision=refs/heads/main --bare . dst &&
+	oid=$(git rev-parse refs/heads/main) &&
+	git -C dst cat-file -t $oid > actual &&
+	echo "commit" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone with --revision being a short raw commit hash' '
+	test_when_finished "rm -rf dst" &&
+	oid=$(git rev-parse --short refs/heads/feature) &&
+	test_must_fail git clone --revision=$oid . dst 2>err &&
+	test_grep "fatal: Remote revision $oid not found in upstream origin" err
+'
+
+test_expect_success 'clone with --revision being a tree hash' '
+	test_when_finished "rm -rf dst" &&
+	oid=$(git rev-parse refs/heads/feature^{tree}) &&
+	test_must_fail git clone --revision=$oid . dst 2>err &&
+	test_grep "error: object $oid is a tree, not a commit" err
+'
+
+test_expect_success 'clone with --revision being the parent of a ref fails' '
+	test_when_finished "rm -rf dst" &&
+	test_must_fail git clone --revision=refs/heads/main^ . dst
+'
+
+test_expect_success 'clone with --revision and --branch fails' '
+	test_when_finished "rm -rf dst" &&
+	test_must_fail git clone --revision=refs/heads/main --branch=main . dst
+'
+
+test_expect_success 'clone with --revision and --mirror fails' '
+	test_when_finished "rm -rf dst" &&
+	test_must_fail git clone --revision=refs/heads/main --mirror . dst
+'
+
+test_done