diff mbox series

[v4] builtin/clone.c: add --reject-shallow option

Message ID pull.865.v4.git.1613891147977.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v4] builtin/clone.c: add --reject-shallow option | expand

Commit Message

Li Linchao Feb. 21, 2021, 7:05 a.m. UTC
From: lilinchao <lilinchao@oschina.cn>

In some scenarios, users may want more history than the repository
offered for cloning, which happens to be a shallow repository, can
give them. But because users don't know it is a shallow repository
until they download it to local, users should have the option to
refuse to clone this kind of repository, and may want to exit the
process immediately without creating any unnecessary files.

Althought there is an option '--depth=x' for users to decide how
deep history they can fetch, but as the unshallow cloning's depth
is INFINITY, we can't know exactly the minimun 'x' value that can
satisfy the minimum integrity, so we can't pass 'x' value to --depth,
and expect this can obtain a complete history of a repository.

In other scenarios, if we have an API that allow us to import external
repository, and then perform various operations on the repo.
But if the imported is a shallow one(which is actually possible), it
will affect the subsequent operations. So we can choose to refuse to
clone, and let's just import a normal repository.

This patch offers a new option '--reject-shallow' that can reject to
clone a shallow repository.

Signed-off-by: lilinchao <lilinchao@oschina.cn>
---
    builtin/clone.c: add --reject-shallow option
    
    Changes since v1:
    
     * Rename --no-shallow to --reject-shallow
     * Enable to reject a non-local clone
     * Enable --[no-]reject-shallow from CLI override configuration.
     * Add more testcases.
     * Reword commit messages and relative documentation.
    
    Changes since v3:
    
     * Add support to reject clone shallow repo over https protocol
     * Add testcase to reject clone shallow repo over https:// transport
     * Reword commit messages and relative documentation according
       suggestions from Junio.
    
    Signed-off-by: lilinchao lilinchao@oschina.cn

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-865%2FCactusinhand%2Fgit-clone-options-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-865/Cactusinhand/git-clone-options-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/865

Range-diff vs v3:

 1:  f5ed6b2e4481 ! 1:  ee4fb840a32f builtin/clone.c: add --reject-shallow option
     @@ Commit message
          builtin/clone.c: add --reject-shallow option
      
          In some scenarios, users may want more history than the repository
     -    offered for cloning, which mostly to be a shallow repository, can
     -    give them. But users don't know it is a shallow repository until
     -    they download it to local, users should have the option to refuse
     -    to clone this kind of repository, and may want to exit the process
     -    immediately without creating any unnecessary files.
     +    offered for cloning, which happens to be a shallow repository, can
     +    give them. But because users don't know it is a shallow repository
     +    until they download it to local, users should have the option to
     +    refuse to clone this kind of repository, and may want to exit the
     +    process immediately without creating any unnecessary files.
      
          Althought there is an option '--depth=x' for users to decide how
          deep history they can fetch, but as the unshallow cloning's depth
     @@ Commit message
          satisfy the minimum integrity, so we can't pass 'x' value to --depth,
          and expect this can obtain a complete history of a repository.
      
     -    In other scenarios, given that we have an API that allow us to import
     -    external repository, and then perform various operations on the repo.
     +    In other scenarios, if we have an API that allow us to import external
     +    repository, and then perform various operations on the repo.
          But if the imported is a shallow one(which is actually possible), it
          will affect the subsequent operations. So we can choose to refuse to
     -    clone, and let's just import an unshallow repository.
     +    clone, and let's just import a normal repository.
      
          This patch offers a new option '--reject-shallow' that can reject to
          clone a shallow repository.
     @@ Documentation/git-clone.txt: SYNOPSIS
       	  [--depth <depth>] [--[no-]single-branch] [--no-tags]
       	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
      -	  [--[no-]remote-submodules] [--jobs <n>] [--sparse]
     -+	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--reject-shallow]
     ++	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--[no-]reject-shallow]
       	  [--filter=<filter>] [--] <repository>
       	  [<directory>]
       
     @@ Documentation/git-clone.txt: objects from the source repository into a pack in t
       --no-checkout::
       	No checkout of HEAD is performed after the clone is complete.
       
     -+--reject-shallow::
     -+	Don't clone a shallow source repository. In some scenarios, clients
     -+	want the cloned repository information to be complete. Otherwise,
     -+	the cloning process should end immediately without creating any files,
     -+	which can save some disk space. This can override `clone.rejectshallow`
     -+	from the configuration:
     -+
     -+	--------------------------------------------------------------------
     -+	$ git -c clone.rejectshallow=false clone --reject-shallow source out
     -+	--------------------------------------------------------------------
     -+
     -+	While there is a way to countermand a configured "I always refuse to
     -+	clone from a shallow repository" with "but let's allow it only this time":
     -+
     -+	----------------------------------------------------------------------
     -+	$ git -c clone.rejectshallow=true clone --no-reject-shallow source out
     -+	----------------------------------------------------------------------
     ++--[no-]reject-shallow::
     ++	Fail if the source repository is a shallow repository. The
     ++	'clone.rejectshallow' configuration variable can be used to
     ++	give the default.
      +
       --bare::
       	Make a 'bare' Git repository.  That is, instead of
       	creating `<directory>` and placing the administrative
      
       ## builtin/clone.c ##
     +@@
     + #include "connected.h"
     + #include "packfile.h"
     + #include "list-objects-filter-options.h"
     ++#include "shallow.h"
     + 
     + /*
     +  * Overall FIXMEs:
      @@ builtin/clone.c: static int option_no_checkout, option_bare, option_mirror, option_single_branch
       static int option_local = -1, option_no_hardlinks, option_shared;
       static int option_no_tags;
       static int option_shallow_submodules;
     -+static int option_no_shallow = -1;  /* unspecified */
     ++static int option_shallow = -1;    /* unspecified */
      +static int config_shallow = -1;    /* unspecified */
       static int deepen;
       static char *option_template, *option_depth, *option_since;
     @@ builtin/clone.c: static struct option builtin_clone_options[] = {
       	OPT__VERBOSITY(&option_verbosity),
       	OPT_BOOL(0, "progress", &option_progress,
       		 N_("force progress reporting")),
     -+	OPT_BOOL(0, "reject-shallow", &option_no_shallow,
     ++	OPT_BOOL(0, "reject-shallow", &option_shallow,
      +		 N_("don't clone shallow repository")),
       	OPT_BOOL('n', "no-checkout", &option_no_checkout,
       		 N_("don't create a checkout")),
     @@ builtin/clone.c: static int path_exists(const char *path)
       int cmd_clone(int argc, const char **argv, const char *prefix)
       {
       	int is_bundle = 0, is_local;
     -+	int is_shallow = 0;
     ++	int local_shallow = 0;
     ++	int reject_shallow = 0;
       	const char *repo_name, *repo, *work_tree, *git_dir;
       	char *path, *dir, *display_repo = NULL;
       	int dest_exists, real_dest_exists = 0;
      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     + 	 */
     + 	git_config(git_clone_config, NULL);
       
     - 	path = get_repo_path(remote->url[0], &is_bundle);
     - 	is_local = option_local != 0 && path && !is_bundle;
     -+
     -+	/* Detect if the remote repository is shallow */
     -+	if (!access(mkpath("%s/shallow", path), F_OK)) {
     -+		is_shallow = 1;
     ++	/*
     ++	 * If option_shallow is specified from CLI option,
     ++	 * ignore config_shallow from git_clone_config.
     ++	 */
     ++	if (config_shallow != -1) {
     ++		reject_shallow = config_shallow;
      +	}
     -+
     - 	if (is_local) {
     - 		if (option_depth)
     - 			warning(_("--depth is ignored in local clones; use file:// instead."));
     ++	if (option_shallow != -1) {
     ++		reject_shallow = option_shallow;
     ++	}
     + 	/*
     + 	 * apply the remote name provided by --origin only after this second
     + 	 * call to git_config, to ensure it overrides all config-based values.
      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     - 			warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
       		if (filter_options.choice)
       			warning(_("--filter is ignored in local clones; use file:// instead."));
     --		if (!access(mkpath("%s/shallow", path), F_OK)) {
     -+		if (is_shallow) {
     + 		if (!access(mkpath("%s/shallow", path), F_OK)) {
     ++			local_shallow = 1;
       			if (option_local > 0)
       				warning(_("source repository is shallow, ignoring --local"));
       			is_local = 0;
      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     + 			goto cleanup;
       	}
     - 	if (option_local > 0 && !is_local)
     - 		warning(_("--local is ignored"));
     -+
     -+	if (is_shallow) {
     -+		int reject = 0;
     -+
     -+		/* If option_no_shallow is specified from CLI option,
     -+		 * ignore config_shallow from git_clone_config.
     -+		 */
     -+
     -+		if (config_shallow != -1) {
     -+			reject = config_shallow;
     -+		}
     -+		if (option_no_shallow != -1) {
     -+			reject = option_no_shallow;
     -+		}
     -+		if (reject) {
     + 
     ++	if (reject_shallow) {
     ++		if (local_shallow || is_repository_shallow(the_repository)) {
      +			die(_("source repository is shallow, reject to clone."));
      +		}
      +	}
      +
     - 	transport->cloning = 1;
     - 
     - 	transport_set_option(transport, TRANS_OPT_KEEP, "yes");
     + 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
     + 			   branch_top.buf, reflog_msg.buf, transport,
     + 			   !is_local);
      
       ## t/t5606-clone-options.sh ##
     +@@ t/t5606-clone-options.sh: GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
     + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
     + 
     + . ./test-lib.sh
     ++. "$TEST_DIRECTORY"/lib-httpd.sh
     ++start_httpd
     + 
     + test_expect_success 'setup' '
     + 
      @@ t/t5606-clone-options.sh: test_expect_success 'disallows --bare with --separate-git-dir' '
       
       '
       
     ++test_expect_success 'fail to clone http shallow repository' '
     ++	git clone --depth=1 --no-local parent shallow-repo &&
     ++	git clone --bare --no-local shallow-repo "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
     ++	test_must_fail git clone --reject-shallow $HTTPD_URL/smart/repo.git out 2>err &&
     ++	test_i18ngrep -e "source repository is shallow, reject to clone." err
     ++
     ++'
     ++
      +test_expect_success 'fail to clone shallow repository' '
     ++	rm -rf shallow-repo &&
      +	git clone --depth=1 --no-local parent shallow-repo &&
      +	test_must_fail git clone --reject-shallow shallow-repo out 2>err &&
      +	test_i18ngrep -e "source repository is shallow, reject to clone." err
     @@ t/t5611-clone-config.sh: test_expect_success 'clone -c remote.<remote>.fetch=<re
      +	test_i18ngrep -e "source repository is shallow, reject to clone." err
      +'
      +
     -+test_expect_success 'clone.rejectshallow=false should succeed cloning' '
     ++test_expect_success 'clone.rejectshallow=false should succeed' '
      +	rm -rf child &&
      +	git clone --depth=1 --no-local . child &&
      +	git -c clone.rejectshallow=false clone --no-local child out
      +'
      +
     -+test_expect_success 'clone.rejectshallow=true should succeed cloning normal repo' '
     ++test_expect_success 'clone.rejectshallow=true should succeed with normal repo' '
      +	rm -rf child out &&
      +	git clone --no-local . child &&
      +	git -c clone.rejectshallow=true clone --no-local child out
     @@ t/t5611-clone-config.sh: test_expect_success 'clone -c remote.<remote>.fetch=<re
      +test_expect_success 'option --reject-shallow override clone.rejectshallow' '
      +	rm -rf child out &&
      +	git clone --depth=1 --no-local . child &&
     -+	test_must_fail git clone -c clone.rejectshallow=false  --reject-shallow --no-local child out 2>err &&
     ++	test_must_fail git -c clone.rejectshallow=false clone --reject-shallow --no-local child out 2>err &&
      +	test_i18ngrep -e "source repository is shallow, reject to clone." err
      +'
      +
     -+test_expect_success ' option --no-reject-shallow override clone.rejectshallow' '
     ++test_expect_success 'option --no-reject-shallow override clone.rejectshallow' '
      +	rm -rf child &&
      +	git clone --depth=1 --no-local . child &&
      +	git -c clone.rejectshallow=true clone --no-reject-shallow --no-local child out


 Documentation/config/clone.txt |  4 +++
 Documentation/git-clone.txt    |  7 ++++-
 builtin/clone.c                | 27 +++++++++++++++++++
 t/t5606-clone-options.sh       | 47 ++++++++++++++++++++++++++++++++++
 t/t5611-clone-config.sh        | 32 +++++++++++++++++++++++
 5 files changed, 116 insertions(+), 1 deletion(-)


base-commit: 2283e0e9af55689215afa39c03beb2315ce18e83

Comments

Junio C Hamano Feb. 22, 2021, 6:12 p.m. UTC | #1
[jc: I've CC'ed Jonathan Tan, who is much more knowledgeable than I
am on the transport layer issues, to sanity check my assumption]

"Li Linchao via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -1363,6 +1384,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  			goto cleanup;
>  	}
>  
> +	if (reject_shallow) {
> +		if (local_shallow || is_repository_shallow(the_repository)) {

This may reject to clone from a shallow repository, but at this
point the bulk of the tranfer from the origin repository has already
happened, no?  Rejecting after transferring many megabytes feels a
bit too late.  That is one of the reasons why I kept hinting that
the transport layer needs to be taught an option to reject talking
to a shallow counterpart if we want to add this feature [*1*].

Also, wouldn't "clone --depth=1 --reject-shallow" from a repository
that is not shallow make the_repository a shallow one at this point
and makes it fail?  If the goal of the --reject-shallow option were
to make sure the resulting repository is not shallow, then that is a
technically correct implementation (even though it is wasteful to
transfer a full tree worth of megabytes and then abort), but is the
feature is explained to reject cloning from a shallow one, then
users would be suprised to see ...

> +			die(_("source repository is shallow, reject to clone."));

... this message, when cloning from well known publich repositories
that are not shallow.

I think cloning with --depth=<n> when the source repository is deep
enough, should be allowed, so the cleanest solution for the latter
may be to notice the combination of options that make the resulting
repository shallow (I mentioned --depth=<n>, but there may be others)
and the --reject-shallow option and error out before even talking
to the other side at the time we parse the command line.

Thanks.


[Footnote]

*1* Looking at Documentation/technical/pack-protocol.txt, "git
    fetch" seem to learn if the repository is shallow immediately
    upon contacting "upload-pack" during the Reference Discovery
    phase (we'd see 'shallow' packets if they are shallow). I
    suspect that the right solution would be to teach the codepath
    on the "git fetch" side that accepts, parses, and acts on this
    packet to optionally stop communication and error out when the
    caller asks not to talk with a shallow repository.
Jonathan Tan March 1, 2021, 10:03 p.m. UTC | #2
> This may reject to clone from a shallow repository, but at this
> point the bulk of the tranfer from the origin repository has already
> happened, no?  Rejecting after transferring many megabytes feels a
> bit too late.  That is one of the reasons why I kept hinting that
> the transport layer needs to be taught an option to reject talking
> to a shallow counterpart if we want to add this feature [*1*].

Extending the transport layer in this way might not be too difficult in
the case of native (SSH, git://) protocols and using protocol v0, since
handshake() in transport.c (called indirectly from
transport_get_remote_refs()) writes shallow information to a data
structure that we could potentially expose for the caller to use (before
it calls transport_fetch_refs(). I couldn't see how remote-using
protocols (e.g. HTTP) communicate shallow information, though
(remote-curl.c seems to just keep it for itself), so that will be a more
difficult task. And of course there's the matter of protocol v2, which I
discuss below.

> [Footnote]
> 
> *1* Looking at Documentation/technical/pack-protocol.txt, "git
>     fetch" seem to learn if the repository is shallow immediately
>     upon contacting "upload-pack" during the Reference Discovery
>     phase (we'd see 'shallow' packets if they are shallow). I
>     suspect that the right solution would be to teach the codepath
>     on the "git fetch" side that accepts, parses, and acts on this
>     packet to optionally stop communication and error out when the
>     caller asks not to talk with a shallow repository.

This is true with protocol v0, but protocol v2 bundles all shallow
information (whether coming from the fact that the remote is shallow or
the fact that the fetcher specified --depth etc.) and sends them
together with the packfile. It may be possible to stop packfile download
(saving bandwidth on the client side, at least) once such information is
returned, though.
Junio C Hamano March 1, 2021, 10:34 p.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:

>> This may reject to clone from a shallow repository, but at this
>> point the bulk of the tranfer from the origin repository has already
>> happened, no?  Rejecting after transferring many megabytes feels a
>> bit too late.  That is one of the reasons why I kept hinting that
>> the transport layer needs to be taught an option to reject talking
>> to a shallow counterpart if we want to add this feature [*1*].
>
> Extending the transport layer in this way might not be too difficult in
> the case of native (SSH, git://) protocols and using protocol v0, since
> handshake() in transport.c (called indirectly from
> transport_get_remote_refs()) writes shallow information to a data
> structure that we could potentially expose for the caller to use (before
> it calls transport_fetch_refs(). I couldn't see how remote-using
> protocols (e.g. HTTP) communicate shallow information, though
> (remote-curl.c seems to just keep it for itself), so that will be a more
> difficult task. And of course there's the matter of protocol v2, which I
> discuss below.
>
>> [Footnote]
>> 
>> *1* Looking at Documentation/technical/pack-protocol.txt, "git
>>     fetch" seem to learn if the repository is shallow immediately
>>     upon contacting "upload-pack" during the Reference Discovery
>>     phase (we'd see 'shallow' packets if they are shallow). I
>>     suspect that the right solution would be to teach the codepath
>>     on the "git fetch" side that accepts, parses, and acts on this
>>     packet to optionally stop communication and error out when the
>>     caller asks not to talk with a shallow repository.
>
> This is true with protocol v0, but protocol v2 bundles all shallow
> information (whether coming from the fact that the remote is shallow or
> the fact that the fetcher specified --depth etc.) and sends them
> together with the packfile. It may be possible to stop packfile download
> (saving bandwidth on the client side, at least) once such information is
> returned, though.

So in short, the "we do not want to clone from a shallow upstream"
would not be possible to implement sensibly without significantly
cleaning up the protocol layers first, which makes the whole thing
pretty much moot.

Thanks for a review and insight.
Li Linchao March 2, 2021, 8:44 a.m. UTC | #4
--------------
lilinchao
>> This may reject to clone from a shallow repository, but at this
>> point the bulk of the tranfer from the origin repository has already
>> happened, no?  Rejecting after transferring many megabytes feels a
>> bit too late.  That is one of the reasons why I kept hinting that
>> the transport layer needs to be taught an option to reject talking
>> to a shallow counterpart if we want to add this feature [*1*].
>
>Extending the transport layer in this way might not be too difficult in
>the case of native (SSH, git://) protocols and using protocol v0, since
>handshake() in transport.c (called indirectly from
>transport_get_remote_refs()) writes shallow information to a data
>structure that we could potentially expose for the caller to use (before
>it calls transport_fetch_refs(). I couldn't see how remote-using
>protocols (e.g. HTTP) communicate shallow information, though
>(remote-curl.c seems to just keep it for itself), so that will be a more
>difficult task. And of course there's the matter of protocol v2, which I
>discuss below.
> 
These discussions were based on PATCH_v4, which is quiet immature then,
In the latest PATCH_v5, for the case of native protocols(ssh, git, file://), they
will eventually call do_fetch_pack_v2() if it's protocol v2,  and then will call
receive_shallow_info() for the case FETCH_GET_PACK, so this is the place
I made the change.
As for HTTPs protocl, as long as it's still smart protocol, which means do not
fallback to dumb protocol, it will also call do_fetch_pack_v2(), and go to 
"check shallow info" trigger in receive_shallow_info().

So, base on PATCH_V5, I have tested protocol v2, which goes like this:

First, I created a shallow repo on gitlab and gitee respectively.

Then tried to clone them in my terminal, (in order not to look too verbose,
I ommited the result when GIT_TRACE_PACKET is on).

$ ./git-clone --reject-shallow https://gitlab.com/Cactusinhand/rugged.git
Cloning into 'rugged'...
fatal: source repository is shallow, reject to clone.

$ ./git-clone --reject-shallow ssh://gitlab.com/Cactusinhand/rugged.git
Cloning into 'rugged'...
fatal: source repository is shallow, reject to clone.

$ ./git-clone --reject-shallow git@gitlab.com:Cactusinhand/rugged.git
Cloning into 'rugged'...
fatal: source repository is shallow, reject to clone.

$ ./git-clone --reject-shallow https://gitee.com/cactusinhand/rugged.git
Cloning into 'rugged'...
fatal: source repository is shallow, reject to clone.

$ ./git-clone --reject-shallow ssh://gitee.com/cactusinhand/rugged.git
Cloning into 'rugged'...
fatal: source repository is shallow, reject to clone.

$ ./git-clone --reject-shallow git@gitee.com:cactusinhand/rugged.git
Cloning into 'rugged'...
fatal: source repository is shallow, reject to clone.

I haven't tested protocol v1, but I've made the change in do_fetch_pack(),
which is for protocol version 0 and version 1.

I hope you can review the latest patch, and give me some suggestions.

Thanks!

>> [Footnote]
>>
>> *1* Looking at Documentation/technical/pack-protocol.txt, "git
>>     fetch" seem to learn if the repository is shallow immediately
>>     upon contacting "upload-pack" during the Reference Discovery
>>     phase (we'd see 'shallow' packets if they are shallow). I
>>     suspect that the right solution would be to teach the codepath
>>     on the "git fetch" side that accepts, parses, and acts on this
>>     packet to optionally stop communication and error out when the
>>     caller asks not to talk with a shallow repository.
>
>This is true with protocol v0, but protocol v2 bundles all shallow
>information (whether coming from the fact that the remote is shallow or
>the fact that the fetcher specified --depth etc.) and sends them
>together with the packfile. It may be possible to stop packfile download
>(saving bandwidth on the client side, at least) once such information is
>returned, though.
>
Junio C Hamano March 3, 2021, 11:59 p.m. UTC | #5
Jonathan Tan <jonathantanmy@google.com> writes:

> This is true with protocol v0, but protocol v2 bundles all shallow
> information (whether coming from the fact that the remote is shallow or
> the fact that the fetcher specified --depth etc.) and sends them
> together with the packfile.

By the above do you mean what happens in FETCH_GET_PACK arm where
receive_shallow_info() is called when "shallow-info" header is seen,
before the code continues to process wanted-refs, packfile-uris and
then finally the packfile?

I do not think it makes much sense to ask any option to make us
shallow (like --depth=<n>) while --reject-shallow is in use (after
all, if the other side is deep enough to make us <n> commits deep,
there is no reason to reject the other side as the source), so your
"whether coming from the fact ..." part, while is a valid
observation, can be ignored in practice (meaning: it is OK to make
"--reject-shallow" be in effect only when we are trying to make a
full clone, and reject combinations of it with --depth=<n> and such
at the command parsing time).

> It may be possible to stop packfile download (saving bandwidth on
> the client side, at least) once such information is returned,
> though.

Just like "upload-pack" does not get upset by a client that comes
only for the initial refs advertisement and disconnects without
asking for any "want" (aka "ls-remote"), the server side should be
prepared to see if the other side cuts off after seeing the
"shallow-info" section header or after seeing the the whole
"shallow-info" section, so we should be able to leave early without
having to download the bulk data.  If the "upload-pack" spends
unnecessary cycles when it happens, then we need to fix that.  Even
if the "fetch" client does not willingly disconnect in the middle,
the network disconnect may happen at any point in the exchange, and
we'd need to be prepared for it.

Do we need to read and parse the "shallow-info" section, or would
the mere presense of the section mean the other side knows this side
needs to futz with the .git/shallow information (either because we
asked to be made shallow with --depth and the like, or because we
tried to clone from them and they are shallow)?

Thanks.
Jonathan Tan March 4, 2021, 1:53 a.m. UTC | #6
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > This is true with protocol v0, but protocol v2 bundles all shallow
> > information (whether coming from the fact that the remote is shallow or
> > the fact that the fetcher specified --depth etc.) and sends them
> > together with the packfile.
> 
> By the above do you mean what happens in FETCH_GET_PACK arm where
> receive_shallow_info() is called when "shallow-info" header is seen,
> before the code continues to process wanted-refs, packfile-uris and
> then finally the packfile?

Yes.

> I do not think it makes much sense to ask any option to make us
> shallow (like --depth=<n>) while --reject-shallow is in use (after
> all, if the other side is deep enough to make us <n> commits deep,
> there is no reason to reject the other side as the source), so your
> "whether coming from the fact ..." part, while is a valid
> observation, can be ignored in practice (meaning: it is OK to make
> "--reject-shallow" be in effect only when we are trying to make a
> full clone, and reject combinations of it with --depth=<n> and such
> at the command parsing time).
> 
> > It may be possible to stop packfile download (saving bandwidth on
> > the client side, at least) once such information is returned,
> > though.
> 
> Just like "upload-pack" does not get upset by a client that comes
> only for the initial refs advertisement and disconnects without
> asking for any "want" (aka "ls-remote"), the server side should be
> prepared to see if the other side cuts off after seeing the
> "shallow-info" section header or after seeing the the whole
> "shallow-info" section, so we should be able to leave early without
> having to download the bulk data.  If the "upload-pack" spends
> unnecessary cycles when it happens, then we need to fix that.  Even
> if the "fetch" client does not willingly disconnect in the middle,
> the network disconnect may happen at any point in the exchange, and
> we'd need to be prepared for it.
> 
> Do we need to read and parse the "shallow-info" section, or would
> the mere presense of the section mean the other side knows this side
> needs to futz with the .git/shallow information (either because we
> asked to be made shallow with --depth and the like, or because we
> tried to clone from them and they are shallow)?

Reading the documentation, the mere presence should be enough. Yes, I
think upload-pack will spend unnecessary cycles if the channel is
terminated halfway (and I don't know if we can prevent spending these
cycles, since I/O can be buffered). I think it should be possible for
the client to cut off when it sees shallow-info (besides the possible
wastage of cycles and I/O on the server's end).

Having said that, I think this different from the ls-remote case. There,
the server is awaiting another request from the user before sending more
information, but here, the server intends to send everything at once.
diff mbox series

Patch

diff --git a/Documentation/config/clone.txt b/Documentation/config/clone.txt
index 47de36a5fedf..50ebc170bb81 100644
--- a/Documentation/config/clone.txt
+++ b/Documentation/config/clone.txt
@@ -2,3 +2,7 @@  clone.defaultRemoteName::
 	The name of the remote to create when cloning a repository.  Defaults to
 	`origin`, and can be overridden by passing the `--origin` command-line
 	option to linkgit:git-clone[1].
+
+clone.rejectshallow::
+	Reject to clone a repository if it is a shallow one, can be overridden by
+	passing option `--reject-shallow` in command line. See linkgit:git-clone[1]
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 02d9c19cec75..cb458123eef6 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -15,7 +15,7 @@  SYNOPSIS
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch] [--no-tags]
 	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
-	  [--[no-]remote-submodules] [--jobs <n>] [--sparse]
+	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--[no-]reject-shallow]
 	  [--filter=<filter>] [--] <repository>
 	  [<directory>]
 
@@ -149,6 +149,11 @@  objects from the source repository into a pack in the cloned repository.
 --no-checkout::
 	No checkout of HEAD is performed after the clone is complete.
 
+--[no-]reject-shallow::
+	Fail if the source repository is a shallow repository. The
+	'clone.rejectshallow' configuration variable can be used to
+	give the default.
+
 --bare::
 	Make a 'bare' Git repository.  That is, instead of
 	creating `<directory>` and placing the administrative
diff --git a/builtin/clone.c b/builtin/clone.c
index 51e844a2de0a..6807eb954366 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -32,6 +32,7 @@ 
 #include "connected.h"
 #include "packfile.h"
 #include "list-objects-filter-options.h"
+#include "shallow.h"
 
 /*
  * Overall FIXMEs:
@@ -50,6 +51,8 @@  static int option_no_checkout, option_bare, option_mirror, option_single_branch
 static int option_local = -1, option_no_hardlinks, option_shared;
 static int option_no_tags;
 static int option_shallow_submodules;
+static int option_shallow = -1;    /* unspecified */
+static int config_shallow = -1;    /* unspecified */
 static int deepen;
 static char *option_template, *option_depth, *option_since;
 static char *option_origin = NULL;
@@ -90,6 +93,8 @@  static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
 	OPT_BOOL(0, "progress", &option_progress,
 		 N_("force progress reporting")),
+	OPT_BOOL(0, "reject-shallow", &option_shallow,
+		 N_("don't clone shallow repository")),
 	OPT_BOOL('n', "no-checkout", &option_no_checkout,
 		 N_("don't create a checkout")),
 	OPT_BOOL(0, "bare", &option_bare, N_("create a bare repository")),
@@ -858,6 +863,9 @@  static int git_clone_config(const char *k, const char *v, void *cb)
 		free(remote_name);
 		remote_name = xstrdup(v);
 	}
+	if (!strcmp(k, "clone.rejectshallow")) {
+		config_shallow = git_config_bool(k, v);
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -963,6 +971,8 @@  static int path_exists(const char *path)
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
 	int is_bundle = 0, is_local;
+	int local_shallow = 0;
+	int reject_shallow = 0;
 	const char *repo_name, *repo, *work_tree, *git_dir;
 	char *path, *dir, *display_repo = NULL;
 	int dest_exists, real_dest_exists = 0;
@@ -1156,6 +1166,16 @@  int cmd_clone(int argc, const char **argv, const char *prefix)
 	 */
 	git_config(git_clone_config, NULL);
 
+	/*
+	 * If option_shallow is specified from CLI option,
+	 * ignore config_shallow from git_clone_config.
+	 */
+	if (config_shallow != -1) {
+		reject_shallow = config_shallow;
+	}
+	if (option_shallow != -1) {
+		reject_shallow = option_shallow;
+	}
 	/*
 	 * apply the remote name provided by --origin only after this second
 	 * call to git_config, to ensure it overrides all config-based values.
@@ -1216,6 +1236,7 @@  int cmd_clone(int argc, const char **argv, const char *prefix)
 		if (filter_options.choice)
 			warning(_("--filter is ignored in local clones; use file:// instead."));
 		if (!access(mkpath("%s/shallow", path), F_OK)) {
+			local_shallow = 1;
 			if (option_local > 0)
 				warning(_("source repository is shallow, ignoring --local"));
 			is_local = 0;
@@ -1363,6 +1384,12 @@  int cmd_clone(int argc, const char **argv, const char *prefix)
 			goto cleanup;
 	}
 
+	if (reject_shallow) {
+		if (local_shallow || is_repository_shallow(the_repository)) {
+			die(_("source repository is shallow, reject to clone."));
+		}
+	}
+
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
 			   branch_top.buf, reflog_msg.buf, transport,
 			   !is_local);
diff --git a/t/t5606-clone-options.sh b/t/t5606-clone-options.sh
index 52e5789fb050..6170d0513227 100755
--- a/t/t5606-clone-options.sh
+++ b/t/t5606-clone-options.sh
@@ -5,6 +5,8 @@  GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
 
 test_expect_success 'setup' '
 
@@ -45,6 +47,51 @@  test_expect_success 'disallows --bare with --separate-git-dir' '
 
 '
 
+test_expect_success 'fail to clone http shallow repository' '
+	git clone --depth=1 --no-local parent shallow-repo &&
+	git clone --bare --no-local shallow-repo "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
+	test_must_fail git clone --reject-shallow $HTTPD_URL/smart/repo.git out 2>err &&
+	test_i18ngrep -e "source repository is shallow, reject to clone." err
+
+'
+
+test_expect_success 'fail to clone shallow repository' '
+	rm -rf shallow-repo &&
+	git clone --depth=1 --no-local parent shallow-repo &&
+	test_must_fail git clone --reject-shallow shallow-repo out 2>err &&
+	test_i18ngrep -e "source repository is shallow, reject to clone." err
+
+'
+
+test_expect_success 'fail to clone non-local shallow repository' '
+	rm -rf shallow-repo &&
+	git clone --depth=1 --no-local parent shallow-repo &&
+	test_must_fail git clone --reject-shallow --no-local shallow-repo out 2>err &&
+	test_i18ngrep -e "source repository is shallow, reject to clone." err
+
+'
+
+test_expect_success 'clone shallow repository with --no-reject-shallow' '
+	rm -rf shallow-repo &&
+	git clone --depth=1 --no-local parent shallow-repo &&
+	git clone --no-reject-shallow --no-local shallow-repo clone-repo
+
+'
+
+test_expect_success 'clone normal repository with --reject-shallow' '
+	rm -rf clone-repo &&
+	git clone --no-local parent normal-repo &&
+	git clone --reject-shallow --no-local normal-repo clone-repo
+
+'
+
+test_expect_success 'unspecified any configs or options' '
+	rm -rf shallow-repo clone-repo &&
+	git clone --depth=1 --no-local parent shallow-repo &&
+	git clone shallow-repo clone-repo
+
+'
+
 test_expect_success 'uses "origin" for default remote name' '
 
 	git clone parent clone-default-origin &&
diff --git a/t/t5611-clone-config.sh b/t/t5611-clone-config.sh
index 9f555b87ecdf..18268256bfe0 100755
--- a/t/t5611-clone-config.sh
+++ b/t/t5611-clone-config.sh
@@ -95,6 +95,38 @@  test_expect_success 'clone -c remote.<remote>.fetch=<refspec> --origin=<name>' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone.rejectshallow=true should fail to clone' '
+	rm -rf child &&
+	git clone --depth=1 --no-local . child &&
+	test_must_fail git -c clone.rejectshallow=true clone --no-local child out 2>err &&
+	test_i18ngrep -e "source repository is shallow, reject to clone." err
+'
+
+test_expect_success 'clone.rejectshallow=false should succeed' '
+	rm -rf child &&
+	git clone --depth=1 --no-local . child &&
+	git -c clone.rejectshallow=false clone --no-local child out
+'
+
+test_expect_success 'clone.rejectshallow=true should succeed with normal repo' '
+	rm -rf child out &&
+	git clone --no-local . child &&
+	git -c clone.rejectshallow=true clone --no-local child out
+'
+
+test_expect_success 'option --reject-shallow override clone.rejectshallow' '
+	rm -rf child out &&
+	git clone --depth=1 --no-local . child &&
+	test_must_fail git -c clone.rejectshallow=false clone --reject-shallow --no-local child out 2>err &&
+	test_i18ngrep -e "source repository is shallow, reject to clone." err
+'
+
+test_expect_success 'option --no-reject-shallow override clone.rejectshallow' '
+	rm -rf child &&
+	git clone --depth=1 --no-local . child &&
+	git -c clone.rejectshallow=true clone --no-reject-shallow --no-local child out
+'
+
 test_expect_success MINGW 'clone -c core.hideDotFiles' '
 	test_commit attributes .gitattributes "" &&
 	rm -rf child &&