Message ID | pull.865.v4.git.1613891147977.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4] builtin/clone.c: add --reject-shallow option | expand |
[jc: I've CC'ed Jonathan Tan, who is much more knowledgeable than I am on the transport layer issues, to sanity check my assumption] "Li Linchao via GitGitGadget" <gitgitgadget@gmail.com> writes: > @@ -1363,6 +1384,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix) > goto cleanup; > } > > + if (reject_shallow) { > + if (local_shallow || is_repository_shallow(the_repository)) { This may reject to clone from a shallow repository, but at this point the bulk of the tranfer from the origin repository has already happened, no? Rejecting after transferring many megabytes feels a bit too late. That is one of the reasons why I kept hinting that the transport layer needs to be taught an option to reject talking to a shallow counterpart if we want to add this feature [*1*]. Also, wouldn't "clone --depth=1 --reject-shallow" from a repository that is not shallow make the_repository a shallow one at this point and makes it fail? If the goal of the --reject-shallow option were to make sure the resulting repository is not shallow, then that is a technically correct implementation (even though it is wasteful to transfer a full tree worth of megabytes and then abort), but is the feature is explained to reject cloning from a shallow one, then users would be suprised to see ... > + die(_("source repository is shallow, reject to clone.")); ... this message, when cloning from well known publich repositories that are not shallow. I think cloning with --depth=<n> when the source repository is deep enough, should be allowed, so the cleanest solution for the latter may be to notice the combination of options that make the resulting repository shallow (I mentioned --depth=<n>, but there may be others) and the --reject-shallow option and error out before even talking to the other side at the time we parse the command line. Thanks. [Footnote] *1* Looking at Documentation/technical/pack-protocol.txt, "git fetch" seem to learn if the repository is shallow immediately upon contacting "upload-pack" during the Reference Discovery phase (we'd see 'shallow' packets if they are shallow). I suspect that the right solution would be to teach the codepath on the "git fetch" side that accepts, parses, and acts on this packet to optionally stop communication and error out when the caller asks not to talk with a shallow repository.
> This may reject to clone from a shallow repository, but at this > point the bulk of the tranfer from the origin repository has already > happened, no? Rejecting after transferring many megabytes feels a > bit too late. That is one of the reasons why I kept hinting that > the transport layer needs to be taught an option to reject talking > to a shallow counterpart if we want to add this feature [*1*]. Extending the transport layer in this way might not be too difficult in the case of native (SSH, git://) protocols and using protocol v0, since handshake() in transport.c (called indirectly from transport_get_remote_refs()) writes shallow information to a data structure that we could potentially expose for the caller to use (before it calls transport_fetch_refs(). I couldn't see how remote-using protocols (e.g. HTTP) communicate shallow information, though (remote-curl.c seems to just keep it for itself), so that will be a more difficult task. And of course there's the matter of protocol v2, which I discuss below. > [Footnote] > > *1* Looking at Documentation/technical/pack-protocol.txt, "git > fetch" seem to learn if the repository is shallow immediately > upon contacting "upload-pack" during the Reference Discovery > phase (we'd see 'shallow' packets if they are shallow). I > suspect that the right solution would be to teach the codepath > on the "git fetch" side that accepts, parses, and acts on this > packet to optionally stop communication and error out when the > caller asks not to talk with a shallow repository. This is true with protocol v0, but protocol v2 bundles all shallow information (whether coming from the fact that the remote is shallow or the fact that the fetcher specified --depth etc.) and sends them together with the packfile. It may be possible to stop packfile download (saving bandwidth on the client side, at least) once such information is returned, though.
Jonathan Tan <jonathantanmy@google.com> writes: >> This may reject to clone from a shallow repository, but at this >> point the bulk of the tranfer from the origin repository has already >> happened, no? Rejecting after transferring many megabytes feels a >> bit too late. That is one of the reasons why I kept hinting that >> the transport layer needs to be taught an option to reject talking >> to a shallow counterpart if we want to add this feature [*1*]. > > Extending the transport layer in this way might not be too difficult in > the case of native (SSH, git://) protocols and using protocol v0, since > handshake() in transport.c (called indirectly from > transport_get_remote_refs()) writes shallow information to a data > structure that we could potentially expose for the caller to use (before > it calls transport_fetch_refs(). I couldn't see how remote-using > protocols (e.g. HTTP) communicate shallow information, though > (remote-curl.c seems to just keep it for itself), so that will be a more > difficult task. And of course there's the matter of protocol v2, which I > discuss below. > >> [Footnote] >> >> *1* Looking at Documentation/technical/pack-protocol.txt, "git >> fetch" seem to learn if the repository is shallow immediately >> upon contacting "upload-pack" during the Reference Discovery >> phase (we'd see 'shallow' packets if they are shallow). I >> suspect that the right solution would be to teach the codepath >> on the "git fetch" side that accepts, parses, and acts on this >> packet to optionally stop communication and error out when the >> caller asks not to talk with a shallow repository. > > This is true with protocol v0, but protocol v2 bundles all shallow > information (whether coming from the fact that the remote is shallow or > the fact that the fetcher specified --depth etc.) and sends them > together with the packfile. It may be possible to stop packfile download > (saving bandwidth on the client side, at least) once such information is > returned, though. So in short, the "we do not want to clone from a shallow upstream" would not be possible to implement sensibly without significantly cleaning up the protocol layers first, which makes the whole thing pretty much moot. Thanks for a review and insight.
-------------- lilinchao >> This may reject to clone from a shallow repository, but at this >> point the bulk of the tranfer from the origin repository has already >> happened, no? Rejecting after transferring many megabytes feels a >> bit too late. That is one of the reasons why I kept hinting that >> the transport layer needs to be taught an option to reject talking >> to a shallow counterpart if we want to add this feature [*1*]. > >Extending the transport layer in this way might not be too difficult in >the case of native (SSH, git://) protocols and using protocol v0, since >handshake() in transport.c (called indirectly from >transport_get_remote_refs()) writes shallow information to a data >structure that we could potentially expose for the caller to use (before >it calls transport_fetch_refs(). I couldn't see how remote-using >protocols (e.g. HTTP) communicate shallow information, though >(remote-curl.c seems to just keep it for itself), so that will be a more >difficult task. And of course there's the matter of protocol v2, which I >discuss below. > These discussions were based on PATCH_v4, which is quiet immature then, In the latest PATCH_v5, for the case of native protocols(ssh, git, file://), they will eventually call do_fetch_pack_v2() if it's protocol v2, and then will call receive_shallow_info() for the case FETCH_GET_PACK, so this is the place I made the change. As for HTTPs protocl, as long as it's still smart protocol, which means do not fallback to dumb protocol, it will also call do_fetch_pack_v2(), and go to "check shallow info" trigger in receive_shallow_info(). So, base on PATCH_V5, I have tested protocol v2, which goes like this: First, I created a shallow repo on gitlab and gitee respectively. Then tried to clone them in my terminal, (in order not to look too verbose, I ommited the result when GIT_TRACE_PACKET is on). $ ./git-clone --reject-shallow https://gitlab.com/Cactusinhand/rugged.git Cloning into 'rugged'... fatal: source repository is shallow, reject to clone. $ ./git-clone --reject-shallow ssh://gitlab.com/Cactusinhand/rugged.git Cloning into 'rugged'... fatal: source repository is shallow, reject to clone. $ ./git-clone --reject-shallow git@gitlab.com:Cactusinhand/rugged.git Cloning into 'rugged'... fatal: source repository is shallow, reject to clone. $ ./git-clone --reject-shallow https://gitee.com/cactusinhand/rugged.git Cloning into 'rugged'... fatal: source repository is shallow, reject to clone. $ ./git-clone --reject-shallow ssh://gitee.com/cactusinhand/rugged.git Cloning into 'rugged'... fatal: source repository is shallow, reject to clone. $ ./git-clone --reject-shallow git@gitee.com:cactusinhand/rugged.git Cloning into 'rugged'... fatal: source repository is shallow, reject to clone. I haven't tested protocol v1, but I've made the change in do_fetch_pack(), which is for protocol version 0 and version 1. I hope you can review the latest patch, and give me some suggestions. Thanks! >> [Footnote] >> >> *1* Looking at Documentation/technical/pack-protocol.txt, "git >> fetch" seem to learn if the repository is shallow immediately >> upon contacting "upload-pack" during the Reference Discovery >> phase (we'd see 'shallow' packets if they are shallow). I >> suspect that the right solution would be to teach the codepath >> on the "git fetch" side that accepts, parses, and acts on this >> packet to optionally stop communication and error out when the >> caller asks not to talk with a shallow repository. > >This is true with protocol v0, but protocol v2 bundles all shallow >information (whether coming from the fact that the remote is shallow or >the fact that the fetcher specified --depth etc.) and sends them >together with the packfile. It may be possible to stop packfile download >(saving bandwidth on the client side, at least) once such information is >returned, though. >
Jonathan Tan <jonathantanmy@google.com> writes: > This is true with protocol v0, but protocol v2 bundles all shallow > information (whether coming from the fact that the remote is shallow or > the fact that the fetcher specified --depth etc.) and sends them > together with the packfile. By the above do you mean what happens in FETCH_GET_PACK arm where receive_shallow_info() is called when "shallow-info" header is seen, before the code continues to process wanted-refs, packfile-uris and then finally the packfile? I do not think it makes much sense to ask any option to make us shallow (like --depth=<n>) while --reject-shallow is in use (after all, if the other side is deep enough to make us <n> commits deep, there is no reason to reject the other side as the source), so your "whether coming from the fact ..." part, while is a valid observation, can be ignored in practice (meaning: it is OK to make "--reject-shallow" be in effect only when we are trying to make a full clone, and reject combinations of it with --depth=<n> and such at the command parsing time). > It may be possible to stop packfile download (saving bandwidth on > the client side, at least) once such information is returned, > though. Just like "upload-pack" does not get upset by a client that comes only for the initial refs advertisement and disconnects without asking for any "want" (aka "ls-remote"), the server side should be prepared to see if the other side cuts off after seeing the "shallow-info" section header or after seeing the the whole "shallow-info" section, so we should be able to leave early without having to download the bulk data. If the "upload-pack" spends unnecessary cycles when it happens, then we need to fix that. Even if the "fetch" client does not willingly disconnect in the middle, the network disconnect may happen at any point in the exchange, and we'd need to be prepared for it. Do we need to read and parse the "shallow-info" section, or would the mere presense of the section mean the other side knows this side needs to futz with the .git/shallow information (either because we asked to be made shallow with --depth and the like, or because we tried to clone from them and they are shallow)? Thanks.
> Jonathan Tan <jonathantanmy@google.com> writes: > > > This is true with protocol v0, but protocol v2 bundles all shallow > > information (whether coming from the fact that the remote is shallow or > > the fact that the fetcher specified --depth etc.) and sends them > > together with the packfile. > > By the above do you mean what happens in FETCH_GET_PACK arm where > receive_shallow_info() is called when "shallow-info" header is seen, > before the code continues to process wanted-refs, packfile-uris and > then finally the packfile? Yes. > I do not think it makes much sense to ask any option to make us > shallow (like --depth=<n>) while --reject-shallow is in use (after > all, if the other side is deep enough to make us <n> commits deep, > there is no reason to reject the other side as the source), so your > "whether coming from the fact ..." part, while is a valid > observation, can be ignored in practice (meaning: it is OK to make > "--reject-shallow" be in effect only when we are trying to make a > full clone, and reject combinations of it with --depth=<n> and such > at the command parsing time). > > > It may be possible to stop packfile download (saving bandwidth on > > the client side, at least) once such information is returned, > > though. > > Just like "upload-pack" does not get upset by a client that comes > only for the initial refs advertisement and disconnects without > asking for any "want" (aka "ls-remote"), the server side should be > prepared to see if the other side cuts off after seeing the > "shallow-info" section header or after seeing the the whole > "shallow-info" section, so we should be able to leave early without > having to download the bulk data. If the "upload-pack" spends > unnecessary cycles when it happens, then we need to fix that. Even > if the "fetch" client does not willingly disconnect in the middle, > the network disconnect may happen at any point in the exchange, and > we'd need to be prepared for it. > > Do we need to read and parse the "shallow-info" section, or would > the mere presense of the section mean the other side knows this side > needs to futz with the .git/shallow information (either because we > asked to be made shallow with --depth and the like, or because we > tried to clone from them and they are shallow)? Reading the documentation, the mere presence should be enough. Yes, I think upload-pack will spend unnecessary cycles if the channel is terminated halfway (and I don't know if we can prevent spending these cycles, since I/O can be buffered). I think it should be possible for the client to cut off when it sees shallow-info (besides the possible wastage of cycles and I/O on the server's end). Having said that, I think this different from the ls-remote case. There, the server is awaiting another request from the user before sending more information, but here, the server intends to send everything at once.
diff --git a/Documentation/config/clone.txt b/Documentation/config/clone.txt index 47de36a5fedf..50ebc170bb81 100644 --- a/Documentation/config/clone.txt +++ b/Documentation/config/clone.txt @@ -2,3 +2,7 @@ clone.defaultRemoteName:: The name of the remote to create when cloning a repository. Defaults to `origin`, and can be overridden by passing the `--origin` command-line option to linkgit:git-clone[1]. + +clone.rejectshallow:: + Reject to clone a repository if it is a shallow one, can be overridden by + passing option `--reject-shallow` in command line. See linkgit:git-clone[1] diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt index 02d9c19cec75..cb458123eef6 100644 --- a/Documentation/git-clone.txt +++ b/Documentation/git-clone.txt @@ -15,7 +15,7 @@ SYNOPSIS [--dissociate] [--separate-git-dir <git dir>] [--depth <depth>] [--[no-]single-branch] [--no-tags] [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules] - [--[no-]remote-submodules] [--jobs <n>] [--sparse] + [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--[no-]reject-shallow] [--filter=<filter>] [--] <repository> [<directory>] @@ -149,6 +149,11 @@ objects from the source repository into a pack in the cloned repository. --no-checkout:: No checkout of HEAD is performed after the clone is complete. +--[no-]reject-shallow:: + Fail if the source repository is a shallow repository. The + 'clone.rejectshallow' configuration variable can be used to + give the default. + --bare:: Make a 'bare' Git repository. That is, instead of creating `<directory>` and placing the administrative diff --git a/builtin/clone.c b/builtin/clone.c index 51e844a2de0a..6807eb954366 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -32,6 +32,7 @@ #include "connected.h" #include "packfile.h" #include "list-objects-filter-options.h" +#include "shallow.h" /* * Overall FIXMEs: @@ -50,6 +51,8 @@ static int option_no_checkout, option_bare, option_mirror, option_single_branch static int option_local = -1, option_no_hardlinks, option_shared; static int option_no_tags; static int option_shallow_submodules; +static int option_shallow = -1; /* unspecified */ +static int config_shallow = -1; /* unspecified */ static int deepen; static char *option_template, *option_depth, *option_since; static char *option_origin = NULL; @@ -90,6 +93,8 @@ static struct option builtin_clone_options[] = { OPT__VERBOSITY(&option_verbosity), OPT_BOOL(0, "progress", &option_progress, N_("force progress reporting")), + OPT_BOOL(0, "reject-shallow", &option_shallow, + N_("don't clone shallow repository")), OPT_BOOL('n', "no-checkout", &option_no_checkout, N_("don't create a checkout")), OPT_BOOL(0, "bare", &option_bare, N_("create a bare repository")), @@ -858,6 +863,9 @@ static int git_clone_config(const char *k, const char *v, void *cb) free(remote_name); remote_name = xstrdup(v); } + if (!strcmp(k, "clone.rejectshallow")) { + config_shallow = git_config_bool(k, v); + } return git_default_config(k, v, cb); } @@ -963,6 +971,8 @@ static int path_exists(const char *path) int cmd_clone(int argc, const char **argv, const char *prefix) { int is_bundle = 0, is_local; + int local_shallow = 0; + int reject_shallow = 0; const char *repo_name, *repo, *work_tree, *git_dir; char *path, *dir, *display_repo = NULL; int dest_exists, real_dest_exists = 0; @@ -1156,6 +1166,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix) */ git_config(git_clone_config, NULL); + /* + * If option_shallow is specified from CLI option, + * ignore config_shallow from git_clone_config. + */ + if (config_shallow != -1) { + reject_shallow = config_shallow; + } + if (option_shallow != -1) { + reject_shallow = option_shallow; + } /* * apply the remote name provided by --origin only after this second * call to git_config, to ensure it overrides all config-based values. @@ -1216,6 +1236,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) if (filter_options.choice) warning(_("--filter is ignored in local clones; use file:// instead.")); if (!access(mkpath("%s/shallow", path), F_OK)) { + local_shallow = 1; if (option_local > 0) warning(_("source repository is shallow, ignoring --local")); is_local = 0; @@ -1363,6 +1384,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix) goto cleanup; } + if (reject_shallow) { + if (local_shallow || is_repository_shallow(the_repository)) { + die(_("source repository is shallow, reject to clone.")); + } + } + update_remote_refs(refs, mapped_refs, remote_head_points_at, branch_top.buf, reflog_msg.buf, transport, !is_local); diff --git a/t/t5606-clone-options.sh b/t/t5606-clone-options.sh index 52e5789fb050..6170d0513227 100755 --- a/t/t5606-clone-options.sh +++ b/t/t5606-clone-options.sh @@ -5,6 +5,8 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. "$TEST_DIRECTORY"/lib-httpd.sh +start_httpd test_expect_success 'setup' ' @@ -45,6 +47,51 @@ test_expect_success 'disallows --bare with --separate-git-dir' ' ' +test_expect_success 'fail to clone http shallow repository' ' + git clone --depth=1 --no-local parent shallow-repo && + git clone --bare --no-local shallow-repo "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" && + test_must_fail git clone --reject-shallow $HTTPD_URL/smart/repo.git out 2>err && + test_i18ngrep -e "source repository is shallow, reject to clone." err + +' + +test_expect_success 'fail to clone shallow repository' ' + rm -rf shallow-repo && + git clone --depth=1 --no-local parent shallow-repo && + test_must_fail git clone --reject-shallow shallow-repo out 2>err && + test_i18ngrep -e "source repository is shallow, reject to clone." err + +' + +test_expect_success 'fail to clone non-local shallow repository' ' + rm -rf shallow-repo && + git clone --depth=1 --no-local parent shallow-repo && + test_must_fail git clone --reject-shallow --no-local shallow-repo out 2>err && + test_i18ngrep -e "source repository is shallow, reject to clone." err + +' + +test_expect_success 'clone shallow repository with --no-reject-shallow' ' + rm -rf shallow-repo && + git clone --depth=1 --no-local parent shallow-repo && + git clone --no-reject-shallow --no-local shallow-repo clone-repo + +' + +test_expect_success 'clone normal repository with --reject-shallow' ' + rm -rf clone-repo && + git clone --no-local parent normal-repo && + git clone --reject-shallow --no-local normal-repo clone-repo + +' + +test_expect_success 'unspecified any configs or options' ' + rm -rf shallow-repo clone-repo && + git clone --depth=1 --no-local parent shallow-repo && + git clone shallow-repo clone-repo + +' + test_expect_success 'uses "origin" for default remote name' ' git clone parent clone-default-origin && diff --git a/t/t5611-clone-config.sh b/t/t5611-clone-config.sh index 9f555b87ecdf..18268256bfe0 100755 --- a/t/t5611-clone-config.sh +++ b/t/t5611-clone-config.sh @@ -95,6 +95,38 @@ test_expect_success 'clone -c remote.<remote>.fetch=<refspec> --origin=<name>' ' test_cmp expect actual ' +test_expect_success 'clone.rejectshallow=true should fail to clone' ' + rm -rf child && + git clone --depth=1 --no-local . child && + test_must_fail git -c clone.rejectshallow=true clone --no-local child out 2>err && + test_i18ngrep -e "source repository is shallow, reject to clone." err +' + +test_expect_success 'clone.rejectshallow=false should succeed' ' + rm -rf child && + git clone --depth=1 --no-local . child && + git -c clone.rejectshallow=false clone --no-local child out +' + +test_expect_success 'clone.rejectshallow=true should succeed with normal repo' ' + rm -rf child out && + git clone --no-local . child && + git -c clone.rejectshallow=true clone --no-local child out +' + +test_expect_success 'option --reject-shallow override clone.rejectshallow' ' + rm -rf child out && + git clone --depth=1 --no-local . child && + test_must_fail git -c clone.rejectshallow=false clone --reject-shallow --no-local child out 2>err && + test_i18ngrep -e "source repository is shallow, reject to clone." err +' + +test_expect_success 'option --no-reject-shallow override clone.rejectshallow' ' + rm -rf child && + git clone --depth=1 --no-local . child && + git -c clone.rejectshallow=true clone --no-reject-shallow --no-local child out +' + test_expect_success MINGW 'clone -c core.hideDotFiles' ' test_commit attributes .gitattributes "" && rm -rf child &&