Message ID | 20230211222353.1984150-1-andy@halogix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] upload_pack.c: make deepen-not more tree-ish | expand |
Sorry to spam the list with this patch twice, I failed to follow the instructions correctly the first time and sent a diff with the whitespace stipped. - Andrew On Sat, Feb 11, 2023 at 2:23 PM Andrew Wansink <andy@halogix.com> wrote: > > This unlocks `git clone --shallow-exclude=<commit-sha1>` > > git-clone only accepts --shallow-excude arguments where > the argument is a branch or tag because upload_pack only > searches deepen-not arguments for branches and tags. > > Make process_deepen_not search for commit objects if no > branch or tag is found then add them to the deepen_not > list. > > Signed-off-by: Andrew Wansink <wansink@uber.com> > --- > > At Uber we have a lot of patches in CI simultaneously, > the CI jobs will frequently clone the monorepo multiple > times for each patch. They do this to calculate diffs > between a patch and its parent commit. > > One optimisation in this flow is to clone only to a specific > depth, this may or may not work, depending on how old the > patch is. In this case we have to --unshallow or discard > the shallow clone and fully clone the repo. > > This patch would allow us to clone to exactly the depth we > need to find a patch's parent commit. > > t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++ > upload-pack.c | 35 +++++++++++++++++++++++++++++++---- > 2 files changed, 61 insertions(+), 4 deletions(-) > > diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh > index d18f2823d86..8d5045cc1b9 100755 > --- a/t/t5500-fetch-pack.sh > +++ b/t/t5500-fetch-pack.sh > @@ -899,6 +899,36 @@ test_expect_success 'shallow clone exclude tag two' ' > ) > ' > > +test_expect_success 'shallow clone exclude commit' ' > + test_create_repo shallow-exclude-commit && > + ( > + cd shallow-exclude-commit && > + test_commit one && > + test_commit two && > + test_commit three && > + commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) && > + git clone --shallow-exclude=${commit_two_sha1} "file://$(pwd)/." ../shallow3-by-commit && > + git -C ../shallow3-by-commit log --pretty=tformat:%s HEAD >actual && > + git log -n 1 --pretty=tformat:%s HEAD >expected && > + test_cmp expected actual > + ) > +' > + > +test_expect_success 'shallow clone exclude commit^' ' > + test_create_repo shallow-exclude-commit-carat && > + ( > + cd shallow-exclude-commit-carat && > + test_commit one && > + test_commit two && > + test_commit three && > + commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) && > + git clone --shallow-exclude=${commit_two_sha1}^ "file://$(pwd)/." ../shallow23-by-commit && > + git -C ../shallow23-by-commit log --pretty=tformat:%s HEAD >actual && > + git log -n 2 --pretty=tformat:%s HEAD >expected && > + test_cmp expected actual > + ) > +' > + > test_expect_success 'fetch exclude tag one' ' > git -C shallow12 fetch --shallow-exclude one origin && > git -C shallow12 log --pretty=tformat:%s origin/main >actual && > diff --git a/upload-pack.c b/upload-pack.c > index 551f22ffa5d..0c8594f4744 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -985,10 +985,37 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not, > if (skip_prefix(line, "deepen-not ", &arg)) { > char *ref = NULL; > struct object_id oid; > - if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1) > - die("git upload-pack: ambiguous deepen-not: %s", line); > - string_list_append(deepen_not, ref); > - free(ref); > + > + switch (expand_ref(the_repository, arg, strlen(arg), &oid, &ref)) { > + case 1: > + // tag or branch matching arg found > + string_list_append(deepen_not, ref); > + free(ref); > + break; > + case 0: { > + // no tags or branches matching arg > + struct object *obj = NULL; > + struct commit *commit = NULL; > + > + if (get_oid(arg, &oid)) > + die("git upload-pack: deepen-not: no ref or object %s", arg); > + > + obj = parse_object(the_repository, &oid); > + if (!obj) > + die("git upload-pack: deepen-not: object could not be parsed: %s", arg); > + > + commit = (struct commit *)peel_to_type(arg, 0, obj, OBJ_COMMIT); > + if (!commit) > + die("git upload-pack: deepen-not: object not a commit: %s", arg); > + > + string_list_append(deepen_not, oid_to_hex(&commit->object.oid)); > + break; > + } > + default: > + // more than 1 tag or branch matches arg > + die("git upload-pack: ambiguous deepen-not: %s", arg); > + } > + > *deepen_rev_list = 1; > return 1; > } > -- > 2.39.1 >
Re-send to the Git mailing-list as setting a font on gmail switched plain-text to HTML and thus, got blocked by mailing-list. On Sun, Feb 12, 2023 at 3:09 PM Son Luong Ngoc <sluongng@gmail.com> wrote: > > Hi Andrew, > > On Sat, Feb 11, 2023 at 11:49 PM Andrew Wansink <andy@halogix.com> wrote: > > > > This unlocks `git clone --shallow-exclude=<commit-sha1>` > > > > git-clone only accepts --shallow-excude arguments where > > the argument is a branch or tag because upload_pack only > > searches deepen-not arguments for branches and tags. > > > > Make process_deepen_not search for commit objects if no > > branch or tag is found then add them to the deepen_not > > list. > > > > Signed-off-by: Andrew Wansink <wansink@uber.com> > > --- > > > > At Uber we have a lot of patches in CI simultaneously, > > the CI jobs will frequently clone the monorepo multiple > > times for each patch. They do this to calculate diffs > > between a patch and its parent commit. > > > > I used to manage a CI system that support monorepo use cases not so long ago. > We had several hosts(VM/Baremetal) on which we spin up containers for CI to run. > > We maintain a bare copy of the monorepo on the host level (cron job / systemd / DaemonSet) and mount this as read-only into each of the CI containers. > > Each of the CI containers would attempt to clone/fetch the monorepo with `--reference-if-able ./path/to/read-only-mount/repo.git` (1) > So that most of the needed objects are already on disk in the shared bare repo. > > > +-----------+ +-----------+ +-----------+ > | container | | container | | container | > +-----------+ +-----------+ +-----------+ > \ | / > (mount) \ | / > +------------+ +--------+ > | bare-repo | <-------------- | Remote | > +------------+ (git-fetch) +--------+ > | > | (maintain) > | > +----------+ > | cron-job | > +----------+ > > (forgive my horrible drawing) > > With this setup, we did not have a need to shallow clone any longer, > and our git-clone in each container is simply a combination of git-ls-remote and a very light-weighted git-fetch. > In some cases, such as a job in the later stages of a CI pipeline, > the host would already download all the needed objects into the bare copy of the repository. > This lets us skip git-fetch entirely when the CI container executes. > > Compared to the shallow clone approach, > our "local cache" approach sped up the clone speed drastically > while allowing developers to interact with git history inside tests a lot easier. > > > One optimisation in this flow is to clone only to a specific > > depth, this may or may not work, depending on how old the > > patch is. In this case we have to --unshallow or discard > > the shallow clone and fully clone the repo. > > > > This patch would allow us to clone to exactly the depth we > > need to find a patch's parent commit. > > Hope it helps, > Son Luong. > > (1): https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index d18f2823d86..8d5045cc1b9 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -899,6 +899,36 @@ test_expect_success 'shallow clone exclude tag two' ' ) ' +test_expect_success 'shallow clone exclude commit' ' + test_create_repo shallow-exclude-commit && + ( + cd shallow-exclude-commit && + test_commit one && + test_commit two && + test_commit three && + commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) && + git clone --shallow-exclude=${commit_two_sha1} "file://$(pwd)/." ../shallow3-by-commit && + git -C ../shallow3-by-commit log --pretty=tformat:%s HEAD >actual && + git log -n 1 --pretty=tformat:%s HEAD >expected && + test_cmp expected actual + ) +' + +test_expect_success 'shallow clone exclude commit^' ' + test_create_repo shallow-exclude-commit-carat && + ( + cd shallow-exclude-commit-carat && + test_commit one && + test_commit two && + test_commit three && + commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) && + git clone --shallow-exclude=${commit_two_sha1}^ "file://$(pwd)/." ../shallow23-by-commit && + git -C ../shallow23-by-commit log --pretty=tformat:%s HEAD >actual && + git log -n 2 --pretty=tformat:%s HEAD >expected && + test_cmp expected actual + ) +' + test_expect_success 'fetch exclude tag one' ' git -C shallow12 fetch --shallow-exclude one origin && git -C shallow12 log --pretty=tformat:%s origin/main >actual && diff --git a/upload-pack.c b/upload-pack.c index 551f22ffa5d..0c8594f4744 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -985,10 +985,37 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not, if (skip_prefix(line, "deepen-not ", &arg)) { char *ref = NULL; struct object_id oid; - if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1) - die("git upload-pack: ambiguous deepen-not: %s", line); - string_list_append(deepen_not, ref); - free(ref); + + switch (expand_ref(the_repository, arg, strlen(arg), &oid, &ref)) { + case 1: + // tag or branch matching arg found + string_list_append(deepen_not, ref); + free(ref); + break; + case 0: { + // no tags or branches matching arg + struct object *obj = NULL; + struct commit *commit = NULL; + + if (get_oid(arg, &oid)) + die("git upload-pack: deepen-not: no ref or object %s", arg); + + obj = parse_object(the_repository, &oid); + if (!obj) + die("git upload-pack: deepen-not: object could not be parsed: %s", arg); + + commit = (struct commit *)peel_to_type(arg, 0, obj, OBJ_COMMIT); + if (!commit) + die("git upload-pack: deepen-not: object not a commit: %s", arg); + + string_list_append(deepen_not, oid_to_hex(&commit->object.oid)); + break; + } + default: + // more than 1 tag or branch matches arg + die("git upload-pack: ambiguous deepen-not: %s", arg); + } + *deepen_rev_list = 1; return 1; }
This unlocks `git clone --shallow-exclude=<commit-sha1>` git-clone only accepts --shallow-excude arguments where the argument is a branch or tag because upload_pack only searches deepen-not arguments for branches and tags. Make process_deepen_not search for commit objects if no branch or tag is found then add them to the deepen_not list. Signed-off-by: Andrew Wansink <wansink@uber.com> --- At Uber we have a lot of patches in CI simultaneously, the CI jobs will frequently clone the monorepo multiple times for each patch. They do this to calculate diffs between a patch and its parent commit. One optimisation in this flow is to clone only to a specific depth, this may or may not work, depending on how old the patch is. In this case we have to --unshallow or discard the shallow clone and fully clone the repo. This patch would allow us to clone to exactly the depth we need to find a patch's parent commit. t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++ upload-pack.c | 35 +++++++++++++++++++++++++++++++---- 2 files changed, 61 insertions(+), 4 deletions(-)