diff mbox series

[RFC] upload_pack.c: make deepen-not more tree-ish

Message ID 20230211222353.1984150-1-andy@halogix.com (mailing list archive)
State New, archived
Headers show
Series [RFC] upload_pack.c: make deepen-not more tree-ish | expand

Commit Message

Andrew Wansink Feb. 11, 2023, 10:23 p.m. UTC
This unlocks `git clone --shallow-exclude=<commit-sha1>`

git-clone only accepts --shallow-excude arguments where
the argument is a branch or tag because upload_pack only
searches deepen-not arguments for branches and tags.

Make process_deepen_not search for commit objects if no
branch or tag is found then add them to the deepen_not
list.

Signed-off-by: Andrew Wansink <wansink@uber.com>
---

At Uber we have a lot of patches in CI simultaneously,
the CI jobs will frequently clone the monorepo multiple
times for each patch.  They do this to calculate diffs
between a patch and its parent commit.

One optimisation in this flow is to clone only to a specific
depth, this may or may not work, depending on how old the 
patch is.  In this case we have to --unshallow or discard
the shallow clone and fully clone the repo.

This patch would allow us to clone to exactly the depth we
need to find a patch's parent commit.

 t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++
 upload-pack.c         | 35 +++++++++++++++++++++++++++++++----
 2 files changed, 61 insertions(+), 4 deletions(-)

Comments

Andrew Wansink Feb. 11, 2023, 10:40 p.m. UTC | #1
Sorry to spam the list with this patch twice, I failed to follow the
instructions correctly the first time and sent a diff with the
whitespace stipped.

- Andrew


On Sat, Feb 11, 2023 at 2:23 PM Andrew Wansink <andy@halogix.com> wrote:
>
> This unlocks `git clone --shallow-exclude=<commit-sha1>`
>
> git-clone only accepts --shallow-excude arguments where
> the argument is a branch or tag because upload_pack only
> searches deepen-not arguments for branches and tags.
>
> Make process_deepen_not search for commit objects if no
> branch or tag is found then add them to the deepen_not
> list.
>
> Signed-off-by: Andrew Wansink <wansink@uber.com>
> ---
>
> At Uber we have a lot of patches in CI simultaneously,
> the CI jobs will frequently clone the monorepo multiple
> times for each patch.  They do this to calculate diffs
> between a patch and its parent commit.
>
> One optimisation in this flow is to clone only to a specific
> depth, this may or may not work, depending on how old the
> patch is.  In this case we have to --unshallow or discard
> the shallow clone and fully clone the repo.
>
> This patch would allow us to clone to exactly the depth we
> need to find a patch's parent commit.
>
>  t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++
>  upload-pack.c         | 35 +++++++++++++++++++++++++++++++----
>  2 files changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index d18f2823d86..8d5045cc1b9 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -899,6 +899,36 @@ test_expect_success 'shallow clone exclude tag two' '
>         )
>  '
>
> +test_expect_success 'shallow clone exclude commit' '
> +       test_create_repo shallow-exclude-commit &&
> +       (
> +       cd shallow-exclude-commit &&
> +       test_commit one &&
> +       test_commit two &&
> +       test_commit three &&
> +       commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
> +       git clone --shallow-exclude=${commit_two_sha1} "file://$(pwd)/." ../shallow3-by-commit &&
> +       git -C ../shallow3-by-commit log --pretty=tformat:%s HEAD >actual &&
> +       git log -n 1 --pretty=tformat:%s HEAD >expected &&
> +       test_cmp expected actual
> +       )
> +'
> +
> +test_expect_success 'shallow clone exclude commit^' '
> +       test_create_repo shallow-exclude-commit-carat &&
> +       (
> +       cd shallow-exclude-commit-carat &&
> +       test_commit one &&
> +       test_commit two &&
> +       test_commit three &&
> +       commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
> +       git clone --shallow-exclude=${commit_two_sha1}^ "file://$(pwd)/." ../shallow23-by-commit &&
> +       git -C ../shallow23-by-commit log --pretty=tformat:%s HEAD >actual &&
> +       git log -n 2 --pretty=tformat:%s HEAD >expected &&
> +       test_cmp expected actual
> +       )
> +'
> +
>  test_expect_success 'fetch exclude tag one' '
>         git -C shallow12 fetch --shallow-exclude one origin &&
>         git -C shallow12 log --pretty=tformat:%s origin/main >actual &&
> diff --git a/upload-pack.c b/upload-pack.c
> index 551f22ffa5d..0c8594f4744 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -985,10 +985,37 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not,
>         if (skip_prefix(line, "deepen-not ", &arg)) {
>                 char *ref = NULL;
>                 struct object_id oid;
> -               if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1)
> -                       die("git upload-pack: ambiguous deepen-not: %s", line);
> -               string_list_append(deepen_not, ref);
> -               free(ref);
> +
> +               switch (expand_ref(the_repository, arg, strlen(arg), &oid, &ref)) {
> +               case 1:
> +                       // tag or branch matching arg found
> +                       string_list_append(deepen_not, ref);
> +                       free(ref);
> +                       break;
> +               case 0: {
> +                       // no tags or branches matching arg
> +                       struct object *obj = NULL;
> +                       struct commit *commit = NULL;
> +
> +                       if (get_oid(arg, &oid))
> +                               die("git upload-pack: deepen-not: no ref or object %s", arg);
> +
> +                       obj = parse_object(the_repository, &oid);
> +                       if (!obj)
> +                               die("git upload-pack: deepen-not: object could not be parsed: %s", arg);
> +
> +                       commit = (struct commit *)peel_to_type(arg, 0, obj, OBJ_COMMIT);
> +                       if (!commit)
> +                               die("git upload-pack: deepen-not: object not a commit: %s", arg);
> +
> +                       string_list_append(deepen_not, oid_to_hex(&commit->object.oid));
> +                       break;
> +               }
> +               default:
> +                       // more than 1 tag or branch matches arg
> +                       die("git upload-pack: ambiguous deepen-not: %s", arg);
> +               }
> +
>                 *deepen_rev_list = 1;
>                 return 1;
>         }
> --
> 2.39.1
>
Son Luong Ngoc Feb. 12, 2023, 2:12 p.m. UTC | #2
Re-send to the Git mailing-list as setting a font on gmail switched
plain-text to HTML and thus, got blocked by mailing-list.

On Sun, Feb 12, 2023 at 3:09 PM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Andrew,
>
> On Sat, Feb 11, 2023 at 11:49 PM Andrew Wansink <andy@halogix.com> wrote:
> >
> > This unlocks `git clone --shallow-exclude=<commit-sha1>`
> >
> > git-clone only accepts --shallow-excude arguments where
> > the argument is a branch or tag because upload_pack only
> > searches deepen-not arguments for branches and tags.
> >
> > Make process_deepen_not search for commit objects if no
> > branch or tag is found then add them to the deepen_not
> > list.
> >
> > Signed-off-by: Andrew Wansink <wansink@uber.com>
> > ---
> >
> > At Uber we have a lot of patches in CI simultaneously,
> > the CI jobs will frequently clone the monorepo multiple
> > times for each patch.  They do this to calculate diffs
> > between a patch and its parent commit.
> >
>
> I used to manage a CI system that support monorepo use cases not so long ago.
> We had several hosts(VM/Baremetal) on which we spin up containers for CI to run.
>
> We maintain a bare copy of the monorepo on the host level (cron job / systemd / DaemonSet) and mount this as read-only into each of the CI containers.
>
> Each of the CI containers would attempt to clone/fetch the monorepo with `--reference-if-able ./path/to/read-only-mount/repo.git` (1)
> So that most of the needed objects are already on disk in the shared bare repo.
>
>
> +-----------+  +-----------+  +-----------+
> | container |  | container |  | container |
> +-----------+  +-----------+  +-----------+
>              \       |       /
>       (mount) \      |      /
>               +------------+                 +--------+
>               | bare-repo  | <-------------- | Remote |
>               +------------+   (git-fetch)   +--------+
>                     |
>                     | (maintain)
>                     |
>               +----------+
>               | cron-job |
>               +----------+
>
> (forgive my horrible drawing)
>
> With this setup, we did not have a need to shallow clone any longer,
> and our git-clone in each container is simply a combination of git-ls-remote and a very light-weighted git-fetch.
> In some cases, such as a job in the later stages of a CI pipeline,
> the host would already download all the needed objects into the bare copy of the repository.
> This lets us skip git-fetch entirely when the CI container executes.
>
> Compared to the shallow clone approach,
> our "local cache" approach sped up the clone speed drastically
> while allowing developers to interact with git history inside tests a lot easier.
>
> > One optimisation in this flow is to clone only to a specific
> > depth, this may or may not work, depending on how old the
> > patch is.  In this case we have to --unshallow or discard
> > the shallow clone and fully clone the repo.
> >
> > This patch would allow us to clone to exactly the depth we
> > need to find a patch's parent commit.
>
> Hope it helps,
> Son Luong.
>
> (1): https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt
diff mbox series

Patch

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index d18f2823d86..8d5045cc1b9 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -899,6 +899,36 @@  test_expect_success 'shallow clone exclude tag two' '
 	)
 '
 
+test_expect_success 'shallow clone exclude commit' '
+	test_create_repo shallow-exclude-commit &&
+	(
+	cd shallow-exclude-commit &&
+	test_commit one &&
+	test_commit two &&
+	test_commit three &&
+	commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
+	git clone --shallow-exclude=${commit_two_sha1} "file://$(pwd)/." ../shallow3-by-commit &&
+	git -C ../shallow3-by-commit log --pretty=tformat:%s HEAD >actual &&
+	git log -n 1 --pretty=tformat:%s HEAD >expected &&
+	test_cmp expected actual
+	)
+'
+
+test_expect_success 'shallow clone exclude commit^' '
+	test_create_repo shallow-exclude-commit-carat &&
+	(
+	cd shallow-exclude-commit-carat &&
+	test_commit one &&
+	test_commit two &&
+	test_commit three &&
+	commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
+	git clone --shallow-exclude=${commit_two_sha1}^ "file://$(pwd)/." ../shallow23-by-commit &&
+	git -C ../shallow23-by-commit log --pretty=tformat:%s HEAD >actual &&
+	git log -n 2 --pretty=tformat:%s HEAD >expected &&
+	test_cmp expected actual
+	)
+'
+
 test_expect_success 'fetch exclude tag one' '
 	git -C shallow12 fetch --shallow-exclude one origin &&
 	git -C shallow12 log --pretty=tformat:%s origin/main >actual &&
diff --git a/upload-pack.c b/upload-pack.c
index 551f22ffa5d..0c8594f4744 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -985,10 +985,37 @@  static int process_deepen_not(const char *line, struct string_list *deepen_not,
 	if (skip_prefix(line, "deepen-not ", &arg)) {
 		char *ref = NULL;
 		struct object_id oid;
-		if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1)
-			die("git upload-pack: ambiguous deepen-not: %s", line);
-		string_list_append(deepen_not, ref);
-		free(ref);
+
+		switch (expand_ref(the_repository, arg, strlen(arg), &oid, &ref)) {
+		case 1:
+			// tag or branch matching arg found
+			string_list_append(deepen_not, ref);
+			free(ref);
+			break;
+		case 0: {
+			// no tags or branches matching arg
+			struct object *obj = NULL;
+			struct commit *commit = NULL;
+
+			if (get_oid(arg, &oid))
+				die("git upload-pack: deepen-not: no ref or object %s", arg);
+
+			obj = parse_object(the_repository, &oid);
+			if (!obj)
+				die("git upload-pack: deepen-not: object could not be parsed: %s", arg);
+
+			commit = (struct commit *)peel_to_type(arg, 0, obj, OBJ_COMMIT);
+			if (!commit)
+				die("git upload-pack: deepen-not: object not a commit: %s", arg);
+
+			string_list_append(deepen_not, oid_to_hex(&commit->object.oid));
+			break;
+		}
+		default:
+			// more than 1 tag or branch matches arg
+			die("git upload-pack: ambiguous deepen-not: %s", arg);
+		}
+
 		*deepen_rev_list = 1;
 		return 1;
 	}