diff mbox series

[v2] revision: add `--ignore-missing-links` user option

Message ID 20230912155820.136111-1-karthik.188@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] revision: add `--ignore-missing-links` user option | expand

Commit Message

Karthik Nayak Sept. 12, 2023, 3:58 p.m. UTC
The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
revision walk to skip any objects links which are missing. Expose this
bit via an `--ignore-missing-links` user option.

A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, enabling this option along with the `--boundary` option for
while disabling the alternate object directory allows us to find the
boundary objects between the main and alternate object directory.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Changes from v1:
1. Changes in the commit message and option description to be more specific
and list why and what the changes are.
2. Ensure the new option also works with the existing `--objects` options.
3. More specific testing for boundary commit.

Range diff against v1:

1:  c0a4dca9b0 ! 1:  e3f4d85732 revision: add `--ignore-missing-links` user option
    @@ Commit message
         The revision backend is used by multiple porcelain commands such as
         git-rev-list(1) and git-log(1). The backend currently supports ignoring
         missing links by setting the `ignore_missing_links` bit. This allows the
    -    revision walk to skip any objects links which are missing.
    +    revision walk to skip any objects links which are missing. Expose this
    +    bit via an `--ignore-missing-links` user option.
     
    -    Currently there is no way to use git-rev-list(1) to traverse the objects
    -    of the main object directory (GIT_OBJECT_DIRECTORY) and print the
    -    boundary objects when moving from the main object directory to the
    -    alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).
    -
    -    By exposing this new flag `--ignore-missing-links`, users can set the
    -    required env variables (GIT_OBJECT_DIRECTORY and
    -    GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
    -    find the boundary objects between object directories.
    +    A scenario where this option would be used is to find the boundary
    +    objects between different object directories. Consider a repository with
    +    a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
    +    object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
    +    repository, enabling this option along with the `--boundary` option for
    +    while disabling the alternate object directory allows us to find the
    +    boundary objects between the main and alternate object directory.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
    @@ Documentation/rev-list-options.txt: explicitly.
      	the bad input was not given.
      
     +--ignore-missing-links::
    -+	When an object points to another object that is missing, pretend as if the
    -+	link did not exist. These missing links are not written to stdout unless
    -+	the --boundary flag is passed.
    ++	During traversal, if an object that is referenced does not
    ++	exist, instead of dying of a repository corruption, pretend as
    ++	if the reference itself does not exist. Running the command
    ++	with the `--boundary` option makes these missing commits,
    ++	together with the commits on the edge of revision ranges
    ++	(i.e. true boundary objects), appear on the output, prefixed
    ++	with '-'.
     +
      ifndef::git-rev-list[]
      --bisect::
      	Pretend as if the bad bisection ref `refs/bisect/bad`
     
    + ## builtin/rev-list.c ##
    +@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED,
    + {
    + 	struct rev_list_info *info = cb_data;
    + 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
    +-		finish_object__ma(obj);
    ++		if (!info->revs->ignore_missing_links)
    ++			finish_object__ma(obj);
    + 		return 1;
    + 	}
    + 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
    +
      ## revision.c ##
     @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
      		revs->limited = 1;
    @@ t/t6022-rev-list-alternates.sh (new)
     +test_expect_success 'create repository and alternate directory' '
     +	git init main &&
     +	test_commit_bulk -C main 5 &&
    ++	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
     +	mkdir alt &&
     +	mv main/.git/objects/* alt &&
     +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
     +'
     +
    -+# When the alternate odb is provided, all commits are listed.
    ++# when the alternate odb is provided, all commits are listed along with the boundary
    ++# commit.
     +test_expect_success 'rev-list passes with alternate object directory' '
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD
    ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
    ++	test_stdout_line_count = 10 cat actual &&
    ++	grep $BOUNDARY_COMMIT actual
     +'
     +
     +# When the alternate odb is not provided, rev-list fails since the 5th commit's
    @@ t/t6022-rev-list-alternates.sh (new)
     +'
     +
     +# With `--ignore-missing-links`, we stop the traversal when we encounter a
    -+# missing link.
    ++# missing link. The boundary commit is not listed as we haven't used the
    ++# `--boundary` options.
     +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
    -+	test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
    ++	git -C main rev-list --ignore-missing-links HEAD >actual &&
    ++	test_stdout_line_count = 5 cat actual &&
    ++	! grep -$BOUNDARY_COMMIT actual
     +'
     +
     +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
     +# commits.
     +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
    -+	git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
    -+	test_stdout_line_count = 6 cat list-output &&
    -+	test_stdout_line_count = 1 cat list-output | grep "^-"
    ++	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
    ++	test_stdout_line_count = 6 cat actual &&
    ++	grep -$BOUNDARY_COMMIT actual
    ++'
    ++
    ++# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
    ++# fail when used alongside `--objects` when a tree is missing.
    ++test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
    ++	echo "foo" >main/file &&
    ++	git -C main add file &&
    ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
    ++	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
    ++	mkdir alt/${TREE_OID:0:2} &&
    ++	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
    ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    ++	! grep $TREE_OID actual
    ++'
    ++
    ++# Similar to above, it should also work when a blob is missing.
    ++test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
    ++	echo "bar" >main/file &&
    ++	git -C main add file &&
    ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
    ++	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
    ++	mkdir alt/${BLOB_OID:0:2} &&
    ++	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
    ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    ++	! grep $BLOB_OID actual
     +'
     +
     +test_done


 Documentation/rev-list-options.txt |  9 ++++
 builtin/rev-list.c                 |  3 +-
 revision.c                         |  2 +
 t/t6022-rev-list-alternates.sh     | 75 ++++++++++++++++++++++++++++++
 4 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100755 t/t6022-rev-list-alternates.sh

Comments

Taylor Blau Sept. 12, 2023, 5:07 p.m. UTC | #1
On Tue, Sep 12, 2023 at 05:58:20PM +0200, Karthik Nayak wrote:
> The revision backend is used by multiple porcelain commands such as
> git-rev-list(1) and git-log(1). The backend currently supports ignoring
> missing links by setting the `ignore_missing_links` bit. This allows the
> revision walk to skip any objects links which are missing. Expose this
> bit via an `--ignore-missing-links` user option.
>
> A scenario where this option would be used is to find the boundary
> objects between different object directories. Consider a repository with
> a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
> object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
> repository, enabling this option along with the `--boundary` option for
> while disabling the alternate object directory allows us to find the
> boundary objects between the main and alternate object directory.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>
> Changes from v1:
> 1. Changes in the commit message and option description to be more specific
> and list why and what the changes are.
> 2. Ensure the new option also works with the existing `--objects` options.
> 3. More specific testing for boundary commit.
>
> Range diff against v1:
>
> 1:  c0a4dca9b0 ! 1:  e3f4d85732 revision: add `--ignore-missing-links` user option
>     @@ Commit message
>          The revision backend is used by multiple porcelain commands such as
>          git-rev-list(1) and git-log(1). The backend currently supports ignoring
>          missing links by setting the `ignore_missing_links` bit. This allows the
>     -    revision walk to skip any objects links which are missing.
>     +    revision walk to skip any objects links which are missing. Expose this
>     +    bit via an `--ignore-missing-links` user option.
>
>     -    Currently there is no way to use git-rev-list(1) to traverse the objects
>     -    of the main object directory (GIT_OBJECT_DIRECTORY) and print the
>     -    boundary objects when moving from the main object directory to the
>     -    alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).
>     -
>     -    By exposing this new flag `--ignore-missing-links`, users can set the
>     -    required env variables (GIT_OBJECT_DIRECTORY and
>     -    GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
>     -    find the boundary objects between object directories.
>     +    A scenario where this option would be used is to find the boundary
>     +    objects between different object directories. Consider a repository with
>     +    a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
>     +    object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
>     +    repository, enabling this option along with the `--boundary` option for
>     +    while disabling the alternate object directory allows us to find the
>     +    boundary objects between the main and alternate object directory.
>
>          Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>
>     @@ Documentation/rev-list-options.txt: explicitly.
>       	the bad input was not given.
>
>      +--ignore-missing-links::
>     -+	When an object points to another object that is missing, pretend as if the
>     -+	link did not exist. These missing links are not written to stdout unless
>     -+	the --boundary flag is passed.
>     ++	During traversal, if an object that is referenced does not
>     ++	exist, instead of dying of a repository corruption, pretend as
>     ++	if the reference itself does not exist. Running the command
>     ++	with the `--boundary` option makes these missing commits,
>     ++	together with the commits on the edge of revision ranges
>     ++	(i.e. true boundary objects), appear on the output, prefixed
>     ++	with '-'.
>      +
>       ifndef::git-rev-list[]
>       --bisect::
>       	Pretend as if the bad bisection ref `refs/bisect/bad`
>
>     + ## builtin/rev-list.c ##
>     +@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED,
>     + {
>     + 	struct rev_list_info *info = cb_data;
>     + 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
>     +-		finish_object__ma(obj);
>     ++		if (!info->revs->ignore_missing_links)
>     ++			finish_object__ma(obj);
>     + 		return 1;
>     + 	}
>     + 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
>     +
>       ## revision.c ##
>      @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>       		revs->limited = 1;
>     @@ t/t6022-rev-list-alternates.sh (new)
>      +test_expect_success 'create repository and alternate directory' '
>      +	git init main &&
>      +	test_commit_bulk -C main 5 &&
>     ++	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
>      +	mkdir alt &&
>      +	mv main/.git/objects/* alt &&
>      +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
>      +'
>      +
>     -+# When the alternate odb is provided, all commits are listed.
>     ++# when the alternate odb is provided, all commits are listed along with the boundary
>     ++# commit.
>      +test_expect_success 'rev-list passes with alternate object directory' '
>     -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
>     ++	test_stdout_line_count = 10 cat actual &&
>     ++	grep $BOUNDARY_COMMIT actual
>      +'
>      +
>      +# When the alternate odb is not provided, rev-list fails since the 5th commit's
>     @@ t/t6022-rev-list-alternates.sh (new)
>      +'
>      +
>      +# With `--ignore-missing-links`, we stop the traversal when we encounter a
>     -+# missing link.
>     ++# missing link. The boundary commit is not listed as we haven't used the
>     ++# `--boundary` options.
>      +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
>     -+	test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
>     ++	git -C main rev-list --ignore-missing-links HEAD >actual &&
>     ++	test_stdout_line_count = 5 cat actual &&
>     ++	! grep -$BOUNDARY_COMMIT actual
>      +'
>      +
>      +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
>      +# commits.
>      +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
>     -+	git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
>     -+	test_stdout_line_count = 6 cat list-output &&
>     -+	test_stdout_line_count = 1 cat list-output | grep "^-"
>     ++	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
>     ++	test_stdout_line_count = 6 cat actual &&
>     ++	grep -$BOUNDARY_COMMIT actual
>     ++'
>     ++
>     ++# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
>     ++# fail when used alongside `--objects` when a tree is missing.
>     ++test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
>     ++	echo "foo" >main/file &&
>     ++	git -C main add file &&
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
>     ++	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
>     ++	mkdir alt/${TREE_OID:0:2} &&
>     ++	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
>     ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
>     ++	! grep $TREE_OID actual
>     ++'
>     ++
>     ++# Similar to above, it should also work when a blob is missing.
>     ++test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
>     ++	echo "bar" >main/file &&
>     ++	git -C main add file &&
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
>     ++	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
>     ++	mkdir alt/${BLOB_OID:0:2} &&
>     ++	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
>     ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
>     ++	! grep $BLOB_OID actual
>      +'
>      +
>      +test_done
>
>
>  Documentation/rev-list-options.txt |  9 ++++
>  builtin/rev-list.c                 |  3 +-
>  revision.c                         |  2 +
>  t/t6022-rev-list-alternates.sh     | 75 ++++++++++++++++++++++++++++++
>  4 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100755 t/t6022-rev-list-alternates.sh
>
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index a4a0cb93b2..8ee713db3d 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -227,6 +227,15 @@ explicitly.
>  	Upon seeing an invalid object name in the input, pretend as if
>  	the bad input was not given.
>
> +--ignore-missing-links::
> +	During traversal, if an object that is referenced does not
> +	exist, instead of dying of a repository corruption, pretend as
> +	if the reference itself does not exist. Running the command
> +	with the `--boundary` option makes these missing commits,
> +	together with the commits on the edge of revision ranges
> +	(i.e. true boundary objects), appear on the output, prefixed
> +	with '-'.
> +
>  ifndef::git-rev-list[]
>  --bisect::
>  	Pretend as if the bad bisection ref `refs/bisect/bad`
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index ff715d6918..5239d83c76 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
>  {
>  	struct rev_list_info *info = cb_data;
>  	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
> -		finish_object__ma(obj);
> +		if (!info->revs->ignore_missing_links)
> +			finish_object__ma(obj);
>  		return 1;
>  	}
>  	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)


> diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
> new file mode 100755
> index 0000000000..08d9ffde5f
> --- /dev/null
> +++ b/t/t6022-rev-list-alternates.sh
> @@ -0,0 +1,75 @@
> +#!/bin/sh
> +
> +test_description='handling of alternates in rev-list'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +# We create 5 commits and move them to the alt directory and
> +# create 5 more commits which will stay in the main odb.
> +test_expect_success 'create repository and alternate directory' '
> +	git init main &&

We don't necessarily have to initialize a repository, as the test suite
already does so for us. So we may want to write this instead as:

    test_commit_bulk 5 &&
    git clone --reference=. --shared . alt &&
    test_commit_bulk -C alt --start=6 5

> +# when the alternate odb is provided, all commits are listed along with the boundary
> +# commit.
> +test_expect_success 'rev-list passes with alternate object directory' '
> +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
> +	test_stdout_line_count = 10 cat actual &&
> +	grep $BOUNDARY_COMMIT actual
> +'

Here, I think we'd want to make sure that we have not just 10 lines of
output, but that they are the 10 that we expect, like so:

    git -C alt rev-list --all --objects --no-object-names >actual.raw &&
    {
      git rev-list --all --objects --no-object-names &&
      git -C alt rev-list --all --objects --no-object-names --not \
        --alternate-refs
    } >expect.raw &&
    sort actual.raw >actual &&
    sort expect.raw >expect &&
    test_cmp expect actual

When reviewing this and tweaking some of the tests locally, I found it
useful to have some convenience functions like "hide_alternates" and
"show_alternates" to control whether or not "alt" could see its
alternate or not.

From my review locally, the resulting changes (which can be applied
directly on top of your patch here look like):

--- 8< ---
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
index 08d9ffde5f..ef4231b2de 100755
--- a/t/t6022-rev-list-alternates.sh
+++ b/t/t6022-rev-list-alternates.sh
@@ -8,68 +8,86 @@ TEST_PASSES_SANITIZE_LEAK=true
 # We create 5 commits and move them to the alt directory and
 # create 5 more commits which will stay in the main odb.
 test_expect_success 'create repository and alternate directory' '
-	git init main &&
-	test_commit_bulk -C main 5 &&
-	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
-	mkdir alt &&
-	mv main/.git/objects/* alt &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+	test_commit_bulk 5 &&
+	git clone --reference=. --shared . alt &&
+	test_commit_bulk --start=6 -C alt 5
 '

 # when the alternate odb is provided, all commits are listed along with the boundary
 # commit.
 test_expect_success 'rev-list passes with alternate object directory' '
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
-	test_stdout_line_count = 10 cat actual &&
-	grep $BOUNDARY_COMMIT actual
+	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
+	{
+		git rev-list --all --objects --no-object-names &&
+		git -C alt rev-list --all --objects --no-object-names --not \
+			--alternate-refs
+	} >expect.raw &&
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
 '

+alt=alt/.git/objects/info/alternates
+
+hide_alternates () {
+	test -f "$alt.bak" || mv "$alt" "$alt.bak"
+}
+
+show_alternates () {
+	test -f "$alt" || mv "$alt.bak" "$alt"
+}
+
 # When the alternate odb is not provided, rev-list fails since the 5th commit's
 # parent is not present in the main odb.
 test_expect_success 'rev-list fails without alternate object directory' '
-	test_must_fail git -C main rev-list HEAD
+	hide_alternates &&
+	test_must_fail git -C alt rev-list HEAD
 '

 # With `--ignore-missing-links`, we stop the traversal when we encounter a
 # missing link. The boundary commit is not listed as we haven't used the
 # `--boundary` options.
 test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
-	git -C main rev-list --ignore-missing-links HEAD >actual &&
-	test_stdout_line_count = 5 cat actual &&
-	! grep -$BOUNDARY_COMMIT actual
+	hide_alternates &&
+
+	git -C alt rev-list --objects --no-object-names \
+		--ignore-missing-links HEAD >actual.raw &&
+	git -C alt cat-file --batch-check="%(objectname)" \
+		--batch-all-objects >expect.raw &&
+
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
 '

 # With `--ignore-missing-links` and `--boundary`, we can even print those boundary
 # commits.
 test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
-	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
-	test_stdout_line_count = 6 cat actual &&
-	grep -$BOUNDARY_COMMIT actual
+	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
+	grep "^-$(git rev-parse HEAD)" got
 '

-# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
-# fail when used alongside `--objects` when a tree is missing.
-test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
-	echo "foo" >main/file &&
-	git -C main add file &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
-	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
-	mkdir alt/${TREE_OID:0:2} &&
-	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
-	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
-	! grep $TREE_OID actual
+test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
+	show_alternates &&
+	test_commit -C alt 11
 '

-# Similar to above, it should also work when a blob is missing.
-test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
-	echo "bar" >main/file &&
-	git -C main add file &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
-	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
-	mkdir alt/${BLOB_OID:0:2} &&
-	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
-	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
-	! grep $BLOB_OID actual
-'
+for obj in "HEAD^{tree}" "HEAD:11.t"
+do
+	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
+	# doesn't fail when used alongside `--objects` when a tree/blob is
+	# missing.
+	test_expect_success "rev-list --ignore-missing-links with missing $type" '
+		oid="$(git -C alt rev-parse $obj)" &&
+		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
+
+		mv "$path" "$path.hidden" &&
+		test_when_finished "mv $path.hidden $path" &&
+
+		git -C alt rev-list --ignore-missing-links --objects HEAD \
+			>actual &&
+		! grep $oid actual
+	'
+done

 test_done
--- >8 ---

Thanks,
Taylor
Karthik Nayak Sept. 13, 2023, 9:32 a.m. UTC | #2
On Tue, Sep 12, 2023 at 7:07 PM Taylor Blau <me@ttaylorr.com> wrote:
> > +# We create 5 commits and move them to the alt directory and
> > +# create 5 more commits which will stay in the main odb.
> > +test_expect_success 'create repository and alternate directory' '
> > +     git init main &&
>
> We don't necessarily have to initialize a repository, as the test suite
> already does so for us. So we may want to write this instead as:
>
>     test_commit_bulk 5 &&
>     git clone --reference=. --shared . alt &&
>     test_commit_bulk -C alt --start=6 5
>

I was trying to use the env variable `GIT_ALTERNATE_OBJECT_DIRECTORIES` and
get hence ended up creating a new repository. But I really like the
convenience functions
that you've suggested below. With that, this seems like the way to go.

> > +# when the alternate odb is provided, all commits are listed along with the boundary
> > +# commit.
> > +test_expect_success 'rev-list passes with alternate object directory' '
> > +     GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
> > +     test_stdout_line_count = 10 cat actual &&
> > +     grep $BOUNDARY_COMMIT actual
> > +'
>
> Here, I think we'd want to make sure that we have not just 10 lines of
> output, but that they are the 10 that we expect, like so:
>
>     git -C alt rev-list --all --objects --no-object-names >actual.raw &&
>     {
>       git rev-list --all --objects --no-object-names &&
>       git -C alt rev-list --all --objects --no-object-names --not \
>         --alternate-refs
>     } >expect.raw &&
>     sort actual.raw >actual &&
>     sort expect.raw >expect &&
>     test_cmp expect actual
>
> When reviewing this and tweaking some of the tests locally, I found it
> useful to have some convenience functions like "hide_alternates" and
> "show_alternates" to control whether or not "alt" could see its
> alternate or not.
>
> From my review locally, the resulting changes (which can be applied
> directly on top of your patch here look like):
>

This is much better. I didn't know about `test_oid_to_path` and
`test_when_finished`, and overall
your patch looks much nicer and is more thorough in the testing. I'll
add it to the next version.
Will wait a day or two for more feedback before I submit v3.

Thanks again for your review and the patch :)
Taylor Blau Sept. 13, 2023, 5:17 p.m. UTC | #3
On Wed, Sep 13, 2023 at 11:32:13AM +0200, Karthik Nayak wrote:
> > From my review locally, the resulting changes (which can be applied
> > directly on top of your patch here look like):
>
> This is much better. I didn't know about `test_oid_to_path` and
> `test_when_finished`, and overall your patch looks much nicer and is
> more thorough in the testing. I'll add it to the next version.

Thanks for folding it in! I felt bad that I might have stepped on your
toes by saying "here's how I would do it", but by the time I had applied
your patch locally to review it, I had already generated the
aforementioned diff.

> Will wait a day or two for more feedback before I submit v3.

Sounds like a plan :-).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a4a0cb93b2..8ee713db3d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -227,6 +227,15 @@  explicitly.
 	Upon seeing an invalid object name in the input, pretend as if
 	the bad input was not given.
 
+--ignore-missing-links::
+	During traversal, if an object that is referenced does not
+	exist, instead of dying of a repository corruption, pretend as
+	if the reference itself does not exist. Running the command
+	with the `--boundary` option makes these missing commits,
+	together with the commits on the edge of revision ranges
+	(i.e. true boundary objects), appear on the output, prefixed
+	with '-'.
+
 ifndef::git-rev-list[]
 --bisect::
 	Pretend as if the bad bisection ref `refs/bisect/bad`
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ff715d6918..5239d83c76 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -266,7 +266,8 @@  static int finish_object(struct object *obj, const char *name UNUSED,
 {
 	struct rev_list_info *info = cb_data;
 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
-		finish_object__ma(obj);
+		if (!info->revs->ignore_missing_links)
+			finish_object__ma(obj);
 		return 1;
 	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
diff --git a/revision.c b/revision.c
index 2f4c53ea20..cbfcbf6e28 100644
--- a/revision.c
+++ b/revision.c
@@ -2595,6 +2595,8 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--ignore-missing-links")) {
+		revs->ignore_missing_links = 1;
 	} else if (opt && opt->allow_exclude_promisor_objects &&
 		   !strcmp(arg, "--exclude-promisor-objects")) {
 		if (fetch_if_missing)
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
new file mode 100755
index 0000000000..08d9ffde5f
--- /dev/null
+++ b/t/t6022-rev-list-alternates.sh
@@ -0,0 +1,75 @@ 
+#!/bin/sh
+
+test_description='handling of alternates in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We create 5 commits and move them to the alt directory and
+# create 5 more commits which will stay in the main odb.
+test_expect_success 'create repository and alternate directory' '
+	git init main &&
+	test_commit_bulk -C main 5 &&
+	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
+	mkdir alt &&
+	mv main/.git/objects/* alt &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+'
+
+# when the alternate odb is provided, all commits are listed along with the boundary
+# commit.
+test_expect_success 'rev-list passes with alternate object directory' '
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
+	test_stdout_line_count = 10 cat actual &&
+	grep $BOUNDARY_COMMIT actual
+'
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
+# parent is not present in the main odb.
+test_expect_success 'rev-list fails without alternate object directory' '
+	test_must_fail git -C main rev-list HEAD
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
+# missing link. The boundary commit is not listed as we haven't used the
+# `--boundary` options.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
+	git -C main rev-list --ignore-missing-links HEAD >actual &&
+	test_stdout_line_count = 5 cat actual &&
+	! grep -$BOUNDARY_COMMIT actual
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
+	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
+	test_stdout_line_count = 6 cat actual &&
+	grep -$BOUNDARY_COMMIT actual
+'
+
+# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
+# fail when used alongside `--objects` when a tree is missing.
+test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
+	echo "foo" >main/file &&
+	git -C main add file &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
+	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
+	mkdir alt/${TREE_OID:0:2} &&
+	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
+	! grep $TREE_OID actual
+'
+
+# Similar to above, it should also work when a blob is missing.
+test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
+	echo "bar" >main/file &&
+	git -C main add file &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
+	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
+	mkdir alt/${BLOB_OID:0:2} &&
+	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
+	! grep $BLOB_OID actual
+'
+
+test_done