mbox series

[v2,0/3] rev-list: add support for commits in `--missing`

Message ID 20231016103830.56486-1-karthik.188@gmail.com (mailing list archive)
Headers show
Series rev-list: add support for commits in `--missing` | expand

Message

karthik nayak Oct. 16, 2023, 10:38 a.m. UTC
The `--missing` option in git-rev-list(1) was introduced intitally
to deal with missing blobs in the context of promissory notes.
Eventually the option was extended to also support tree objects in
7c0fe330d5 (rev-list: handle missing tree objects properly,2018-10-05).

This patch series extends the `--missing` option to also add support for
commit objects. We do this by introducing a new flag `MISSING` which is
added whenever we encounter a missing commit during traversal. Then in
`builtin/rev-list` we check for this flag and take the appropriate
action based on the `--missing=*` option used.

This series is an alternate to the patch series I had posted earlier:
https://lore.kernel.org/git/20230908174208.249184-1-karthik.188@gmail.com/.
In that patch, we introduced an option `--ignore-missing-links` which
was added to expose the `ignore_missing_links` bit to the user. The
issue in that patch was that, the option `--ignore-missing-links` didn't
play well the pre-existing `--missing` option. This series avoids that
route and just extends the `--missing` option for commits to solve the
same problem.

Changes from v1:
1. The patch series is now rebased on top of Patrick's work to make
commit-graphs more reliable. With this, we need to be more diligant
around which commits are missing and only parse required commits. So
we add some extra checks here, especially because we don't want to
allow missing commit's tree and parent to be parsed.
2. The tests were preivously testing a commit with no parents, add
an additional commit to ensure that the missing commit's parents aren't
parsed.

Range diff:

1:  be2ac0a331 = 1:  c1c892aa86 revision: rename bit to `do_not_die_on_missing_objects`
2:  b44983967f = 2:  67f6bfeaf7 rev-list: move `show_commit()` to the bottom
3:  306d918aef ! 3:  6d8e3c721f rev-list: add commit object support in `--missing` option
    @@ builtin/rev-list.c: static void show_commit(struct commit *commit, void *data)
      		total_disk_usage += get_object_disk_usage(&commit->object);
      
     
    + ## list-objects.c ##
    +@@ list-objects.c: static void do_traverse(struct traversal_context *ctx)
    + 		 * an uninteresting boundary commit may not have its tree
    + 		 * parsed yet, but we are not going to show them anyway
    + 		 */
    +-		if (!ctx->revs->tree_objects)
    ++		if (!ctx->revs->tree_objects || commit->object.flags & MISSING)
    + 			; /* do not bother loading tree */
    + 		else if (repo_get_commit_tree(the_repository, commit)) {
    + 			struct tree *tree = repo_get_commit_tree(the_repository,
    +
      ## object.h ##
     @@ object.h: void object_array_init(struct object_array *array);
      
    @@ object.h: void object_array_init(struct object_array *array);
       * walker.c:                 0-2
     
      ## revision.c ##
    +@@ revision.c: static int process_parents(struct rev_info *revs, struct commit *commit,
    + 	struct commit_list *parent = commit->parents;
    + 	unsigned pass_flags;
    + 
    +-	if (commit->object.flags & ADDED)
    ++	if (commit->object.flags & ADDED || commit->object.flags & MISSING)
    + 		return 0;
    + 	commit->object.flags |= ADDED;
    + 
     @@ revision.c: static int process_parents(struct rev_info *revs, struct commit *commit,
      	for (parent = commit->parents; parent; parent = parent->next) {
      		struct commit *p = parent->item;
    @@ t/t6022-rev-list-missing.sh (new)
     +# available and we can hide commit 1.
     +test_expect_success 'create repository and alternate directory' '
     +	test_commit 1 &&
    -+	test_commit 2
    ++	test_commit 2 &&
    ++	test_commit 3
     +'
     +
     +for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
    @@ t/t6022-rev-list-missing.sh (new)
     +			git rev-list --objects --no-object-names \
     +				HEAD ^$obj >expect.raw &&
     +
    -+			# Since both the commits have the `1.t` blob, rev-list
    -+			# will print it since its reachable from either commit. Unless
    -+			# the blob itself is missing.
    ++			# Blobs are shared by all commits, so evethough a commit/tree
    ++			# might be skipped, its blob must be accounted for.
     +			if [ $obj != "HEAD:1.t" ]; then
    -+				echo $(git rev-parse HEAD:1.t) >>expect.raw
    ++				echo $(git rev-parse HEAD:1.t) >>expect.raw &&
    ++				echo $(git rev-parse HEAD:2.t) >>expect.raw
     +			fi &&
     +
     +			mv "$path" "$path.hidden" &&
    @@ t/t6022-rev-list-missing.sh (new)
     +	done
     +done
     +
    -+
     +test_done


Karthik Nayak (3):
  revision: rename bit to `do_not_die_on_missing_objects`
  rev-list: move `show_commit()` to the bottom
  rev-list: add commit object support in `--missing` option

 builtin/reflog.c            |  2 +-
 builtin/rev-list.c          | 93 +++++++++++++++++++------------------
 list-objects.c              |  4 +-
 object.h                    |  2 +-
 revision.c                  | 11 +++--
 revision.h                  | 20 ++++----
 t/t6022-rev-list-missing.sh | 74 +++++++++++++++++++++++++++++
 7 files changed, 147 insertions(+), 59 deletions(-)
 create mode 100755 t/t6022-rev-list-missing.sh

Comments

Junio C Hamano Oct. 16, 2023, 4:24 p.m. UTC | #1
Karthik Nayak <karthik.188@gmail.com> writes:

> ... some extra checks here, especially because we don't want to
> allow missing commit's tree and parent to be parsed.

Good changes relative to the previous round.

A bad news is that with this series (especially [PATCH 3/3]) applied
on top of Patrick's "always make sure the commit we found from the
commit-graph at least exists" change, t5310 and t5320 seem to
consistently fail for me.  They pass with the first two steps
applied without [3/3] but that is to be expected as they are both
no-ops.

The change in 3/3 to list-objects.c::do_traverse() seems to be the
culprit.  Reverting that single hunk makes t5310 and t5320 pass
again.  What I find alarming is that two tests that are touched by
this series, t5318 and t6022, do not seem to fail with the hunk
reverted, which means the hunk has no meaningful test coverage for
the purpose of this series.

I didn't even try to understand what is going on, so there may be a
very good reason that the change must be there for do_traverse() to
work correctly.  I dunno.

>     +-	if (commit->object.flags & ADDED)
>     ++	if (commit->object.flags & ADDED || commit->object.flags & MISSING)

This and others are syntactically correct C, but some folks may find
it more readable to see each of the bitwise operations enclosed in a
pair of parentheses when combined by logical operations, i.e.,

	if ((commit->object.flags & ADDED) || (commit->object.flags & MISSING))

In this particular case, we can even say

		if (commit->object.flags & (ADDED | MISSING))

meaning, "if we have either the ADDED or the MISSING bit set", which
may make it even clearer.

Thanks.
karthik nayak Oct. 16, 2023, 7:01 p.m. UTC | #2
On Mon, Oct 16, 2023 at 6:24 PM Junio C Hamano <gitster@pobox.com> wrote:
> Good changes relative to the previous round.
>
> A bad news is that with this series (especially [PATCH 3/3]) applied
> on top of Patrick's "always make sure the commit we found from the
> commit-graph at least exists" change, t5310 and t5320 seem to
> consistently fail for me.  They pass with the first two steps
> applied without [3/3] but that is to be expected as they are both
> no-ops.
>
> The change in 3/3 to list-objects.c::do_traverse() seems to be the
> culprit.  Reverting that single hunk makes t5310 and t5320 pass
> again.

It seems that the new chunk introduced now causes collision because of
the bit field of the MISSING flag being the same as the pre-existing
NEEDS_BITMAP flag. Earlier this wasn't a concern, but now, their paths
have converged at this change.

So changing the bit field for the MISSING flag from `22` to an unused `28`
fixes the issue. I should have run the whole test suite again. Apologies.

>  What I find alarming is that two tests that are touched by
> this series, t5318 and t6022, do not seem to fail with the hunk
> reverted, which means the hunk has no meaningful test coverage for
> the purpose of this series.
>

Well, actually the newly introduced tests t6022 require this block,
but this is specific
to when commit graph is enabled. So what you did see was correct, but
the test would
also fail with `GIT_TEST_COMMIT_GRAPH=1` if the block was removed. That's
exactly why in this series I increased the number of commits in the
test block from 2->3.

> >     +-        if (commit->object.flags & ADDED)
> >     ++        if (commit->object.flags & ADDED || commit->object.flags & MISSING)
>
> This and others are syntactically correct C, but some folks may find
> it more readable to see each of the bitwise operations enclosed in a
> pair of parentheses when combined by logical operations, i.e.,
>
>         if ((commit->object.flags & ADDED) || (commit->object.flags & MISSING))
>
> In this particular case, we can even say
>
>                 if (commit->object.flags & (ADDED | MISSING))
>
> meaning, "if we have either the ADDED or the MISSING bit set", which
> may make it even clearer.

I agree with this, I'll add it in the next block.

Thanks for the quick review, I'll wait a day/two and send v3 with the
changes to fix tests
and cleaner code.
Junio C Hamano Oct. 16, 2023, 8:33 p.m. UTC | #3
Karthik Nayak <karthik.188@gmail.com> writes:

> Well, actually the newly introduced tests t6022 require this block,
> but this is specific
> to when commit graph is enabled.

Ah, of course.

> Thanks for the quick review, I'll wait a day/two and send v3 with the
> changes to fix tests
> and cleaner code.

Thanks.