diff mbox series

[1/3] revision: support tracking uninteresting commits

Message ID a643678c0ff7d1a910b1d6c33a839166e2a6a7b2.1682380788.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: boundary-based bitmap traversal | expand

Commit Message

Taylor Blau April 25, 2023, midnight UTC
The boundary-based bitmap walk will want to know which commits were
marked UNINTERESTING in the walk used to discover the boundary.

Track which commits are marked as such during list limitation as well as
the topo walk step, though the latter is not used.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 revision.c | 10 +++++++++-
 revision.h |  5 +++++
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Derrick Stolee April 25, 2023, 6:15 p.m. UTC | #1
On 4/24/2023 8:00 PM, Taylor Blau wrote:
> The boundary-based bitmap walk will want to know which commits were
> marked UNINTERESTING in the walk used to discover the boundary.
> 
> Track which commits are marked as such during list limitation as well as
> the topo walk step, though the latter is not used.

I was surprised to see this as an addition to revision.c, and
only specific to limit_list() (and its iterative copy). I would
expect this to work for non-topo-order, too. I suppose we couldn't
completely trust that the first visit to a commit includes the
UNINTERESTING flag if there is clock skew in the commit history.

But could we also do this at the caller's end by passing a
callback that checks each walked commit if it is UNINTERESTING
and adds it to a set?

I know that the walking code in builtin/pack-objects.c does
this same computation, but it's muddled with other checks and
the trees are marked as UNINTERESTING at the same time. It
doesn't seem like we can reuse anything directly out of there,
but it did give me the idea to try a callback.

Thanks,
-Stolee
Junio C Hamano April 25, 2023, 6:22 p.m. UTC | #2
Taylor Blau <me@ttaylorr.com> writes:

> The boundary-based bitmap walk will want to know which commits were
> marked UNINTERESTING in the walk used to discover the boundary.
>
> Track which commits are marked as such during list limitation as well as
> the topo walk step, though the latter is not used.

There are many more places, other than these two that are touched by
this patch, where a commit is marked with the UNINTERESTING bit
(e.g.  mark_parents_uninteresting(), process_parents()) and it is
not quite clear if the commits smudged with the UNINTERESTING bit in
these other places are eventually seen by these two places that
somebody else smudged them and get collected.

Or am I wrong to understand that the idea is to collect all
UNINTERESTING commits?

Also don't these two places see the same commit more than once, say
when traversing from two branch tips with UNINTERESTING bit set and
the traversals meet at the fork point of these two branches?  Would
such a commit get added to the array in duplicates?

Thanks.
Taylor Blau April 25, 2023, 6:48 p.m. UTC | #3
On Tue, Apr 25, 2023 at 11:22:25AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> Or am I wrong to understand that the idea is to collect all
> UNINTERESTING commits?

The idea is to collect UNINTERESTING commits between the tips and their
commit boundary. This field should probably be renamed to indicate that.

Thanks,
Taylor
Taylor Blau May 3, 2023, 9:48 p.m. UTC | #4
On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote:
> On 4/24/2023 8:00 PM, Taylor Blau wrote:
> > The boundary-based bitmap walk will want to know which commits were
> > marked UNINTERESTING in the walk used to discover the boundary.
> >
> > Track which commits are marked as such during list limitation as well as
> > the topo walk step, though the latter is not used.
>
> I was surprised to see this as an addition to revision.c, and
> only specific to limit_list() (and its iterative copy). I would
> expect this to work for non-topo-order, too. I suppose we couldn't
> completely trust that the first visit to a commit includes the
> UNINTERESTING flag if there is clock skew in the commit history.

Yeah, the distinction between limit_list() and the --topo-order code
makes things a little funky here. But I think that's OK, since
`--topo-order` is not likely to be used here, since this is all
bitmap-based traversal. IOW, I think that it would be reasonable to ban
the revision args combination of --use-bitmap-index with --topo-order.

> But could we also do this at the caller's end by passing a
> callback that checks each walked commit if it is UNINTERESTING
> and adds it to a set?

I think I remember trying this when I originally wrote this code, and
ended up discarding the idea because it walked over more commits than we
needed to consider.

Thanks,
Taylor
Taylor Blau May 3, 2023, 10:08 p.m. UTC | #5
On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote:
> I know that the walking code in builtin/pack-objects.c does
> this same computation, but it's muddled with other checks and
> the trees are marked as UNINTERESTING at the same time. It
> doesn't seem like we can reuse anything directly out of there,
> but it did give me the idea to try a callback.

Interesting idea. When you say callback, do you mean a function that we
invoke in place of where we currently call `add_object_array()`? Or do
you mean something else? Curious.

Thanks,
Taylor
Derrick Stolee May 4, 2023, 1:46 p.m. UTC | #6
On 5/3/2023 5:48 PM, Taylor Blau wrote:
> On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote:
>> On 4/24/2023 8:00 PM, Taylor Blau wrote:
>>> The boundary-based bitmap walk will want to know which commits were
>>> marked UNINTERESTING in the walk used to discover the boundary.
>>>
>>> Track which commits are marked as such during list limitation as well as
>>> the topo walk step, though the latter is not used.
>>
>> I was surprised to see this as an addition to revision.c, and
>> only specific to limit_list() (and its iterative copy). I would
>> expect this to work for non-topo-order, too. I suppose we couldn't
>> completely trust that the first visit to a commit includes the
>> UNINTERESTING flag if there is clock skew in the commit history.
> 
> Yeah, the distinction between limit_list() and the --topo-order code
> makes things a little funky here. But I think that's OK, since
> `--topo-order` is not likely to be used here, since this is all
> bitmap-based traversal. IOW, I think that it would be reasonable to ban
> the revision args combination of --use-bitmap-index with --topo-order.
 
I think there is a miscommunication here, since the limit_list()
code will only be run if there is a need to "walk to the end" before
outputting results. --topo-order is an example of something that
triggers this need, but it is disabled by default.

It seems that revs->collect_uninteresting would be a condition that
should signal to enable revs->limited so this code is actually
executed.

Taking a look at how this works with your current patch, the only
thing I can think is that since your initial commits include some
UNINTERESTING commits, that actually triggers revs->limited = 1
in handle_commit(). It's an indirect use, and without such a
commit the boundary would be empty, anyway.

Thanks,
-Stolee
Derrick Stolee May 4, 2023, 1:59 p.m. UTC | #7
On 5/3/2023 6:08 PM, Taylor Blau wrote:
> On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote:
>> I know that the walking code in builtin/pack-objects.c does
>> this same computation, but it's muddled with other checks and
>> the trees are marked as UNINTERESTING at the same time. It
>> doesn't seem like we can reuse anything directly out of there,
>> but it did give me the idea to try a callback.
> 
> Interesting idea. When you say callback, do you mean a function that we
> invoke in place of where we currently call `add_object_array()`? Or do
> you mean something else? Curious.

I was talking about passing a callback into the revision walk
instead of having the revision code create a list, as you
replied to in this response:

>> But could we also do this at the caller's end by passing a
>> callback that checks each walked commit if it is UNINTERESTING
>> and adds it to a set?
> 
> I think I remember trying this when I originally wrote this code, and
> ended up discarding the idea because it walked over more commits than we
> needed to consider.
 
It's interesting that it walked more commits than you wanted. I
suppose it's somehow related to the boundary condition you're
implying by enabling the construction of this list.

Could you describe the situation where more commits are walked
than you want? I imagine we can't actually stop at the boundary
because we need to know that certain commits are actually reachable
from those boundary commits.

Here's an example (assume horizontal levels are equal generation
number for the sake of the walk's stopping condition).

  1 (UNINTERESTING)  2 (INTERESTING)
  |\_____       ____/|
  |      \     /     |
  3       4    5     6
  |       |\__ |     |
  |       |   \|     |
  7       8   [9]    10
  |       |    |     |
  11      12   13    14
  |       |    | ___/|
  |       |    |/    |
  15      16  [17]   18
  |       | ___|____/
  |       |/   |
 (19)    [20]   21

Here, 9, 17, and 20 are the boundary, as they are UNINITERESTING
but reachable from an interesting commit (5, 14, and 18,
respectively). This is sufficient to provide a stopping point
for this single-directional difference "2 --not 1".

It's important that we didn't stop walking at 9, since its
UNINTERESTING bit needed to propagate through 13 to get to 17.

Note that 19 is reachable from 1 but not 2, so we would need to
keep walking if we were looking for the boundary of the symmetric
difference 1...2. But we don't need it here since everything at
that level is marked UNINTERESTING so the walk can stop.

Is this example sufficiently complex for you to describe what
causes the extra commit walking? In such a case, what does the
object array approach add to prevent the extra walking?

Thanks,
-Stolee
Taylor Blau May 5, 2023, 5:30 p.m. UTC | #8
On Thu, May 04, 2023 at 09:59:32AM -0400, Derrick Stolee wrote:
> It's interesting that it walked more commits than you wanted. I
> suppose it's somehow related to the boundary condition you're
> implying by enabling the construction of this list.
>
> Could you describe the situation where more commits are walked
> than you want? I imagine we can't actually stop at the boundary
> because we need to know that certain commits are actually reachable
> from those boundary commits.

I honestly cannot remember, and was unable to reproduce it when I
reworked the substantive portion of this series last night.

For posterity, Stolee and I had an off-list discussion yesterday where
he walked me through his suggestion to implement the boundary search via
a straightforward revision walk, instead of grafting onto cherry-picked
components of the revision internals.

It works great, and I have been unable to trick it into "walking too
much".

Thanks,
Taylor
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 106ca1ce6c..7177244e7a 100644
--- a/revision.c
+++ b/revision.c
@@ -1446,6 +1446,9 @@  static int limit_list(struct rev_info *revs)
 		if (process_parents(revs, commit, &original_list, NULL) < 0)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
+			if (revs->collect_uninteresting)
+				add_object_array(obj, NULL,
+						 &revs->uninteresting_commits);
 			mark_parents_uninteresting(revs, commit);
 			slop = still_interesting(original_list, date, slop, &interesting_cache);
 			if (slop)
@@ -3072,6 +3075,7 @@  void release_revisions(struct rev_info *revs)
 	diff_free(&revs->pruning);
 	reflog_walk_info_release(revs->reflog_info);
 	release_revisions_topo_walk_info(revs->topo_walk_info);
+	object_array_clear(&revs->uninteresting_commits);
 }
 
 static void add_child(struct rev_info *revs, struct commit *parent, struct commit *child)
@@ -3522,8 +3526,12 @@  static void explore_walk_step(struct rev_info *revs)
 	if (process_parents(revs, c, NULL, NULL) < 0)
 		return;
 
-	if (c->object.flags & UNINTERESTING)
+	if (c->object.flags & UNINTERESTING) {
+		if (revs->collect_uninteresting)
+			add_object_array(&c->object, NULL,
+					 &revs->uninteresting_commits);
 		mark_parents_uninteresting(revs, c);
+	}
 
 	for (p = c->parents; p; p = p->next)
 		test_flag_and_insert(&info->explore_queue, p->item, TOPO_WALK_EXPLORED);
diff --git a/revision.h b/revision.h
index e8f6de9684..1385b29c76 100644
--- a/revision.h
+++ b/revision.h
@@ -124,6 +124,9 @@  struct rev_info {
 	/* Parents of shown commits */
 	struct object_array boundary_commits;
 
+	/* UNINTERESTING commits between the tips and boundary */
+	struct object_array uninteresting_commits;
+
 	/* The end-points specified by the end user */
 	struct rev_cmdline_info cmdline;
 
@@ -183,6 +186,7 @@  struct rev_info {
 			unpacked:1,
 			no_kept_objects:1,
 			boundary:2,
+			collect_uninteresting:1,
 			count:1,
 			left_right:1,
 			left_only:1,
@@ -404,6 +408,7 @@  struct rev_info {
 	.expand_tabs_in_log = -1, \
 	.commit_format = CMIT_FMT_DEFAULT, \
 	.expand_tabs_in_log_default = 8, \
+	.uninteresting_commits = OBJECT_ARRAY_INIT, \
 }
 
 /**