Message ID | a643678c0ff7d1a910b1d6c33a839166e2a6a7b2.1682380788.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-bitmap: boundary-based bitmap traversal | expand |
On 4/24/2023 8:00 PM, Taylor Blau wrote: > The boundary-based bitmap walk will want to know which commits were > marked UNINTERESTING in the walk used to discover the boundary. > > Track which commits are marked as such during list limitation as well as > the topo walk step, though the latter is not used. I was surprised to see this as an addition to revision.c, and only specific to limit_list() (and its iterative copy). I would expect this to work for non-topo-order, too. I suppose we couldn't completely trust that the first visit to a commit includes the UNINTERESTING flag if there is clock skew in the commit history. But could we also do this at the caller's end by passing a callback that checks each walked commit if it is UNINTERESTING and adds it to a set? I know that the walking code in builtin/pack-objects.c does this same computation, but it's muddled with other checks and the trees are marked as UNINTERESTING at the same time. It doesn't seem like we can reuse anything directly out of there, but it did give me the idea to try a callback. Thanks, -Stolee
Taylor Blau <me@ttaylorr.com> writes: > The boundary-based bitmap walk will want to know which commits were > marked UNINTERESTING in the walk used to discover the boundary. > > Track which commits are marked as such during list limitation as well as > the topo walk step, though the latter is not used. There are many more places, other than these two that are touched by this patch, where a commit is marked with the UNINTERESTING bit (e.g. mark_parents_uninteresting(), process_parents()) and it is not quite clear if the commits smudged with the UNINTERESTING bit in these other places are eventually seen by these two places that somebody else smudged them and get collected. Or am I wrong to understand that the idea is to collect all UNINTERESTING commits? Also don't these two places see the same commit more than once, say when traversing from two branch tips with UNINTERESTING bit set and the traversals meet at the fork point of these two branches? Would such a commit get added to the array in duplicates? Thanks.
On Tue, Apr 25, 2023 at 11:22:25AM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > Or am I wrong to understand that the idea is to collect all > UNINTERESTING commits? The idea is to collect UNINTERESTING commits between the tips and their commit boundary. This field should probably be renamed to indicate that. Thanks, Taylor
On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote: > On 4/24/2023 8:00 PM, Taylor Blau wrote: > > The boundary-based bitmap walk will want to know which commits were > > marked UNINTERESTING in the walk used to discover the boundary. > > > > Track which commits are marked as such during list limitation as well as > > the topo walk step, though the latter is not used. > > I was surprised to see this as an addition to revision.c, and > only specific to limit_list() (and its iterative copy). I would > expect this to work for non-topo-order, too. I suppose we couldn't > completely trust that the first visit to a commit includes the > UNINTERESTING flag if there is clock skew in the commit history. Yeah, the distinction between limit_list() and the --topo-order code makes things a little funky here. But I think that's OK, since `--topo-order` is not likely to be used here, since this is all bitmap-based traversal. IOW, I think that it would be reasonable to ban the revision args combination of --use-bitmap-index with --topo-order. > But could we also do this at the caller's end by passing a > callback that checks each walked commit if it is UNINTERESTING > and adds it to a set? I think I remember trying this when I originally wrote this code, and ended up discarding the idea because it walked over more commits than we needed to consider. Thanks, Taylor
On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote: > I know that the walking code in builtin/pack-objects.c does > this same computation, but it's muddled with other checks and > the trees are marked as UNINTERESTING at the same time. It > doesn't seem like we can reuse anything directly out of there, > but it did give me the idea to try a callback. Interesting idea. When you say callback, do you mean a function that we invoke in place of where we currently call `add_object_array()`? Or do you mean something else? Curious. Thanks, Taylor
On 5/3/2023 5:48 PM, Taylor Blau wrote: > On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote: >> On 4/24/2023 8:00 PM, Taylor Blau wrote: >>> The boundary-based bitmap walk will want to know which commits were >>> marked UNINTERESTING in the walk used to discover the boundary. >>> >>> Track which commits are marked as such during list limitation as well as >>> the topo walk step, though the latter is not used. >> >> I was surprised to see this as an addition to revision.c, and >> only specific to limit_list() (and its iterative copy). I would >> expect this to work for non-topo-order, too. I suppose we couldn't >> completely trust that the first visit to a commit includes the >> UNINTERESTING flag if there is clock skew in the commit history. > > Yeah, the distinction between limit_list() and the --topo-order code > makes things a little funky here. But I think that's OK, since > `--topo-order` is not likely to be used here, since this is all > bitmap-based traversal. IOW, I think that it would be reasonable to ban > the revision args combination of --use-bitmap-index with --topo-order. I think there is a miscommunication here, since the limit_list() code will only be run if there is a need to "walk to the end" before outputting results. --topo-order is an example of something that triggers this need, but it is disabled by default. It seems that revs->collect_uninteresting would be a condition that should signal to enable revs->limited so this code is actually executed. Taking a look at how this works with your current patch, the only thing I can think is that since your initial commits include some UNINTERESTING commits, that actually triggers revs->limited = 1 in handle_commit(). It's an indirect use, and without such a commit the boundary would be empty, anyway. Thanks, -Stolee
On 5/3/2023 6:08 PM, Taylor Blau wrote: > On Tue, Apr 25, 2023 at 02:15:49PM -0400, Derrick Stolee wrote: >> I know that the walking code in builtin/pack-objects.c does >> this same computation, but it's muddled with other checks and >> the trees are marked as UNINTERESTING at the same time. It >> doesn't seem like we can reuse anything directly out of there, >> but it did give me the idea to try a callback. > > Interesting idea. When you say callback, do you mean a function that we > invoke in place of where we currently call `add_object_array()`? Or do > you mean something else? Curious. I was talking about passing a callback into the revision walk instead of having the revision code create a list, as you replied to in this response: >> But could we also do this at the caller's end by passing a >> callback that checks each walked commit if it is UNINTERESTING >> and adds it to a set? > > I think I remember trying this when I originally wrote this code, and > ended up discarding the idea because it walked over more commits than we > needed to consider. It's interesting that it walked more commits than you wanted. I suppose it's somehow related to the boundary condition you're implying by enabling the construction of this list. Could you describe the situation where more commits are walked than you want? I imagine we can't actually stop at the boundary because we need to know that certain commits are actually reachable from those boundary commits. Here's an example (assume horizontal levels are equal generation number for the sake of the walk's stopping condition). 1 (UNINTERESTING) 2 (INTERESTING) |\_____ ____/| | \ / | 3 4 5 6 | |\__ | | | | \| | 7 8 [9] 10 | | | | 11 12 13 14 | | | ___/| | | |/ | 15 16 [17] 18 | | ___|____/ | |/ | (19) [20] 21 Here, 9, 17, and 20 are the boundary, as they are UNINITERESTING but reachable from an interesting commit (5, 14, and 18, respectively). This is sufficient to provide a stopping point for this single-directional difference "2 --not 1". It's important that we didn't stop walking at 9, since its UNINTERESTING bit needed to propagate through 13 to get to 17. Note that 19 is reachable from 1 but not 2, so we would need to keep walking if we were looking for the boundary of the symmetric difference 1...2. But we don't need it here since everything at that level is marked UNINTERESTING so the walk can stop. Is this example sufficiently complex for you to describe what causes the extra commit walking? In such a case, what does the object array approach add to prevent the extra walking? Thanks, -Stolee
On Thu, May 04, 2023 at 09:59:32AM -0400, Derrick Stolee wrote: > It's interesting that it walked more commits than you wanted. I > suppose it's somehow related to the boundary condition you're > implying by enabling the construction of this list. > > Could you describe the situation where more commits are walked > than you want? I imagine we can't actually stop at the boundary > because we need to know that certain commits are actually reachable > from those boundary commits. I honestly cannot remember, and was unable to reproduce it when I reworked the substantive portion of this series last night. For posterity, Stolee and I had an off-list discussion yesterday where he walked me through his suggestion to implement the boundary search via a straightforward revision walk, instead of grafting onto cherry-picked components of the revision internals. It works great, and I have been unable to trick it into "walking too much". Thanks, Taylor
diff --git a/revision.c b/revision.c index 106ca1ce6c..7177244e7a 100644 --- a/revision.c +++ b/revision.c @@ -1446,6 +1446,9 @@ static int limit_list(struct rev_info *revs) if (process_parents(revs, commit, &original_list, NULL) < 0) return -1; if (obj->flags & UNINTERESTING) { + if (revs->collect_uninteresting) + add_object_array(obj, NULL, + &revs->uninteresting_commits); mark_parents_uninteresting(revs, commit); slop = still_interesting(original_list, date, slop, &interesting_cache); if (slop) @@ -3072,6 +3075,7 @@ void release_revisions(struct rev_info *revs) diff_free(&revs->pruning); reflog_walk_info_release(revs->reflog_info); release_revisions_topo_walk_info(revs->topo_walk_info); + object_array_clear(&revs->uninteresting_commits); } static void add_child(struct rev_info *revs, struct commit *parent, struct commit *child) @@ -3522,8 +3526,12 @@ static void explore_walk_step(struct rev_info *revs) if (process_parents(revs, c, NULL, NULL) < 0) return; - if (c->object.flags & UNINTERESTING) + if (c->object.flags & UNINTERESTING) { + if (revs->collect_uninteresting) + add_object_array(&c->object, NULL, + &revs->uninteresting_commits); mark_parents_uninteresting(revs, c); + } for (p = c->parents; p; p = p->next) test_flag_and_insert(&info->explore_queue, p->item, TOPO_WALK_EXPLORED); diff --git a/revision.h b/revision.h index e8f6de9684..1385b29c76 100644 --- a/revision.h +++ b/revision.h @@ -124,6 +124,9 @@ struct rev_info { /* Parents of shown commits */ struct object_array boundary_commits; + /* UNINTERESTING commits between the tips and boundary */ + struct object_array uninteresting_commits; + /* The end-points specified by the end user */ struct rev_cmdline_info cmdline; @@ -183,6 +186,7 @@ struct rev_info { unpacked:1, no_kept_objects:1, boundary:2, + collect_uninteresting:1, count:1, left_right:1, left_only:1, @@ -404,6 +408,7 @@ struct rev_info { .expand_tabs_in_log = -1, \ .commit_format = CMIT_FMT_DEFAULT, \ .expand_tabs_in_log_default = 8, \ + .uninteresting_commits = OBJECT_ARRAY_INIT, \ } /**
The boundary-based bitmap walk will want to know which commits were marked UNINTERESTING in the walk used to discover the boundary. Track which commits are marked as such during list limitation as well as the topo walk step, though the latter is not used. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- revision.c | 10 +++++++++- revision.h | 5 +++++ 2 files changed, 14 insertions(+), 1 deletion(-)