Message ID | 20240802073143.56731-2-hanyang.tony@bytedance.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | revision: fix reachable objects being gc'ed in no blob clone repo | expand |
Han Young <hanyang.tony@bytedance.com> writes: > In revision.c, `process_parents()` recursively marks commit parents > as UNINTERESTING if the commit itself is UNINTERESTING. Makes sense. > `--exclude-promisor-objects` is implemented as > "iterate all objects in promisor packfiles, mark them as UNINTERESTING". Also makes sense. > So when we find a commit object in a promisor packfile, we also set its ancestors > as UNINTERESTING, whether the ancestor is a promisor object or not. > This causes normal objects to be falsely marked as promisor objects > and removed by `git repack`. Ahh, that is not desirable. So the need to do something to fix it is well established here. > Signed-off-by: Han Young <hanyang.tony@bytedance.com> > Helped-by: C O Xing Xin <xingxin.xx@bytedance.com> > --- Please order these trailer lines logically in chronological order, i.e. you get helped by others to complete the change and then seal it by signing it off at the end. I'll swap these two while queuing. > diff --git a/revision.c b/revision.c > index 1c0192f522..eacb0c909d 100644 > --- a/revision.c > +++ b/revision.c > @@ -1164,7 +1164,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit, > * wasn't uninteresting), in which case we need > * to mark its parents recursively too.. > */ > - if (commit->object.flags & UNINTERESTING) { > + if (!revs->exclude_promisor_objects && commit->object.flags & UNINTERESTING) { > while (parent) { > struct commit *p = parent->item; > parent = parent->next; But if the iteration is over all objects in certain packfiles to mark them all uninteresting, shouldn't the caller avoid the call to process_parents() in the first place? Letting process_parents() to do other things and only refrain from doing the "this commit is marked uninteresting" part does not quite match what you are trying to do, at least to me. Please check "git blame" to find those who are likely to know better about the code around the area and ask for help from them. I think the bulk of the logic related came from the series that led to f3d618d2 (Merge branch 'jh/fsck-promisors', 2018-02-13), so I added the authors of the series. It apepars to me that its approach to exclude the objects that appear in the promisor packs may be sound, but the design and implementation of it is dubious. Shouldn't it be getting the list of objects with get_object_list() WITHOUT paying any attention to --exclude-promisor-objects flag, and then filtering objects that appear in the promisor packs out of that list, without mucking with the object and commit traversal in revision.c at all? Thanks.
On Sat, Aug 3, 2024 at 12:45 AM Junio C Hamano <gitster@pobox.com> wrote: > > diff --git a/revision.c b/revision.c > > index 1c0192f522..eacb0c909d 100644 > > --- a/revision.c > > +++ b/revision.c > > @@ -1164,7 +1164,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit, > > * wasn't uninteresting), in which case we need > > * to mark its parents recursively too.. > > */ > > - if (commit->object.flags & UNINTERESTING) { > > + if (!revs->exclude_promisor_objects && commit->object.flags & UNINTERESTING) { > > while (parent) { > > struct commit *p = parent->item; > > parent = parent->next; > > But if the iteration is over all objects in certain packfiles to > mark them all uninteresting, shouldn't the caller avoid the call to > process_parents() in the first place? Letting process_parents() to > do other things and only refrain from doing the "this commit is > marked uninteresting" part does not quite match what you are trying > to do, at least to me. Thanks, I agree process_parents() isn't the right place to fix the issue. > It apepars to me that its approach to exclude the objects that > appear in the promisor packs may be sound, but the design and > implementation of it is dubious. Shouldn't it be getting the list > of objects with get_object_list() WITHOUT paying any attention to > --exclude-promisor-objects flag, and then filtering objects that > appear in the promisor packs out of that list, without mucking with > the object and commit traversal in revision.c at all? The problem is --exclude-promisor-objects is an option in revision.c, and this option is used by pack-objects, prune, midx-write and rev-list. I see there are two ways to fix this issue, one is to remove the --exclude-promisor-objects from revision.c, and filter objects in show_commit or show_objects functions. The other place to filter objects is probably in revision walk, maybe in traverse_commit_list? Thanks.
韩仰 <hanyang.tony@bytedance.com> writes: > Thanks, I agree process_parents() isn't the right place to fix the issue. > >> It apepars to me that its approach to exclude the objects that >> appear in the promisor packs may be sound, but the design and >> implementation of it is dubious. Shouldn't it be getting the list >> of objects with get_object_list() WITHOUT paying any attention to >> --exclude-promisor-objects flag, and then filtering objects that >> appear in the promisor packs out of that list, without mucking with >> the object and commit traversal in revision.c at all? > > The problem is --exclude-promisor-objects is an option in revision.c, > and this option is used by pack-objects, prune, midx-write and rev-list. > I see there are two ways to fix this issue, one is to remove the > --exclude-promisor-objects from revision.c, and filter objects in show_commit > or show_objects functions. The other place to filter objects is probably > in revision walk, maybe in traverse_commit_list? Perhaps another simpler approach may be to use is_promisor_object() function and get rid of this initial marking of these objects in prepare_revision_walk() with the for_each_packed_object() loop, which abuses the UNINTERESTING bit. The feature wants to exclude objects contained in these packs, but does not want to exclude objects that are referred to and outside of these packs, so UNINTERESTING bit whose natural behaviour is to propagate down the history is a very bad fit for it. We may be able to lose a lot of existing code paths that say "if exclude_promisor_objects then do this", and filter objects out with "is_promisor_object()" at the output phase near get_revision(). Jonathan, if I am not mistaken, this is almost all your code. Any insights to lend us, even though you may not be very active around here lately? Thanks.
On Tue, Aug 13, 2024 at 12:09 AM Junio C Hamano <gitster@pobox.com> wrote: > Perhaps another simpler approach may be to use is_promisor_object() > function and get rid of this initial marking of these objects in > prepare_revision_walk() with the for_each_packed_object() loop, > which abuses the UNINTERESTING bit. The feature wants to exclude > objects contained in these packs, but does not want to exclude > objects that are referred to and outside of these packs, so > UNINTERESTING bit whose natural behaviour is to propagate down the > history is a very bad fit for it. We may be able to lose a lot of > existing code paths that say "if exclude_promisor_objects then do > this", and filter objects out with "is_promisor_object()" at the > output phase near get_revision(). I tried to go down this route. I removed the for_each_packed_object() loop and filter promisor commits in get_revision_1() instead. However, this only filtered promisor commits, not promisor trees and objects. A combined approach would be keeping the for_each_packed_object() loop, but only mark non-commit objects as UNINTERESTING there, and filter promisor commits in get_revision()?
diff --git a/revision.c b/revision.c index 1c0192f522..eacb0c909d 100644 --- a/revision.c +++ b/revision.c @@ -1164,7 +1164,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit, * wasn't uninteresting), in which case we need * to mark its parents recursively too.. */ - if (commit->object.flags & UNINTERESTING) { + if (!revs->exclude_promisor_objects && commit->object.flags & UNINTERESTING) { while (parent) { struct commit *p = parent->item; parent = parent->next; diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 2c30c86e7b..4ee3437685 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -22,6 +22,17 @@ pack_as_from_promisor () { echo $HASH } +pack_commit() { + HASH=$(echo $1 | git -C repo pack-objects .git/objects/pack/pack --missing=allow-any) && + delete_object repo $1 && + echo $HASH +} + +pack_commit_as_promisor() { + HASH=$(pack_commit $1) && + >repo/.git/objects/pack/pack-$HASH.promisor +} + promise_and_delete () { HASH=$(git -C repo rev-parse "$1") && git -C repo tag -a -m message my_annotated_tag "$HASH" && @@ -369,7 +380,16 @@ test_expect_success 'missing tree objects with --missing=allow-promisor and --ex git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err && test_must_be_empty rev_list_err && # 3 commits, no blobs or trees - test_line_count = 3 objs + test_line_count = 3 objs && + + # Pack all three commits into individual packs, and mark the last commit pack as promisor + pack_commit_as_promisor $(git -C repo rev-parse baz) && + pack_commit $(git -C repo rev-parse bar) && + pack_commit $(git -C repo rev-parse foo) && + git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err && + test_must_be_empty rev_list_err && + # commits foo and bar should remain + test_line_count = 2 objs ' test_expect_success 'missing non-root tree object and rev-list' '
In revision.c, `process_parents()` recursively marks commit parents as UNINTERESTING if the commit itself is UNINTERESTING. `--exclude-promisor-objects` is implemented as "iterate all objects in promisor packfiles, mark them as UNINTERESTING". So when we find a commit object in a promisor packfile, we also set its ancestors as UNINTERESTING, whether the ancestor is a promisor object or not. This causes normal objects to be falsely marked as promisor objects and removed by `git repack`. Stop setting the parents of uninteresting commits' to UNINTERESTING when we exclude promisor objects, and add a test to prevent regression. Note that this change would cause rev-list to report incorrect results if `--exclude-promisor-objects` is used with other revision walk filters. But `--exclude-promisor-objects` is for internal use only, so we don't have to worry about users using other filters with `--exclude-promisor-objects`. Signed-off-by: Han Young <hanyang.tony@bytedance.com> Helped-by: C O Xing Xin <xingxin.xx@bytedance.com> --- revision.c | 2 +- t/t0410-partial-clone.sh | 22 +++++++++++++++++++++- 2 files changed, 22 insertions(+), 2 deletions(-)