Message ID | 20240326130902.7111-5-dirk@gouders.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
Dirk Gouders <dirk@gouders.net> writes: > diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt > index a06c712e46..6901561263 100644 > --- a/Documentation/MyFirstObjectWalk.txt > +++ b/Documentation/MyFirstObjectWalk.txt > @@ -754,10 +754,12 @@ points to the same tree object as its grandparent.) > === Counting Omitted Objects > > We also have the capability to enumerate all objects which were omitted by a > -filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking > -`traverse_commit_list_filtered()` to populate the `omitted` list means that our > -object walk does not perform any better than an unfiltered object walk; all > -reachable objects are walked in order to populate the list. > +filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this, > +change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is > +able to populate an `omitted` list. This list of filtered objects may have > +performance implications, however, because despite filtering objects, the possibly > +much larger set of all reachable objects must be processed in order to > +populate that list. It may be just me not reading what is obvious to everybody else clearly, in which case I am happy to take the above text as-is, but the updated text that says a "list" may have "performance implications" reads a bit odd. It would be understandable if you said "asking for list of filtered objects may have", though. Are you contrasting a call to traverse_commit_list() and traverse_commit_list_filtered() and discussing their relative performance? Of are you contrasting a call to traverse_commit_list_filtered() with and without the omitted parameter, and saying that a call with omitted parameter asks the machinery to do more work so it has to cost more? Other than that I had no trouble with this latest round. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Dirk Gouders <dirk@gouders.net> writes: > >> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt >> index a06c712e46..6901561263 100644 >> --- a/Documentation/MyFirstObjectWalk.txt >> +++ b/Documentation/MyFirstObjectWalk.txt >> @@ -754,10 +754,12 @@ points to the same tree object as its grandparent.) >> === Counting Omitted Objects >> >> We also have the capability to enumerate all objects which were omitted by a >> -filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking >> -`traverse_commit_list_filtered()` to populate the `omitted` list means that our >> -object walk does not perform any better than an unfiltered object walk; all >> -reachable objects are walked in order to populate the list. >> +filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this, >> +change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is >> +able to populate an `omitted` list. This list of filtered objects may have >> +performance implications, however, because despite filtering objects, the possibly >> +much larger set of all reachable objects must be processed in order to >> +populate that list. > > It may be just me not reading what is obvious to everybody else > clearly, in which case I am happy to take the above text as-is, but > the updated text that says a "list" may have "performance > implications" reads a bit odd. It would be understandable if you > said "asking for list of filtered objects may have", though. Oh yes, you are right (as far as I can say): I would change this to something like: "Asking for this list of filtered objects may cause performance implications, however, because in this case, despite filtering objects, the possibly much larger set of all reachable objects must be processed in order to populate that list." (Later in the document, it is suggested to do timing with the two versions, which kind of follows up on the performance impact that is focused on, here. So, this doesn't remain an unresolved detail.) > Are you contrasting a call to traverse_commit_list() and > traverse_commit_list_filtered() and discussing their relative > performance? > > Of are you contrasting a call to traverse_commit_list_filtered() > with and without the omitted parameter, and saying that a call with > omitted parameter asks the machinery to do more work so it has to > cost more? This answer has the potential to cause an enhancement request, anyway: Previously, the document didn't state that traverse_commit_list_filtered() can be used without asking for a `omitted` list (and I didn't change that), so the contrasting in my understanding explicitely is traverse_commit_list() vs. traverse_commit_list_filtered(). The second of your cases is only included implicitely, for those who know or can guess they could use NULL as the pointer to `omitted` list. Thank you for looking at this one more time! Dirk > Other than that I had no trouble with this latest round. > > Thanks.
Dirk Gouders <dirk@gouders.net> writes: > Oh yes, you are right (as far as I can say): I would change this to > something like: > > "Asking for this list of filtered objects may cause performance > implications, however, because in this case, despite filtering objects, > the possibly much larger set of all reachable objects must be processed > in order to populate that list." Better, but the verb "cause" applied to "performance implications" feels funny. It may "have" implications. Alternatively, it may "cause" degradations. As implications can be both positive or negative, it would be better to say "cause performancedegradations" when you know if it is negative. > (Later in the document, it is suggested to do timing with the two > versions, which kind of follows up on the performance impact that is > focused on, here. So, this doesn't remain an unresolved detail.) Great. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Dirk Gouders <dirk@gouders.net> writes: > >> Oh yes, you are right (as far as I can say): I would change this to >> something like: >> >> "Asking for this list of filtered objects may cause performance >> implications, however, because in this case, despite filtering objects, >> the possibly much larger set of all reachable objects must be processed >> in order to populate that list." > > Better, but the verb "cause" applied to "performance implications" > feels funny. It may "have" implications. Alternatively, it may > "cause" degradations. As implications can be both positive or > negative, it would be better to say "cause performancedegradations" > when you know if it is negative. Thank you for the clarification with "implications" I will fix it. Dirk >> (Later in the document, it is suggested to do timing with the two >> versions, which kind of follows up on the performance impact that is >> focused on, here. So, this doesn't remain an unresolved detail.) > > Great. > > Thanks.
diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt index a06c712e46..6901561263 100644 --- a/Documentation/MyFirstObjectWalk.txt +++ b/Documentation/MyFirstObjectWalk.txt @@ -754,10 +754,12 @@ points to the same tree object as its grandparent.) === Counting Omitted Objects We also have the capability to enumerate all objects which were omitted by a -filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking -`traverse_commit_list_filtered()` to populate the `omitted` list means that our -object walk does not perform any better than an unfiltered object walk; all -reachable objects are walked in order to populate the list. +filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this, +change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is +able to populate an `omitted` list. This list of filtered objects may have +performance implications, however, because despite filtering objects, the possibly +much larger set of all reachable objects must be processed in order to +populate that list. First, add the `struct oidset` and related items we will use to iterate it: @@ -778,8 +780,9 @@ static void walken_object_walk( ... ---- -Modify the call to `traverse_commit_list_filtered()` to include your `omitted` -object: +Replace the call to `traverse_commit_list()` with +`traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset +defined and initialized above: ---- ...
Before the changes to count omitted objects, the function traverse_commit_list() was used and its call cannot be changed to pass a pointer to an oidset to record omitted objects. Fix the text to clarify that we now use another traversal function to be able to pass the pointer to the introduced oidset. Helped-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Dirk Gouders <dirk@gouders.net> --- Documentation/MyFirstObjectWalk.txt | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)