diff mbox series

[v4,4/5] MyFirstObjectWalk: fix description for counting omitted objects

Message ID 20240326130902.7111-5-dirk@gouders.net (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Dirk Gouders March 26, 2024, 1:08 p.m. UTC
Before the changes to count omitted objects, the function
traverse_commit_list() was used and its call cannot be changed to pass
a pointer to an oidset to record omitted objects.

Fix the text to clarify that we now use another traversal function to
be able to pass the pointer to the introduced oidset.

Helped-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Dirk Gouders <dirk@gouders.net>
---
 Documentation/MyFirstObjectWalk.txt | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Junio C Hamano March 26, 2024, 5 p.m. UTC | #1
Dirk Gouders <dirk@gouders.net> writes:

> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
> index a06c712e46..6901561263 100644
> --- a/Documentation/MyFirstObjectWalk.txt
> +++ b/Documentation/MyFirstObjectWalk.txt
> @@ -754,10 +754,12 @@ points to the same tree object as its grandparent.)
>  === Counting Omitted Objects
>  
>  We also have the capability to enumerate all objects which were omitted by a
> -filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
> -`traverse_commit_list_filtered()` to populate the `omitted` list means that our
> -object walk does not perform any better than an unfiltered object walk; all
> -reachable objects are walked in order to populate the list.
> +filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
> +change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
> +able to populate an `omitted` list.  This list of filtered objects may have
> +performance implications, however, because despite filtering objects, the possibly
> +much larger set of all reachable objects must be processed in order to
> +populate that list.

It may be just me not reading what is obvious to everybody else
clearly, in which case I am happy to take the above text as-is, but
the updated text that says a "list" may have "performance
implications" reads a bit odd.  It would be understandable if you
said "asking for list of filtered objects may have", though.

Are you contrasting a call to traverse_commit_list() and
traverse_commit_list_filtered() and discussing their relative
performance?  

Of are you contrasting a call to traverse_commit_list_filtered()
with and without the omitted parameter, and saying that a call with
omitted parameter asks the machinery to do more work so it has to
cost more?

Other than that I had no trouble with this latest round.

Thanks.
Dirk Gouders March 26, 2024, 8:09 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Dirk Gouders <dirk@gouders.net> writes:
>
>> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
>> index a06c712e46..6901561263 100644
>> --- a/Documentation/MyFirstObjectWalk.txt
>> +++ b/Documentation/MyFirstObjectWalk.txt
>> @@ -754,10 +754,12 @@ points to the same tree object as its grandparent.)
>>  === Counting Omitted Objects
>>  
>>  We also have the capability to enumerate all objects which were omitted by a
>> -filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
>> -`traverse_commit_list_filtered()` to populate the `omitted` list means that our
>> -object walk does not perform any better than an unfiltered object walk; all
>> -reachable objects are walked in order to populate the list.
>> +filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
>> +change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
>> +able to populate an `omitted` list.  This list of filtered objects may have
>> +performance implications, however, because despite filtering objects, the possibly
>> +much larger set of all reachable objects must be processed in order to
>> +populate that list.
>
> It may be just me not reading what is obvious to everybody else
> clearly, in which case I am happy to take the above text as-is, but
> the updated text that says a "list" may have "performance
> implications" reads a bit odd.  It would be understandable if you
> said "asking for list of filtered objects may have", though.

Oh yes, you are right (as far as I can say): I would change this to
something like:

"Asking for this list of filtered objects may cause performance
implications, however, because in this case, despite filtering objects,
the possibly much larger set of all reachable objects must be processed
in order to populate that list."

(Later in the document, it is suggested to do timing with the two
versions, which kind of follows up on the performance impact that is
focused on, here.  So, this doesn't remain an unresolved detail.)

> Are you contrasting a call to traverse_commit_list() and
> traverse_commit_list_filtered() and discussing their relative
> performance?  
>
> Of are you contrasting a call to traverse_commit_list_filtered()
> with and without the omitted parameter, and saying that a call with
> omitted parameter asks the machinery to do more work so it has to
> cost more?

This answer has the potential to cause an enhancement request, anyway:

Previously, the document didn't state that
traverse_commit_list_filtered() can be used without asking for a
`omitted` list (and I didn't change that), so the contrasting
in my understanding explicitely is traverse_commit_list()
vs. traverse_commit_list_filtered().

The second of your cases is only included implicitely, for those who
know or can guess they could use NULL as the pointer to `omitted` list.

Thank you for looking at this one more time!

Dirk

> Other than that I had no trouble with this latest round.
>
> Thanks.
Junio C Hamano March 26, 2024, 8:24 p.m. UTC | #3
Dirk Gouders <dirk@gouders.net> writes:

> Oh yes, you are right (as far as I can say): I would change this to
> something like:
>
> "Asking for this list of filtered objects may cause performance
> implications, however, because in this case, despite filtering objects,
> the possibly much larger set of all reachable objects must be processed
> in order to populate that list."

Better, but the verb "cause" applied to "performance implications"
feels funny.  It may "have" implications.  Alternatively, it may
"cause" degradations.  As implications can be both positive or
negative, it would be better to say "cause performancedegradations"
when you know if it is negative.

> (Later in the document, it is suggested to do timing with the two
> versions, which kind of follows up on the performance impact that is
> focused on, here.  So, this doesn't remain an unresolved detail.)

Great.

Thanks.
Dirk Gouders March 27, 2024, 6:30 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Dirk Gouders <dirk@gouders.net> writes:
>
>> Oh yes, you are right (as far as I can say): I would change this to
>> something like:
>>
>> "Asking for this list of filtered objects may cause performance
>> implications, however, because in this case, despite filtering objects,
>> the possibly much larger set of all reachable objects must be processed
>> in order to populate that list."
>
> Better, but the verb "cause" applied to "performance implications"
> feels funny.  It may "have" implications.  Alternatively, it may
> "cause" degradations.  As implications can be both positive or
> negative, it would be better to say "cause performancedegradations"
> when you know if it is negative.

Thank you for the clarification with "implications"
I will fix it.

Dirk

>> (Later in the document, it is suggested to do timing with the two
>> versions, which kind of follows up on the performance impact that is
>> focused on, here.  So, this doesn't remain an unresolved detail.)
>
> Great.
>
> Thanks.
diff mbox series

Patch

diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
index a06c712e46..6901561263 100644
--- a/Documentation/MyFirstObjectWalk.txt
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -754,10 +754,12 @@  points to the same tree object as its grandparent.)
 === Counting Omitted Objects
 
 We also have the capability to enumerate all objects which were omitted by a
-filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
-`traverse_commit_list_filtered()` to populate the `omitted` list means that our
-object walk does not perform any better than an unfiltered object walk; all
-reachable objects are walked in order to populate the list.
+filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
+change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
+able to populate an `omitted` list.  This list of filtered objects may have
+performance implications, however, because despite filtering objects, the possibly
+much larger set of all reachable objects must be processed in order to
+populate that list.
 
 First, add the `struct oidset` and related items we will use to iterate it:
 
@@ -778,8 +780,9 @@  static void walken_object_walk(
 	...
 ----
 
-Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
-object:
+Replace the call to `traverse_commit_list()` with
+`traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset
+defined and initialized above:
 
 ----
 	...