trace2: log progress time and throughput

Message ID	20200512214420.36329-1-emilyshaffer@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=6g9E=62=vger.kernel.org=git-owner@kernel.org> Date: Tue, 12 May 2020 14:44:20 -0700 Message-Id: <20200512214420.36329-1-emilyshaffer@google.com> Mime-Version: 1.0 Subject: [PATCH] trace2: log progress time and throughput From: Emily Shaffer <emilyshaffer@google.com> To: git@vger.kernel.org Cc: Emily Shaffer <emilyshaffer@google.com>, Josh Steadmon <steadmon@google.com> Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk
Series	trace2: log progress time and throughput \| expand trace2: log progress time and throughput

Emily Shaffer May 12, 2020, 9:44 p.m. UTC

Rather than teaching only one operation, like 'git fetch', how to write
down throughput to traces, we can learn about a wide range of user
operations that may seem slow by adding tooling to the progress library
itself. Operations which display progress are likely to be slow-running
and the kind of thing we want to monitor for performance anyways. By
showing object counts and data transfer size, we should be able to
make some derived measurements to ensure operations are scaling the way
we expect.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
One note: by putting trace collection into the progress library, we end
up with data events which have titles like "Receiving objects" - not
very machine-parseable. An alternative might be to ask for a
machine-readable title in the progress struct, but I didn't think it was
worth the code churn. However, I don't have experience with processing
the trace data after it's been collected, so if this is a bigger problem
than I think, please say so and I'll figure something out.

CI run here, although it failed on the same error Junio noted today[1]:
https://github.com/nasamuffin/git/runs/668457062

 - Emily

[1]: https://lore.kernel.org/git/xmqqtv0kc2q1.fsf@gitster.c.googlers.com

 progress.c                  | 17 +++++++++++++++++
 t/t0500-progress-display.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

Junio C Hamano May 12, 2020, 10:14 p.m. UTC | #1

Emily Shaffer <emilyshaffer@google.com> writes:

> Rather than teaching only one operation, like 'git fetch', how to write
> down throughput to traces, we can learn about a wide range of user
> operations that may seem slow by adding tooling to the progress library
> itself. Operations which display progress are likely to be slow-running
> and the kind of thing we want to monitor for performance anyways. By
> showing object counts and data transfer size, we should be able to
> make some derived measurements to ensure operations are scaling the way
> we expect.

Excellent observation ;-)

>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> One note: by putting trace collection into the progress library, we end
> up with data events which have titles like "Receiving objects" - not
> very machine-parseable. An alternative might be to ask for a
> machine-readable title in the progress struct, but I didn't think it was
> worth the code churn.

Perhaps make it an optional member of the struct, and convert only a
selected few callers to set it and see how well it helps, while
leaving all the other callers set NULL to the member to keep showing
not-very-useful titles?

> CI run here, although it failed on the same error Junio noted today[1]:

I think Dscho's latest patch, even though it didn't mention my message,
is to fix that error.  So with that queued in the same batch, we'll
see a full CI coverage again.

Will queue.  Thanks.

Josh Steadmon May 13, 2020, 7:46 p.m. UTC | #2

On 2020.05.12 14:44, Emily Shaffer wrote:
> Rather than teaching only one operation, like 'git fetch', how to write
> down throughput to traces, we can learn about a wide range of user
> operations that may seem slow by adding tooling to the progress library
> itself. Operations which display progress are likely to be slow-running
> and the kind of thing we want to monitor for performance anyways. By
> showing object counts and data transfer size, we should be able to
> make some derived measurements to ensure operations are scaling the way
> we expect.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> One note: by putting trace collection into the progress library, we end
> up with data events which have titles like "Receiving objects" - not
> very machine-parseable. An alternative might be to ask for a
> machine-readable title in the progress struct, but I didn't think it was
> worth the code churn. However, I don't have experience with processing
> the trace data after it's been collected, so if this is a bigger problem
> than I think, please say so and I'll figure something out.
> 
> CI run here, although it failed on the same error Junio noted today[1]:
> https://github.com/nasamuffin/git/runs/668457062
> 
>  - Emily
> 
> [1]: https://lore.kernel.org/git/xmqqtv0kc2q1.fsf@gitster.c.googlers.com


I like Junio's idea of adding an optional machine-readable field in the
progress struct, but I don't think it is necessarily a blocker for this
change. Everything looks good to me.

Reviewed-by: Josh Steadmon <steadmon@google.com>

Derrick Stolee May 15, 2020, 10:59 a.m. UTC | #3

On 5/12/2020 5:44 PM, Emily Shaffer wrote:
> Rather than teaching only one operation, like 'git fetch', how to write
> down throughput to traces, we can learn about a wide range of user
> operations that may seem slow by adding tooling to the progress library
> itself. Operations which display progress are likely to be slow-running
> and the kind of thing we want to monitor for performance anyways. By
> showing object counts and data transfer size, we should be able to
> make some derived measurements to ensure operations are scaling the way
> we expect.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> One note: by putting trace collection into the progress library, we end
> up with data events which have titles like "Receiving objects" - not
> very machine-parseable. An alternative might be to ask for a
> machine-readable title in the progress struct, but I didn't think it was
> worth the code churn. However, I don't have experience with processing
> the trace data after it's been collected, so if this is a bigger problem
> than I think, please say so and I'll figure something out.
> 
> CI run here, although it failed on the same error Junio noted today[1]:
> https://github.com/nasamuffin/git/runs/668457062
> 
>  - Emily
> 
> [1]: https://lore.kernel.org/git/xmqqtv0kc2q1.fsf@gitster.c.googlers.com
> 
>  progress.c                  | 17 +++++++++++++++++
>  t/t0500-progress-display.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/progress.c b/progress.c
> index 75633e9c5e..6d2dcff0b6 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -265,6 +265,7 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
>  	progress->title_len = utf8_strwidth(title);
>  	progress->split = 0;
>  	set_progress_signal();
> +	trace2_region_enter("progress", title, the_repository);
>  	return progress;
>  }
>  
> @@ -320,6 +321,22 @@ void stop_progress(struct progress **p_progress)
>  {
>  	finish_if_sparse(*p_progress);
>  
> +	if (p_progress && *p_progress) {
> +		trace2_data_intmax("progress", the_repository, "total_objects",
> +				   (*p_progress)->total);

Should this be "total_objects"? Progress can iterate over lots of things,
such as cache entries. Perhaps leave it as "total"?

> +
> +		if ((*p_progress)->throughput)
> +			trace2_data_intmax("progress", the_repository,
> +					   "total_bytes",
> +					   (*p_progress)->throughput->curr_total);

I like the extra detail here for the specific kind of progress used in
network transfer.

> +	}
> +
> +	trace2_region_leave("progress",
> +			    p_progress && *p_progress
> +				? (*p_progress)->title
> +				: NULL,
> +			    the_repository);
> +
>  	stop_progress_msg(p_progress, _("done"));
>  }

This trace2_region_leave() needs to be conditional on the progress
being non-null. The pattern used by consumers of the progress API is:

	if (my_progress_condition)
		start_progress(&progress);

	do {
		display_progress(&progress, count);
	} while (condition);

	stop_progress(&progress);

The condition to show progress or not is conditional on an option that
is external to the progress API.

The fix for this patch is simple: make the trace2_region_leave() be
conditional on "p_progress && *p_progress".

This leads to an extra problem: if a user uses an option such as "--quiet",
then the trace2 regions won't appear at all. This becomes even more important
when thinking about scripts or tools that have stderr as a non-TTY, which
disables progress most of the time.

It's best to have trace2 data be consistent across commands. I think this can
be accomplished, but it is a more invasive change to the rest of the codebase.
It requires invoking the progress API in all cases, and having the progress
API conditionally initialize the progress struct. The new pattern would look
like this:

	start_progress(&progress, my_progress_condition);

	do {
		display_progress(&progress, count);
	} while (condition);

	stop_progress(&progress);

Then, start_progress() (and variants) could always start the trace2 region,
and stop_progress() could always end the trace2 region.

>  
> diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
> index d2d088d9a0..1ed1df351c 100755
> --- a/t/t0500-progress-display.sh
> +++ b/t/t0500-progress-display.sh
> @@ -283,4 +283,30 @@ test_expect_success 'cover up after throughput shortens a lot' '
>  	test_i18ncmp expect out
>  '
>  
> +test_expect_success 'progress generates traces' '
> +	cat >in <<-\EOF &&
> +	throughput 102400 1000
> +	update
> +	progress 10
> +	throughput 204800 2000
> +	update
> +	progress 20
> +	throughput 307200 3000
> +	update
> +	progress 30
> +	throughput 409600 4000
> +	update
> +	progress 40
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace.event" test-tool progress --total=40 \
> +		"Working hard" <in 2>stderr &&
> +
> +	# t0212/parse_events.perl intentionally omits regions and data.
> +	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
> +	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
> +	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
> +	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
> +'
> +

Thanks for the test! While this is sufficient to test the trace2 in the
happy path, I think that if you run a Git command such as `git commit-graph write`
that automatically quiets progress in the test suite (non-TTY stderr) you will
find the trace2 logs malformed due to excess end regions.

Thanks,
-Stolee

Junio C Hamano May 15, 2020, 3:27 p.m. UTC | #4

Derrick Stolee <stolee@gmail.com> writes:

>> +	if (p_progress && *p_progress) {
>> +		trace2_data_intmax("progress", the_repository, "total_objects",
>> +				   (*p_progress)->total);
>
> Should this be "total_objects"? Progress can iterate over lots of things,
> such as cache entries. Perhaps leave it as "total"?

Good point.  

I think the literal strings like "total_objects" we see in this
patch can later be customized in the same future change that lets us
give a more useful output than "Receiving objects" by having a
handful of customizable strings in the progress struct as hinted by
Emily under the three-dash line.  So I do not mind leaving it as-is.
If a reroll has to come without making these strings customizable,
it may be an improvement to change it to "total" if we do not forget,
but as long as we know where we are going in the longer term, I do
not think it is a big deal.

>> +
>> +		if ((*p_progress)->throughput)
>> +			trace2_data_intmax("progress", the_repository,
>> +					   "total_bytes",
>> +					   (*p_progress)->throughput->curr_total);
>
> I like the extra detail here for the specific kind of progress used in
> network transfer.

The curr_total field must be counted in bytes, by the fact that
strbuf_humanise_bytes() is called by throughput_string() to show it,
so "total_bytes" does make perfect sense.  The field might want to
get renamed to reflect this but obviously that's the kind of change
that we would not want in the middle of a more meaningful change
like this one.

>> +	}
>> +
>> +	trace2_region_leave("progress",
>> +			    p_progress && *p_progress
>> +				? (*p_progress)->title
>> +				: NULL,
>> +			    the_repository);
>> +
>>  	stop_progress_msg(p_progress, _("done"));
>>  }
>
> This trace2_region_leave() needs to be conditional on the progress
> being non-null. The pattern used by consumers of the progress API is:
>
> 	if (my_progress_condition)
> 		start_progress(&progress);
>
> 	do {
> 		display_progress(&progress, count);
> 	} while (condition);
>
> 	stop_progress(&progress);
>
> The condition to show progress or not is conditional on an option that
> is external to the progress API.
>
> The fix for this patch is simple: make the trace2_region_leave() be
> conditional on "p_progress && *p_progress".

Makes sense.

> This leads to an extra problem: if a user uses an option such as "--quiet",
> then the trace2 regions won't appear at all. This becomes even more important
> when thinking about scripts or tools that have stderr as a non-TTY, which
> disables progress most of the time.
>
> It's best to have trace2 data be consistent across commands. I think this can
> be accomplished, but it is a more invasive change to the rest of the codebase.

True, because "--quiet" means we cannot piggyback on the progress code.

> It requires invoking the progress API in all cases, and having the progress
> API conditionally initialize the progress struct. The new pattern would look
> like this:
>
> 	start_progress(&progress, my_progress_condition);
>
> 	do {
> 		display_progress(&progress, count);
> 	} while (condition);
>
> 	stop_progress(&progress);
>
> Then, start_progress() (and variants) could always start the trace2 region,
> and stop_progress() could always end the trace2 region.

OK.  Good analysis.

Thanks.

Junio C Hamano May 15, 2020, 4:09 p.m. UTC | #5

Junio C Hamano <gitster@pobox.com> writes:

> Derrick Stolee <stolee@gmail.com> writes:
> ...
>> This trace2_region_leave() needs to be conditional on the progress
>> being non-null. The pattern used by consumers of the progress API is:
>>
>> 	if (my_progress_condition)
>> 		start_progress(&progress);
>>
>> 	do {
>> 		display_progress(&progress, count);
>> 	} while (condition);
>>
>> 	stop_progress(&progress);
>>
>> The condition to show progress or not is conditional on an option that
>> is external to the progress API.
>>
>> The fix for this patch is simple: make the trace2_region_leave() be
>> conditional on "p_progress && *p_progress".

Oops.  That means that we need to apply the fix before -rc1 to
'master' X-<.

Something like this?

-- >8 --
Subject: progress: call trace2_region_leave() only after calling _enter()
From: Derrick Stolee <dstolee@microsoft.com>

A user of progress API calls start_progress() conditionally and
depends on the display_progress() and stop_progress() functions to
become no-op when start_progress() hasn't been called.

As we added a call to trace2_region_enter() to start_progress(), the
calls to other trace2 API calls from the progress API functions must
make sure that these trace2 calls are skipped when start_progress()
hasn't been called on the progress struct.  Specifically, do not
call trace2_region_leave() from stop_progress() when we haven't
called start_progress(), which would have called the matching
trace2_region_enter().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 progress.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/progress.c b/progress.c
index 6d2dcff0b6..3eda914518 100644
--- a/progress.c
+++ b/progress.c
@@ -329,13 +329,9 @@ void stop_progress(struct progress **p_progress)
 			trace2_data_intmax("progress", the_repository,
 					   "total_bytes",
 					   (*p_progress)->throughput->curr_total);
-	}

-	trace2_region_leave("progress",
-			    p_progress && *p_progress
-				? (*p_progress)->title
-				: NULL,
-			    the_repository);
+		trace2_region_leave("progress", (*p_progress)->title, the_repository);
+	}

 	stop_progress_msg(p_progress, _("done"));
 }

Derrick Stolee May 15, 2020, 4:49 p.m. UTC | #6

On 5/15/2020 12:09 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Derrick Stolee <stolee@gmail.com> writes:
>> ...
>>> This trace2_region_leave() needs to be conditional on the progress
>>> being non-null. The pattern used by consumers of the progress API is:
>>>
>>> 	if (my_progress_condition)
>>> 		start_progress(&progress);
>>>
>>> 	do {
>>> 		display_progress(&progress, count);
>>> 	} while (condition);
>>>
>>> 	stop_progress(&progress);
>>>
>>> The condition to show progress or not is conditional on an option that
>>> is external to the progress API.
>>>
>>> The fix for this patch is simple: make the trace2_region_leave() be
>>> conditional on "p_progress && *p_progress".
> 
> Oops.  That means that we need to apply the fix before -rc1 to
> 'master' X-<.

Sorry I didn't catch this in time for rc0!

> Something like this?

The patch below is exactly what I would have recommended. It looks like
I wrote it, too! ;)

> -- >8 --
> Subject: progress: call trace2_region_leave() only after calling _enter()
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> A user of progress API calls start_progress() conditionally and
> depends on the display_progress() and stop_progress() functions to
> become no-op when start_progress() hasn't been called.
> 
> As we added a call to trace2_region_enter() to start_progress(), the
> calls to other trace2 API calls from the progress API functions must
> make sure that these trace2 calls are skipped when start_progress()
> hasn't been called on the progress struct.  Specifically, do not
> call trace2_region_leave() from stop_progress() when we haven't
> called start_progress(), which would have called the matching
> trace2_region_enter().
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  progress.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/progress.c b/progress.c
> index 6d2dcff0b6..3eda914518 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -329,13 +329,9 @@ void stop_progress(struct progress **p_progress)
>  			trace2_data_intmax("progress", the_repository,
>  					   "total_bytes",
>  					   (*p_progress)->throughput->curr_total);
> -	}
>  
> -	trace2_region_leave("progress",
> -			    p_progress && *p_progress
> -				? (*p_progress)->title
> -				: NULL,
> -			    the_repository);
> +		trace2_region_leave("progress", (*p_progress)->title, the_repository);
> +	}
>  
>  	stop_progress_msg(p_progress, _("done"));
>  }

Thanks,
-Stolee

Junio C Hamano May 15, 2020, 5 p.m. UTC | #7

Derrick Stolee <stolee@gmail.com> writes:

>> Something like this?
>
> The patch below is exactly what I would have recommended. It looks like
> I wrote it, too! ;)

Thanks, will queue.

Jeff Hostetler May 15, 2020, 7:37 p.m. UTC | #8

On 5/12/20 5:44 PM, Emily Shaffer wrote:
> Rather than teaching only one operation, like 'git fetch', how to write
> down throughput to traces, we can learn about a wide range of user
> operations that may seem slow by adding tooling to the progress library
> itself. Operations which display progress are likely to be slow-running
> and the kind of thing we want to monitor for performance anyways. By
> showing object counts and data transfer size, we should be able to
> make some derived measurements to ensure operations are scaling the way
> we expect.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>

We need to be careful here.  The region_enter and _leave calls need
to always be properly nested.  Having an implicit region within the
progress code means the code path between all of the different start_ 
and stop_progress calls need to not interleave with explicit regions.

How about putting something like this in stop_progress:

     struct json_writer jw = JSON_WRITER_INIT;
     jw_object_begin(&jw, 0);
     jw_object_intmax(&jw, "total", p->total);
     if (p->throughput)
         jw_object_intmax(&jw, "throughput", p->throughput->curr_total);
     /*
      * and so on...
      * and maybe include the elapsed time in since the start_progress.
      */
     jw_end(&jw);
     trace2_data_json("progress", NULL, p->title, &jw);
     jw_release(&jw);

That will give you a single record summary of the progress meter
and you won't have to worry about any interleaving.

You could also add a bit to `struct progress` to let you opt-in
or opt-out of the message on a case-by-case basis.

Jeff



> ---
> One note: by putting trace collection into the progress library, we end
> up with data events which have titles like "Receiving objects" - not
> very machine-parseable. An alternative might be to ask for a
> machine-readable title in the progress struct, but I didn't think it was
> worth the code churn. However, I don't have experience with processing
> the trace data after it's been collected, so if this is a bigger problem
> than I think, please say so and I'll figure something out.
> 
> CI run here, although it failed on the same error Junio noted today[1]:
> https://github.com/nasamuffin/git/runs/668457062
> 
>   - Emily
> 
> [1]: https://lore.kernel.org/git/xmqqtv0kc2q1.fsf@gitster.c.googlers.com
> 
>   progress.c                  | 17 +++++++++++++++++
>   t/t0500-progress-display.sh | 26 ++++++++++++++++++++++++++
>   2 files changed, 43 insertions(+)
> 
> diff --git a/progress.c b/progress.c
> index 75633e9c5e..6d2dcff0b6 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -265,6 +265,7 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
>   	progress->title_len = utf8_strwidth(title);
>   	progress->split = 0;
>   	set_progress_signal();
> +	trace2_region_enter("progress", title, the_repository);
>   	return progress;
>   }
>   
> @@ -320,6 +321,22 @@ void stop_progress(struct progress **p_progress)
>   {
>   	finish_if_sparse(*p_progress);
>   
> +	if (p_progress && *p_progress) {
> +		trace2_data_intmax("progress", the_repository, "total_objects",
> +				   (*p_progress)->total);
> +
> +		if ((*p_progress)->throughput)
> +			trace2_data_intmax("progress", the_repository,
> +					   "total_bytes",
> +					   (*p_progress)->throughput->curr_total);
> +	}
> +
> +	trace2_region_leave("progress",
> +			    p_progress && *p_progress
> +				? (*p_progress)->title
> +				: NULL,
> +			    the_repository);
> +
>   	stop_progress_msg(p_progress, _("done"));
>   }
>   
> diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
> index d2d088d9a0..1ed1df351c 100755
> --- a/t/t0500-progress-display.sh
> +++ b/t/t0500-progress-display.sh
> @@ -283,4 +283,30 @@ test_expect_success 'cover up after throughput shortens a lot' '
>   	test_i18ncmp expect out
>   '
>   
> +test_expect_success 'progress generates traces' '
> +	cat >in <<-\EOF &&
> +	throughput 102400 1000
> +	update
> +	progress 10
> +	throughput 204800 2000
> +	update
> +	progress 20
> +	throughput 307200 3000
> +	update
> +	progress 30
> +	throughput 409600 4000
> +	update
> +	progress 40
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace.event" test-tool progress --total=40 \
> +		"Working hard" <in 2>stderr &&
> +
> +	# t0212/parse_events.perl intentionally omits regions and data.
> +	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
> +	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
> +	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
> +	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
> +'
> +
>   test_done
>

Jeff Hostetler May 15, 2020, 7:44 p.m. UTC | #9

On 5/15/20 3:37 PM, Jeff Hostetler wrote:
> 
> 
> On 5/12/20 5:44 PM, Emily Shaffer wrote:
>> Rather than teaching only one operation, like 'git fetch', how to write
>> down throughput to traces, we can learn about a wide range of user
>> operations that may seem slow by adding tooling to the progress library
>> itself. Operations which display progress are likely to be slow-running
>> and the kind of thing we want to monitor for performance anyways. By
>> showing object counts and data transfer size, we should be able to
>> make some derived measurements to ensure operations are scaling the way
>> we expect.
>>
>> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> 
> We need to be careful here.  The region_enter and _leave calls need
> to always be properly nested.  Having an implicit region within the
> progress code means the code path between all of the different start_ 
> and stop_progress calls need to not interleave with explicit regions.
> 
> How about putting something like this in stop_progress:
> 
>      struct json_writer jw = JSON_WRITER_INIT;
>      jw_object_begin(&jw, 0);
>      jw_object_intmax(&jw, "total", p->total);
>      if (p->throughput)
>          jw_object_intmax(&jw, "throughput", p->throughput->curr_total);
>      /*
>       * and so on...
>       * and maybe include the elapsed time in since the start_progress.
>       */
>      jw_end(&jw);
>      trace2_data_json("progress", NULL, p->title, &jw);
>      jw_release(&jw);
> 
> That will give you a single record summary of the progress meter
> and you won't have to worry about any interleaving.
> 
> You could also add a bit to `struct progress` to let you opt-in
> or opt-out of the message on a case-by-case basis.
> 
> Jeff

Hit send too quickly...

One of the advantages of a "data_json" event is that it will keep
the multiple values together in one event.  This makes it easier to
correlate them during post processing.

For example, you could grep for "\"data_json\".*\"progress\".*<title>"
and easily parse and see the total and time and etc in one record.

Ævar Arnfjörð Bjarmason June 21, 2021, 1:24 a.m. UTC | #10

On Tue, May 12 2020, Emily Shaffer wrote:

[Replying to a change long-since merged into git.git's "master"]

> Rather than teaching only one operation, like 'git fetch', how to write
> down throughput to traces, we can learn about a wide range of user
> operations that may seem slow by adding tooling to the progress library
> itself. Operations which display progress are likely to be slow-running
> and the kind of thing we want to monitor for performance anyways. By
> showing object counts and data transfer size, we should be able to
> make some derived measurements to ensure operations are scaling the way
> we expect.

Did you end up using this data for anything?

> [...]
> @@ -320,6 +321,22 @@ void stop_progress(struct progress **p_progress)
>  {
>  	finish_if_sparse(*p_progress);
>  
> +	if (p_progress && *p_progress) {
> +		trace2_data_intmax("progress", the_repository, "total_objects",
> +				   (*p_progress)->total);

We start progress bars for various things in git, yet the trace2 data
calls every such progress bar with a total "total_objects", even though
we may not be counting anything to do with objects.

Wouldn't simply s/total_objects/total/ make more sense here, do you rely
on the name of the current key?

Elijah Newren June 21, 2021, 1:55 p.m. UTC | #11

On Sun, Jun 20, 2021 at 6:32 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Tue, May 12 2020, Emily Shaffer wrote:
>
> [Replying to a change long-since merged into git.git's "master"]
>
> > Rather than teaching only one operation, like 'git fetch', how to write
> > down throughput to traces, we can learn about a wide range of user
> > operations that may seem slow by adding tooling to the progress library
> > itself. Operations which display progress are likely to be slow-running
> > and the kind of thing we want to monitor for performance anyways. By
> > showing object counts and data transfer size, we should be able to
> > make some derived measurements to ensure operations are scaling the way
> > we expect.
>
> Did you end up using this data for anything?

I know you were asking Emily, but independently, I found it useful
while doing merge-ort and diffcore-rename optimizations.  I thought it
was a clever idea for quickly adding more measurement regions easily,
and wished I had thought of it myself earlier.

Ævar Arnfjörð Bjarmason June 21, 2021, 2:51 p.m. UTC | #12

On Mon, Jun 21 2021, Elijah Newren wrote:

> On Sun, Jun 20, 2021 at 6:32 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Tue, May 12 2020, Emily Shaffer wrote:
>>
>> [Replying to a change long-since merged into git.git's "master"]
>>
>> > Rather than teaching only one operation, like 'git fetch', how to write
>> > down throughput to traces, we can learn about a wide range of user
>> > operations that may seem slow by adding tooling to the progress library
>> > itself. Operations which display progress are likely to be slow-running
>> > and the kind of thing we want to monitor for performance anyways. By
>> > showing object counts and data transfer size, we should be able to
>> > make some derived measurements to ensure operations are scaling the way
>> > we expect.
>>
>> Did you end up using this data for anything?
>
> I know you were asking Emily, but independently, I found it useful
> while doing merge-ort and diffcore-rename optimizations.  I thought it
> was a clever idea for quickly adding more measurement regions easily,
> and wished I had thought of it myself earlier.

Indeed, the data's useful. I'm just wondering about the key name.

I saw after I sent this that this was brought up when the patch was
discussed, and the name was just left in there:

https://lore.kernel.org/git/8f159f13-ed61-61ea-8e9a-c1ffbc5fddb3@gmail.com/

Elijah Newren June 21, 2021, 8:28 p.m. UTC | #13

On Mon, Jun 21, 2021 at 7:54 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Mon, Jun 21 2021, Elijah Newren wrote:
>
> > On Sun, Jun 20, 2021 at 6:32 PM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >> On Tue, May 12 2020, Emily Shaffer wrote:
> >>
> >> [Replying to a change long-since merged into git.git's "master"]
> >>
> >> > Rather than teaching only one operation, like 'git fetch', how to write
> >> > down throughput to traces, we can learn about a wide range of user
> >> > operations that may seem slow by adding tooling to the progress library
> >> > itself. Operations which display progress are likely to be slow-running
> >> > and the kind of thing we want to monitor for performance anyways. By
> >> > showing object counts and data transfer size, we should be able to
> >> > make some derived measurements to ensure operations are scaling the way
> >> > we expect.
> >>
> >> Did you end up using this data for anything?
> >
> > I know you were asking Emily, but independently, I found it useful
> > while doing merge-ort and diffcore-rename optimizations.  I thought it
> > was a clever idea for quickly adding more measurement regions easily,
> > and wished I had thought of it myself earlier.
>
> Indeed, the data's useful. I'm just wondering about the key name.
>
> I saw after I sent this that this was brought up when the patch was
> discussed, and the name was just left in there:
>
> https://lore.kernel.org/git/8f159f13-ed61-61ea-8e9a-c1ffbc5fddb3@gmail.com/

Yeah, I didn't comment on the key name because the name wasn't
important for my purposes: I didn't have scripts or anything looking
at the logs, just me reading them.

"total_objects" worked just fine (and was slightly more precise
wording) for my case, but since this code is used in lots of areas,
I'd be totally fine to see it switched to "total".

Taylor Blau June 23, 2021, 2:55 a.m. UTC | #14

On Mon, Jun 21, 2021 at 03:24:47AM +0200, Ævar Arnfjörð Bjarmason wrote:
> > [...]
> > @@ -320,6 +321,22 @@ void stop_progress(struct progress **p_progress)
> >  {
> >  	finish_if_sparse(*p_progress);
> >
> > +	if (p_progress && *p_progress) {
> > +		trace2_data_intmax("progress", the_repository, "total_objects",
> > +				   (*p_progress)->total);
>
> We start progress bars for various things in git, yet the trace2 data
> calls every such progress bar with a total "total_objects", even though
> we may not be counting anything to do with objects.
>
> Wouldn't simply s/total_objects/total/ make more sense here, do you rely
> on the name of the current key?

Yeah, I think that the latter of just "total" makes more sense here. I
was going to comment on the fact that "(*p_progress)->total" could be
written simply as "*p_progress->total", but I'm (a) not sure that I
actually prefer the latter to the former, and (b) I find that kind of
style comment generally useless.

But it may make sense to sidestep the whole thing and have a "struct
progress *progress = *p_progress" (that is assigned after we check
p_progress to make sure it's non-NULL) like in stop_progress_msg, which
would clean up a lot of this.

--- 8< ---

Subject: [PATCH] progress.c: avoid repeatedly dereferencing p_progress

stop_progress() takes a double-pointer to a "progress" struct, and
dereferences it twice in each use except one (checking whether
*p_progress is NULL or not).

Mirror the implementation of stop_progress_msg() below by having a local
single-pointer to a progress struct (which is the dereference of
p_progress) to avoid repeatedly dereferencing it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 progress.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/progress.c b/progress.c
index 680c6a8bf9..390c76b22a 100644
--- a/progress.c
+++ b/progress.c
@@ -319,21 +319,25 @@ static void finish_if_sparse(struct progress *progress)

 void stop_progress(struct progress **p_progress)
 {
+	struct progress *progress;
+
 	if (!p_progress)
 		BUG("don't provide NULL to stop_progress");

-	finish_if_sparse(*p_progress);
+	progress = *p_progress;

-	if (*p_progress) {
+	finish_if_sparse(progress);
+
+	if (progress) {
 		trace2_data_intmax("progress", the_repository, "total_objects",
-				   (*p_progress)->total);
+				   progress->total);

-		if ((*p_progress)->throughput)
+		if (progress->throughput)
 			trace2_data_intmax("progress", the_repository,
 					   "total_bytes",
-					   (*p_progress)->throughput->curr_total);
+					   progress->throughput->curr_total);

-		trace2_region_leave("progress", (*p_progress)->title, the_repository);
+		trace2_region_leave("progress", progress->title, the_repository);
 	}

 	stop_progress_msg(p_progress, _("done"));
--
2.31.1.163.ga65ce7f831

Chris Torek June 23, 2021, 3:29 a.m. UTC | #15

On Tue, Jun 22, 2021 at 7:56 PM Taylor Blau <me@ttaylorr.com> wrote:
>... I was going to comment on the fact that "(*p_progress)->total" could
> be written simply as "*p_progress->total", but I'm (a) not sure that I
> actually prefer the latter to the former, and (b) I find that kind of
> style comment generally useless.

Also, it can't. :-) The binding order is wrong; *p_progress->total binds as
*(p_progress->total), and `p_progress` has to be followed first, so this
just doesn't work.

(Go does precedence and operator syntax a bit better than does C, but
even in Go one must still parenthesize certain pointer-following operations.)

> But it may make sense to sidestep the whole thing and have a "struct
> progress *progress = *p_progress" (that is assigned after we check
> p_progress to make sure it's non-NULL) like in stop_progress_msg, which
> would clean up a lot of this.

This is the way to go.

Chris

Taylor Blau June 23, 2021, 3:42 a.m. UTC | #16

On Tue, Jun 22, 2021 at 08:29:35PM -0700, Chris Torek wrote:
> On Tue, Jun 22, 2021 at 7:56 PM Taylor Blau <me@ttaylorr.com> wrote:
> >... I was going to comment on the fact that "(*p_progress)->total" could
> > be written simply as "*p_progress->total", but I'm (a) not sure that I
> > actually prefer the latter to the former, and (b) I find that kind of
> > style comment generally useless.
>
> Also, it can't. :-) The binding order is wrong; *p_progress->total binds as
> *(p_progress->total), and `p_progress` has to be followed first, so this
> just doesn't work.

Ack, serves me right for starting a discussion based on operator
precedence. Yes, you're right, I was mistaken and forgot that -> binds
with highest precedence in C (above *, which is why this doesn't work).

In any case, my confusion is probably a good reason to avoid this
entirely by manipulating a variable which stores *p_progress.

Thanks,
Taylor

trace2: log progress time and throughput

Commit Message

Comments

Patch