diff mbox series

[RFC] trace2 API: don't save a copy of constant "thread_name"

Message ID RFC-patch-1.1-8563d017137-20221007T010829Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] trace2 API: don't save a copy of constant "thread_name" | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 7, 2022, 1:10 a.m. UTC
Since ee4512ed481 (trace2: create new combined trace facility,
2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has
been copied from the caller, but those callers have always passed a
constant string:

	$ git -P grep '^\s*trace2_thread_start\('
	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");

This isn't needed for optimization, but apparently[1] there's been
some confusion about the non-const-ness of the previous "struct
strbuf".

Using the caller's string here makes this more straightforward, as
it's now clear that we're not dynamically constructing these. It's
also what the progress API does with its "title" string.

Since we know we're hardcoding these thread names let's BUG() out when
we see that the length of the name plus the length of the prefix would
exceed the maximum length for the "perf" format.

1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

On Thu, Oct 06 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> So, we don't need to strdup() and store that "preload_thread" anywhere.
>> It's already a constant string we have hardcoded in the program. We just
>> need to save a pointer to it.
>
> That sounds even simpler.

A cleaned up version of the test code I had on top of "master", RFC
because I may still be missing some context here. E.g. maybe there's a
plan to dynamically construct these thread names?

 json-writer.c          | 17 +++++++++++++++++
 json-writer.h          |  4 ++++
 trace2/tr2_tgt_event.c |  2 +-
 trace2/tr2_tgt_perf.c  | 10 +++++++---
 trace2/tr2_tls.c       | 14 +++++---------
 trace2/tr2_tls.h       |  9 +++++++--
 6 files changed, 41 insertions(+), 15 deletions(-)

Comments

Junio C Hamano Oct. 7, 2022, 1:16 a.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> A cleaned up version of the test code I had on top of "master", RFC
> because I may still be missing some context here. E.g. maybe there's a
> plan to dynamically construct these thread names?

That's nice to learn, indeed.

> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
> +			     int thread_id)
> +{
> +	object_common(jw, "thread");
> +	strbuf_addch(&jw->json, '"');
> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
> +	strbuf_addch(&jw->json, '"');
> +}

...

> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>  	}
>  
>  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
> -		    event_name);
> +	oldlen = buf->len;
> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);

Having to do strbuf_addf() many times may negatively affect perf_*
stuff, if this code is invoked in the hot path.  I however tend to
treat anything that involves an I/O not performance critical, and
this certainly falls into that category.
Ævar Arnfjörð Bjarmason Oct. 7, 2022, 10:03 a.m. UTC | #2
On Thu, Oct 06 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> A cleaned up version of the test code I had on top of "master", RFC
>> because I may still be missing some context here. E.g. maybe there's a
>> plan to dynamically construct these thread names?
>
> That's nice to learn, indeed.
>
>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>> +			     int thread_id)
>> +{
>> +	object_common(jw, "thread");
>> +	strbuf_addch(&jw->json, '"');
>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>> +	strbuf_addch(&jw->json, '"');
>> +}
>
> ...
>
>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>  	}
>>  
>>  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>> -		    event_name);
>> +	oldlen = buf->len;
>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>
> Having to do strbuf_addf() many times may negatively affect perf_*
> stuff, if this code is invoked in the hot path.  I however tend to
> treat anything that involves an I/O not performance critical, and
> this certainly falls into that category.

Yes, and that function already called strbuf_addf() 5-7 times, this adds
one more, but only if "thread_id" is > 0.

The reason I added jw_object_string_thread() was to avoid the malloc() &
free() of a temporary "struct strbuf", it would have been more
straightforward to call jw_object_string() like that.

I don't think anyone cares about the raw performance of the "perf"
output, but the "JSON" one needs to be fast(er).

But even that output will malloc()/free() for each line it emits, and
often multiple times within one line (e.g. each time we format a
double).

So if we do want to optimize this in terms of memory use the lowest
hanging fruit seems to be to just have a per-thread "scratch" buffer
we'd write to, we could also observe that we're writing to a file and
just directly write to it in most cases (although we'd need to be
careful to write partial-and-still-invalid JSON lines in that case...).
Jeff Hostetler Oct. 10, 2022, 7:05 p.m. UTC | #3
On 10/6/22 9:10 PM, Ævar Arnfjörð Bjarmason wrote:
> Since ee4512ed481 (trace2: create new combined trace facility,
> 2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has
> been copied from the caller, but those callers have always passed a
> constant string:
> 
> 	$ git -P grep '^\s*trace2_thread_start\('
> 	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
> 	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
> 	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
> 	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");
> 
> This isn't needed for optimization, but apparently[1] there's been
> some confusion about the non-const-ness of the previous "struct
> strbuf".
> 
> Using the caller's string here makes this more straightforward, as
> it's now clear that we're not dynamically constructing these. It's
> also what the progress API does with its "title" string.
> 
> Since we know we're hardcoding these thread names let's BUG() out when
> we see that the length of the name plus the length of the prefix would
> exceed the maximum length for the "perf" format.
> 
> 1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

PLEASE DON'T DO THIS.

If you don't like my patch, fine.  Let's discuss it.  But DON'T submit
a new one to replace it.  Or worse, try to inject it into the middle
of an existing series.


Yes, current callers are passing a string literal and thread-start
could take a "const char*" to it, but there is no way to guarantee
that that is safe if someone decides to dynamically construct their
thread-name and pass it in (since we don't know the lifetime of that
pointer).  So it is safer to copy it into the thread context so that
it can be used by later trace messages.


[...]
> +void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name,
> +			       int thread_id);
> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
> +			     int thread_id);

This violates a separation of concerns.  json-writer is ONLY concerned
with formatting valid JSON from basic data types.  It does not know
about threads or thread contexts.

`js_strbuf_add_thread_name()` also violates the json-writer conventions
-- that it takes a "struct json_writer *" pointer.  There is nothing
about JSON here.

You might write a helper (inside of tr2_tgt_event.c) that formats a
thread-name from the id and hint, but that is specific to the Event
target -- not to JSON, nor the JSON writer.

But then again, why make every trace message from every target format
that "th%0d:%s" when we could save some time and format it in the
thread-start and just USE it.


[...]
> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>   	}
>   
>   	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
> -		    event_name);
> +	oldlen = buf->len;
> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);

This stands out as very wrong.  The _Perf target does not use JSON
at all, yet here we are calling a jw_ routine.  Again, that code is
in the wrong place.


I'm going to clip the rest of this commit, since the above invalidates
it.

Jeff
Jeff Hostetler Oct. 10, 2022, 7:16 p.m. UTC | #4
On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Oct 06 2022, Junio C Hamano wrote:
> 
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> A cleaned up version of the test code I had on top of "master", RFC
>>> because I may still be missing some context here. E.g. maybe there's a
>>> plan to dynamically construct these thread names?
>>
>> That's nice to learn, indeed.
>>
>>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>>> +			     int thread_id)
>>> +{
>>> +	object_common(jw, "thread");
>>> +	strbuf_addch(&jw->json, '"');
>>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>>> +	strbuf_addch(&jw->json, '"');
>>> +}
>>
>> ...
>>
>>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>>   	}
>>>   
>>>   	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>>> -		    event_name);
>>> +	oldlen = buf->len;
>>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>>
>> Having to do strbuf_addf() many times may negatively affect perf_*
>> stuff, if this code is invoked in the hot path.  I however tend to
>> treat anything that involves an I/O not performance critical, and
>> this certainly falls into that category.
> 
> Yes, and that function already called strbuf_addf() 5-7 times, this adds
> one more, but only if "thread_id" is > 0.
> 
> The reason I added jw_object_string_thread() was to avoid the malloc() &
> free() of a temporary "struct strbuf", it would have been more
> straightforward to call jw_object_string() like that.
> 
> I don't think anyone cares about the raw performance of the "perf"
> output, but the "JSON" one needs to be fast(er).
> 
> But even that output will malloc()/free() for each line it emits, and
> often multiple times within one line (e.g. each time we format a
> double).
> 
> So if we do want to optimize this in terms of memory use the lowest
> hanging fruit seems to be to just have a per-thread "scratch" buffer
> we'd write to, we could also observe that we're writing to a file and
> just directly write to it in most cases (although we'd need to be
> careful to write partial-and-still-invalid JSON lines in that case...).
> 

WRT optimizing memory usage.  We're talking about ~25 byte buffer
per thread.  Most commands execute in 1 thread -- if they read the
index they may have ~10 threads (depending on the size of the index
and if preload-index is enabled).  So, I don't think we really need
to optimize this.  Threading is used extensively in fsmonitor-daemon,
but it creates a fixed thread-pool at startup, so it may have ~12
threads.  Again, not worth optimizing for the thread-name field.

Now, if you want to optimize over all trace2 events (a completely
different topic), you could create a large scratch strbuf buffer in
each thread context and use it so that we don't have to malloc/free
during each trace message.  That might be worth while.


We must not do partial writes to the trace2 files as we're
constructing fields.  The trace2 files are opened with O_APPEND
so that we get the atomic lseek(2)+write(2) so that lines get
written without overwrites when multiple threads and/or processes
are tracing.

Also, when writing to a named pipe, we get "message" semantics
on write() boundaries, which makes post-processing easier.

Jeff
Ævar Arnfjörð Bjarmason Oct. 11, 2022, 12:52 p.m. UTC | #5
On Mon, Oct 10 2022, Jeff Hostetler wrote:

> On 10/6/22 9:10 PM, Ævar Arnfjörð Bjarmason wrote:
>> Since ee4512ed481 (trace2: create new combined trace facility,
>> 2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has
>> been copied from the caller, but those callers have always passed a
>> constant string:
>> 	$ git -P grep '^\s*trace2_thread_start\('
>> 	Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread");
>> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-health");
>> 	builtin/fsmonitor--daemon.c:    trace2_thread_start("fsm-listen");
>> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-worker");
>> 	compat/simple-ipc/ipc-unix-socket.c:    trace2_thread_start("ipc-accept");
>> 	compat/simple-ipc/ipc-win32.c:  trace2_thread_start("ipc-server");
>> 	t/helper/test-fsmonitor-client.c:       trace2_thread_start("hammer");
>> 	t/helper/test-simple-ipc.c:     trace2_thread_start("multiple");
>> This isn't needed for optimization, but apparently[1] there's been
>> some confusion about the non-const-ness of the previous "struct
>> strbuf".
>> Using the caller's string here makes this more straightforward, as
>> it's now clear that we're not dynamically constructing these. It's
>> also what the progress API does with its "title" string.
>> Since we know we're hardcoding these thread names let's BUG() out
>> when
>> we see that the length of the name plus the length of the prefix would
>> exceed the maximum length for the "perf" format.
>> 1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> PLEASE DON'T DO THIS.
>
> If you don't like my patch, fine.  Let's discuss it.  But DON'T submit
> a new one to replace it.  Or worse, try to inject it into the middle
> of an existing series.

I'm not seeking to replace your series, or to tick you off, sorry if it
came across like that.

I just thought (and still think) that we were at a point in the
discussion where it seemed clear that I wasn't quite managing to get
across to you what I meant, so sending that in the form of working code
should clarify things.

Per Junio's "That's nice to learn, indeed." in
<xmqqo7uoh1q0.fsf@gitster.g> it seems to have had that intended effect
on him. It's marked as an RFC, so not-a-thing-to-pick-up, but just for
discussion.

> Yes, current callers are passing a string literal and thread-start
> could take a "const char*" to it, but there is no way to guarantee
> that that is safe if someone decides to dynamically construct their
> thread-name and pass it in (since we don't know the lifetime of that
> pointer).  So it is safer to copy it into the thread context so that
> it can be used by later trace messages.

I think that's a defensible opinion, but I also think it's fair to say
that:

 * This seems to be *the* motivation for why you're doing things the way
   you're doing them, and at least to this reviewer that wasn't really
   coming across...

 * ...nor the context of why we'd need that sort of guarded API in this
   case, but not e.g. for another widely-used API like start_progress().

   See 791afae2924 (progress.c tests: make start/stop commands on stdin,
   2022-02-03) for a case where we're using that where we need to work
   around its behavior (and no, I didn't make the underlying API that
   way, it's just a commit of mine where I'm having to work with it).

I think designing our internal APIs to not be quite so guarded is fine,
and we do that in various other contexts (progress, etc.). We control
both the API and its users, so just leaving a "this must be a constant"
should be enough.

But even if you want to be paranoid about it there's much easier ways to
do that which give you more of the safety you seem to want. E.g. this on
top of master (and easily adjusted on top of this RFC patch):
	
	diff --git a/trace2.h b/trace2.h
	index 88d906ea830..1c3a98fb30f 100644
	--- a/trace2.h
	+++ b/trace2.h
	@@ -306,12 +306,18 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code);
	  *
	  * Thread names should be descriptive, like "preload_index".
	  * Thread names will be decorated with an instance number automatically.
	+ * Thread names must point to data that won't change after it's passed
	+ * into this function. Once trace2_thread_exit() is called it can be
	+ * free'd.
	  */
	 void trace2_thread_start_fl(const char *file, int line,
	 			    const char *thread_name);
	 
	+/*
	+ * The "" is to assure us that API users pass only constant strings
	+ */
	 #define trace2_thread_start(thread_name) \
	-	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name))
	+	trace2_thread_start_fl(__FILE__, __LINE__, (thread_name ""))
	 
	 /*
	  * Emit a 'thread_exit' event.  This must be called from inside the

Will pass, as we only pass it constant strings, but if someone were to
pass a variable it'll blow up, at which point we could provide some
inline macro/function that would do the required xstrdup().

All of which I think is *still* being too paranoid, but which I think
*if* you want the paranoia is much more explicit about what we're trying
to accomplish with said paranoida, and where the compiler will help you.

> [...]
>> +void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name,
>> +			       int thread_id);
>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>> +			     int thread_id);
>
> This violates a separation of concerns.  json-writer is ONLY concerned
> with formatting valid JSON from basic data types.  It does not know
> about threads or thread contexts.
>
> `js_strbuf_add_thread_name()` also violates the json-writer conventions
> -- that it takes a "struct json_writer *" pointer.  There is nothing
> about JSON here.
>
> You might write a helper (inside of tr2_tgt_event.c) that formats a
> thread-name from the id and hint, but that is specific to the Event
> target -- not to JSON, nor the JSON writer.

That's fair, more on that below.

> But then again, why make every trace message from every target format
> that "th%0d:%s" when we could save some time and format it in the
> thread-start and just USE it.

If you actually care about this being fasterer -- and only reason for
posting this RFC patch is to try to tease out *why* that is -- then this
part of your concern can be trivially mitigated with having a struct
member like:

	char thread_id_str[3];

We'd then just snprintf() into that in tr2tls_create_self(). Then when
we print the thread to the JSON or log you'd do so without any
strbuf_addf(), just a strbuf_addstr() or strbuf_add().

I think that micro-optimization isn't needed in this case, but it *is*
easy to do .

> [...]
>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>   	}
>>     	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>> -		    event_name);
>> +	oldlen = buf->len;
>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>
> This stands out as very wrong.  The _Perf target does not use JSON
> at all, yet here we are calling a jw_ routine.  Again, that code is
> in the wrong place.
>
> I'm going to clip the rest of this commit, since the above invalidates
> it.

A helper function being in the wrong place invalidates the whole commit?

I think you're right that this jw_strbuf_add_thread_name() helper should
live somewhere else, probably in thread-utils.c.

So, pretending that it's in whatever place you'd be comfortable with,
and using whatever naming convention you'd prefer. What do you think
about the rest of the commit?

You snippet it just as you were getting to the meaty part of it, namely:

 * With this approach we can BUG() out as soon as we try to construct
   the main thread if its name is bad, we don't need to wait until
   runtime when a child thread runs into the limit.

 * We no longer need the whole thread-creation-time string duplication,
   associated storage in the struct etc.

 * That struct member is "const", addresing your initial concern of
   (from the upthread commit message):

	Using a (non-const) `strbuf` structure for it caused some
	confusion in the past because it implied that someone could
	rename a thread after it was created.  That usage was not
	intended.

   Although I think (and I'm possibly misreading it) that your
   commentary here is saying that even that's not enough, i.e. we can't
   just leave it at a "const" here, but must assume that an API user
   will disregard that and modify it after it's passed to us anyway.
Ævar Arnfjörð Bjarmason Oct. 11, 2022, 1:31 p.m. UTC | #6
On Mon, Oct 10 2022, Jeff Hostetler wrote:

> On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Oct 06 2022, Junio C Hamano wrote:
>> 
>>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>>
>>>> A cleaned up version of the test code I had on top of "master", RFC
>>>> because I may still be missing some context here. E.g. maybe there's a
>>>> plan to dynamically construct these thread names?
>>>
>>> That's nice to learn, indeed.
>>>
>>>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
>>>> +			     int thread_id)
>>>> +{
>>>> +	object_common(jw, "thread");
>>>> +	strbuf_addch(&jw->json, '"');
>>>> +	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
>>>> +	strbuf_addch(&jw->json, '"');
>>>> +}
>>>
>>> ...
>>>
>>>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name,
>>>>   	}
>>>>     	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>>>> -	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
>>>> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
>>>> -		    event_name);
>>>> +	oldlen = buf->len;
>>>> +	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
>>>> +	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
>>>> +	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
>>>> +		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
>>>
>>> Having to do strbuf_addf() many times may negatively affect perf_*
>>> stuff, if this code is invoked in the hot path.  I however tend to
>>> treat anything that involves an I/O not performance critical, and
>>> this certainly falls into that category.
>> Yes, and that function already called strbuf_addf() 5-7 times, this
>> adds
>> one more, but only if "thread_id" is > 0.
>> The reason I added jw_object_string_thread() was to avoid the
>> malloc() &
>> free() of a temporary "struct strbuf", it would have been more
>> straightforward to call jw_object_string() like that.
>> I don't think anyone cares about the raw performance of the "perf"
>> output, but the "JSON" one needs to be fast(er).
>> But even that output will malloc()/free() for each line it emits,
>> and
>> often multiple times within one line (e.g. each time we format a
>> double).
>> So if we do want to optimize this in terms of memory use the lowest
>> hanging fruit seems to be to just have a per-thread "scratch" buffer
>> we'd write to, we could also observe that we're writing to a file and
>> just directly write to it in most cases (although we'd need to be
>> careful to write partial-and-still-invalid JSON lines in that case...).
>> 

I left more extensive commentary in the side-thread in
https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@evledraar.gmail.com/,
just a quick reply here.

> WRT optimizing memory usage.  We're talking about ~25 byte buffer
> per thread.  Most commands execute in 1 thread -- if they read the
> index they may have ~10 threads (depending on the size of the index
> and if preload-index is enabled).  So, I don't think we really need
> to optimize this.  Threading is used extensively in fsmonitor-daemon,
> but it creates a fixed thread-pool at startup, so it may have ~12
> threads.  Again, not worth optimizing for the thread-name field.

Yes, I agree it's not worth optimizing.

The reason for commenting on this part is that it isn't clear to me why
your proposed patch then isn't doing the more obvious "it's not worth
optimizing" pattern, per Junio's [1] comment on the initial version.

The "flex array" method is seemingly taking pains to reduce the runtime
memory use of these by embedding this string in the space reserved for
the struct.

So it's just meant as a question for you & the proposed patch.

> Now, if you want to optimize over all trace2 events (a completely
> different topic), you could create a large scratch strbuf buffer in
> each thread context and use it so that we don't have to malloc/free
> during each trace message.  That might be worth while.

*nod*

> We must not do partial writes to the trace2 files as we're
> constructing fields.  The trace2 files are opened with O_APPEND
> so that we get the atomic lseek(2)+write(2) so that lines get
> written without overwrites when multiple threads and/or processes
> are tracing.
>
> Also, when writing to a named pipe, we get "message" semantics
> on write() boundaries, which makes post-processing easier.

*nod*

1. https://lore.kernel.org/git/xmqq8rwcjttq.fsf@gitster.g/
2. https://lore.kernel.org/git/RFC-patch-1.1-8563d017137-20221007T010829Z-avarab@gmail.com/
Junio C Hamano Oct. 11, 2022, 2:40 p.m. UTC | #7
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Per Junio's "That's nice to learn, indeed." in
> <xmqqo7uoh1q0.fsf@gitster.g> it seems to have had that intended effect
> on him.

I was commenting on the goal, i.e. you "may still be missing some
context here, maybe there's a plan to ...", and I meant that the
plan of the overall effort is something that is nice to learn before
going further.  I was not endorsing the method you are taking to
achieve that goal, though.

FWIW, I find my code sent in as a comment easier to read than my
prose alone for any topic, but that is only because it is "my" code
is easy to read for "me".  I am sure others would find it
unnecessary burden to figure out what the alternative/replacement I
send out intends to do and why it does so in the way it does, and
would rather appreciate if I explained these things in prose that is
easy to understand and rich in "why", which alternative/replacement
code would solely lack.  Code snippet helps illustrate points on
"how", but is often a poor replacement for proper explanation
because it is a bad medium to convey "why".

Same would apply to your code.  For others, including me, it often
is a lot of work to figure out what your code is trying to do, and
more importantly why it does what it tries to do in the way it does.

After all, when you are having hard time communicating why you want
to do things differently from the patch author in prose, code
snippet would probably be the worst primary medium to do so, because
it is full of "how exactly" with little "why".
Jeff Hostetler Oct. 12, 2022, 1:31 p.m. UTC | #8
On 10/11/22 9:31 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Oct 10 2022, Jeff Hostetler wrote:
> 
>> On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote:
>>> On Thu, Oct 06 2022, Junio C Hamano wrote:
>>>
>>>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>>>
>>>>> A cleaned up version of the test code I had on top of "master", RFC
>>>>> because I may still be missing some context here. E.g. maybe there's a
>>>>> plan to dynamically construct these thread names?
>>>>
[...]

> I left more extensive commentary in the side-thread in
> https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@evledraar.gmail.com/,
> just a quick reply here.
> 
>> WRT optimizing memory usage.  We're talking about ~25 byte buffer
>> per thread.  Most commands execute in 1 thread -- if they read the
>> index they may have ~10 threads (depending on the size of the index
>> and if preload-index is enabled).  So, I don't think we really need
>> to optimize this.  Threading is used extensively in fsmonitor-daemon,
>> but it creates a fixed thread-pool at startup, so it may have ~12
>> threads.  Again, not worth optimizing for the thread-name field.
> 
> Yes, I agree it's not worth optimizing.
> 
> The reason for commenting on this part is that it isn't clear to me why
> your proposed patch then isn't doing the more obvious "it's not worth
> optimizing" pattern, per Junio's [1] comment on the initial version.
> 
> The "flex array" method is seemingly taking pains to reduce the runtime
> memory use of these by embedding this string in the space reserved for
> the struct.
> 
> So it's just meant as a question for you & the proposed patch.

I think we're converging on some common understanding (and I
think we've gone around on this topic more than enough).  :-)

I really was just trying to get rid of the strbuf and make it
a fixed string -- I chose a flex-array rather than just detaching
the buffer from a local in the thread-start code.  I should have
done the latter.  I saw the flex-array as a fixed-size object
that can't be replaced or extended (without recreating the
thread-local storage) -- yes, people could overwrite existing
bytes in-place in the flex-array, but who does that??


I understood what you were asking (illustrated in your RFC).
That is, I understood the "what/how" you wanted to do to refactor /
redesign the field, but I couldn't understand the "why".  That
is, why you've taken such interest in this field (and such
a relatively unimportant change).  We've spent nearly a week
discussing it and we both agree that the optimization that I
didn't suggest isn't worth doing.  (I'm paraphrasing slightly.) :-)

So, rather than continuing with the back-n-forth, let me skip
over the remaining questions in this thread and prepare a re-roll.
Hopefully, I can simplify and more clearly explain the method to
my madness and we can move on.


>> Now, if you want to optimize over all trace2 events (a completely
>> different topic), you could create a large scratch strbuf buffer in
>> each thread context and use it so that we don't have to malloc/free
>> during each trace message.  That might be worth while.
> 
> *nod*

I'll make a note to revisit this idea in a future series.

Thanks
Jeff
diff mbox series

Patch

diff --git a/json-writer.c b/json-writer.c
index f1cfd8fa8c6..569a75bee51 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -161,6 +161,23 @@  void jw_object_string(struct json_writer *jw, const char *key, const char *value
 	append_quoted_string(&jw->json, value);
 }
 
+void jw_strbuf_add_thread_name(struct strbuf *sb, const char *thread_name,
+			       int thread_id)
+{
+	if (thread_id)
+		strbuf_addf(sb, "th%02d:", thread_id);
+	strbuf_addstr(sb, thread_name);
+}
+
+void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
+			     int thread_id)
+{
+	object_common(jw, "thread");
+	strbuf_addch(&jw->json, '"');
+	jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id);
+	strbuf_addch(&jw->json, '"');
+}
+
 void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value)
 {
 	object_common(jw, key);
diff --git a/json-writer.h b/json-writer.h
index 209355e0f12..269c203b119 100644
--- a/json-writer.h
+++ b/json-writer.h
@@ -75,6 +75,10 @@  void jw_release(struct json_writer *jw);
 void jw_object_begin(struct json_writer *jw, int pretty);
 void jw_array_begin(struct json_writer *jw, int pretty);
 
+void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name,
+			       int thread_id);
+void jw_object_string_thread(struct json_writer *jw, const char *thread_name,
+			     int thread_id);
 void jw_object_string(struct json_writer *jw, const char *key,
 		      const char *value);
 void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value);
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 37a3163be12..1308cf05df4 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -90,7 +90,7 @@  static void event_fmt_prepare(const char *event_name, const char *file,
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread", ctx->thread_name.buf);
+	jw_object_string_thread(jw, ctx->thread_name, ctx->thread_id);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 8cb792488c8..ab21277eb36 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -69,6 +69,8 @@  static void perf_fmt_prepare(const char *event_name,
 			     const char *category, struct strbuf *buf)
 {
 	int len;
+	size_t oldlen;
+	int padlen;
 
 	strbuf_setlen(buf, 0);
 
@@ -107,9 +109,11 @@  static void perf_fmt_prepare(const char *event_name,
 	}
 
 	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
-	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
-		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
-		    event_name);
+	oldlen = buf->len;
+	jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id);
+	padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);;
+	strbuf_addf(buf, "%-*s | %-*s | ", padlen, "",
+		    TR2FMT_PERF_MAX_EVENT_NAME, event_name);
 
 	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
 	if (repo)
diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 7da94aba522..aa9aeb67fca 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -36,6 +36,9 @@  struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 {
 	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
 
+	if (strlen(thread_name) + TR2_MAX_THREAD_NAME_PREFIX > TR2_MAX_THREAD_NAME)
+		BUG("too long thread name '%s'", thread_name);
+
 	/*
 	 * Implicitly "tr2tls_push_self()" to capture the thread's start
 	 * time in array_us_start[0].  For the main thread this gives us the
@@ -45,15 +48,9 @@  struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 	ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t));
 	ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start;
 
+	ctx->thread_name = thread_name;
 	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
-	strbuf_init(&ctx->thread_name, 0);
-	if (ctx->thread_id)
-		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
-	strbuf_addstr(&ctx->thread_name, thread_name);
-	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
-		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
-
 	pthread_setspecific(tr2tls_key, ctx);
 
 	return ctx;
@@ -95,7 +92,6 @@  void tr2tls_unset_self(void)
 
 	pthread_setspecific(tr2tls_key, NULL);
 
-	strbuf_release(&ctx->thread_name);
 	free(ctx->array_us_start);
 	free(ctx);
 }
@@ -113,7 +109,7 @@  void tr2tls_pop_self(void)
 	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
 
 	if (!ctx->nr_open_regions)
-		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
+		BUG("no open regions in thread '%s'", ctx->thread_name);
 
 	ctx->nr_open_regions--;
 }
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index b1e327a928e..f600eb22551 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -4,12 +4,17 @@ 
 #include "strbuf.h"
 
 /*
- * Arbitry limit for thread names for column alignment.
+ * Arbitry limit for thread names for column alignment. The overall
+ * max length is TR2_MAX_THREAD_NAME, and the
+ * TR2_MAX_THREAD_NAME_PREFIX is the length of the formatted
+ * '"th%02d:", ctx->thread_id' prefix which is added when "thread_id >
+ * 0".
  */
+#define TR2_MAX_THREAD_NAME_PREFIX (5)
 #define TR2_MAX_THREAD_NAME (24)
 
 struct tr2tls_thread_ctx {
-	struct strbuf thread_name;
+	const char *thread_name;
 	uint64_t *array_us_start;
 	int alloc;
 	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */