Message ID | RFC-patch-1.1-8563d017137-20221007T010829Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] trace2 API: don't save a copy of constant "thread_name" | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > A cleaned up version of the test code I had on top of "master", RFC > because I may still be missing some context here. E.g. maybe there's a > plan to dynamically construct these thread names? That's nice to learn, indeed. > +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, > + int thread_id) > +{ > + object_common(jw, "thread"); > + strbuf_addch(&jw->json, '"'); > + jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id); > + strbuf_addch(&jw->json, '"'); > +} ... > @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name, > } > > strbuf_addf(buf, "d%d | ", tr2_sid_depth()); > - strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, > - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, > - event_name); > + oldlen = buf->len; > + jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id); > + padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);; > + strbuf_addf(buf, "%-*s | %-*s | ", padlen, "", > + TR2FMT_PERF_MAX_EVENT_NAME, event_name); Having to do strbuf_addf() many times may negatively affect perf_* stuff, if this code is invoked in the hot path. I however tend to treat anything that involves an I/O not performance critical, and this certainly falls into that category.
On Thu, Oct 06 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> A cleaned up version of the test code I had on top of "master", RFC >> because I may still be missing some context here. E.g. maybe there's a >> plan to dynamically construct these thread names? > > That's nice to learn, indeed. > >> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, >> + int thread_id) >> +{ >> + object_common(jw, "thread"); >> + strbuf_addch(&jw->json, '"'); >> + jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id); >> + strbuf_addch(&jw->json, '"'); >> +} > > ... > >> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name, >> } >> >> strbuf_addf(buf, "d%d | ", tr2_sid_depth()); >> - strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, >> - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, >> - event_name); >> + oldlen = buf->len; >> + jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id); >> + padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);; >> + strbuf_addf(buf, "%-*s | %-*s | ", padlen, "", >> + TR2FMT_PERF_MAX_EVENT_NAME, event_name); > > Having to do strbuf_addf() many times may negatively affect perf_* > stuff, if this code is invoked in the hot path. I however tend to > treat anything that involves an I/O not performance critical, and > this certainly falls into that category. Yes, and that function already called strbuf_addf() 5-7 times, this adds one more, but only if "thread_id" is > 0. The reason I added jw_object_string_thread() was to avoid the malloc() & free() of a temporary "struct strbuf", it would have been more straightforward to call jw_object_string() like that. I don't think anyone cares about the raw performance of the "perf" output, but the "JSON" one needs to be fast(er). But even that output will malloc()/free() for each line it emits, and often multiple times within one line (e.g. each time we format a double). So if we do want to optimize this in terms of memory use the lowest hanging fruit seems to be to just have a per-thread "scratch" buffer we'd write to, we could also observe that we're writing to a file and just directly write to it in most cases (although we'd need to be careful to write partial-and-still-invalid JSON lines in that case...).
On 10/6/22 9:10 PM, Ævar Arnfjörð Bjarmason wrote: > Since ee4512ed481 (trace2: create new combined trace facility, > 2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has > been copied from the caller, but those callers have always passed a > constant string: > > $ git -P grep '^\s*trace2_thread_start\(' > Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread"); > builtin/fsmonitor--daemon.c: trace2_thread_start("fsm-health"); > builtin/fsmonitor--daemon.c: trace2_thread_start("fsm-listen"); > compat/simple-ipc/ipc-unix-socket.c: trace2_thread_start("ipc-worker"); > compat/simple-ipc/ipc-unix-socket.c: trace2_thread_start("ipc-accept"); > compat/simple-ipc/ipc-win32.c: trace2_thread_start("ipc-server"); > t/helper/test-fsmonitor-client.c: trace2_thread_start("hammer"); > t/helper/test-simple-ipc.c: trace2_thread_start("multiple"); > > This isn't needed for optimization, but apparently[1] there's been > some confusion about the non-const-ness of the previous "struct > strbuf". > > Using the caller's string here makes this more straightforward, as > it's now clear that we're not dynamically constructing these. It's > also what the progress API does with its "title" string. > > Since we know we're hardcoding these thread names let's BUG() out when > we see that the length of the name plus the length of the prefix would > exceed the maximum length for the "perf" format. > > 1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/ > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> PLEASE DON'T DO THIS. If you don't like my patch, fine. Let's discuss it. But DON'T submit a new one to replace it. Or worse, try to inject it into the middle of an existing series. Yes, current callers are passing a string literal and thread-start could take a "const char*" to it, but there is no way to guarantee that that is safe if someone decides to dynamically construct their thread-name and pass it in (since we don't know the lifetime of that pointer). So it is safer to copy it into the thread context so that it can be used by later trace messages. [...] > +void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name, > + int thread_id); > +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, > + int thread_id); This violates a separation of concerns. json-writer is ONLY concerned with formatting valid JSON from basic data types. It does not know about threads or thread contexts. `js_strbuf_add_thread_name()` also violates the json-writer conventions -- that it takes a "struct json_writer *" pointer. There is nothing about JSON here. You might write a helper (inside of tr2_tgt_event.c) that formats a thread-name from the id and hint, but that is specific to the Event target -- not to JSON, nor the JSON writer. But then again, why make every trace message from every target format that "th%0d:%s" when we could save some time and format it in the thread-start and just USE it. [...] > @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name, > } > > strbuf_addf(buf, "d%d | ", tr2_sid_depth()); > - strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, > - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, > - event_name); > + oldlen = buf->len; > + jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id); This stands out as very wrong. The _Perf target does not use JSON at all, yet here we are calling a jw_ routine. Again, that code is in the wrong place. I'm going to clip the rest of this commit, since the above invalidates it. Jeff
On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Oct 06 2022, Junio C Hamano wrote: > >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> A cleaned up version of the test code I had on top of "master", RFC >>> because I may still be missing some context here. E.g. maybe there's a >>> plan to dynamically construct these thread names? >> >> That's nice to learn, indeed. >> >>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, >>> + int thread_id) >>> +{ >>> + object_common(jw, "thread"); >>> + strbuf_addch(&jw->json, '"'); >>> + jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id); >>> + strbuf_addch(&jw->json, '"'); >>> +} >> >> ... >> >>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name, >>> } >>> >>> strbuf_addf(buf, "d%d | ", tr2_sid_depth()); >>> - strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, >>> - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, >>> - event_name); >>> + oldlen = buf->len; >>> + jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id); >>> + padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);; >>> + strbuf_addf(buf, "%-*s | %-*s | ", padlen, "", >>> + TR2FMT_PERF_MAX_EVENT_NAME, event_name); >> >> Having to do strbuf_addf() many times may negatively affect perf_* >> stuff, if this code is invoked in the hot path. I however tend to >> treat anything that involves an I/O not performance critical, and >> this certainly falls into that category. > > Yes, and that function already called strbuf_addf() 5-7 times, this adds > one more, but only if "thread_id" is > 0. > > The reason I added jw_object_string_thread() was to avoid the malloc() & > free() of a temporary "struct strbuf", it would have been more > straightforward to call jw_object_string() like that. > > I don't think anyone cares about the raw performance of the "perf" > output, but the "JSON" one needs to be fast(er). > > But even that output will malloc()/free() for each line it emits, and > often multiple times within one line (e.g. each time we format a > double). > > So if we do want to optimize this in terms of memory use the lowest > hanging fruit seems to be to just have a per-thread "scratch" buffer > we'd write to, we could also observe that we're writing to a file and > just directly write to it in most cases (although we'd need to be > careful to write partial-and-still-invalid JSON lines in that case...). > WRT optimizing memory usage. We're talking about ~25 byte buffer per thread. Most commands execute in 1 thread -- if they read the index they may have ~10 threads (depending on the size of the index and if preload-index is enabled). So, I don't think we really need to optimize this. Threading is used extensively in fsmonitor-daemon, but it creates a fixed thread-pool at startup, so it may have ~12 threads. Again, not worth optimizing for the thread-name field. Now, if you want to optimize over all trace2 events (a completely different topic), you could create a large scratch strbuf buffer in each thread context and use it so that we don't have to malloc/free during each trace message. That might be worth while. We must not do partial writes to the trace2 files as we're constructing fields. The trace2 files are opened with O_APPEND so that we get the atomic lseek(2)+write(2) so that lines get written without overwrites when multiple threads and/or processes are tracing. Also, when writing to a named pipe, we get "message" semantics on write() boundaries, which makes post-processing easier. Jeff
On Mon, Oct 10 2022, Jeff Hostetler wrote: > On 10/6/22 9:10 PM, Ævar Arnfjörð Bjarmason wrote: >> Since ee4512ed481 (trace2: create new combined trace facility, >> 2019-02-22) the "thread_name" member of "struct tr2tls_thread_ctx" has >> been copied from the caller, but those callers have always passed a >> constant string: >> $ git -P grep '^\s*trace2_thread_start\(' >> Documentation/technical/api-trace2.txt: trace2_thread_start("preload_thread"); >> builtin/fsmonitor--daemon.c: trace2_thread_start("fsm-health"); >> builtin/fsmonitor--daemon.c: trace2_thread_start("fsm-listen"); >> compat/simple-ipc/ipc-unix-socket.c: trace2_thread_start("ipc-worker"); >> compat/simple-ipc/ipc-unix-socket.c: trace2_thread_start("ipc-accept"); >> compat/simple-ipc/ipc-win32.c: trace2_thread_start("ipc-server"); >> t/helper/test-fsmonitor-client.c: trace2_thread_start("hammer"); >> t/helper/test-simple-ipc.c: trace2_thread_start("multiple"); >> This isn't needed for optimization, but apparently[1] there's been >> some confusion about the non-const-ness of the previous "struct >> strbuf". >> Using the caller's string here makes this more straightforward, as >> it's now clear that we're not dynamically constructing these. It's >> also what the progress API does with its "title" string. >> Since we know we're hardcoding these thread names let's BUG() out >> when >> we see that the length of the name plus the length of the prefix would >> exceed the maximum length for the "perf" format. >> 1. https://lore.kernel.org/git/82f1672e180afcd876505a4354bd9952f70db49e.1664900407.git.gitgitgadget@gmail.com/ >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > > PLEASE DON'T DO THIS. > > If you don't like my patch, fine. Let's discuss it. But DON'T submit > a new one to replace it. Or worse, try to inject it into the middle > of an existing series. I'm not seeking to replace your series, or to tick you off, sorry if it came across like that. I just thought (and still think) that we were at a point in the discussion where it seemed clear that I wasn't quite managing to get across to you what I meant, so sending that in the form of working code should clarify things. Per Junio's "That's nice to learn, indeed." in <xmqqo7uoh1q0.fsf@gitster.g> it seems to have had that intended effect on him. It's marked as an RFC, so not-a-thing-to-pick-up, but just for discussion. > Yes, current callers are passing a string literal and thread-start > could take a "const char*" to it, but there is no way to guarantee > that that is safe if someone decides to dynamically construct their > thread-name and pass it in (since we don't know the lifetime of that > pointer). So it is safer to copy it into the thread context so that > it can be used by later trace messages. I think that's a defensible opinion, but I also think it's fair to say that: * This seems to be *the* motivation for why you're doing things the way you're doing them, and at least to this reviewer that wasn't really coming across... * ...nor the context of why we'd need that sort of guarded API in this case, but not e.g. for another widely-used API like start_progress(). See 791afae2924 (progress.c tests: make start/stop commands on stdin, 2022-02-03) for a case where we're using that where we need to work around its behavior (and no, I didn't make the underlying API that way, it's just a commit of mine where I'm having to work with it). I think designing our internal APIs to not be quite so guarded is fine, and we do that in various other contexts (progress, etc.). We control both the API and its users, so just leaving a "this must be a constant" should be enough. But even if you want to be paranoid about it there's much easier ways to do that which give you more of the safety you seem to want. E.g. this on top of master (and easily adjusted on top of this RFC patch): diff --git a/trace2.h b/trace2.h index 88d906ea830..1c3a98fb30f 100644 --- a/trace2.h +++ b/trace2.h @@ -306,12 +306,18 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code); * * Thread names should be descriptive, like "preload_index". * Thread names will be decorated with an instance number automatically. + * Thread names must point to data that won't change after it's passed + * into this function. Once trace2_thread_exit() is called it can be + * free'd. */ void trace2_thread_start_fl(const char *file, int line, const char *thread_name); +/* + * The "" is to assure us that API users pass only constant strings + */ #define trace2_thread_start(thread_name) \ - trace2_thread_start_fl(__FILE__, __LINE__, (thread_name)) + trace2_thread_start_fl(__FILE__, __LINE__, (thread_name "")) /* * Emit a 'thread_exit' event. This must be called from inside the Will pass, as we only pass it constant strings, but if someone were to pass a variable it'll blow up, at which point we could provide some inline macro/function that would do the required xstrdup(). All of which I think is *still* being too paranoid, but which I think *if* you want the paranoia is much more explicit about what we're trying to accomplish with said paranoida, and where the compiler will help you. > [...] >> +void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name, >> + int thread_id); >> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, >> + int thread_id); > > This violates a separation of concerns. json-writer is ONLY concerned > with formatting valid JSON from basic data types. It does not know > about threads or thread contexts. > > `js_strbuf_add_thread_name()` also violates the json-writer conventions > -- that it takes a "struct json_writer *" pointer. There is nothing > about JSON here. > > You might write a helper (inside of tr2_tgt_event.c) that formats a > thread-name from the id and hint, but that is specific to the Event > target -- not to JSON, nor the JSON writer. That's fair, more on that below. > But then again, why make every trace message from every target format > that "th%0d:%s" when we could save some time and format it in the > thread-start and just USE it. If you actually care about this being fasterer -- and only reason for posting this RFC patch is to try to tease out *why* that is -- then this part of your concern can be trivially mitigated with having a struct member like: char thread_id_str[3]; We'd then just snprintf() into that in tr2tls_create_self(). Then when we print the thread to the JSON or log you'd do so without any strbuf_addf(), just a strbuf_addstr() or strbuf_add(). I think that micro-optimization isn't needed in this case, but it *is* easy to do . > [...] >> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name, >> } >> strbuf_addf(buf, "d%d | ", tr2_sid_depth()); >> - strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, >> - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, >> - event_name); >> + oldlen = buf->len; >> + jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id); > > This stands out as very wrong. The _Perf target does not use JSON > at all, yet here we are calling a jw_ routine. Again, that code is > in the wrong place. > > I'm going to clip the rest of this commit, since the above invalidates > it. A helper function being in the wrong place invalidates the whole commit? I think you're right that this jw_strbuf_add_thread_name() helper should live somewhere else, probably in thread-utils.c. So, pretending that it's in whatever place you'd be comfortable with, and using whatever naming convention you'd prefer. What do you think about the rest of the commit? You snippet it just as you were getting to the meaty part of it, namely: * With this approach we can BUG() out as soon as we try to construct the main thread if its name is bad, we don't need to wait until runtime when a child thread runs into the limit. * We no longer need the whole thread-creation-time string duplication, associated storage in the struct etc. * That struct member is "const", addresing your initial concern of (from the upthread commit message): Using a (non-const) `strbuf` structure for it caused some confusion in the past because it implied that someone could rename a thread after it was created. That usage was not intended. Although I think (and I'm possibly misreading it) that your commentary here is saying that even that's not enough, i.e. we can't just leave it at a "const" here, but must assume that an API user will disregard that and modify it after it's passed to us anyway.
On Mon, Oct 10 2022, Jeff Hostetler wrote: > On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote: >> On Thu, Oct 06 2022, Junio C Hamano wrote: >> >>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >>> >>>> A cleaned up version of the test code I had on top of "master", RFC >>>> because I may still be missing some context here. E.g. maybe there's a >>>> plan to dynamically construct these thread names? >>> >>> That's nice to learn, indeed. >>> >>>> +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, >>>> + int thread_id) >>>> +{ >>>> + object_common(jw, "thread"); >>>> + strbuf_addch(&jw->json, '"'); >>>> + jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id); >>>> + strbuf_addch(&jw->json, '"'); >>>> +} >>> >>> ... >>> >>>> @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name, >>>> } >>>> strbuf_addf(buf, "d%d | ", tr2_sid_depth()); >>>> - strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, >>>> - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, >>>> - event_name); >>>> + oldlen = buf->len; >>>> + jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id); >>>> + padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);; >>>> + strbuf_addf(buf, "%-*s | %-*s | ", padlen, "", >>>> + TR2FMT_PERF_MAX_EVENT_NAME, event_name); >>> >>> Having to do strbuf_addf() many times may negatively affect perf_* >>> stuff, if this code is invoked in the hot path. I however tend to >>> treat anything that involves an I/O not performance critical, and >>> this certainly falls into that category. >> Yes, and that function already called strbuf_addf() 5-7 times, this >> adds >> one more, but only if "thread_id" is > 0. >> The reason I added jw_object_string_thread() was to avoid the >> malloc() & >> free() of a temporary "struct strbuf", it would have been more >> straightforward to call jw_object_string() like that. >> I don't think anyone cares about the raw performance of the "perf" >> output, but the "JSON" one needs to be fast(er). >> But even that output will malloc()/free() for each line it emits, >> and >> often multiple times within one line (e.g. each time we format a >> double). >> So if we do want to optimize this in terms of memory use the lowest >> hanging fruit seems to be to just have a per-thread "scratch" buffer >> we'd write to, we could also observe that we're writing to a file and >> just directly write to it in most cases (although we'd need to be >> careful to write partial-and-still-invalid JSON lines in that case...). >> I left more extensive commentary in the side-thread in https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@evledraar.gmail.com/, just a quick reply here. > WRT optimizing memory usage. We're talking about ~25 byte buffer > per thread. Most commands execute in 1 thread -- if they read the > index they may have ~10 threads (depending on the size of the index > and if preload-index is enabled). So, I don't think we really need > to optimize this. Threading is used extensively in fsmonitor-daemon, > but it creates a fixed thread-pool at startup, so it may have ~12 > threads. Again, not worth optimizing for the thread-name field. Yes, I agree it's not worth optimizing. The reason for commenting on this part is that it isn't clear to me why your proposed patch then isn't doing the more obvious "it's not worth optimizing" pattern, per Junio's [1] comment on the initial version. The "flex array" method is seemingly taking pains to reduce the runtime memory use of these by embedding this string in the space reserved for the struct. So it's just meant as a question for you & the proposed patch. > Now, if you want to optimize over all trace2 events (a completely > different topic), you could create a large scratch strbuf buffer in > each thread context and use it so that we don't have to malloc/free > during each trace message. That might be worth while. *nod* > We must not do partial writes to the trace2 files as we're > constructing fields. The trace2 files are opened with O_APPEND > so that we get the atomic lseek(2)+write(2) so that lines get > written without overwrites when multiple threads and/or processes > are tracing. > > Also, when writing to a named pipe, we get "message" semantics > on write() boundaries, which makes post-processing easier. *nod* 1. https://lore.kernel.org/git/xmqq8rwcjttq.fsf@gitster.g/ 2. https://lore.kernel.org/git/RFC-patch-1.1-8563d017137-20221007T010829Z-avarab@gmail.com/
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Per Junio's "That's nice to learn, indeed." in > <xmqqo7uoh1q0.fsf@gitster.g> it seems to have had that intended effect > on him. I was commenting on the goal, i.e. you "may still be missing some context here, maybe there's a plan to ...", and I meant that the plan of the overall effort is something that is nice to learn before going further. I was not endorsing the method you are taking to achieve that goal, though. FWIW, I find my code sent in as a comment easier to read than my prose alone for any topic, but that is only because it is "my" code is easy to read for "me". I am sure others would find it unnecessary burden to figure out what the alternative/replacement I send out intends to do and why it does so in the way it does, and would rather appreciate if I explained these things in prose that is easy to understand and rich in "why", which alternative/replacement code would solely lack. Code snippet helps illustrate points on "how", but is often a poor replacement for proper explanation because it is a bad medium to convey "why". Same would apply to your code. For others, including me, it often is a lot of work to figure out what your code is trying to do, and more importantly why it does what it tries to do in the way it does. After all, when you are having hard time communicating why you want to do things differently from the patch author in prose, code snippet would probably be the worst primary medium to do so, because it is full of "how exactly" with little "why".
On 10/11/22 9:31 AM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Oct 10 2022, Jeff Hostetler wrote: > >> On 10/7/22 6:03 AM, Ævar Arnfjörð Bjarmason wrote: >>> On Thu, Oct 06 2022, Junio C Hamano wrote: >>> >>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >>>> >>>>> A cleaned up version of the test code I had on top of "master", RFC >>>>> because I may still be missing some context here. E.g. maybe there's a >>>>> plan to dynamically construct these thread names? >>>> [...] > I left more extensive commentary in the side-thread in > https://lore.kernel.org/git/221011.86lepmo5dn.gmgdl@evledraar.gmail.com/, > just a quick reply here. > >> WRT optimizing memory usage. We're talking about ~25 byte buffer >> per thread. Most commands execute in 1 thread -- if they read the >> index they may have ~10 threads (depending on the size of the index >> and if preload-index is enabled). So, I don't think we really need >> to optimize this. Threading is used extensively in fsmonitor-daemon, >> but it creates a fixed thread-pool at startup, so it may have ~12 >> threads. Again, not worth optimizing for the thread-name field. > > Yes, I agree it's not worth optimizing. > > The reason for commenting on this part is that it isn't clear to me why > your proposed patch then isn't doing the more obvious "it's not worth > optimizing" pattern, per Junio's [1] comment on the initial version. > > The "flex array" method is seemingly taking pains to reduce the runtime > memory use of these by embedding this string in the space reserved for > the struct. > > So it's just meant as a question for you & the proposed patch. I think we're converging on some common understanding (and I think we've gone around on this topic more than enough). :-) I really was just trying to get rid of the strbuf and make it a fixed string -- I chose a flex-array rather than just detaching the buffer from a local in the thread-start code. I should have done the latter. I saw the flex-array as a fixed-size object that can't be replaced or extended (without recreating the thread-local storage) -- yes, people could overwrite existing bytes in-place in the flex-array, but who does that?? I understood what you were asking (illustrated in your RFC). That is, I understood the "what/how" you wanted to do to refactor / redesign the field, but I couldn't understand the "why". That is, why you've taken such interest in this field (and such a relatively unimportant change). We've spent nearly a week discussing it and we both agree that the optimization that I didn't suggest isn't worth doing. (I'm paraphrasing slightly.) :-) So, rather than continuing with the back-n-forth, let me skip over the remaining questions in this thread and prepare a re-roll. Hopefully, I can simplify and more clearly explain the method to my madness and we can move on. >> Now, if you want to optimize over all trace2 events (a completely >> different topic), you could create a large scratch strbuf buffer in >> each thread context and use it so that we don't have to malloc/free >> during each trace message. That might be worth while. > > *nod* I'll make a note to revisit this idea in a future series. Thanks Jeff
diff --git a/json-writer.c b/json-writer.c index f1cfd8fa8c6..569a75bee51 100644 --- a/json-writer.c +++ b/json-writer.c @@ -161,6 +161,23 @@ void jw_object_string(struct json_writer *jw, const char *key, const char *value append_quoted_string(&jw->json, value); } +void jw_strbuf_add_thread_name(struct strbuf *sb, const char *thread_name, + int thread_id) +{ + if (thread_id) + strbuf_addf(sb, "th%02d:", thread_id); + strbuf_addstr(sb, thread_name); +} + +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, + int thread_id) +{ + object_common(jw, "thread"); + strbuf_addch(&jw->json, '"'); + jw_strbuf_add_thread_name(&jw->json, thread_name, thread_id); + strbuf_addch(&jw->json, '"'); +} + void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value) { object_common(jw, key); diff --git a/json-writer.h b/json-writer.h index 209355e0f12..269c203b119 100644 --- a/json-writer.h +++ b/json-writer.h @@ -75,6 +75,10 @@ void jw_release(struct json_writer *jw); void jw_object_begin(struct json_writer *jw, int pretty); void jw_array_begin(struct json_writer *jw, int pretty); +void jw_strbuf_add_thread_name(struct strbuf *buf, const char *thread_name, + int thread_id); +void jw_object_string_thread(struct json_writer *jw, const char *thread_name, + int thread_id); void jw_object_string(struct json_writer *jw, const char *key, const char *value); void jw_object_intmax(struct json_writer *jw, const char *key, intmax_t value); diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c index 37a3163be12..1308cf05df4 100644 --- a/trace2/tr2_tgt_event.c +++ b/trace2/tr2_tgt_event.c @@ -90,7 +90,7 @@ static void event_fmt_prepare(const char *event_name, const char *file, jw_object_string(jw, "event", event_name); jw_object_string(jw, "sid", tr2_sid_get()); - jw_object_string(jw, "thread", ctx->thread_name.buf); + jw_object_string_thread(jw, ctx->thread_name, ctx->thread_id); /* * In brief mode, only emit <time> on these 2 event types. diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c index 8cb792488c8..ab21277eb36 100644 --- a/trace2/tr2_tgt_perf.c +++ b/trace2/tr2_tgt_perf.c @@ -69,6 +69,8 @@ static void perf_fmt_prepare(const char *event_name, const char *category, struct strbuf *buf) { int len; + size_t oldlen; + int padlen; strbuf_setlen(buf, 0); @@ -107,9 +109,11 @@ static void perf_fmt_prepare(const char *event_name, } strbuf_addf(buf, "d%d | ", tr2_sid_depth()); - strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, - event_name); + oldlen = buf->len; + jw_strbuf_add_thread_name(buf, ctx->thread_name, ctx->thread_id); + padlen = TR2_MAX_THREAD_NAME - (buf->len - oldlen);; + strbuf_addf(buf, "%-*s | %-*s | ", padlen, "", + TR2FMT_PERF_MAX_EVENT_NAME, event_name); len = buf->len + TR2FMT_PERF_REPO_WIDTH; if (repo) diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c index 7da94aba522..aa9aeb67fca 100644 --- a/trace2/tr2_tls.c +++ b/trace2/tr2_tls.c @@ -36,6 +36,9 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, { struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); + if (strlen(thread_name) + TR2_MAX_THREAD_NAME_PREFIX > TR2_MAX_THREAD_NAME) + BUG("too long thread name '%s'", thread_name); + /* * Implicitly "tr2tls_push_self()" to capture the thread's start * time in array_us_start[0]. For the main thread this gives us the @@ -45,15 +48,9 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, ctx->array_us_start = (uint64_t *)xcalloc(ctx->alloc, sizeof(uint64_t)); ctx->array_us_start[ctx->nr_open_regions++] = us_thread_start; + ctx->thread_name = thread_name; ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id); - strbuf_init(&ctx->thread_name, 0); - if (ctx->thread_id) - strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id); - strbuf_addstr(&ctx->thread_name, thread_name); - if (ctx->thread_name.len > TR2_MAX_THREAD_NAME) - strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME); - pthread_setspecific(tr2tls_key, ctx); return ctx; @@ -95,7 +92,6 @@ void tr2tls_unset_self(void) pthread_setspecific(tr2tls_key, NULL); - strbuf_release(&ctx->thread_name); free(ctx->array_us_start); free(ctx); } @@ -113,7 +109,7 @@ void tr2tls_pop_self(void) struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); if (!ctx->nr_open_regions) - BUG("no open regions in thread '%s'", ctx->thread_name.buf); + BUG("no open regions in thread '%s'", ctx->thread_name); ctx->nr_open_regions--; } diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h index b1e327a928e..f600eb22551 100644 --- a/trace2/tr2_tls.h +++ b/trace2/tr2_tls.h @@ -4,12 +4,17 @@ #include "strbuf.h" /* - * Arbitry limit for thread names for column alignment. + * Arbitry limit for thread names for column alignment. The overall + * max length is TR2_MAX_THREAD_NAME, and the + * TR2_MAX_THREAD_NAME_PREFIX is the length of the formatted + * '"th%02d:", ctx->thread_id' prefix which is added when "thread_id > + * 0". */ +#define TR2_MAX_THREAD_NAME_PREFIX (5) #define TR2_MAX_THREAD_NAME (24) struct tr2tls_thread_ctx { - struct strbuf thread_name; + const char *thread_name; uint64_t *array_us_start; int alloc; int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */